Logo image
AskBeacon-performing genomic data exchange and analytics with natural language
Journal article   Open access   Peer reviewed

AskBeacon-performing genomic data exchange and analytics with natural language

Anuradha Wickramarachchi, Shakila Tonni, Sonali Majumdar, Sarvnaz Karimi, Sulev Kõks, Brendan Hosking, Jordi Rambla, Natalie A Twine, Yatish Jain and Denis C Bauer
Bioinformatics (Oxford, England), Vol.41(3), btaf079
2025
PMID: 39985504
pdf
Published1.95 MBDownloadView
CC BY V4.0 Open Access

Abstract

Databases, Genetic Female Genomics - methods Humans Male Natural Language Processing Parkinson Disease - genetics Software
Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables large language models (LLMs) to be applied to securely shared cohorts via the Global Alliance for Genomics and Health Beacon protocol. By simply "asking" Beacon, actionable insights can be gained, analyzed, and made publication-ready. In the Parkinson's Progression Markers Initiative (PPMI), we use natural language to ask whether the sex-differences observed in Parkinson's disease are due to X-linked or autosomal markers. AskBeacon returns a publication-ready visualization showing that for PPMI the autosomal marker occurred 1.4 times more often in males with Parkinson's disease than females, compared to no differences for the X-linked marker. We evaluate commercial and open-weight LLM models, as well as different architectures to identify the best strategy for translating research questions to Beacon queries. AskBeacon implements extensive safety guardrails to ensure that genomic data is not exposed to the LLM directly, and that generated code for data extraction, analysis and visualization process is sanitized and hallucination resistant, so data cannot be leaked or falsified. AskBeacon is available at https://github.com/aehrc/AskBeacon.

Details

Metrics

134 File views/ downloads
17 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types
Domestic collaboration
International collaboration
Citation topics
6 Social Sciences
6.238 Bibliometrics, Scientometrics & Research Integrity
6.238.2805 Cultural Narratives
Web Of Science research areas
Biochemical Research Methods
Biotechnology & Applied Microbiology
Computer Science, Interdisciplinary Applications
Mathematical & Computational Biology
Statistics & Probability
ESI research areas
Biology & Biochemistry
Logo image