Abstract
Background
Anorexia nervosa (AN) is a polygenic, severe metabopsychiatric disorder with poorly understood aetiology. Eight genome wide significant loci have been identified and single nucleotide polymorphisms (SNPs) account for ∼11-17% heritability (Watson et al., 2019). A key challenge to understanding mechanisms through which these loci alter AN risk is the uncertainty around which variants in the risk region drive the effect, as linkage disequilibrium (LD) across these regions means variants other than the most strongly associated SNP may be relevant. Therefore, it is important to define the spectrum of genetic variants in these regions. Advancements in long read genetic sequencing technologies such as nanopore sequencing, present a heretofore unseen opportunity to interrogate complex genetics at base pair resolution in individuals with high accuracy.
Methods
We implemented a novel pilot approach with targeted nanopore sequencing (Oxford Nanopore Technologies) of the eight loci associated with AN in 10 individuals with AN. Sequencing data was aligned to human reference genome GRCh38. We leveraged a variety of publicly available variant calling algorithms and databases, including calculating LD for SNPS identified in the recent AN genome wide association study (GWAS) with the newly developed TopLD tool, and variation annotations by Variant Effect Predictor (VEP), to characterise these loci. The primary focus was on less characterised variant types, including retrotransposons, tandem repeats, and short tandem repeats in regions of high LD with functional variant annotations.
Results
Target regions were clearly enriched (average coverage per sample ≥10.3X) and contained putatively relevant variants. Prioritised variants primarily occurred in introns, intergenic regions, or, in the case of one variant overlapping FOXP1, in the 3ʹ UTR. Notably, we identified a SINE-VNTR-Alu like sub-family D element (SVA-D), ∼2000bp, in the lead GWAS target region, intergenic with IP6K2 and PRKAR2A. This element had reported variations (both expansions and contractions) in all samples and bordered a precipitous decrease in LD. This variant overlaps several putative regulatory elements and interaction regions for surrounding genes.
Discussion
We successfully applied adaptive sampling to uncover genetic variation at base pair resolution in eight selected AN associated loci. Our results highlight the potential of this technology for uncovering novel or complex variation not captured by GWAS potentially underpinning risk regions. We present a subset of prioritized variants as examples of targets for investigation. While preliminary, variants such as the polymorphic SVA-D element in the lead GWAS locus could contribute to mechanisms of phenotypic risk. Based on protein, gene, and regulatory annotations in the UCSC Genome Browser, we speculate this SVA variant may influence regulation of several neighbouring genes, including already implicated plausible players in AN biology. This study is an exploratory investigation, not powered to detect significant effects, results interpreted cautiously; however, the rich variation with putative regulatory effects captured in this pilot study, much of which has not been previously explored, provides new opportunities for improved understanding of genetic risk mechanisms AN. We aim to continue characterisation of these types of genetic elements in larger, more diverse cohorts, to better understand whether and how these variants may contribute to risk.