Polymerase chain reaction (PCR) amplicon sequencing allows for reliable identification of an organism by amplifying and analyzing a single conserved marker gene or DNA barcode. As this approach generally involves a single gene, it is an easier protocol to run compared with multilocus or whole-genome sequencing for diagnostic purposes, yet considerably reliable. Therefore, Sanger-based high-quality amplicon sequencing is widely deployed for species identification and high-throughput biosecurity surveillance. However, keeping up with the data analysis in large-scale surveillance or diagnostic settings could be a limiting factor because it involves manual quality control of the raw sequencing data, alignment of the forward and reverse reads, and, finally, a web-based Blastn search of all the amplicons. Here, we present a bioinformatics pipeline that automates the entire analysis. As a result, the pipeline is scalable with a high volume of samples and reproducible. Furthermore, the pipeline leverages the modern open-source Nextflow and Singularity concept; thus, it does not require software installation, except for Nextflow and Singularity, or any paid commercial software or programming expertise from the end users, making it widely adaptable.
[Formula: see text]
Details
Title
sangerFlow, an Automated Bioinformatics Pipeline to Analyze Sanger Amplicon Sequencing Data for Pest and Pathogen Diagnosis
Authors/Creators
M. Asaduzzaman Prodhan
Matthew Power - Department of Primary Industries and Regional Development
Monica Kehoe - Department of Primary Industries and Regional Development