An Automated Workflow for Mitochondrial DNA Extraction and Analysis from High Throughput Sequencing Data
Next-generation sequencing data are rich in information and contain many off-target sequences (reads), including mitochondrial reads, that are often ignored but which may be biologically relevant. Mitochondrial DNA (mtDNA) is now providing new perspectives on the tree of life and the etiology of the common complex diseases. The mtDNA codes for important bioenergetic genes, has a very high mutation rate, and can be present in thousands of copies per cell. The mechanisms by which new mtDNA mutations arise among thousands of other mtDNAs (called heteroplasmies) is poorly understood, and is complicated by the presence of nuclear mitochondrial insertions (NUMTs). My research utilizes current de-novo and referenced based methods of mitochondrial genome extraction from high throughput sequencing data. I implement a workflow to map reads from a popular next generation platform (Illumina) to custom-built reference genomes and extract the NUMTs using an open-source genome analysis platform, Galaxy. Workflows in Galaxy can be shared and published via the web, improving repeatability and data sharing among scientists. I discuss how to extend this workflow to include NUMT insertion rate detection in gene trees and heteroplasmy variant detection and annotation.