"Automating my own job: deep learning for population genetic inference"
Feb 17, 2021 Schedule:
- Virtual Tea Time
- 03:00 to 03:30 PM Eastern Time (US and Canada)
- Virtual Colloquium
- 03:30 to 04:30 PM Eastern Time (US and Canada)
Population-scale genomic datasets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date most methods make use of only one or a few summaries of the input sequences and therefore ignore much of the information encoded in the data. In this talk we will demonstrate the enormous potential of adapting techniques from the field of machine learning to population genetic inference. We find that supervised machine learning tools typically outperform more conventional methods in part because of their ability to simultaneously examine of a variety of different facets of genetic variation measured by summary statistics. Modern deep learning tools also have the ability of operating directly on DNA sequence alignments as input, and can perform both evolutionary model selection and parameter estimation. Because these methods do not require explicit evolutionary model-based formulae for summarizing sequence data, they are particularly well suited for problems that have not received detailed theoretical treatments. Thus, when applied to population genetic data, machine learning approaches outperform expert-derived statistical methods, and offer a new path forward in cases where no theory-based approaches exist.