PhD Seminar - Jarrett Phillips

Date and Time

Location

J.D. MacLachlan Room 228

Details

Title

Novel Statistical Approaches to Estimating Intraspecific Sample Sizes for Animal DNA Barcoding

Abstract

The determination of adequate sample sizes for successful species identification has long been recognized as vital since the early days of DNA barcoding; however, deep taxon sampling is often secondary to maximizing the number of different taxa sampled. While general consensus points to the sampling of 5-10 individuals per species as sufficient for most phylogeographic barcoding applications, this figure is highly constrained by numerous biological and statistical factors, meaning that no on universal sample size can readily be applied across all known taxa. In effect, few attempts have been made at solving this problem. 

The present research will be devoted to the issue of sample size determination through the development of rigorous statistical models in order to predict adequate specimen sample sizes necessary to uncover the majority of cytochrome oxidase subunit I (COI) DNA barcode haplotype diversity existing within animal species. Specifically, this work aims to address the issue of sampling sufficiency - the sample size at which sampling accuracy is maximized and above which no new sampling information is likely to be gained. Findings based on a simple Method of Moments estimator show that hundreds to thousands of specimens likely must be randomly sampled to uncover all predicted haplotype diversity in species of ray-finned fishes. While haplotype accumulation curves lend credence to this result, underlying assumptions of the current model likely to be over-simplistic. 

In this project, existing and new haplotype diversity sample size models will be calibrated with simulations of species haplotype accumulation curves based on established statistical interpolation/extrapolation and regression techniques, specifically Kriging, in order to accurately determine the value on the x-axis where haplotype saturation is likely to occur for a variety of representative vertebrate and invertebrate taxa within the Barcode of Life Data Systems (BOLD) using the statistical platform R. The end result will be the first widely-accessible approach allowing the estimation of specimen sample sizes for the accurate and rapid characterization of genetic diversity existing within species. 

Advisor: Dr. Daniel Gillis
Co-Advisor: Dr. Robert Hanner

Advisory Committee Member: Dr. Deborah Stacey
Advisory Committee Member: Dr. Graham Taylor

Events Archive