PhD Seminar: Eddie Ma

Date and Time

Location

JD MacLachlan 228

Details

Title: Automated Error Correction for DNA Sequencing using Neural Networks: Basecall Validation, Deletion, Resolving Ambiguity, and Prospective Editing Activities 

Abstract:

Artificial Neural Networks (ANNs) steadily gain traction as a pliable and versatile strategy for sequence analysis, image analysis, and timeseries analysis. Despite the variety in technology, data emitted by DNA sequencing platforms are fully compatible with ANNs, in that they are timeseries data amenable to be formatted as real-vector windows that can participate in classification and regression tasks. The activity of identifying and labelling DNA bases from sequencing sensor data is termed basecalling, and it is usually paired with an estimation of confidence for each labelled base. The basecaller used is particular to the sequencing platform. While basecalling from Sanger Sequencing timeseries data using the modern KB and PHRED algorithms is accurate and consistent, there is still work that must be done by the human editor in order to obtain an inspected and finished DNA sequence suitable for use in vast DNA libraries. In this seminar, I describe the use of two novel artificial neural network strategies in post-processing basecalled sequences, that can serve to alleviate some of the work needed by the human editor. DNA sequence and Sanger Sequencing sensor data participating in the training, and test subsets are sampled from Barcode of Life Data Systems, the most comprehensive online DNA Barcoding database and analytics platform. The ANNs described operate by using both the initially basecalled sequences, as well as the sensor data emitted from a Sanger Sequencing platform. In particular, the tasks of validating emitted bases, deleting erroneously called bases, and resolving ambiguous base labels marked with a low confidence value are demonstrated. Furthermore, additional sequence finishing activities performed by the human editor that could be automated by similar ANN systems in future are also discussed.

Advisor: Dr. Stefan Kremer
Advisory Committee Member:  Dr. David Calvert
Non-Advisory Committee Member: Dr. Andrew Hamilton-Wright

Find related events by keyword

Events Archive