(External) Development of a toolset for the efficient analysis, characterization, and reporting of viral genomics data
Advisor: Oliver Lung, Canadian Food Inspection Agency
Suggested co-advisors: Zvonimir Poljak
For this project, a student will work towards developing a data analysis platform which will be used to analyze metagenomic and viral genomic datasets. These datasets, and the associated metadata, are becoming increasingly complicated. Also, a rich amount of additional information, such as protein functions, the functional roles of various mutations, and data generated through antigenic cartography, is being compiled at an accelerated rate for important viral zoonotic pathogens such as SARS-CoV-2, Influenza, and Mpox. Integrating these additional sources of information and other tools will be key to generating reports that summarize key insights about viral sequences under evaluation. Normally, generating these types of reports is complicated since it requires highly qualified personnel. By developing this data analysis platform, the student(s) working on this project will help simplify some aspects of data analysis, thereby improving turnover time, reporting of key findings, and efficiency. Ultimately, this work will enable the better allocation of resources at various levels to more complex computational tasks. This is particularly important at the National Centre for Foreign Animal Disease as this will improve our efficiency in analyzing and characterizing high-consequence current, novel, and emerging viral pathogens. Furthermore, this work will be of benefit to health agencies in developing nations where access to high-quality analytical and bioinformatic services and resources is often limited. Finally, students will further develop their Python programming skills and learn how to integrate machine learning models and other advanced analytical approaches into a browser-based toolkit.
This project is suitable for one or two semesters. The student is required to occasionally be on-site.
Knowledge/Skills
Python programming, familiarity with training Scikit-learn models, familiarity with creating visualizations in Python (eg: scatterplots, waterfall plots, etc), understanding of common statistical methods used in genomics and metabarcoding research, familiarity with the Ubuntu command-line.