(External) Development of an AI-based tool for design of Heterologous DNA sequences used in Biopharming
Advisor: Ashley Meyers, PlantForm Corporation
Suggested co-advisors: Emma Allen-Vercoe, Joe Colasanti, Milad Eskandari, Jennifer Geddes-McAlister, Thomas Graham
PlantForm Corporation is a privately-owned Canadian biotechnology start up established in 2008 out of the University of Guelph. The company is based on technology developed by Professor Chris Hall’s group to express heterologous proteins using transient Agrobacterium-mediated infection of Nicotiana benthamiana. Products are targeted at human therapeutics (monoclonal antibodies and enzymes), veterinary vaccines, and medical countermeasures. One of the most significant challenges that PlantForm faces is that of product yield, which underpins the commercial viability for biopharmaceuticals produced in the plant-based expression system. Expression yield from a heterologous gene sequence is influenced by many disparate factors. One of which, is the design of DNA sequence for the target of interest. The majority of PlantForm products are natively produced in humans or commercially important animals. The codon use and gene architecture in these organisms is different from N. benthamiana that we use as a production host. In this project, we propose to access bioinformatics tools to review codon use distribution across a selection of genes for proteins known to be well-expressed in this plant. In a previous project, we demonstrated that these genes do not show a detectable difference in overall codon usage compared to a global genome analysis such as reported in the Kazusa database. We therefore propose to look for presence of general ‘rules’ in codon selection for the amino acids comprising the expressed protein. In simple terms, if a particular codon is used once for a particular amino acid, how many subsequent instances for this amino acid are encountered in the protein before the same codon is used again? Additionally, the question of frequency and context for ‘rare’ codon use should be included in the analysis. Secondly, we would like to consider if there is a bias to use common codons in the first part of the cDNA sequence. We would also like to consider the native DNA sequence for a selected product of interest to PlantForm. This will be ‘mapped’ for codon use, especially to where rarely used codons are situated in the cDNA sequence. Ultimately, we would aim to be in a position, towards the end of the project, to be able to rationally design a DNA sequence that should best mimic the native sequence, but applying N. benthamiana preferred codon usage patterns. In practise, we would design a series of sequences – all coding for the protein target – such that we would predict a range of anticipated outcomes. For example, we could use the native DNA sequence, a small number of sequences designed using currently available tools for N. benthamiana codon bias and randomly selected codons, and at least one sequence based on information derived over the course of the above bioinformatics analysis. Ultimately, we would hope to arrive at a point where could move to design of an AI-based tool for design of optimised DNA sequences for expression of heterologous proteins in the plant-based platform.
This is a one-semester project.
Knowledge/Skills
- Experience conducting bibliographic research
- Programming skills
- Highly computer literate
- Great at teamwork