(Internal) A bioinformatic investigation into the evolution of double-stranded DNA bacterial viruses

Advisor: Andrew Kropinski, Pathobiology

Proposed computational advisor: Nicole Ricker, Zvonimir Poljak

 

Introduction:

"British evolutionary biologist and geneticist J.B.S. Haldane quipped that if a god or divine being had created all living organisms on Earth, then that creator must have an “inordinate fondness for beetles.” Beetles (phylum Arthropoda, class Insecta, order Coleoptera) account for a greater number of species than any other single group of living animal (Weird Science: An Inordinate Fondness for Beetles).” This was before anybody started to enumerate bacterial viruses (bacteriophages, phages) which are the most abundant lifeform on earth with numbers in excess of 1031. The most common phages belong to the Class Caudoviricetes which possess dsDNA genomes and tails (article link).

Most of these viruses employ one of two life styles – lytic or virulent where an infected cell is killed releasing new virus particles; OR temperate, where the virus may suppress its lytic tendency and coexist with its host. What we don’t know is how these viruses originated though one theory posits that they evolved from their host. Since they are obligate parasites dependent on their host’s transcription, translation and DNA replication machinery, one might expect that their genomes exhibit characteristics in common with that of their hosts to optimize efficiency. One such characteristic is the guanine and cytosine content of the DNA, referred to as the mol%GC. In the 1960s-70s this used to estimated physico-chemically but now it can be calculated from the DNA sequence. A preliminary study carried out in the early 1970s suggested a positive correlation between the mol%GC of the phage and its host. The current research will explore this relationship with data from fully sequenced bacteriophages and their hosts.

Experimental outline:

  1. Familiarize yourself with phage genome and taxonomy resources including the NCBI Virus portal and the ICTV database including MSL (Master Species List) and VMR (Virus Metadata Resource).
  2. Familiarize yourself with bacterial genome and taxonomy resources including NCBI Microbial Genomes, Bacterial and Viral Bioinformatics Resource Center (BV-BRC), GCM (Global Catalogue of Microorganisms), and GOLD (Genomes OnLine Database)
  3. Discuss scope i.e. analysis of all phages and their hosts, or, a taxonomic subset; and, statistical approaches to be taken with he supervisory team.
  4. Recovery of desired data from the appropriate databases in the format required for analysis. N.B. This may necessitate writing a script to analyze bulk downloaded data or the databases themselves.
  5. Appropriate statistical analysis.

This is a one-semester project. The student is required to occasionally be on-site.

Knowledge/Skills

Programming, statistics