(Internal) Fine-tuning Vision-Language Models for Biodiversity Image Captioning

Advisor: Graham Taylor, School of Engineering

Proposed biological advisor: Joey Bernhardt, Alex Smith, Dirk Steinke

 

In collaboration with the Vector Institute, this project aims to revolutionize how artificial intelligence (AI) understands and describes images of plants, animals, and other living organisms in their natural habitats. By fine-tuning advanced AI vision-language models, we're creating a specialized system that can generate detailed, biologically accurate captions for biodiversity images.

Imagine an AI that doesn't just see a "bird" but recognizes "a ruby-throated hummingbird (Archilochus colubris) hovering near a red trumpet-shaped flower, its iridescent green back and forked tail visible against a blurred forest background." Our goal is to develop an AI system capable of producing such rich, scientifically relevant descriptions from images of diverse species.

This enhanced image captioner will serve as a crucial component in a larger biodiversity analysis system that leverages state-of-the-art large language models and retrieval-augmented generation (RAG) techniques. By generating detailed descriptions of organisms, their features, and their environments, our AI will assist researchers, conservationists, and citizen scientists in species identification, ecological studies, and biodiversity monitoring.

The project bridges the gap between cutting-edge AI technology and biodiversity informatics, potentially accelerating scientific discovery and conservation efforts. It promises to make the vast amount of biodiversity imagery collected worldwide more accessible and analyzable, supporting critical research on ecosystems, species distributions, and the impacts of environmental changes.

By improving how AI sees and describes the natural world, we're taking a significant step toward more efficient, accurate, and comprehensive biodiversity research and conservation strategies.

This project is suitable for one or two semesters.

Knowledge/Skills

Essential Qualifications:

  1. Strong programming skills, particularly in Python
  2. Familiarity with computer vision techniques and models
  3. Understanding of natural language processing (NLP) concepts
  4. Experience with large language models and their applications
  5. Familiarity with biodiversity informatics and taxonomic classification systems
  6. Basic understanding of biology, ecology, and species identification principles
  7. Experience with data preprocessing and dataset curation
  8. Experience with version control systems (e.g., Git)
  9. Familiarity with scientific writing and technical documentation

Preferred Qualifications:

  1. Experience with deep learning frameworks (e.g., PyTorch, JAX)
  2. Knowledge of retrieval-augmented generation (RAG) systems
  3. Understanding of model fine-tuning techniques
  4. Experience with prompt engineering for large language models
  5. Basic knowledge of REST APIs and web services
  6. Basic understanding of high-performance computing environments
  7. Knowledge of data visualization techniques
  8. Familiarity with biodiversity databases (e.g., BOLD, iNaturalist)