(Internal) Fine-tuning Vision-Language Models for Biodiversity Image Captioning

Advisor: Graham Taylor, School of Engineering

Proposed biological advisor: Joey Bernhardt, Alex Smith, Dirk Steinke

In collaboration with the Vector Institute, this project aims to revolutionize how artificial intelligence (AI) understands and describes images of plants, animals, and other living organisms in their natural habitats. By fine-tuning advanced AI vision-language models, we're creating a specialized system that can generate detailed, biologically accurate captions for biodiversity images.

Imagine an AI that doesn't just see a "bird" but recognizes "a ruby-throated hummingbird (Archilochus colubris) hovering near a red trumpet-shaped flower, its iridescent green back and forked tail visible against a blurred forest background." Our goal is to develop an AI system capable of producing such rich, scientifically relevant descriptions from images of diverse species.

This enhanced image captioner will serve as a crucial component in a larger biodiversity analysis system that leverages state-of-the-art large language models and retrieval-augmented generation (RAG) techniques. By generating detailed descriptions of organisms, their features, and their environments, our AI will assist researchers, conservationists, and citizen scientists in species identification, ecological studies, and biodiversity monitoring.

The project bridges the gap between cutting-edge AI technology and biodiversity informatics, potentially accelerating scientific discovery and conservation efforts. It promises to make the vast amount of biodiversity imagery collected worldwide more accessible and analyzable, supporting critical research on ecosystems, species distributions, and the impacts of environmental changes.

By improving how AI sees and describes the natural world, we're taking a significant step toward more efficient, accurate, and comprehensive biodiversity research and conservation strategies.

This project is suitable for one or two semesters.

Knowledge/Skills

Essential Qualifications:

Strong programming skills, particularly in Python
Familiarity with computer vision techniques and models
Understanding of natural language processing (NLP) concepts
Experience with large language models and their applications
Familiarity with biodiversity informatics and taxonomic classification systems
Basic understanding of biology, ecology, and species identification principles
Experience with data preprocessing and dataset curation
Experience with version control systems (e.g., Git)
Familiarity with scientific writing and technical documentation

Preferred Qualifications:

Experience with deep learning frameworks (e.g., PyTorch, JAX)
Knowledge of retrieval-augmented generation (RAG) systems
Understanding of model fine-tuning techniques
Experience with prompt engineering for large language models
Basic knowledge of REST APIs and web services
Basic understanding of high-performance computing environments
Knowledge of data visualization techniques
Familiarity with biodiversity databases (e.g., BOLD, iNaturalist)

(Internal) Fine-tuning Vision-Language Models for Biodiversity Image Captioning

Share this page

Graduate Program in Bioinformatics

Slideshow Banners

(Internal) Fine-tuning Vision-Language Models for Biodiversity Image Captioning

Share this page