CARE-AI Seminar Series: Building More Controllable Text-to-Image Generation
Date and Time
Location
Reynolds Building,1101
Details
Abstract: Recently, text-to-image generation models have gained tremendous popularity due to their capabilities to produce accurate, diverse and even creative images from text prompts. However, text prompts are highly ambiguous in terms of conveying visual control. For example, if we want to generate an image with "my own backpack" or generate an image with "my backyard" as the background. These control signals cannot be well represented as text. Therefore, we need diverse types of control signals to complement the text-to-image generation process. Specifically, we work on two novel tasks: (1) subject-driven image generation, where the model needs to generate images containing a given subject (like a specific dog, backpack, etc.). (2) subject-driven image editing, where the model needs to swap or add a given subject into a given scene. We first propose new benchmarks and then propose new training algorithms to address these two new tasks.
Bio: Wenhu Chen is currently an assistant professor in the David R. Cheriton School of Computer Science at the University of Waterloo, a faculty member at the Vector Institute, a CIFAR AI chair and a part-time researcher at Google DeepMind. He obtained his PhD from the computing science department of the University of California, Santa Barbara, in 2021, and he spent a wonderful postdoctoral year at Google Research. His main research interests include natural language processing, large language models, vision-language interaction, and image generation.