Transforming Computer Vision with LLMs


Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see. While vision-language models present a promising pathway to such multimodal understanding, it turns out that text-only LLMs can achieve remarkable success with prompting and tool use.

In this talk, Jacob Marks will give an overview of key LLM-centered projects that are transforming the field of computer vision, such as VisProg, ViperGPT, VoxelGPT, and HuggingGPT. He will also discuss his first-hand experience of building VoxelGPT, shedding light on the challenges and lessons learned, as well as a practitioner’s insights into domain-specific prompt engineering. He will conclude with his thoughts on the future of LLMs in computer vision.

This event is open to all and is especially relevant for researchers and practitioners interested in computer vision, generative AI, LLMs, and machine learning. RSVP now for an enlightening session!

Jacob Marks-LLMs-Computer Vision-ChatGPT

Jacob Marks

ML Engineer and Evangelist at Voxel51, xGoogle X

Jacob Marks is an ML Engineer and Developer Evangelist at Voxel51. At Voxel51, Jacob created VoxelGPT — an LLM-powered AI assistant for computer vision. In addition, he leads open source efforts in vector search, semantic search, and generative AI. He has been a Top 10 writer in AI on Medium, with 6,400+ followers. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research. In a past life, he was a theoretical physicist, he completed his Ph.D. at Stanford, where he investigated quantum phases of matter.

