








 Favourites
Favourites Share
Share F5HW+FGX, Vaiaku, Tuvalu
F5HW+FGX, Vaiaku, TuvaluVision-Language Models (VLMs) are transforming how AI **sees, understands, and explains the world.** From image captioning to multimodal reasoning, they power the next wave of intelligent applications. In this live session, we’ll: * Break down the fundamentals of how **Vision-Language Models** process both text and images * Showcase **lightweight VLMs** (SmolVLM, MiniCPM-o, Qwen-VL, etc.) that can run on modest hardware * Demonstrate real-time examples: **image captioning, visual Q&A, and multimodal retrieval** * Compare open-source VLMs with larger commercial ones, and discuss where small models shine * Share tips on deploying VLMs for **startups, apps, and research projects** ✅ **Format:** Demo-driven walkthrough + interactive Q&A ✅ **Who’s it for:** AI engineers, product managers, researchers, and builders curious about multimodal AI ✅ **Takeaway:** A working understanding of VLMs, access to demo notebooks, and ideas for real-world applications
 Show map
Show map