








 Favourites
Favourites Share
Share F5HW+FGX, Vaiaku, Tuvalu
F5HW+FGX, Vaiaku, TuvaluTired of slow inference and complex serving pipelines? Join us for a **live hands-on demo of vLLM**, the high-performance inference engine designed for large language models. In this session, you’ll learn: * How to install and configure **vLLM** step by step * Best practices for serving models efficiently with **dynamic batching and PagedAttention** * How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference * Tips for running vLLM locally and scaling on the cloud This is a **practical, no-fluff workshop**—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production. 🔹 **Format:** Live coding + Q&A 🔹 **Who’s it for:** AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale 🔹 **Takeaway:** A working vLLM setup and a deeper understanding of efficient LLM serving
 Show map
Show map