Shrink Your VRAM! Alibaba's Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic"

Shrink Your VRAM! Alibaba’s Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic”

Ever felt like you need a humongous VLM to get the job done? Alibaba’s Qwen team has news for you: enter the dense Qwen3-VL 4B/8B (Instruct & Thinking) with FP8 checkpoints! These new models run smoothly on low VRAM, yet retain a whopping 256K to 1M context length and a full capability surface. Here’s what you need to know:

What’s New?
– Models Galore: Qwen3-VL-4B and Qwen3-VL-8B, each in Instruct and Thinking flavors, plus FP8 versions of these checkpoints. They’re compact, dense, and ready for action.
– Same Capabilities, Smaller Size: These newbies understand images/videos, OCR in 32 languages, spatial grounding, and control GUIs on desktops and mobile. No compromises here!
– Under the Hood: They’ve got Interleaved-MRoPE, DeepStack, and Text–Timestamp Alignment for robust video understanding. It’s all in the model cards.

FP8: The Game Changer
– Fine-Grained and Powerful: FP8 quantization with a block size of 128, nearly matching BF16 performance. No more re-quantization headaches!
– Tooling Up: Transformers doesn’t load these FP8 weights directly yet, but vLLM or SGLang can handle it. Plus, vLLM loves FP8 for H100 memory efficiency.

Why You Should Care
– Qwen’s new 4B/8B models come in both Instruct and Thinking variants, with FP8 checkpoints for low-VRAM deployment.
– They retain the full capability surface, making them perfect for single-GPU or edge budgets.

Join the Qwen Community
– Check out the model on Hugging Face.
– Explore tutorials, codes, and notebooks on our GitHub page.
– Follow us on Twitter, join our ML SubReddit, subscribe to our newsletter, and even join us on Telegram!

Share on Facebook

Post on X

Save

What's Hot

Shrink Your VRAM! Alibaba’s Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic”

Spooky Shootout Ahead: ‘Rules of Engagement: The Grey State’ – A Free Horror Extraction FPS is Coming in 2026!”

Blast Off! Karpathy’s ‘nanochat’ Lets You Train Your Own ChatGPT in Just 4 Hours for Under $100!”

Shrink Your VRAM! Alibaba’s Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic”

Spooky Shootout Ahead: ‘Rules of Engagement: The Grey State’ – A Free Horror Extraction FPS is Coming in 2026!”

Blast Off! Karpathy’s ‘nanochat’ Lets You Train Your Own ChatGPT in Just 4 Hours for Under $100!”

🔥 Catch Latvia vs England for FREE! Stream FIFA World Cup 2026 Qualifier Live

Our Picks

Shrink Your VRAM! Alibaba’s Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic”

Spooky Shootout Ahead: ‘Rules of Engagement: The Grey State’ – A Free Horror Extraction FPS is Coming in 2026!”

Blast Off! Karpathy’s ‘nanochat’ Lets You Train Your Own ChatGPT in Just 4 Hours for Under $100!”

Subscribe to Updates

What's Hot

Shrink Your VRAM! Alibaba’s Qwen AI Unveils Powerful, Compact 4B/8B Models with FP8 Magic”

Related Posts