Moondream
Products
Docs
Playground
Pricing
Blog
9.6k
Try the model
←
Back to all blogs
Categories
Showcase
Announcement
Model Release
Engineering
Release
Engineering
Blog posts in the Engineering category
Engineering
June 4, 2026
Popping the GPU Bubble
Photon, Moondream's inference engine, achieves near-realtime VLM inference (~33ms on NVIDIA B200). This is a peek into how it delivers up to 35% higher decode throughput by optimizing how the GPU works.
Read more