Showcase

Showcase

Showcase

Announcing Gaze Detection.

January 6, 2025

Vision-Language Models (VLMs) are “foundational” as they can be adapted for many different tasks. Since Moondream’s launch we’ve released several capabilities such as Object Detection and Pointing. With a new Moondream launch scheduled later this week, we’re excited to pre-announce a new capability: Gaze Detection.

This capability does what it says: it determines what people in the image are looking at. This is convenient for captioning videos, understanding social dynamics, and for specific cases such as sports analytics, or detecting when drivers or operators are distracted. It’s likely useful for even more use cases we haven’t thought of yet. That’s why it’s so exciting for us to keep Moondream open source. It makes it easier (and cheaper) for everyone to build together.

Moondream is achieving promising results. It’s currently achieving a score of 0.103 on the Avg L2 GazeFollow benchmark, which is close to a state-of-the-art result. There’s a specialized model called Gaze-LLE that reaches 0.099 (lower is better). An actual human scores about 0.096, so Moondream is about as good as asking a human to do it.

This capability will be in our upcoming Moondream release scheduled for later this week. Meanwhile you can see it in action and try it out here. We’re excited to see gaze detection get used in next generation Vision AI apps. Let us know if you have any plans to use it, or if you have any questions.

We’ll have more release announcements lined up this week. Keep Moondreaming y’all.

Continue Reading

Check out our latest blogs

VISION AI THAT RUNS EVERYWHERE

VISION AI THAT RUNS EVERYWHERE