Powerful visual AI.
Tiny footprint.

Moondream is an open-source visual language model that understands images using simple text prompts. It's fast and wildly capable.

Start Coding

See It In Action

Features

Vision AI at Warp Speed

Forget everything you thought you needed to know about computer vision. With Moondream, there's no training, no ground truth data, and no heavy infrastructure. Just a model, a prompt, and a whole world of visual understanding.

Ridiculously lightweight

Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop.

Actually affordable

Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included.

Simple by design

Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models.

Versatile as hell

Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more.

Tried, tested, trusted

6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.

Ridiculously lightweight

Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop.

Actually affordable

Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included.

Simple by design

Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models.

Versatile as hell

Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more.

Tried, tested, trusted

6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.

Ridiculously lightweight

Under 2B parameters. Quantized to 4-bit. Just 1GB. Moondream runs anywhere — from edge devices to your laptop.

Actually affordable

Run it locally for free. Or use our cloud API to process a high volume of images quickly and cheaply. Free tier included.

Simple by design

Choose a capability. Write a prompt. Get results. That's it. Moondream is designed for developers who don't want to babysit models.

Versatile as hell

Go beyond basic visual Q&A. Moondream can caption, detect objects, locate things, read documents, follow gaze, and more.

Tried, tested, trusted

6M+ downloads. 8K+ GitHub stars. Used across industries — from healthcare to robotics to mobile apps.

Capabilities

One Model, Many Capabilities

Moondream supports a growing set of visual capabilities — all accessible through natural-language prompts.

Image Captioning

Caption:

The image shows a man in a blue jumpsuit and yellow hard hat standing in a large industrial setting. He is wearing safety glasses and ear protection, and is holding a clipboard and a pen, appearing to be taking notes...

Manufacturing

Compliance

Synthetic Data

Image Captioning

Visual Question Answering

Query: Is any vehicle unsecured? Describe.

Yes, there is an unsecured truck parked in the area. The truck is filled with boxes, and it appears to be a delivery truck. The presence of the unsecured truck and the boxes suggests that it might be a delivery service or a delivery truck parked in a public area.

Transportation

Security

Agentic AI

Visual Question Answering

Object Detection

Detect: License plate

(x=0.431, y=0.713, x2=0.569, y2=0.921)

Retail

Inventory

Transportation

Robotics

Object Detection

(x=0.431, y=0.713, x2=0.569, y2=0.921)

A shelf stocked with soda cans, water bottles, and snack bars

Pointing (x, y)

Point: Defect in train tracks.

(x=0.431, y=0.505)

Quality Control

Compliance

Transportation

Defense

Surveillance

Pointing (x, y)

(x=0.431, y=0.505)

Drone view with multiple vehicles near a building

Gaze Detection

Detect Gaze:

The operator is looking at the bottom-right section of the control panel, near the red warning light.

Manufacturing

Safety

Transportation

Retail

Real-world Agentic AI

Gaze Detection

The operator is looking at the bottom-right section of the control panel, near the red warning light.

Operator at a control panel with a flashing alert

OCR & Document Understanding

Query: Transcribe the text in natural reading order.

"Preface, The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities..."

Logistics

Office Automation

Legal

OCR & Document Understanding

"Preface, The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities..."

Moondream is trusted by:

Performance

Blazingly Fast and Cost-Effective.

Moondream is the most efficient VLM ever built. Only 1GB size and packed with architectural optimizations, it runs fast on commodity hardware and scales beautifully in the cloud.

Fast
Blazing fast even on laptops and mobile
Low Memory
Low memory usage and power consumption
Cost-Effective
Lower cloud costs at any volume
Runs everywhere
No GPU rentals or server tuning required

Quickstart →See It In Action →

Get Started

Get Running in Minutes.

Moondream is open source and you can install and run it anywhere, for free. You can have it running on your computer or in our cloud in a matter of minutes.

Run It Yourself

Moondream Station is free
Works with our Python and Node clients
Works offline, fully under your control
CPU or GPU compatible

Moondream Station

Run in the Cloud

No downloads required
Free tier: 5,000 requests per day
Works with same Python or Node clients
Scales to production

Moondream Cloud

Community

Trusted by Developers Everywhere.

Used in real-world applications across retail, logistics, healthcare, defense,and more.

View All Customer Success Stories

Ben Caunt

Software Engineer & Founder

Moondream allowed me to implement semantic behaviors for robotics systems far easier than any other way I could think of doing it.

Aya & Dan Bochman

Founders of FashnAI

We were looking for the fastest VLM model that could still reliably handle our use-case, and moondream nailed it.

Anurag Phadke

Software Engineer

Blazingly fast, clean interface and APIs, extensible, open-source, actually works, friendly website w/ fine-tuning instructions.

Powerful visual AI.Tiny footprint.

Vision AI at Warp Speed

Ridiculously lightweight

Actually affordable

Simple by design

Versatile as hell

Tried, tested, trusted

Ridiculously lightweight

Actually affordable

Simple by design

Versatile as hell

Tried, tested, trusted

Ridiculously lightweight

Actually affordable

Simple by design

Versatile as hell

Tried, tested, trusted

One Model, Many Capabilities

Image Captioning

Image Captioning

Visual Question Answering

Visual Question Answering

Object Detection

Object Detection

Pointing (x, y)

Pointing (x, y)

Gaze Detection

Gaze Detection

OCR & Document Understanding

OCR & Document Understanding

Moondream is trusted by:

Blazingly Fast and Cost-Effective.

Get Running in Minutes.

Trusted by Developers Everywhere.

Powerful visual AI.
Tiny footprint.