Moondream

5M+ monthly downloadsOpen weights · commercial use

The production-ready vision language model.

Production VLMs need more than just accuracy. They need to be fast enough for real-time decisions, and run anywhere you deploy. That's what Moondream is built for.

Got questions?

>model.detect("Misoriented box")

Misoriented box 0.94

Output

{label: "Misoriented box", conf: 0.94}

Moondream 3 Preview · Photon · Fine-tuned with Lens

Step 1 · Try it

Try the open models. You might already be done.

Moondream might already nail your use case out of the box. The open models are commercially friendly and can run anywhere. Use our playground to try it out or download it and run it yourself.

Open playground

$pip install moondream

No credit card required. $5 in credits added monthly.

python

# Caption an image in four lines
import moondream as md
from PIL import Image

model = md.vl(model="moondream-2b")
image = Image.open("shelf.jpg")
print(model.caption(image).caption)
# → 'A warehouse shelf with six cardboard cartons…'

Moondream 3 Preview (9B MoE)

Recommended

Sparse mixture-of-experts architecture with frontier-level visual reasoning, segmentation, and long-context queries.

Parameters9B MoE (2B Active)

QueryCaptionDetectPointSegment

Moondream 2 (2B Dense)

Stable

The production workhorse: compact, proven, commercially friendly, and easy to deploy across GPUs, CPUs, and edge devices.

Parameters2B Dense

QueryCaptionDetectPointSegment

Moondream 2 0.5B (distillation target)

Tiny

A small fine-tuning base for constrained hardware where every megabyte matters.

Parameters0.5B Dense

QueryCaptionDetectPointSegment

Step 2 · Fine-tune it

Need more? Lens gets you to production-grade accuracy

Your data is specific, so the model has to be. Lens is a fine-tuning platform with a simple API. No dataset uploads, no infrastructure, no ML team required.

Learn more about Lens Start fine-tuning

Self-serve API

A simple hosted API — no hardware to rent or manage. Supports SFT and RL. Vibe-code your fine-tune script in minutes.

Tune and go

Your fine-tuned model is instantly ready to run on Moondream Cloud or locally with Photon. No cumbersome download or install step.

White-glove option

Our team handles the labeling protocol, loss design, and evaluation. You keep the weights, the training code, and the data. Unlike ML consulting, you walk away self-sufficient.

No massive dataset required

Our reinforcement-learning fine-tune API can dramatically improve accuracy with as few as 20 labeled images — not thousands.

Step 3 · Run it anywhere

Fast, efficient, runs everwhere you need it.

Once your model is accurate, performance and cost become the next wall. Photon is the inference engine we built to run Moondream in production. Moondream Cloud and partner clouds give you a hosted path if you want one.

Speed

Under 500 ms is the difference between a useful answer and a late one. Photon runs Moondream in roughly half the time vLLM does on the same hardware.

~2×vs. vLLM · H100

Cost

A VLM running across a fleet of cameras at the wrong efficiency costs thousands a day. Moondream is the lowest-cost VLM we have measured across the inference providers we tested.

$0.06per 1K images, cloud

Flexibility

Your deployment story will change. Start in the cloud, move to the edge, or run air-gapped. You pick the hardware. The model and APIs stay the same.

8supported hardware tiers

Internal benchmark

Time per request

median of 200 runs

Moondream 3 Preview + PhotonH100 · batch 1

34 msbaseline

Qwen 3.5 4B + vLLMH100 · batch 1

73 ms2.1× slower

GPT-5.4 MiniOpenAI API

2.78 s82× slower

Gemini 2.5 FlashGoogle API

3.79 s111× slower

1920×1080 input · single-turn detect · NVIDIA H100 80GB · 2026-02 build

Photon · hardware

Same model, every tier.

Measured on the ChartQA test split with prefix caching enabled. Latency is the P50 of a single direct-answer query call; throughput is sustained requests per second at batch 64.

Jetson AGX Orin

Edge · 32 GB · Moondream 2

543ms

P50 · batch 1

3.66req/s

sustained · batch 64

NVIDIA L4

Workstation · Moondream 3

358ms

P50 · batch 1

4.85req/s

sustained · batch 64

NVIDIA A10

Cloud · Ampere · Moondream 2

223ms

P50 · batch 1

6.83req/s

sustained · batch 64

NVIDIA A100 80GB

Cloud · Ampere · Moondream 2

104ms

P50 · batch 1

21.36req/s

sustained · batch 64

NVIDIA L40S

Server · Moondream 3

121ms

P50 · batch 1

18.81req/s

sustained · batch 64

NVIDIA H100

Server · Hopper · Moondream 3

62ms

P50 · batch 1

58.01req/s

sustained · batch 64

Source: kestrel/PERFORMANCE.md · Moondream 3 requires sm89+, so Ampere parts (A100, A10, Jetson AGX Orin) report Moondream 2.

Available on

FAL

Self-hosted

Moondream Cloud

Photon

Same code. Edge, workstation, server.

Read the Photon docs

python

import moondream as md
from PIL import Image

# Initialize with local GPU inference
model = md.vl(api_key="YOUR_API_KEY", local=True)

# Load an image
image = Image.open("path/to/image.jpg")

# Generate a caption
caption = model.caption(image)["caption"]
print("Caption:", caption)

Step 4 · Keep it running

Launch is just the start

One vendor for the full stack. Models drift. Engineers leave. New use cases appear. With stitched-together vendors, nobody owns the outage. With Moondream, we do.

Competitor stack

Model vendor (weights only)
Fine-tuning vendor (your data goes elsewhere)
Inference provider (different SLA)
Your on-call engineer (owns everything)

Moondream

Model, weights, and roadmap
Lens fine-tuning and evals
Photon and Moondream Cloud
One team on call, 24/7 on enterprise plans

See Plans

The platform

Four products that work together. Use one. Use all of them.

Open Models

The foundation. Free for commercial use. 2B, 1B, and 0.5B checkpoints on Hugging Face.

View on Hugging Face

Lens

Fine-tuning with a simple API. Self-serve or white-glove. You keep the weights.

Start fine-tuning

Photon

Inference engine. Hand-tuned kernels. Mac, Windows, CUDA — Jetson to B200.

Run Moondream fast

Moondream Cloud

Hosted inference. OpenAI-compatible API. Pay per image, no commitment.

Get an API key

Support

Plans for teams running Moondream in production. One team owns the full stack.

Talk to our team

Two ways to start

Try the open model. Or talk to us about production.

The model is free, open, and the fastest way to see if Moondream fits. If you already know it does, we can skip ahead and talk about fine-tuning, inference, and a support plan.

Try in the playground

The production-ready vision language model.

Try the open models. You might already be done.

Moondream 3 Preview (9B MoE)

Moondream 2 (2B Dense)

Moondream 2 0.5B (distillation target)

Need more? Lens gets you to production-grade accuracy

Fast, efficient, runs everwhere you need it.

Launch is just the start

Four products that work together. Use one. Use all of them.

Try the open model. Or talk to us about production.

Moondream is trusted by