The production-ready vision language model.
Production VLMs need more than just accuracy. They need to be fast enough for real-time decisions, and run anywhere you deploy. That's what Moondream is built for.

Output
Try the open models. You might already be done.
Moondream might already nail your use case out of the box. The open models are commercially friendly and can run anywhere. Use our playground to try it out or download it and run it yourself.
No credit card required. $5 in credits added monthly.
# Caption an image in four lines import moondream as md from PIL import Image model = md.vl(model="moondream-2b") image = Image.open("shelf.jpg") print(model.caption(image).caption) # → 'A warehouse shelf with six cardboard cartons…'
Moondream 3 Preview (9B MoE)
RecommendedSparse mixture-of-experts architecture with frontier-level visual reasoning, segmentation, and long-context queries.
Moondream 2 (2B Dense)
StableThe production workhorse: compact, proven, commercially friendly, and easy to deploy across GPUs, CPUs, and edge devices.
Moondream 2 0.5B (distillation target)
TinyA small fine-tuning base for constrained hardware where every megabyte matters.
Need more? Lens gets you to production-grade accuracy
Your data is specific, so the model has to be. Lens is a fine-tuning platform with a simple API. No dataset uploads, no infrastructure, no ML team required.
A simple hosted API — no hardware to rent or manage. Supports SFT and RL. Vibe-code your fine-tune script in minutes.
Your fine-tuned model is instantly ready to run on Moondream Cloud or locally with Photon. No cumbersome download or install step.
Our team handles the labeling protocol, loss design, and evaluation. You keep the weights, the training code, and the data. Unlike ML consulting, you walk away self-sufficient.
Our reinforcement-learning fine-tune API can dramatically improve accuracy with as few as 20 labeled images — not thousands.
Fast, efficient, runs everwhere you need it.
Once your model is accurate, performance and cost become the next wall. Photon is the inference engine we built to run Moondream in production. Moondream Cloud and partner clouds give you a hosted path if you want one.
Under 500 ms is the difference between a useful answer and a late one. Photon runs Moondream in roughly half the time vLLM does on the same hardware.
A VLM running across a fleet of cameras at the wrong efficiency costs thousands a day. Moondream is the lowest-cost VLM we have measured across the inference providers we tested.
Your deployment story will change. Start in the cloud, move to the edge, or run air-gapped. You pick the hardware. The model and APIs stay the same.
Measured on the ChartQA test split with prefix caching enabled. Latency is the P50 of a single direct-answer query call; throughput is sustained requests per second at batch 64.
import moondream as md from PIL import Image # Initialize with local GPU inference model = md.vl(api_key="YOUR_API_KEY", local=True) # Load an image image = Image.open("path/to/image.jpg") # Generate a caption caption = model.caption(image)["caption"] print("Caption:", caption)
Launch is just the start
One vendor for the full stack. Models drift. Engineers leave. New use cases appear. With stitched-together vendors, nobody owns the outage. With Moondream, we do.
- Model vendor (weights only)
- Fine-tuning vendor (your data goes elsewhere)
- Inference provider (different SLA)
- Your on-call engineer (owns everything)
- Model, weights, and roadmap
- Lens fine-tuning and evals
- Photon and Moondream Cloud
- One team on call, 24/7 on enterprise plans
Four products that work together. Use one. Use all of them.
The foundation. Free for commercial use. 2B, 1B, and 0.5B checkpoints on Hugging Face.
Fine-tuning with a simple API. Self-serve or white-glove. You keep the weights.
Inference engine. Hand-tuned kernels. Mac, Windows, CUDA — Jetson to B200.
Hosted inference. OpenAI-compatible API. Pay per image, no commitment.
Plans for teams running Moondream in production. One team owns the full stack.
Try the open model. Or talk to us about production.
The model is free, open, and the fastest way to see if Moondream fits. If you already know it does, we can skip ahead and talk about fine-tuning, inference, and a support plan.