Open‑Vocabulary Object Detection
Moondream Object Detection understands natural language to locate any object in your images. Fast, accurate bounding boxes powered by visual understanding.
Detect anything you can describe
Traditional object detection relies on predefined classes. Need to find “damaged boxes” or “person wearing red”? If it's not in the training set, you're stuck retraining the model.
Moondream understands language. Describe what you're looking for in plain English and get accurate bounding boxes instantly. No retraining required.
Built for production use cases
Same API, endless applications. See Object Detection across different domains.
Use Case
Damage Detection
Use Case
Robotics
Use Case
Computer Use
Use Case
Security & Safety
Fast, accurate object detection
Moondream 3 achieves the highest scores on standard grounding benchmarks while being faster and more cost-effective.
Moondream | GPT-5 | Gemini 2.5 Flash | Claude 4 Sonnet | |
|---|---|---|---|---|
| RefCOCO | 91.1 | 57.2 | 75.8 | 30.1 |
| RefCOCOg | 88.6 | 49.8 | 75.1 | 26.2 |
| RefCOCO+ | 81.8 | 46.3 | 70.2 | 23.4 |
How object detection works
Describe what you're looking for in natural language and get precise bounding boxes instantly.
Detect
"dirty dishes"
Output
{
"objects": [
{
"x_min": 0.422,
"y_min": 0.579,
"x_max": 0.704,
"y_max": 0.906
}
]
}Code
import moondream as md
from PIL import Image
# Initialize with API key
model = md.vl(api_key="your-api-key")
# Load your image
image = Image.open("kitchen.jpg")
# Detect objects using natural language
result = model.detect(image, "dirty dishes")
print(result["objects"])FAQ
Common questions about Object Detection, pricing, and integration.