Natural language
Your question is the only code you need
Traditional computer vision requires predefined categories and outputs. Need to extract “total amount due” or answer “is this person wearing PPE?”? If it's not in the training set, you're stuck building custom pipelines.
Moondream understands language, not just labels. Ask any question about any image and get detailed, accurate answers instantly. From OCR to scene understanding, your question is the only limit.
Real-world demos
Visual Understanding, Instantly
Same API, endless applications. See Query across different domains.

Security Monitoring
Query
"what is happening?"Response
a man is trying to break into a white car
Document Extraction
Query
"return as json: total, merchant, item"Response
{ "total": 20.0, "merchant": "Anthropic, PBC", "item": "Claude Pro" }
Safety Compliance
Query
"what is the worker doing?"Response
Inspecting equipment while wearing a hard hat, noting details on a clipboard.
Media Tagging
Query
"provide 5 tags"Response
["Golden Gate Bridge", "suspension bridge", "orange", "water", "sunrise"]How it works
How query works
Ask any natural language question about an image and get detailed, accurate answers.

Query
Try it"what is happening?"
Output
485ms • 741 tokens • $0.000249
{
"answer": "A man is trying to break into a white car"
}Code
import moondream as md
from PIL import Image
# Initialize with API key
model = md.vl(api_key="your-api-key")
# Load your image
image = Image.open("security_camera.jpg")
# Ask a question about the image
result = model.query(image, "what is happening?")
print(result["answer"])FAQ
FAQ
Common questions about Query, pricing, and integration.
Caption generates a general description of the entire image. Query lets you ask specific questions and get targeted answers about particular aspects of the image, from scene understanding to document extraction.
Query supports a wide range of questions including scene understanding ("What is happening?"), object counting ("How many cars are visible?"), document extraction ("What is the total amount?"), compliance checking ("Is this person wearing PPE?"), and more.
Yes. Query excels at extracting information from receipts, invoices, forms, and other documents. You can ask for specific fields or request structured output like JSON.
Query uses the same per-token pricing as all other Moondream skills. Every Moondream Cloud account includes $5 in free monthly credits to experiment and build.
Yes. Query is available in both Moondream Cloud and the downloadable model. You can run Query locally for free on your own hardware.
Moondream Query delivers sub-200ms latency for most queries, making it suitable for real-time applications like video analysis and interactive systems. This is significantly faster than larger models like GPT-4o or Claude.
Yes. Query operates on still images and can be applied frame-by-frame to video. Combined with its low latency, this enables real-time video understanding and monitoring applications.
Running into problems or need help? Come reach us on Discord
Join Discord


