- Documentation
- Advanced
Detect API
The /detect
endpoint identifies and locates specific objects within images. It returns bounding box coordinates for each detected instance.
Endpoint
POST https://api.moondream.ai/v1/detect
Request Format
Parameter | Type | Required | Description |
---|---|---|---|
image_url | string | Yes | Base64 encoded image with data URI prefix (e.g., "data:image/jpeg;base64,..." ) |
object | string | Yes | The type of object to detect (e.g., "person" , "car" , "face" ) |
Streaming Support
This endpoint does not support streaming responses.
Response Format
{
"request_id": "2025-03-25_detect_2025-03-25-21:00:39-715d03",
"objects": [
{
"x_min": 0.2, // left boundary of detection box (normalized 0-1)
"y_min": 0.3, // top boundary of detection box (normalized 0-1)
"x_max": 0.6, // right boundary of detection box (normalized 0-1)
"y_max": 0.8 // bottom boundary of detection box (normalized 0-1)
},
// Additional objects...
]
}
Coordinate System
Coordinates are normalized to the image dimensions, ranging from 0 to 1:
- (0,0) is the top-left corner of the image
- (1,1) is the bottom-right corner of the image
To convert to pixel coordinates, multiply by the image dimensions:
- pixel_x_min = x_min * image_width
- pixel_y_min = y_min * image_height
- pixel_x_max = x_max * image_width
- pixel_y_max = y_max * image_height
Examples
import moondream as md
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Initialize with API key
model = md.vl(api_key="your-api-key")
# Load an image
image = Image.open("path/to/image.jpg")
# Detect objects (e.g., "person", "car", "face", etc.)
result = model.detect(image, "person")
detections = result["objects"]
request_id = result["request_id"]
print(f"Found {len(detections)} people")
print(f"Request ID: {request_id}")
# Visualize the detections
plt.figure(figsize=(10, 10))
plt.imshow(image)
ax = plt.gca()
for obj in detections:
# Convert normalized coordinates to pixel values
x_min = obj["x_min"] * image.width
y_min = obj["y_min"] * image.height
x_max = obj["x_max"] * image.width
y_max = obj["y_max"] * image.height
# Calculate width and height for the rectangle
width = x_max - x_min
height = y_max - y_min
# Create a rectangle patch
rect = patches.Rectangle(
(x_min, y_min), width, height,
linewidth=2, edgecolor='r', facecolor='none'
)
ax.add_patch(rect)
plt.text(
x_min, y_min, "Person",
color='white', fontsize=12,
bbox=dict(facecolor='red', alpha=0.5)
)
plt.axis('off')
plt.savefig("output_with_detections.jpg")
plt.show()
Common Object Types
Moondream can detect a wide range of objects. Here are some commonly used examples:
- person
- face
- car
- dog
- cat
- building
- furniture
- text
- food
- plant
Zero-Shot Detection
Moondream's object detection is zero-shot, meaning it can detect virtually any object you specify, not just from a predefined list. Try describing the object as specifically as possible for best results.
Performance Considerations
- Detection performance varies based on:
- Image resolution and quality
- Object size relative to the image
- Lighting conditions
- Occlusion (partial visibility)
- Object orientation
Error Handling
Common error responses:
Status Code | Description |
---|---|
400 | Bad Request - Invalid parameters or image format |
401 | Unauthorized - Invalid or missing API key |
413 | Payload Too Large - Image size exceeds limits |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - Server-side issue |
Error Response Format
Error responses are returned in the following format:
{
"error": {
"message": "Detailed error description",
"type": "error_type",
"param": "parameter_name",
"code": "error_code"
}
}
Limitations
- Maximum image size: 10MB
- Supported image formats: JPEG, PNG, GIF (first frame only)
- Detection works best on clearly visible objects
- Multiple small objects may be more challenging to detect
- Rate limits apply based on your plan