/detect
The /detect
endpoint identifies and locates specific objects within images. It returns bounding box coordinates for each detected instance.
Endpoint
bash
POST https://api.moondream.ai/v1/detect
Request Format
Parameter | Type | Required | Description |
---|---|---|---|
`image_url` | string | Yes | Base64 encoded image with data URI prefix (e.g., `"data:image/jpeg;base64,..."`) |
`object` | string | Yes | The type of object to detect (e.g., "person", "car", "face") |
Streaming Support
This endpoint does not support streaming responses.
Response Format
json
{"request_id": "2025-03-25_detect_2025-03-25-21:00:39-715d03","objects": [ { "x_min": 0.2, // left boundary of detection box (normalized 0-1) "y_min": 0.3, // top boundary of detection box (normalized 0-1) "x_max": 0.6, // right boundary of detection box (normalized 0-1) "y_max": 0.8 // bottom boundary of detection box (normalized 0-1) }, // Additional objects...]}
Coordinate System
Coordinates are normalized to the image dimensions, ranging from 0 to 1:
- (0,0) is the top-left corner of the image
- (1,1) is the bottom-right corner of the image
To convert to pixel coordinates, multiply by the image dimensions:
- pixel_x_min = x_min * image_width
- pixel_y_min = y_min * image_height
- pixel_x_max = x_max * image_width
- pixel_y_max = y_max * image_height
Examples
python
import moondream as mdfrom PIL import Imageimport matplotlib.pyplot as pltimport matplotlib.patches as patches # Initialize with API keymodel = md.vl(api_key="your-api-key") # Load an imageimage = Image.open("path/to/image.jpg") # Detect objects (e.g., "person", "car", "face", etc.)result = model.detect(image, "person")detections = result["objects"]request_id = result["request_id"]print(f"Found {len(detections)} people")print(f"Request ID: {request_id}") # Visualize the detectionsplt.figure(figsize=(10, 10))plt.imshow(image)ax = plt.gca() for obj in detections: # Convert normalized coordinates to pixel values x_min = obj["x_min"] * image.width y_min = obj["y_min"] * image.height x_max = obj["x_max"] * image.width y_max = obj["y_max"] * image.height # Calculate width and height for the rectangle width = x_max - x_min height = y_max - y_min # Create a rectangle patch rect = patches.Rectangle( (x_min, y_min), width, height, linewidth=2, edgecolor='r', facecolor='none' ) ax.add_patch(rect) plt.text( x_min, y_min, "Person", color='white', fontsize=12, bbox=dict(facecolor='red', alpha=0.5) ) plt.axis('off')plt.savefig("output_with_detections.jpg")plt.show()
Common Object Types
Moondream can detect a wide range of objects. Here are some commonly used examples:
- person
- face
- car
- dog
- cat
- building
- furniture
- text
- food
- plant
Zero-Shot Detection
Moondream's object detection is zero-shot, meaning it can detect virtually any object you specify, not just from a predefined list. Try describing the object as specifically as possible for best results.
Performance Considerations
- Detection performance varies based on:
- Image resolution and quality
- Object size relative to the image
- Lighting conditions
- Occlusion (partial visibility)
- Object orientation
Error Handling
Common error responses:
Status Code | Description |
---|---|
400 | Bad Request - Invalid parameters or image format |
401 | Unauthorized - Invalid or missing API key |
413 | Payload Too Large - Image size exceeds limits |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - Server-side issue |
Error Response Format
Error responses are returned in the following format:
json
{"error": { "message": "Detailed error description", "type": "error_type", "param": "parameter_name", "code": "error_code"}}
Limitations
- Maximum image size: 10MB
- Supported image formats: JPEG, PNG, GIF (first frame only)
- Detection works best on clearly visible objects
- Multiple small objects may be more challenging to detect
- Rate limits apply based on your plan