Detect API

The /detect endpoint identifies and locates specific objects within images. It returns bounding box coordinates for each detected instance.

Endpoint

POST https://api.moondream.ai/v1/detect

Request Format

Parameter	Type	Required	Description
`image_url`	string	Yes	Base64 encoded image with data URI prefix (e.g., `"data:image/jpeg;base64,..."`)
`object`	string	Yes	The type of object to detect (e.g., `"person"`, `"car"`, `"face"`)

Streaming Support

This endpoint does not support streaming responses.

Response Format

{
  "request_id": "2025-03-25_detect_2025-03-25-21:00:39-715d03",
  "objects": [
    {
      "x_min": 0.2,   // left boundary of detection box (normalized 0-1)
      "y_min": 0.3,   // top boundary of detection box (normalized 0-1)
      "x_max": 0.6,   // right boundary of detection box (normalized 0-1)
      "y_max": 0.8    // bottom boundary of detection box (normalized 0-1)
    },
    // Additional objects...
  ]
}

Coordinate System

Coordinates are normalized to the image dimensions, ranging from 0 to 1:

(0,0) is the top-left corner of the image
(1,1) is the bottom-right corner of the image

To convert to pixel coordinates, multiply by the image dimensions:

pixel_x_min = x_min * image_width
pixel_y_min = y_min * image_height
pixel_x_max = x_max * image_width
pixel_y_max = y_max * image_height

Examples

import moondream as md
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
 
# Initialize with API key
model = md.vl(api_key="your-api-key")
 
# Load an image
image = Image.open("path/to/image.jpg")
 
# Detect objects (e.g., "person", "car", "face", etc.)
result = model.detect(image, "person")
detections = result["objects"]
request_id = result["request_id"]
print(f"Found {len(detections)} people")
print(f"Request ID: {request_id}")
 
# Visualize the detections
plt.figure(figsize=(10, 10))
plt.imshow(image)
ax = plt.gca()
 
for obj in detections:
    # Convert normalized coordinates to pixel values
    x_min = obj["x_min"] * image.width
    y_min = obj["y_min"] * image.height
    x_max = obj["x_max"] * image.width
    y_max = obj["y_max"] * image.height
    
    # Calculate width and height for the rectangle
    width = x_max - x_min
    height = y_max - y_min
    
    # Create a rectangle patch
    rect = patches.Rectangle(
        (x_min, y_min), width, height, 
        linewidth=2, edgecolor='r', facecolor='none'
    )
    ax.add_patch(rect)
    plt.text(
        x_min, y_min, "Person", 
        color='white', fontsize=12,
        bbox=dict(facecolor='red', alpha=0.5)
    )
 
plt.axis('off')
plt.savefig("output_with_detections.jpg")
plt.show()

Common Object Types

Moondream can detect a wide range of objects. Here are some commonly used examples:

person
face
car
dog
cat
building
furniture
text
food
plant

Zero-Shot Detection

Moondream's object detection is zero-shot, meaning it can detect virtually any object you specify, not just from a predefined list. Try describing the object as specifically as possible for best results.

Performance Considerations

Detection performance varies based on:
- Image resolution and quality
- Object size relative to the image
- Lighting conditions
- Occlusion (partial visibility)
- Object orientation

Error Handling

Common error responses:

Status Code	Description
400	Bad Request - Invalid parameters or image format
401	Unauthorized - Invalid or missing API key
413	Payload Too Large - Image size exceeds limits
429	Too Many Requests - Rate limit exceeded
500	Internal Server Error - Server-side issue

Error Response Format

Error responses are returned in the following format:

{
  "error": {
    "message": "Detailed error description",
    "type": "error_type",
    "param": "parameter_name",
    "code": "error_code"
  }
}

Limitations

Maximum image size: 10MB
Supported image formats: JPEG, PNG, GIF (first frame only)
Detection works best on clearly visible objects
Multiple small objects may be more challenging to detect
Rate limits apply based on your plan

On This Page

Endpoint
Request Format
Response Format
Examples
Common Object Types
Performance Considerations
Error Handling
Limitations