/detect

The /detect endpoint identifies and locates specific objects within images. It returns bounding box coordinates for each detected instance.

Endpoint

bash
POST https://api.moondream.ai/v1/detect

Request Format

ParameterTypeRequiredDescription
`image_url`stringYesBase64 encoded image with data URI prefix (e.g., `"data:image/jpeg;base64,..."`)
`object`stringYesThe type of object to detect (e.g., "person", "car", "face")
Streaming Support

This endpoint does not support streaming responses.

Response Format

json
{"request_id": "2025-03-25_detect_2025-03-25-21:00:39-715d03","objects": [  {    "x_min": 0.2,   // left boundary of detection box (normalized 0-1)    "y_min": 0.3,   // top boundary of detection box (normalized 0-1)    "x_max": 0.6,   // right boundary of detection box (normalized 0-1)    "y_max": 0.8    // bottom boundary of detection box (normalized 0-1)  },  // Additional objects...]}
Coordinate System

Coordinates are normalized to the image dimensions, ranging from 0 to 1:

  • (0,0) is the top-left corner of the image
  • (1,1) is the bottom-right corner of the image

To convert to pixel coordinates, multiply by the image dimensions:

  • pixel_x_min = x_min * image_width
  • pixel_y_min = y_min * image_height
  • pixel_x_max = x_max * image_width
  • pixel_y_max = y_max * image_height

Examples

python
import moondream as mdfrom PIL import Imageimport matplotlib.pyplot as pltimport matplotlib.patches as patches # Initialize with API keymodel = md.vl(api_key="your-api-key") # Load an imageimage = Image.open("path/to/image.jpg") # Detect objects (e.g., "person", "car", "face", etc.)result = model.detect(image, "person")detections = result["objects"]request_id = result["request_id"]print(f"Found {len(detections)} people")print(f"Request ID: {request_id}") # Visualize the detectionsplt.figure(figsize=(10, 10))plt.imshow(image)ax = plt.gca() for obj in detections:  # Convert normalized coordinates to pixel values  x_min = obj["x_min"] * image.width  y_min = obj["y_min"] * image.height  x_max = obj["x_max"] * image.width  y_max = obj["y_max"] * image.height   # Calculate width and height for the rectangle  width = x_max - x_min  height = y_max - y_min   # Create a rectangle patch  rect = patches.Rectangle(      (x_min, y_min), width, height,       linewidth=2, edgecolor='r', facecolor='none'  )  ax.add_patch(rect)  plt.text(      x_min, y_min, "Person",       color='white', fontsize=12,      bbox=dict(facecolor='red', alpha=0.5)  ) plt.axis('off')plt.savefig("output_with_detections.jpg")plt.show()

Common Object Types

Moondream can detect a wide range of objects. Here are some commonly used examples:

  • person
  • face
  • car
  • dog
  • cat
  • building
  • furniture
  • text
  • food
  • plant
Zero-Shot Detection

Moondream's object detection is zero-shot, meaning it can detect virtually any object you specify, not just from a predefined list. Try describing the object as specifically as possible for best results.

Performance Considerations

  • Detection performance varies based on:
    • Image resolution and quality
    • Object size relative to the image
    • Lighting conditions
    • Occlusion (partial visibility)
    • Object orientation

Error Handling

Common error responses:

Status CodeDescription
400Bad Request - Invalid parameters or image format
401Unauthorized - Invalid or missing API key
413Payload Too Large - Image size exceeds limits
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server-side issue
Error Response Format

Error responses are returned in the following format:

json
{"error": {  "message": "Detailed error description",  "type": "error_type",  "param": "parameter_name",  "code": "error_code"}}

Limitations

  • Maximum image size: 10MB
  • Supported image formats: JPEG, PNG, GIF (first frame only)
  • Detection works best on clearly visible objects
  • Multiple small objects may be more challenging to detect
  • Rate limits apply based on your plan