/point

The /point endpoint provides precise coordinate locations for specific objects in images. Unlike /detect which returns bounding boxes, this endpoint returns center points for each instance of the specified object.

Endpoint

bash
POST https://api.moondream.ai/v1/point

Request Format

ParameterTypeRequiredDescription
`image_url`stringYesBase64 encoded image with data URI prefix (e.g., `"data:image/jpeg;base64,..."`)
`object`stringYesThe type of object to locate (e.g., "person", "car", "face")
Streaming Support

This endpoint does not support streaming responses.

Response Format

json
{"request_id": "2025-03-25_point_2025-03-25-21:00:39-715d03","points": [  {    "x": 0.65,      // x coordinate (normalized 0-1)    "y": 0.42       // y coordinate (normalized 0-1)  },  // Additional points...]}
Coordinate System

Coordinates are normalized to the image dimensions, ranging from 0 to 1:

  • (0,0) is the top-left corner of the image
  • (1,1) is the bottom-right corner of the image

To convert to pixel coordinates, multiply by the image dimensions:

  • pixel_x = x * image_width
  • pixel_y = y * image_height

Examples

python
import moondream as mdfrom PIL import Imageimport matplotlib.pyplot as plt # Initialize with API keymodel = md.vl(api_key="your-api-key") # Load an imageimage = Image.open("path/to/image.jpg") # Locate objects (e.g., "person", "car", "face", etc.)result = model.point(image, "face")points = result["points"]request_id = result["request_id"]print(f"Found {len(points)} faces")print(f"Request ID: {request_id}") # Visualize the pointsplt.figure(figsize=(10, 10))plt.imshow(image) for point in points:  # Convert normalized coordinates to pixel values  x = point["x"] * image.width  y = point["y"] * image.height   # Plot the point  plt.plot(x, y, 'ro', markersize=15, alpha=0.7)  plt.text(      x + 10, y, "Face",       color='white', fontsize=12,      bbox=dict(facecolor='red', alpha=0.5)  ) plt.axis('off')plt.savefig("output_with_points.jpg")plt.show()

Use Cases

  • Interactive image experiences
  • Accurate object counting
  • Heat map generation
  • Creating image annotations
  • Precise focus point identification for photography applications

Difference Between /point and /detect

Point vs. Detect
  • /point returns center coordinates only (x, y)
  • /detect returns bounding boxes (x_min, y_min, x_max, y_max)

Use /point when you need:

  • Precise object center points
  • Simpler data for plotting
  • Multiple instance counting

Use /detect when you need:

  • Object size information
  • Visual highlighting of complete objects
  • Cropping regions of interest

Common Object Types

Moondream can locate a wide range of objects. Here are some commonly used examples:

  • person
  • face
  • car
  • dog
  • cat
  • building
  • furniture
  • text
  • food
  • plant
Zero-Shot Detection

Like object detection, Moondream's pointing is zero-shot, meaning it can locate virtually any object you specify, not just from a predefined list. Try describing the object as specifically as possible for best results.

Error Handling

Common error responses:

Status CodeDescription
400Bad Request - Invalid parameters or image format
401Unauthorized - Invalid or missing API key
413Payload Too Large - Image size exceeds limits
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server-side issue
Error Response Format

Error responses are returned in the following format:

json
{"error": {  "message": "Detailed error description",  "type": "error_type",  "param": "parameter_name",  "code": "error_code"}}

Limitations

  • Maximum image size: 10MB
  • Supported image formats: JPEG, PNG, GIF (first frame only)
  • Detection works best on clearly visible objects
  • Multiple small objects may be more challenging to locate
  • Rate limits apply based on your plan