Caption API

The /caption endpoint generates natural language descriptions of images, from brief summaries to detailed explanations of visual content.

Endpoint

POST https://api.moondream.ai/v1/caption

Request Format

Parameter	Type	Required	Description
`image_url`	string	Yes	Base64 encoded image with data URI prefix (e.g., `"data:image/jpeg;base64,..."`)
`length`	string	No	Caption detail level: `"short"` or `"normal"` (default: `"normal"`)
`stream`	boolean	No	Whether to stream the response token by token (default: `false`)

Response Format

For non-streaming responses:

{
  "request_id": "2025-03-25_caption_2025-03-25-21:00:39-715d03",
  "caption": "A detailed caption describing the image..."
}

For streaming responses, you'll receive a series of data events:

{
  data: {"chunk": "A scene ", "completed": false, "request_id": "2025-03-25_caption_123456"}
  data: {"chunk": "showing a mountain", "completed": false, "request_id": "2025-03-25_caption_123456"}
  data: {"chunk": " landscape.", "completed": false, "request_id": "2025-03-25_caption_123456"}
  data: {"completed": true, "chunk": "", "org_id": "a349504c-8006-54f3-8862-ba0c41d2b4d7", "request_id": "2025-03-25_caption_123456"}
}

Examples

import moondream as md
from PIL import Image
 
# Initialize with API key
model = md.vl(api_key="your-api-key")
 
# Load an image
image = Image.open("path/to/image.jpg")
 
# Generate a short caption
result = model.caption(image, length="short")
caption = result["caption"]
request_id = result["request_id"]
print(f"Short caption: {caption}")
print(f"Request ID: {request_id}")
 
# Generate a detailed caption
result = model.caption(image, length="normal")
detailed_caption = result["caption"]
print(f"Detailed caption: {detailed_caption}")
 
# Stream the caption generation
stream_result = model.caption(image, length="normal", stream=True)
for chunk in stream_result["chunk"]:  # Note: Uses "chunk" field, not "caption"
    print(chunk, end="", flush=True)

Caption Length Options

Length Descriptions

short: Brief 1-2 sentence summary (e.g., "A red car parked on a street.")
normal: Detailed description covering elements, context, colors, positioning, etc.

Use Cases

Generating alt text for accessibility
Content indexing and organization
Image search functionality
Social media content creation
Automated reporting and documentation

Error Handling

Common error responses:

Status Code	Description
400	Bad Request - Invalid parameters or image format
401	Unauthorized - Invalid or missing API key
413	Payload Too Large - Image size exceeds limits
429	Too Many Requests - Rate limit exceeded
500	Internal Server Error - Server-side issue

Error Response Format

Error responses are returned in the following format:

{
  "error": {
    "message": "Detailed error description",
    "type": "error_type",
    "param": "parameter_name",
    "code": "error_code"
  }
}

Limitations

Maximum image size: 10MB
Supported image formats: JPEG, PNG, GIF (first frame only)
Rate limits apply based on your plan

On This Page

Endpoint
Request Format
Response Format
Examples
Caption Length Options
Use Cases
Error Handling
Limitations