- Documentation
- Advanced
Caption API
The /caption
endpoint generates natural language descriptions of images, from brief summaries to detailed explanations of visual content.
Endpoint
POST https://api.moondream.ai/v1/caption
Request Format
Parameter | Type | Required | Description |
---|---|---|---|
image_url | string | Yes | Base64 encoded image with data URI prefix (e.g., "data:image/jpeg;base64,..." ) |
length | string | No | Caption detail level: "short" or "normal" (default: "normal" ) |
stream | boolean | No | Whether to stream the response token by token (default: false ) |
Response Format
For non-streaming responses:
{
"request_id": "2025-03-25_caption_2025-03-25-21:00:39-715d03",
"caption": "A detailed caption describing the image..."
}
For streaming responses, you'll receive a series of data events:
{
data: {"chunk": "A scene ", "completed": false, "request_id": "2025-03-25_caption_123456"}
data: {"chunk": "showing a mountain", "completed": false, "request_id": "2025-03-25_caption_123456"}
data: {"chunk": " landscape.", "completed": false, "request_id": "2025-03-25_caption_123456"}
data: {"completed": true, "chunk": "", "org_id": "a349504c-8006-54f3-8862-ba0c41d2b4d7", "request_id": "2025-03-25_caption_123456"}
}
Examples
import moondream as md
from PIL import Image
# Initialize with API key
model = md.vl(api_key="your-api-key")
# Load an image
image = Image.open("path/to/image.jpg")
# Generate a short caption
result = model.caption(image, length="short")
caption = result["caption"]
request_id = result["request_id"]
print(f"Short caption: {caption}")
print(f"Request ID: {request_id}")
# Generate a detailed caption
result = model.caption(image, length="normal")
detailed_caption = result["caption"]
print(f"Detailed caption: {detailed_caption}")
# Stream the caption generation
stream_result = model.caption(image, length="normal", stream=True)
for chunk in stream_result["chunk"]: # Note: Uses "chunk" field, not "caption"
print(chunk, end="", flush=True)
Caption Length Options
Length Descriptions
- short: Brief 1-2 sentence summary (e.g., "A red car parked on a street.")
- normal: Detailed description covering elements, context, colors, positioning, etc.
Use Cases
- Generating alt text for accessibility
- Content indexing and organization
- Image search functionality
- Social media content creation
- Automated reporting and documentation
Error Handling
Common error responses:
Status Code | Description |
---|---|
400 | Bad Request - Invalid parameters or image format |
401 | Unauthorized - Invalid or missing API key |
413 | Payload Too Large - Image size exceeds limits |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - Server-side issue |
Error Response Format
Error responses are returned in the following format:
{
"error": {
"message": "Detailed error description",
"type": "error_type",
"param": "parameter_name",
"code": "error_code"
}
}
Limitations
- Maximum image size: 10MB
- Supported image formats: JPEG, PNG, GIF (first frame only)
- Rate limits apply based on your plan