We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
tl;dr ~ Moondream Cloud makes the moondream model and inference engine available by api in an early beta, available for experimental, non-production use.
Moondream Cloud offers three capabilities:
Caption
- provide a thorough english-language description of an image.Query
- answer specific questions about an image.Detect
- identify objects or aspects of an image and provide the location in a bounding box. * - See limitations, below
Accessing Moondream Cloud is free at this time but requires registration in order to obtain API keys and view access logs.
jpg
, png
, and webp
.
detect
capability will only detect a single object in an image; and if there is no
object to detect, it will still return a "best effort" bounding box. Work to support no results and many results is underway.
Currently the API is available by multi-part POST requests. Client libraries for your favorite language are in development.
The https routes are unsurprising:
POST /v1/caption
POST /v1/query
POST /v1/detect
POST /v1/ocr
caption
does not require a prompt
parameter, although it accepts one if you want to use some prompt engineering to get the caption you want.
query
requires a prompt
parameter, which should be as specific as possible to obtain the desired result
detect
requires a prompt
perameter which should note the object or aspect you wish to detect in the image.
ocr
does not require or make use of a prompt
parameter.
Each request should include an authentication header: X-MD-Auth
with value being the API key obtained from the dashboard.
Results are in json format:
{
"filename": "<name of the input filename">,
"action": "<capability used in the request>" [caption|query|detect],
"result": "<requested results>"
}
Note that for caption
, query
, and ocr
the result will be an english language description.
If no text is identified in an image used for an ocr
request, Moondream will respond with a phrase such as "There is no discernible text within the image."
For detect
the result will be an array of bounding box sets, for example:
{
"filename":"file.jpg",
"result":[[0.18,0.33,0.12,0.98]],
"action":"detect"
}
At this time only one bounding box (and always one bounding box) will be returned, but this format is future compatible with mutliple bounding boxes, so that is correctly an array of arrays.
curl -XPOST -H 'Content-Type: multipart/form-data' -H 'X-MD-Auth: <API-KEY>' -F'body={"prompt": "What is the predominant color in this image?"};type=application/json' -F"content=@<LOCAL-FILE-PATH>" https://api.moondream.ai/v1/query