Try Moondream Cloud

tl;dr ~ Moondream Cloud makes the moondream model and inference engine available by api in an early beta, available for experimental, non-production use.

Moondream Cloud offers three capabilities:

Caption - provide a thorough english-language description of an image.
Query - answer specific questions about an image.
Detect - identify objects or aspects of an image and provide the location in a bounding box. * - See limitations, below

Accessing Moondream Cloud is free at this time but requires registration in order to obtain API keys and view access logs.

Getting Started

Register for access by signing up (upper right). We use github as our authentication mechanism. Note that the only information or access we request from github is your email address.
This should land you in a prototype dashboard, where you can obtain an api key and view api access logs.
When saving your api key, note that this is a "one-time view" key. If you lose it you will need to make a new one. Which is easy.
Using the API key you can now make requests to the moondream API, see "Calling the API" below.

Limitations of the Current "Proof of Concept" Beta:

Images supported at this time: jpg, png, and webp.
The detect capability will only detect a single object in an image; and if there is no object to detect, it will still return a "best effort" bounding box. Work to support no results and many results is underway.
Counting items and OCR is not reliable for more complex images, but is much improved over early versions of moondream.
Moondream Cloud may generate inaccurate statements, and struggle to understand intricate or nuanced instructions.
Moondream Cloud may not be free from societal biases. Users should be aware of this and exercise caution and critical thinking when using the api.
Moondream Cloud may may generate offensive, inappropriate, or hurtful content if it is prompted to do so; abuse of Moondream Cloud may result in a suspension of your account.
API Access is rate limited, and malicious use or abuse of the system may result in a suspension of your account.

Using the API

Currently the API is available by multi-part POST requests. Client libraries for your favorite language are in development.

Routes:

The https routes are unsurprising:

POST /v1/caption
POST /v1/query
POST /v1/detect
POST /v1/ocr

Parameters:

Every request should be a multi-part post request including an image file of a supported format.
caption does not require a prompt parameter, although it accepts one if you want to use some prompt engineering to get the caption you want.
query requires a prompt parameter, which should be as specific as possible to obtain the desired result
detect requires a prompt perameter which should note the object or aspect you wish to detect in the image.
ocr does not require or make use of a prompt parameter.

Authentication:

Each request should include an authentication header: X-MD-Auth with value being the API key obtained from the dashboard.

API results:

Results are in json format:

{
  "filename": "<name of the input filename">,
  "action": "<capability used in the request>" [caption|query|detect],
  "result": "<requested results>"
}

Note that for caption, query, and ocr the result will be an english language description.

If no text is identified in an image used for an ocr request, Moondream will respond with a phrase such as "There is no discernible text within the image."

For detect the result will be an array of bounding box sets, for example:

  {
    "filename":"file.jpg",
    "result":[[0.18,0.33,0.12,0.98]],
    "action":"detect"
  }

At this time only one bounding box (and always one bounding box) will be returned, but this format is future compatible with mutliple bounding boxes, so that is correctly an array of arrays.

Curl Example:

curl -XPOST -H 'Content-Type: multipart/form-data' -H 'X-MD-Auth: <API-KEY>' -F'body={"prompt": "What is the predominant color in this image?"};type=application/json' -F"content=@<LOCAL-FILE-PATH>" https://api.moondream.ai/v1/query