Moondream launches playground.
November 26, 2024
Today we're re-launching our Playground. It's an interactive tool that lets you kick the tires, and evaluate how Moondream might work for your use cases.
The first change you'll notice is the improved UX. Our favorite new feature is the prompt suggester. The playground analyzes the image and automatically generates prompt suggestions for you.
Secondly, you may notice how speedy the responses are. Moondream is a tiny 1.9B model, which means it not only runs everywhere, it also runs blazingly fast on PCs and servers. The playground runs on NVIDIA L40S, which are comparable to NVIDIA 3090's – something you see on mainstream PCs.
Last is the introduction of "Capabilities". A key feature of VLMs is that they understand human-style questions ("prompts"). That's great when you're using it as a tool. However when you're developing an app, you likely want more control over how the model behaves. Capabilities let you specify exactly what type of answers you want. This release features three of them:
Visual Question Answering (VQA): Our most generalized capability. Ask it any question, and it responds in a human-like way. For example, "What kind of cake is this?", "What color is the left surfboard", or "Is there a leak in this pipeline?".
Object detection: Get a list of bounding box coordinates for items that match the prompt. For example: "headlights", "green shirt", or "Ferrari car".
Image captioning: Get accurate, descriptive text for any image, either in short or long form. This is great for annotating images and video frames.
Try it out and let us know what you think (our Discord channel). We hope the playground helps you on your journey to build amazing new things.