Minimal output
The lightest localization primitive
Traditional models output coordinates as verbose text—hundreds of tokens to describe a single location. Need batch processing or real-time inference? You're paying for every character.
Moondream uses dedicated grounding tokens. Two tokens per point. No overhead, no parsing, just normalized (x, y) coordinates ready for robots, UI automation, or counting pipelines.
Real-world demos
Precision ready for production
Same API, endless applications. See Point in action across different domains.
Use Case
UI automation
Use Case
Robotics
Use Case
Industrial
Use Case
Damage detection
How it works
How point works
Describe what you're looking for in natural language and get precise (x, y) coordinates instantly.
"damaged cookies"
Output
738ms • 744 tokens • $0.000239
Try it{
"points": [
{ "x": 0.250, "y": 0.789 },
{ "x": 0.373, "y": 0.548 },
{ "x": 0.674, "y": 0.360 }
]
}Code
import moondream as md
from PIL import Image
# Initialize with API key
model = md.vl(api_key="your-api-key")
# Load your image
image = Image.open("production_line.jpg")
# Point to defects using natural language
result = model.point(image, "damaged cookies")
print(result["points"])FAQ
FAQ
Common questions about Point, pricing, and integration.
Point returns a single (x, y) coordinate for the location you describe, Object Detection returns bounding boxes, and Segment returns pixel-accurate polygons. Point is ideal when you need the exact center or a specific feature of an object, without the overhead of boundaries or masks.
Yes. Point accepts natural language descriptions like "center of the plate," "tip of the pencil," or "animal's nose." You can describe any feature, position, or spatial relationship.
Yes. Point can return coordinates for multiple instances matching your description. For example, asking for "centers of all the apples" will return coordinates for each apple in the image.
Point returns normalized (x, y) coordinates between 0 and 1, making it easy to scale to any image resolution or integrate with downstream systems.
Point excels at robotic grasping (finding grip points), UI automation (clicking specific elements), spatial analysis (measuring distances), and any application that needs exact position data without the overhead of full object boundaries.
Point uses the same per-token pricing as all other Moondream skills. Every Moondream Cloud account includes $5 in free monthly credits to experiment and build.
Yes. Point is available in both Moondream Cloud and the downloadable model, giving you flexibility to run it wherever you need.
Point leverages the same grounding capabilities as Moondream's Object Detection, achieving state-of-the-art accuracy on standard benchmarks like RefCOCO, RefCOCOg, and RefCOCO+. Coordinates are precise to the pixel level.
Running into problems or need help? Come reach us on Discord
Join Discord