The New Standard in Open‑World Segmentation

Moondream Segmentation turns text prompts, points, or boxes into pixel-perfect SVG masks. State-of-the-art segmentation, available today in Moondream Cloud.

Try Segmentation View docs

Open-vocabulary

Beyond fixed classes

Traditional segmentation only recognizes what it was trained on. Need “hairline cracks” or “overripe fruit”? If it's not in the training set, you're stuck — curating data, labeling examples, retraining models.

Moondream understands language, not just labels. Describe any object, any attribute, any spatial relationship. Get pixel-accurate boundaries instantly. Your vocabulary is the only limit.

Real-world demos

Pixel Perfect Prompting

Same API, endless applications. See Segmentation across different domains.

Use Case

Defect Tracking

Use Case

Agriculture

Use Case

Robotics

Use Case

Media

Benchmarks

Top performance, single model

Moondream pairs top-tier grounding with precise segmentation in a single model, removing the need for any multi-model setup.

	Moondream	SAM3	Gemini Flash	SAM3 + Gemini 2.5 Pro
RefCOCO-M	86.9%	42.3%	73.9%	86.3%
RefCOCO	81.8%	39.8%	66.6%	74.9%
RefCOCO+	74.7%	27.0%	60.9%	66.9%
RefCOCOg	76.4%	34.8%	68.1%	73.3%
LVIS	62.6%	62.6%	--	--
Avg Time	5.3s	0.4s	2.6s	10.5s
Cost / 1K images	$0.40	$2.20	$9.00	$40.00

Other Moondream Skills

Query

Answers questions about the image.

Object Detect

Returns bounding rectangles.

Caption

Describes the image.

Point

Returns 2D (x, y) coordinates.

How it works

How Segmentation works

Prompt, points, and boxes can be used alone or together to guide pixel‑accurate SVG polygons.

Inputs

Tell it what to find

Describe it: “the person in blue”, “cracked tiles”, “ripe fruit” — plain English, no class labels
Point to it: Click inside one or more objects to guide the model
Box it in: Draw a rectangle to constrain the search area

Outputs

Get pixel-perfect vectors

SVG polygons that trace exact object boundaries
Render in-browser, export to design tools, or feed downstream pipelines
Compact, editable, resolution-independent

FAQ

Common questions about Segmentation, pricing, and integration.

Object Detect returns bounding rectangles and Point returns specific (x, y) coordinates. Segmentation returns full pixel-accurate polygons that follow object boundaries, giving you precise shapes rather than coarse regions or points.

Yes. Segmentation accepts referring expressions such as "the person in the blue jacket" or "the third car from the left." You can use text prompts alone or combine them with points and boxes.

Segmentation outputs native SVG polygons. These can be rendered directly in the browser, edited in design tools, or stored as lightweight vector masks.

Segmentation uses the same per-token pricing as all other Moondream skills. Every Moondream Cloud account includes $5 in free monthly credits to experiment and build.

Not yet. Segmentation is launching as a cloud-only preview. We plan to add Segmentation to the downloadable model over time.

Across the benchmarks we track (RefCoCo, RefCoCo+, RefCoCoG, and RefCoCoM—M), Moondream outperforms Gemini 2.5 Flash and SAM 3, including SAM 3 when paired with Gemini 2.5 Pro for grounding. Moondream also performs grounding and segmentation within a single model, simplifying integration and reducing latency.

Yes. Segmentation operates on still images and can be applied frame-by-frame to video. You can use the same prompt, points, or boxes on each frame to build a video segmentation workflow.

Running into problems or need help? Come reach us on Discord

Join Discord

The New Standard in Open‑World Segmentation

Beyond fixed classes

Pixel Perfect Prompting

Defect Tracking

Agriculture

Robotics

Media

Top performance, single model

Methodology & Notes

Query

Object Detect

Caption

Point

How Segmentation works

Tell it what to find

Get pixel-perfect vectors

FAQ

How is Segmentation different from Object Detect and Point?

Does Segmentation support open-world text prompts?

What format does Segmentation use for its outputs?

How does pricing work?

Is Segmentation available in the downloadable Moondream model?

How does Moondream compare to Gemini 2.5 Flash and SAM 3?

Can I use Segmentation for video?