Moondream logo

The New Standard in Open‑World Segmentation

Moondream Segmentation turns text prompts, points, or boxes into pixel-accurate SVG. State-of-the-art segmentation, available today in Moondream Cloud.

men crossing road
football players
Kimera sports car
beautiful bird
train engine
men crossing road
football players
Kimera sports car
beautiful bird
train engine
Open-vocabulary

Beyond fixed classes

Traditional segmentation only recognizes what it was trained on. Need “hairline cracks” or “overripe fruit”? If it's not in the training set, you're stuck — curating data, labeling examples, retraining models.

Moondream understands language, not just labels. Describe any object, any attribute, any spatial relationship. Get pixel-accurate boundaries instantly. Your vocabulary is the only limit.

Real-world demos

Pixel Perfect Prompting

Same API, endless applications. See Segmentation across different domains.

Defect Tracking

Use Case

Defect Tracking

Agriculture

Use Case

Agriculture

Robotics

Use Case

Robotics

Media

Use Case

Media

Benchmarks

Top performance, single model

Moondream pairs top-tier grounding with precise segmentation in a single model, removing the need for any multi-model setup.

Moondream
SAM3
Gemini Flash
SAM3 + Gemini 2.5 Pro
RefCOCO-M
86.9%
42.3%
73.9%
86.3%
RefCOCO
81.8%
39.8%
66.6%
74.9%
RefCOCO+
74.7%
27.0%
60.9%
66.9%
RefCOCOg
76.4%
34.8%
68.1%
73.3%
LVIS
62.6%
62.6%
--
--
Avg Time
5.3s
0.4s
2.6s
10.5s
Cost / 1K images
$0.40
$2.20
$9.00
$40.00

Other Moondream Skills
How it works

How Segmentation works

Prompt, points, and boxes can be used alone or together to guide pixel‑accurate SVG polygons.

Inputs

Tell it what to find

  • Describe it: “the person in blue”, “cracked tiles”, “ripe fruit” — plain English, no class labels
  • Point to it: Click inside one or more objects to guide the model
  • Box it in: Draw a rectangle to constrain the search area

Outputs

Get pixel-perfect vectors

  • SVG polygons that trace exact object boundaries
  • Render in-browser, export to design tools, or feed downstream pipelines
  • Compact, editable, resolution-independent
FAQ

FAQ

Common questions about Segmentation, pricing, and integration.

Object Detect returns bounding rectangles and Point returns specific (x, y) coordinates. Segmentation returns full pixel-accurate polygons that follow object boundaries, giving you precise shapes rather than coarse regions or points.

Yes. Segmentation accepts referring expressions such as "the person in the blue jacket" or "the third car from the left." You can use text prompts alone or combine them with points and boxes.

Segmentation outputs native SVG polygons. These can be rendered directly in the browser, edited in design tools, or stored as lightweight vector masks.

Segmentation uses the same per-token pricing as all other Moondream skills. Every Moondream Cloud account includes $5 in free monthly credits to experiment and build.

Not yet. Segmentation is launching as a cloud-only preview. We plan to add Segmentation to the downloadable model over time.

Across the benchmarks we track (RefCoCo, RefCoCo+, RefCoCoG, and RefCoCoM—M), Moondream outperforms Gemini 2.5 Flash and SAM 3, including SAM 3 when paired with Gemini 2.5 Pro for grounding. Moondream also performs grounding and segmentation within a single model, simplifying integration and reducing latency.

Yes. Segmentation operates on still images and can be applied frame-by-frame to video. You can use the same prompt, points, or boxes on each frame to build a video segmentation workflow.
Running into problems or need help? Come reach us on Discord
Join Discord