Simple pricing for Moondream.

Models and Photon are free. Cloud and Lens are pay-per-token. Paid plans add collaboration and support.

Start free Start Team

Pick what you need.

Photon

Free

The world's fastest VLM inference engine — runs everywhere, from edge to server-class hardware.

$npm install moondream

Learn more about Photon

Models

Free

Open and free to use — hosting them as a cloud service requires our license.

..or click here if you really want to download them: Moondream 2 · Moondream 3 (preview)

Cloud

Pay per token

Hosted Moondream inference — lightning fast, always on, with HIPAA and SOC 2 available.

Moondream 3.1 9B A2B — Real-time

$0.30 / 1M input tokens

$1.00 / 1M output tokens

Moondream 3.1 9B A2B — Batch

$0.15 / 1M input tokens

$0.50 / 1M output tokens

Moondream 3 Preview — Real-time

$0.30 / 1M input tokens

$2.50 / 1M output tokens

Moondream 3 Preview — Batch

$0.15 / 1M input tokens

$1.25 / 1M output tokens

Start building

Lens

Pay per token

Fine-tune Moondream for your task.

Rollouts

Same as Cloud inference

Training

$0.60 / 1M training tokens

Fine-tune a model

Need collaboration or support?

Paid plans add seats, shared resources, billing, and direct support. They do not change token pricing.

Free

$0/ month

Best for individual developers.

Start free

1 seat
Personal workspace
Discord support
$5/month usage credits

Recommended

Team

$350/ month

or $300/mo billed annually

Best for teams building together.

Start Team

10 seats
Shared fine-tunes
Shared projects
Shared API keys
Shared billing
Dedicated Slack
1h/month consulting
$5/month usage credits

Additional seats available. Contact us.

Enterprise

Custom

Best for larger, regulated, or custom deployments.

Custom seats
HIPAA / BAA available
Security review
Priority support
Custom deployment support
Custom billing

Paid plans do not change Cloud or Lens token pricing.

Good to know

Every workspace includes $5/month usage credits.
Batch and fine-tune (LoRA) multipliers are the same for every model.
Fine-tuned models use the same inference pricing as base models.
Fine-tunes are unlimited. Usage and rate limits still apply.
Paid plans are for collaboration and support, not cheaper tokens.