Moondream logo

Pricing

Moondream Cloud Pricing

Moondream runs anywhere. For the easiest, fastest, and cheapest way to use it, with no setup required, there’s Moondream Cloud.

Standard Plan

Pay-as-you-go for builders and teams.

Usage-basedBilled by tokens used (input and output counted separately)

Price

  • $0.30 per 1,000,000 input tokens
  • $2.50 per 1,000,000 output tokens

What you get

  • Usage-based billing. Input and output are counted separately and billed per token.
  • $5 in free credits every month to jumpstart experiments.
  • Dense grounding tokens. Do more with fewer tokens for grounding-heavy workloads.
  • Start instantly. No sales call required.
  • Privacy first. We never train on your data.

Validity & refunds

  • Prepaid tokens are valid for 12 months.
  • Unused tokens expire after one year and are non-refundable (except where required by law).
Enterprise Plan

Tailored for regulated and high-scale workloads.

Custom Pricing

What you get

  • Custom pricing & terms. Volume, commit, or bespoke commercial agreements.
  • Compliance options. HIPAA support for covered workloads.
  • Deployment flexibility. Run in your own environment (on-prem or VPC) for full data control.
  • Performance at scale. Higher/dedicated throughput and capacity, aligned to your SLA needs.
  • Dedicated support. Priority channel with 24/7 incident response.
  • Data controls. No training on your data; enterprise retention and governance options.
FAQ

Answers for the most common questions

You pre-purchase tokens (credits). As you make requests, we deduct the exact number of input and output tokens used. When your balance runs low, you can top up instantly. If you prefer, you can also use our “auto-top-up” feature to automatically top-up your account when it falls below a specialized amount.

Yes. Prepaid tokens are valid for 12 months from the purchase date. Unused tokens expire at the end of that period and are non-refundable (except where required by law).

No. Your request data is never used to train our models.

Operational logs are retained only as needed to run and secure the service. Enterprise customers can request stricter retention or zero-retention options, and on-prem/VPC deployments keep data within your environment.

Moondream uses a denser tokenization scheme (superBPE) plus dedicated grounding tokens that represent x,y coordinates and bounding boxes. This means we generate answers using fewer tokens, making Moondream both faster and cheaper to run. You get quicker responses and lower costs per task compared to standard vision models.

Yes! You can download and run Moondream locally (see our running locally guide). For even better performance, Moondream Cloud uses a custom inference engine optimized specifically for Moondream, delivering4-5x faster inference. If you want to deploy this optimized engine on-prem or in your own VPC, contact us about an Enterprise Plan.

You may see rate-limit errors on the Standard Plan. Reach out and we'll right-size your limits; the Enterprise Plan provides higher or dedicated capacity aligned to your needs.

Often, yes for grounding-heavy workloads—because dense tokens typically mean fewer tokens per request. Your actual cost depends on usage patterns and response sizes.

Absolutely. Start on Standard and talk to Sales any time to migrate to an Enterprise agreement.

Need a custom deployment or volume pricing?

Our team can help with private cloud, on-prem deployments, or bespoke SLAs tailored to your industry requirements.