Moondream logo
Moondream Lens

Lens

The fastest path to production-ready vision AI.

Production-grade accuracy on your use case. No infrastructure. Just an API.

Start fine-tuning

10-20 images

RL can be enough to reach production-quality behavior.

Zero infrastructure

No GPUs, pipelines, or CUDA debugging required.

Same afternoon

Train with Lens and deploy on Photon without a handoff.

Training session

Production-ready in hours

RL + SFT

Loop preview

Images in, scores or labels back, deployment out.

10 steps typical
1. Create rollouts
2. Score or label outputs
3. Call train_step
4. Evaluate and deploy

Run profile

RL data

10-20 imgs

Rollout batch

32-64

Deploy target

Photon

Photon endpoint

https://api.moondream.ai/v1/your-modelDeploy with Photon

Why Lens

Production-ready faster than you'd expect

Getting a vision model to production usually means weeks of data collection, expensive GPU time, and a lot of trial and error. Lens compresses that into hours.

You need less data than you think.

RL fine-tuning works with as few as ten to twenty images. You don't need thousands of labeled examples to get production-quality results.

You need zero infrastructure.

Lens is a pay-as-you-go API. No GPUs to rent, no training pipelines to build, no CUDA errors to debug. You call train_step and the model gets better.

Training actually works.

Moondream converges reliably. You're not spending weeks tweaking hyperparameters and hoping for the best. Most teams see production-quality results in under a day.

How It Works

Two methods. Both are simple.

Whether you're scoring model outputs (RL) or providing labeled examples (SFT), the workflow fits in a few lines of code.

Reinforcement Learning

Let Moondream try, then teach it what good looks like.

Best when you can score outputs programmatically, or when you want to iterate fast with minimal data.

The loop:

  1. Create rollouts. Send your images. Moondream generates outputs for each one.
  2. Score them. Assign scores however you want, programmatically or by hand. You define what "good" means.
  3. Call train_step. Pass in your scored rollouts. The model updates.
  4. Repeat. Accumulate 32 to 64 scored rollouts, train, repeat about 10 times, then run your evals.
import moondream as md

lens = md.Lens()

# 1. Generate rollouts
rollouts = lens.create_rollouts(
    images=my_images,
    prompt="Describe the damage in this photo."
)

# 2. Score them (your logic)
for r in rollouts:
    r.score = my_scoring_function(r.output)

# 3. Train
lens.train_step(rollouts=rollouts)

# 4. Repeat ~10x, then evaluate

Ten images. Ten training steps. A few hours. That's often all it takes.

Supervised Fine-Tuning

Use labeled examples when you already have them.

You have labeled data and want a direct path. Call train_step with your image-output pairs. No rollouts, no scoring loop.

import moondream as md

lens = md.Lens()

# Just call train_step with your labeled data
lens.train_step(
    training_data=[
        {"image": img, "prompt": "...", "output": "..."}
        for img, output in my_dataset
    ]
)

Which method should I use?

Not sure? Start with SFT if you have labeled data. Try RL if you want results with minimal data and fast iteration.

RL

Best for

You can score outputs programmatically, or you have very little data

Data needed

As few as 10-20 images

Strengths

Extremely data-efficient, great for nuanced or open-ended tasks

Typical loop

Rollouts, score, train_step, repeat ~10x

SFT

Best for

You have labeled image-output pairs ready to go

Data needed

500+ labeled examples for best results

Strengths

Fast, predictable, works with standard labeled datasets

Typical loop

train_step, repeat, evaluate

Use Cases

What teams are building with Lens

Vehicle inspection

Detect scratches, dents, hail damage, and missing parts from photos. Fine-tuned models consistently outperform the base model on domain-specific damage categories.

Document understanding

Extract fields from invoices, receipts, forms, and IDs. Train Moondream to understand your specific document layouts and return structured data.

Retail and e-commerce

Classify products, detect shelf placement, verify packaging, and moderate user-uploaded images. Works with catalogs of any size.

Medical imaging

Identify findings in X-rays, dermatology images, pathology slides, and other clinical media. Lens supports HIPAA-compliant training environments.

Manufacturing QA

Catch defects on production lines. Fine-tuned models adapt to your specific products, tolerances, and failure modes.

Your use case

If Moondream can see it, Lens can teach it to understand it. These are starting points, not limits.

Self-Serve Or White-Glove

Do it yourself or let us do it

Self-serve

Move fast with the API

You have data and want to move fast. Use the API to train and deploy on your own schedule. Full control over every parameter. Most teams go from data to production in under a day.

Best fit for teams that already know their eval loop, have clear ownership of the dataset, and want iteration speed more than services.

Start fine-tuning
White-glove

Hand the process to us

You want results without managing the process. Our team handles dataset preparation, scoring strategy, training, evaluation, and deployment. You tell us what the model should do. We make it happen.

This isn't a support tier. It's a service. We do the work.

Pricing

Pay for what you use

No subscriptions, no seat licenses, no minimums. You pay for the training steps you run and the inference you serve through Photon.

Self-serve

Training

Pay-per-step based on model size

Deployment

Photon inference pricing

Support

Docs + community

Pricing

Starts at $X per training step

White-glove

Training

Included in engagement

Deployment

Photon inference pricing

Support

Dedicated team

Pricing

Custom, starting at $X/month

You'll see the cost per step before you start. No surprises.

See full pricing

Deploy With Photon

Trained and deployed in the same afternoon

Every model you fine-tune with Lens is deployment-ready on Photon. No conversion steps, no export headaches, no separate serving infrastructure. One API call.

# Deploy your fine-tuned model
deployment = md.photon.deploy(
    model=lens.model_id,
    target="cloud",  # or "edge" or "on-device"
)

print(deployment.endpoint)  # https://api.moondream.ai/v1/your-model

Your fine-tuned model runs wherever Photon runs: cloud, on-premise, edge devices, or embedded hardware. Same model, same weights, same behavior everywhere.

CloudOn-premiseEdge devicesEmbedded hardware
Learn more about Photon

Results

Less data, less time, better results.

Base Moondream is general-purpose. Fine-tuned Moondream is built for your job.

Vehicle damage detection

Base model accuracy

71%

Fine-tuned accuracy

94%

Method

RL

Data used

20 images

Invoice field extraction

Base model accuracy

68%

Fine-tuned accuracy

96%

Method

SFT

Data used

1,800 examples

Product classification

Base model accuracy

74%

Fine-tuned accuracy

92%

Method

SFT

Data used

4,500 examples

Defect detection (PCB)

Base model accuracy

62%

Fine-tuned accuracy

91%

Method

RL

Data used

15 images

Results vary by use case and data quality. These are representative examples from real fine-tuning runs.

Technical Details

For the engineers who want specifics

Training methods

Both reinforcement learning and supervised fine-tuning are fully supported. Use them independently or combine them: SFT for a solid baseline, RL to refine.

RL details

Rollouts are generated server-side. You score them locally using any logic you want. Accumulate 32 to 64 scored rollouts per train_step call. A typical training loop runs about 10 steps before evaluation.

SFT details

Pass image-output pairs directly to train_step. Standard labeled data formats supported. No minimum batch size enforced, but larger batches produce more stable updates.

Training time

Each train_step call completes in seconds to minutes depending on batch size. A full RL loop (10 steps) typically finishes in under an hour. SFT depends on dataset size but most runs complete in one to two hours.

Evaluation

Bring your own eval set and scoring logic. We surface training metrics in the dashboard, but you define what "good" looks like for your use case.

Model versioning

Every training session produces a versioned model artifact. You can compare versions, roll back, and promote models to production through the API.

Export

Fine-tuned models can be exported in standard formats for self-hosting. No lock-in.

FAQ

Frequently asked questions

Concise answers for teams evaluating Lens for production fine-tuning.

Bottom CTA

Ready to take Moondream to production?

Start fine-tuning

Most teams go from data to production in under a day.

Start fine-tuning

Talk to our team

Tell us what you're building. We'll get you to production-ready together.