Lens
The fastest path to production-ready vision AI.
Production-grade accuracy on your use case. No infrastructure. Just an API.
10-20 images
RL can be enough to reach production-quality behavior.
Zero infrastructure
No GPUs, pipelines, or CUDA debugging required.
Same afternoon
Train with Lens and deploy on Photon without a handoff.
Training session
Production-ready in hours
Loop preview
Images in, scores or labels back, deployment out.
Run profile
RL data
10-20 imgs
Rollout batch
32-64
Deploy target
Photon
Why Lens
Production-ready faster than you'd expect
Getting a vision model to production usually means weeks of data collection, expensive GPU time, and a lot of trial and error. Lens compresses that into hours.
You need less data than you think.
RL fine-tuning works with as few as ten to twenty images. You don't need thousands of labeled examples to get production-quality results.
You need zero infrastructure.
Lens is a pay-as-you-go API. No GPUs to rent, no training pipelines to build, no CUDA errors to debug. You call train_step and the model gets better.
Training actually works.
Moondream converges reliably. You're not spending weeks tweaking hyperparameters and hoping for the best. Most teams see production-quality results in under a day.
How It Works
Two methods. Both are simple.
Whether you're scoring model outputs (RL) or providing labeled examples (SFT), the workflow fits in a few lines of code.
Let Moondream try, then teach it what good looks like.
Best when you can score outputs programmatically, or when you want to iterate fast with minimal data.
The loop:
- Create rollouts. Send your images. Moondream generates outputs for each one.
- Score them. Assign scores however you want, programmatically or by hand. You define what "good" means.
- Call train_step. Pass in your scored rollouts. The model updates.
- Repeat. Accumulate 32 to 64 scored rollouts, train, repeat about 10 times, then run your evals.
import moondream as md
lens = md.Lens()
# 1. Generate rollouts
rollouts = lens.create_rollouts(
images=my_images,
prompt="Describe the damage in this photo."
)
# 2. Score them (your logic)
for r in rollouts:
r.score = my_scoring_function(r.output)
# 3. Train
lens.train_step(rollouts=rollouts)
# 4. Repeat ~10x, then evaluateTen images. Ten training steps. A few hours. That's often all it takes.
Use labeled examples when you already have them.
You have labeled data and want a direct path. Call train_step with your image-output pairs. No rollouts, no scoring loop.
import moondream as md
lens = md.Lens()
# Just call train_step with your labeled data
lens.train_step(
training_data=[
{"image": img, "prompt": "...", "output": "..."}
for img, output in my_dataset
]
)Which method should I use?
Not sure? Start with SFT if you have labeled data. Try RL if you want results with minimal data and fast iteration.
RL
Best for
You can score outputs programmatically, or you have very little data
Data needed
As few as 10-20 images
Strengths
Extremely data-efficient, great for nuanced or open-ended tasks
Typical loop
Rollouts, score, train_step, repeat ~10x
SFT
Best for
You have labeled image-output pairs ready to go
Data needed
500+ labeled examples for best results
Strengths
Fast, predictable, works with standard labeled datasets
Typical loop
train_step, repeat, evaluate
Use Cases
What teams are building with Lens
Vehicle inspection
Detect scratches, dents, hail damage, and missing parts from photos. Fine-tuned models consistently outperform the base model on domain-specific damage categories.
Document understanding
Extract fields from invoices, receipts, forms, and IDs. Train Moondream to understand your specific document layouts and return structured data.
Retail and e-commerce
Classify products, detect shelf placement, verify packaging, and moderate user-uploaded images. Works with catalogs of any size.
Medical imaging
Identify findings in X-rays, dermatology images, pathology slides, and other clinical media. Lens supports HIPAA-compliant training environments.
Manufacturing QA
Catch defects on production lines. Fine-tuned models adapt to your specific products, tolerances, and failure modes.
Your use case
If Moondream can see it, Lens can teach it to understand it. These are starting points, not limits.
Self-Serve Or White-Glove
Do it yourself or let us do it
Move fast with the API
You have data and want to move fast. Use the API to train and deploy on your own schedule. Full control over every parameter. Most teams go from data to production in under a day.
Best fit for teams that already know their eval loop, have clear ownership of the dataset, and want iteration speed more than services.
Hand the process to us
You want results without managing the process. Our team handles dataset preparation, scoring strategy, training, evaluation, and deployment. You tell us what the model should do. We make it happen.
This isn't a support tier. It's a service. We do the work.
Pricing
Pay for what you use
No subscriptions, no seat licenses, no minimums. You pay for the training steps you run and the inference you serve through Photon.
Self-serve
Training
Pay-per-step based on model size
Deployment
Photon inference pricing
Support
Docs + community
Pricing
Starts at $X per training step
White-glove
Training
Included in engagement
Deployment
Photon inference pricing
Support
Dedicated team
Pricing
Custom, starting at $X/month
You'll see the cost per step before you start. No surprises.
See full pricingDeploy With Photon
Trained and deployed in the same afternoon
Every model you fine-tune with Lens is deployment-ready on Photon. No conversion steps, no export headaches, no separate serving infrastructure. One API call.
# Deploy your fine-tuned model
deployment = md.photon.deploy(
model=lens.model_id,
target="cloud", # or "edge" or "on-device"
)
print(deployment.endpoint) # https://api.moondream.ai/v1/your-modelYour fine-tuned model runs wherever Photon runs: cloud, on-premise, edge devices, or embedded hardware. Same model, same weights, same behavior everywhere.
Results
Less data, less time, better results.
Base Moondream is general-purpose. Fine-tuned Moondream is built for your job.
Vehicle damage detection
Base model accuracy
71%
Fine-tuned accuracy
94%
Method
RL
Data used
20 images
Invoice field extraction
Base model accuracy
68%
Fine-tuned accuracy
96%
Method
SFT
Data used
1,800 examples
Product classification
Base model accuracy
74%
Fine-tuned accuracy
92%
Method
SFT
Data used
4,500 examples
Defect detection (PCB)
Base model accuracy
62%
Fine-tuned accuracy
91%
Method
RL
Data used
15 images
Results vary by use case and data quality. These are representative examples from real fine-tuning runs.
Technical Details
For the engineers who want specifics
Training methods
Both reinforcement learning and supervised fine-tuning are fully supported. Use them independently or combine them: SFT for a solid baseline, RL to refine.
RL details
Rollouts are generated server-side. You score them locally using any logic you want. Accumulate 32 to 64 scored rollouts per train_step call. A typical training loop runs about 10 steps before evaluation.
SFT details
Pass image-output pairs directly to train_step. Standard labeled data formats supported. No minimum batch size enforced, but larger batches produce more stable updates.
Training time
Each train_step call completes in seconds to minutes depending on batch size. A full RL loop (10 steps) typically finishes in under an hour. SFT depends on dataset size but most runs complete in one to two hours.
Evaluation
Bring your own eval set and scoring logic. We surface training metrics in the dashboard, but you define what "good" looks like for your use case.
Model versioning
Every training session produces a versioned model artifact. You can compare versions, roll back, and promote models to production through the API.
Export
Fine-tuned models can be exported in standard formats for self-hosting. No lock-in.
FAQ
Frequently asked questions
Concise answers for teams evaluating Lens for production fine-tuning.
Ready to take Moondream to production?
Talk to our team
Tell us what you're building. We'll get you to production-ready together.