Moondream logo

Rock Paper Scissors

Classify real photos of hand gestures as rock, paper, or scissors. With only 5 training examples per class and 50 RL steps, accuracy jumps from 54.8% to 98.8%. This demo highlights extreme data efficiency: a useful model from almost no training data.

What's a finetune?

A finetune is a custom-trained version of Moondream, optimized for your specific task using your own data. It starts from the base model and learns to perform better on exactly the kind of images and questions you care about.

Base Moondream 3 Preview

Result

paper

Incorrect

Fine-tuned Moondream 3 Preview

Result

scissors

Correct

Results

See it in action

These examples use real inputs from the fine-tune and show the output it produces on the task.

Hand gesture photo showing scissors.

Scissors. The base model said "paper." The fine-tuned model said "scissors" correctly.

Hand gesture photo showing paper.

Paper. The base model said "rock." The fine-tuned model said "paper" correctly.

Hand gesture photo showing rock.

Rock. The base model said "paper." The fine-tuned model said "rock" correctly.

Getting Started

What is fine-tuning?

Fine-tuning takes a general-purpose vision model and trains it on your specific task. You provide examples of what you want the model to recognize, and the model learns to do that one thing very well. The result is a custom model that performs far better on your task than the base model does out of the box.

Who is this for?

Anyone building a product or system that needs to understand images. You do not need a machine learning background. If you can collect example images and describe what you want the model to see, you can fine-tune Lens.

For teams that want hands-on help, we also offer a white glove fine-tuning service.

How it works

1

Prepare your data.

Collect images that represent your task. Label them with the outputs you want. For object detection, that means drawing bounding boxes. For classification, that means assigning categories.

2

Train with the API.

Send your data to the Moondream Lens API. Choose SFT to teach the model from labeled examples, or RL to optimize outputs against a scoring function. Training runs on our infrastructure. There is no hardware to manage.

3

Deploy your model.

Your fine-tuned model is ready to use through the Moondream API or through Photon, our self-hosted inference engine. Run it in the cloud or on your own hardware.

Code

See the code

These examples show the minimum Python you need to run this workflow with SFT or RL.

import moondream as md

lens = md.Lens()

# Train on labeled examples for this task.
lens.train_step(
    training_data=[
        {
            "image": image,
            "prompt": "Is this rock, paper, or scissors? Respond with rock, paper, or scissors only.",
            "output": "scissors",
        }
        for image in training_images
    ]
)

Why Moondream

The fastest path from idea to production

The point is not to learn another platform. The point is to get a custom model into production with as little friction as possible.

Fully hosted.

Training runs on our infrastructure. You send data through the Lens API and get a model back. No GPUs to rent, no environments to configure, and no drivers to debug.

API-only.

Fine-tuning is a handful of API calls. There is no UI to learn, no platform to onboard to, and no proprietary format to adopt. It fits into the workflow you already have.

Pay as you go.

You pay for the compute you use. Fine-tuning starts at a few dollars. Every account gets $5 of free credits each month, so you can run your first fine-tune at no cost.

Built by the model team.

Moondream's fine-tuning system is built by the same team that designed the model architecture. The training pipeline is optimized specifically for Moondream.

White Glove

Need help? We'll build it for you.

Not every team has the bandwidth to run a fine-tune in-house. If you want help, our team can handle the process end to end.

White-glove service

Our team works with you to define the task, prepare the data, run the training, and validate the results. When we are done, we hand off everything: the fine-tuned model, the training data, the evaluation benchmarks, and documentation on how to maintain and improve the model over time.

You own the model. You own the data. We just get you there faster.

Task definition and benchmark design.

Data review, preparation, and labeling guidance.

Training, evaluation, and handoff documentation.

A model your team can run through the API or Photon.

FAQ

Frequently asked questions

Concise answers for teams evaluating Moondream for production fine-tuning.

Bottom CTA

Ready to take Moondream to production?

Every Moondream account includes $5 of free credits per month. No credit card required.

Start fine-tuning

Start with the docs and run your first experiment in a few API calls.

Start fine-tuning

Talk to our team

Tell us what you are building. We can help with data, training, evaluation, and deployment.