Moondream logo

GeoGuessr Countries

Predict the country from a single street-view image by reading road markings, signage, and vegetation. SFT fine-tuning on a small dataset of 25 images per country takes Moondream from 28.6% to 71.1% accuracy across 53 countries, outperforming GPT-5.4 at 69.8%.

Accuracy

BaseGPT-5.4Fine-tuned
MethodSFT
Steps1,000
Training time3 hrs 24 min
Cost$53.28

See it in action

Compare the base model against the fine-tuned model across representative benchmark examples.

Prompt

What country is this, return only the name.

United Kingdom

Street-view image from the United Kingdom

Base model

Incorrect

World

Fine-tuned model

Correct

United Kingdom

Russia

Street-view image from Russia

Base model

Incorrect

USA

Fine-tuned model

Correct

Russia

Colombia

Street-view image from Colombia

Base model

Incorrect

Japan

Fine-tuned model

Correct

Colombia

Perfection in 3 steps

1

Bring examples.

Collect images for the task you want Moondream to learn.

2

Fine-tune.

Teach Moondream with SFT or RL. Pass your data to the API and we handle the rest.

3

Deploy.

Use your model through the API or run it locally with Photon.

What is fine-tuning?

Moondream starts as a general model trained on broad, public information. Fine-tuning makes it great at one specific task by teaching it the products, documents, categories, or internal information that matter to your business.

Who is this for?

This is for teams putting vision AI into production. If you already know the task and need the model to master that job, fine-tuning is how you get there. It is built for teams that need frontier performance at real-time speed.

See the code

Fine-tuning is just a small API loop: format your data, call `train_step`, and the model updates as you go.

See full code
import moondream as md

# Create fine-tune
ft = md.ft(
    api_key="your-api-key",
    name="geoguessr countries",
    rank=32,
)

# Hidden boilerplate and data code

# Update the model
ft.train_step([{
    "mode": "sft",
    "request": {
        "skill": "query",
        "image": pil_image,
        "question": "What country is this, return only the name.",
    },
    "target": {"answer": "United Kingdom"},
}])

Frequently asked questions

Ready to take Moondream to production?

Get started

Start with the docs and run your first experiment in a few API calls.

Start fine-tuning

Need help? We'll build it for you.

We can help define the task, prepare the data, run training, validate results, and hand off a model your team can use.