GeoGuessr Countries
Predict the country from a single street-view image by reading road markings, signage, and vegetation. SFT fine-tuning on a small dataset of 25 images per country takes Moondream from 28.6% to 71.1% accuracy across 53 countries, outperforming GPT-5.4 at 69.8%.
Accuracy
| Method | SFT |
| Steps | 1,000 |
| Training time | 3 hrs 24 min |
| Cost | $53.28 |
See it in action
Compare the base model against the fine-tuned model across representative benchmark examples.
Prompt
What country is this, return only the name.
United Kingdom

Base model
IncorrectWorld
Fine-tuned model
CorrectUnited Kingdom
Russia

Base model
IncorrectUSA
Fine-tuned model
CorrectRussia
Colombia

Base model
IncorrectJapan
Fine-tuned model
CorrectColombia
Perfection in 3 steps
What is fine-tuning?
Moondream starts as a general model trained on broad, public information. Fine-tuning makes it great at one specific task by teaching it the products, documents, categories, or internal information that matter to your business.
Who is this for?
This is for teams putting vision AI into production. If you already know the task and need the model to master that job, fine-tuning is how you get there. It is built for teams that need frontier performance at real-time speed.
See the code
Fine-tuning is just a small API loop: format your data, call `train_step`, and the model updates as you go.
import moondream as md
# Create fine-tune
ft = md.ft(
api_key="your-api-key",
name="geoguessr countries",
rank=32,
)
# Hidden boilerplate and data code
# Update the model
ft.train_step([{
"mode": "sft",
"request": {
"skill": "query",
"image": pil_image,
"question": "What country is this, return only the name.",
},
"target": {"answer": "United Kingdom"},
}])Frequently asked questions
Ready to take Moondream to production?
Need help? We'll build it for you.
We can help define the task, prepare the data, run training, validate results, and hand off a model your team can use.