Changelog

This page documents all notable changes to Moondream.

Improvements

Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
Added temperature and nucleus sampling to reduce repetitive outputs
Better OCR for documents and tables (prompt with "Transcribe the text" or "Transcribe the text in natural reading order")
Object detection supports document layout detection (figure, formula, text, etc)
UI understanding (ScreenSpot F1@0.5 up from 53.3 to 60.3)
Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)

Quantization Aware Training

4-bit model with quantization-aware training for faster inference and lower memory use
Runs at 184 tokens/second on an RTX 3090 with 2.4 GB memory usage (42% less than full precision)
Only a 0.6% drop in accuracy (74.5 vs 74.9 average score)

Improvements

Added support for long-form captioning
Open vocabulary image tagging
Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)

Bug Fixes

Developer

Structured Output

Support for structured output formats such as JSON, XML, Markdown and CSV.

New Capability: Gaze Detection

Experimental capability that tracks human attention in images.

Benchmark Improvements

Significant improvements across industry benchmarks.

Better OCR

Moondream 0.5B: World's Smallest VLM

Moondream 2024-07-23 Release2024-07-23

OCR & Document Improvements

Moondream 2024-04-02 Release2024-04-02

Enhanced OCR & Captioning

Moondream 2024-03-04 Initial Release2024-03-04

Initial Release