- Documentation
- Changelog
Changelog
This page documents all notable changes to Moondream.
Moondream 2025-04-15 Release2025-04-15
Improvements
- Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
- Added temperature and nucleus sampling to reduce repetitive outputs
- Better OCR for documents and tables (prompt with "Transcribe the text" or "Transcribe the text in natural reading order")
- Object detection supports document layout detection (figure, formula, text, etc)
- UI understanding (ScreenSpot F1@0.5 up from 53.3 to 60.3)
- Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
Moondream 2025-04-14 QAT Release2025-04-14
Quantization Aware Training
- 4-bit model with quantization-aware training for faster inference and lower memory use
- Runs at 184 tokens/second on an RTX 3090 with 2.4 GB memory usage (42% less than full precision)
- Only a 0.6% drop in accuracy (74.5 vs 74.9 average score)
Moondream 2025-03-27 Release2025-03-27
Improvements
- Added support for long-form captioning
- Open vocabulary image tagging
- Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
- Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
- Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
Bug Fixes
- Fixed token streaming bug affecting multi-byte unicode characters
Developer
- gpt-fast style
compile()
now supported in HF Transformers implementation
Moondream 2025-01-08 Release2025-01-08
Structured Output
Support for structured output formats such as JSON, XML, Markdown and CSV.
New Capability: Gaze Detection
Experimental capability that tracks human attention in images.
- Driver gaze detection for automotive applications
- Sports gaze detection for analyzing player focus
Benchmark Improvements
Significant improvements across industry benchmarks.
Better OCR
- Improved vision layer for better text reading capabilities
- Enhanced document querying and understanding
- Better chart and diagram interpretation
Moondream 2024-12-04 Release2024-12-04
Moondream 0.5B: World's Smallest VLM
- 0.5B parameters optimized for edge devices and mobile platforms
- 479 MiB compressed at 8-bit, 375 MiB at 4-bit
- Memory usage: 996 MiB at 8-bit, 816 MiB at 4-bit
- Released under Apache License
Moondream 2024-07-23 Release2024-07-23
OCR & Document Improvements
- Significant improvements in OCR and document understanding
- Optimized for local runtime performance
Moondream 2024-04-02 Release2024-04-02
Enhanced OCR & Captioning
- Improved OCR capabilities for better text recognition
- Enhanced image captioning for more detailed descriptions
Moondream 2024-03-04 Initial Release2024-03-04
Initial Release
- 1.8B parameters vision language model
- Optimized for edge devices
- Less than 5GB memory required in 16-bit precision
- Basic visual understanding capabilities