Changelog

This page documents all notable changes to Moondream.

Moondream 2025-03-27 Release2025-03-27
Improvements
  • Added support for long-form captioning
  • Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
  • Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
  • Improved object detection, especially for small objects
  • More detailed and diverse outputs for image tagging queries
Bug Fixes
  • Fixed token streaming bug affecting multi-byte unicode characters
Developer
  • gpt-fast style compile() now supported in HF Transformers implementation
Moondream 2025-01-08 Release2025-01-08
Structured Output

Support for structured output formats such as JSON, XML, Markdown and CSV.

New Capability: Gaze Detection

Experimental capability that tracks human attention in images.

  • Driver gaze detection for automotive applications
  • Sports gaze detection for analyzing player focus
Benchmark Improvements

Significant improvements across industry benchmarks.

Better OCR
  • Improved vision layer for better text reading capabilities
  • Enhanced document querying and understanding
  • Better chart and diagram interpretation
Moondream 2024-12-04 Release2024-12-04
Moondream 0.5B: World's Smallest VLM
  • 0.5B parameters optimized for edge devices and mobile platforms
  • 479 MiB compressed at 8-bit, 375 MiB at 4-bit
  • Memory usage: 996 MiB at 8-bit, 816 MiB at 4-bit
  • Released under Apache License
Moondream 2024-11-25 Release2024-11-25
Playground Launch
  • Improved user experience with automatic prompt suggester
  • Visual Question Answering (VQA) for human-like responses
  • Object detection with bounding box coordinates
  • Image captioning for annotations
Moondream 2024-07-23 Release2024-07-23
OCR & Document Improvements
  • Significant improvements in OCR and document understanding
  • Optimized for local runtime performance
Moondream 2024-04-02 Release2024-04-02
Enhanced OCR & Captioning
  • Improved OCR capabilities for better text recognition
  • Enhanced image captioning for more detailed descriptions
Moondream 2024-03-04 Initial Release2024-03-04
Initial Release
  • 1.8B parameters vision language model
  • Optimized for edge devices
  • Less than 5GB memory required in 16-bit precision
  • Basic visual understanding capabilities