In software there is an old adage: good, fast, cheap, pick two. For Photon 1.3.0, we did not get the memo. This release is faster, it fixes bugs, and Photon is now completely free. We picked all three for you.
Starting with version 1.3.0, running Moondream locally with Photon is totally free. No API key required, just download and start calling it. You will still need to pass your MOONDREAM_API_KEY if you want to run a finetuned model, or if you want telemetry into your inference activity. These are still free, the API key is just there to link it to your account.
If you have been waiting to try Moondream in production, or wondering about inference costs at the edge or on-prem, now is a perfect time to get started.
Moondream is faster
Photon runs on Windows, Mac, and NVIDIA GPUs. This release makes Moondream faster across all of them.
With NVIDIA GPUs, the biggest gains are seen on older cards. On an A100, you can expect roughly 25 to 44 percent more throughput on standard queries, and up to about 70 percent more when the model reasons step by step. Answers also come back sooner, with latency down about 30%. A10 cards see similar gains, roughly 30 to 45 percent. Jetson Thor is up to 50% faster at low batch sizes. Newer hardware is faster as well. Check out our performance results for more details.
What this means in practice: the same GPU now handles more images per second and returns each answer sooner. Photon with Moondream does more per machine, so you can size down your hardware or push more throughput on what you have.
Decoding is also faster on Apple Silicon Macs, so local development on a laptop feels quicker.
Finetunes run faster, on more hardware
Lens, our finetune service, lets you customize Moondream for your own tasks. Want to teach Moondream to detect defective welds, or determine your own brand compliance — the possibilities are endless. This release makes these finetunes run much faster and available on far more machines.
The overhead of using a finetune dropped sharply. A large finetune that previously added about 140 milliseconds per request now adds under 1 millisecond. You only pay for the size of the finetune you actually use, not the maximum the engine supports.
Finetunes are now supported on Apple Silicon and Windows, in addition to NVIDIA (Windows and Mac could not run them at all before). If your team works on Macs or Windows machines, you can now use customized Moondream models there.
An accuracy fix for older GPUs
We fixed a small accuracy issue that affected a few older GPUs, including the A100, A10, and RTX 3090. On these cards, the math that prepared data for the model was rounding the wrong way and pushing values slightly low. Output is now slightly more accurate on these cards. The effect was minor, and newer GPUs were never affected.
How to get it
To install, pip install moondream. Docs are at docs.moondream.ai. Happy Moondreaming.



