Moondream

The Open Source VLM That Runs Everywhere.

Over 6 million downloads!

Explore the lineup.

Explore the lineup.

Explore the lineup.

Moondream 2B

Moondream 2B

Moondream 2B

Powerful and fast

1.9B

Parameters

1.9B

Parameters

1.9B

Parameters

fp16, int8, int4

Quantized

fp16, int8, int4

Quantized

fp16, int8, int4

Quantized

2GiB

Memory

2GiB

Memory

2GiB

Memory

Quantized Aware Training

Training

Quantized Aware Training

Training

Quantized Aware Training

Training

Servers, PC, Mobile

Target Devices

Servers, PC, Mobile

Target Devices

Servers, PC, Mobile

Target Devices

GPU, CPU-Optimized

Inference

GPU, CPU-Optimized

Inference

GPU, CPU-Optimized

Inference

Apache 2.0

License

Apache 2.0

License

Apache 2.0

License

New

Moondream 0.5B

Moondream 0.5B

Moondream 0.5B

Tiny and speedy.

0.5B

Parameters

0.5B

Parameters

0.5B

Parameters

int8, int4

Quantized

int8, int4

Quantized

int8, int4

Quantized

1GiB

Memory

1GiB

Memory

1GiB

Memory

Quantized Aware Training

Training

Quantized Aware Training

Training

Quantized Aware Training

Training

Mobile, Edge

Target Devices

Mobile, Edge

Target Devices

Mobile, Edge

Target Devices

GPU, CPU-Optimized

Inference

GPU, CPU-Optimized

Inference

GPU, CPU-Optimized

Inference

Apache 2.0

License

Apache 2.0

License

Apache 2.0

License

Discover the capabilities.

Query

Get human-like answers from any prompt.

List all the food shown in this image.

A halved avocado, cherry tomatoes, green onions, spinach, mushrooms, and a few peppers. There are also two eggs on the board.

Query

Get human-like answers from any prompt.

List all the food shown in this image.

A halved avocado, cherry tomatoes, green onions, spinach, mushrooms, and a few peppers. There are also two eggs on the board.

Query

Get human-like answers from any prompt.

List all the food shown in this image.

A halved avocado, cherry tomatoes, green onions, spinach, mushrooms, and a few peppers. There are also two eggs on the board.

Caption

Generate detailed descriptions of any scene.

The image depicts a vibrant clownfish, with its distinctive red body and white stripes, swimming near a large, purple coral. The clownfish is positioned [more].

Caption

Generate detailed descriptions of any scene.

The image depicts a vibrant clownfish, with its distinctive red body and white stripes, swimming near a large, purple coral. The clownfish is positioned [more].

Caption

Generate detailed descriptions of any scene.

The image depicts a vibrant clownfish, with its distinctive red body and white stripes, swimming near a large, purple coral. The clownfish is positioned [more].

Object Detection

Get bounding boxes from a prompt.

Prompt: Drone

4 objects detected.

Object Detection

Get bounding boxes from a prompt.

Prompt: Drone

4 objects detected.

Object Detection

Get bounding boxes from a prompt.

Prompt: Drone

4 objects detected.

Point

Get X, Y locations for any items.

Prompt: Sign in with Apple Button

1 point detected.

Point

Get X, Y locations for any items.

Prompt: Sign in with Apple Button

1 point detected.

Point

Get X, Y locations for any items.

Prompt: Sign in with Apple Button

1 point detected.

Get started in 5 minutes.

Get started in 5 minutes.

Get started in 5 minutes.

Our clients are optimized for CPU and GPU inference, and are a snap to learn.

Our clients are optimized for CPU and GPU inference, and are a snap to learn.

pip install moondream

import moondream as md

from PIL import Image


# initialize with a downloaded model

model = md.vl(model="./moondream-2b-int8.bin")


# process the image

image = Image.open("./image.jpg")

encoded = model.encode_image(image)


# query the image

result = model.query(encoded, "Is this a hot dog?")

print("Answer: ", result["answer"])

import moondream as md

from PIL import Image


# initialize with a downloaded model

model = md.vl(model="./moondream-2b-int8.bin")


# process the image

image = Image.open("./image.jpg")

encoded = model.encode_image(image)


# query the image

result = model.query(encoded, "Is this a hot dog?")

print("Answer: ", result["answer"])

What our fans say

What our fans say

More testing of the amazing Moondream open source multimodal LLM today! It is massively small: 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. I am really impressed. More soon.

Brian Roemmele

@BrianRoemmele

Moondream: a 1.6 Billion parameter model that is quite effective and possibly able to go toe to toe with the bigger models in the future.

MasteringMachines AI

@MstrMachines

MoonDream - A tiny vision language model that performs on par w/ models twice its size by @vikhyatk. Its so fast, you might not even catch it streaming output!

Luis C

@lucataco93

moondream is *wicked* fast.

CJ

@cj_pais

More testing of the amazing Moondream open source multimodal LLM today! It is massively small: 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. I am really impressed. More soon.

Brian Roemmele

@BrianRoemmele

Moondream: a 1.6 Billion parameter model that is quite effective and possibly able to go toe to toe with the bigger models in the future.

MasteringMachines AI

@MstrMachines

MoonDream - A tiny vision language model that performs on par w/ models twice its size by @vikhyatk. Its so fast, you might not even catch it streaming output!

Luis C

@lucataco93

moondream is *wicked* fast.

CJ

@cj_pais

  • First small language model I've seen that has proper vision capabilities

    Tom Dörr

    @tom_doerr

  • moondream is *wicked* fast.

    CJ

    @cj_pais

  • MoonDream - A tiny vision language model that performs on par w/ models twice its size by @vikhyatk. Its so fast, you might not even catch it streaming output!

    Luis C

    @lucataco93

  • Moondream: a 1.6 Billion parameter model that is quite effective and possibly able to go toe to toe with the bigger models in the future.

    MasteringMachines AI

    @MstrMachines

  • More testing of the amazing Moondream open source multimodal LLM today! It is massively small: 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVA training dataset. I am really impressed. More soon.

    Brian Roemmele

    @BrianRoemmele