Moondream with Transformers

This guide shows you how to run Moondream directly with Hugging Face Transformers, giving you maximum control over model execution and parameters.

Prerequisites

First, you'll need to install the core dependencies:

bash
pip install transformers torch pillow einops
System Requirements
  • RAM: 8GB+ (16GB recommended) - Storage: 5GB for model weights - GPU: Recommended but not required (4GB+ VRAM) - Python: 3.8 or higher

Platform-Specific Setup

bash
# Install pyvips for faster image processingpip install pyvips-binary pyvips

Basic Usage

Here's a simple example demonstrating the core Moondream capabilities:

python
from transformers import AutoModelForCausalLM, AutoTokenizerfrom PIL import Image # Load the model model = AutoModelForCausalLM.from_pretrained("vikhyatk/moondream2",revision="2025-01-09",trust_remote_code=True, # Uncomment for GPU acceleration & pip install accelerate # device_map={"": "cuda"}) # Load your image image = Image.open("path/to/your/image.jpg") # 1. Image Captioning print("Short caption:")print(model.caption(image, length="short")["caption"]) print("Detailed caption:")for t in model.caption(image, length="normal", stream=True)["caption"]:print(t, end="", flush=True) # 2. Visual Question Answering print("Asking questions about the image:")print(model.query(image, "How many people are in the image?")["answer"]) # 3. Object Detection print("Detecting objects:")objects = model.detect(image, "face")["objects"]print(f"Found {len(objects)} face(s)") # 4. Visual Pointing print("Locating objects:")points = model.point(image, "person")["points"]print(f"Found {len(points)} person(s)")

Advanced Features

GPU Acceleration

To enable GPU acceleration:

python
 model = AutoModelForCausalLM.from_pretrained(  "vikhyatk/moondream2",  revision="2025-01-09",  trust_remote_code=True,  device_map={"": "cuda"},  # Use "cuda" for NVIDIA GPUs)

Multiple Model Instances

If you have enough VRAM (4-5GB per instance), you can run multiple instances of the model for parallel processing:

python
 model = AutoModelForCausalLM.from_pretrained(  "vikhyatk/moondream2",  revision="2025-01-09",  trust_remote_code=True,  device_map={"": "cuda"},) model2 = AutoModelForCausalLM.from_pretrained("vikhyatk/moondream2",revision="2025-01-09",trust_remote_code=True,device_map={"": "cuda"},)

Efficient Image Encoding

For multiple operations on the same image, encode it once to save processing time:

python
image = Image.open("path/to/your/image.jpg")encoded_image = model.encode_image(image) # Reuse the encoded image for each inference print(model.caption(encoded_image, length="short")["caption"])print(model.query(encoded_image, "How many people are in the image?")["answer"])

API Reference

Captioning

python
model.caption(image, length="normal", stream=False)
ParameterTypeDescription
`image`PIL.Image or encoded imageThe image to process
`length`strCaption detail level: "short" or "normal"
`stream`boolWhether to stream the response token by token

Visual Question Answering

python
model.query(image, question, stream=False)
ParameterTypeDescription
`image`PIL.Image or encoded imageThe image to process
`question`strThe question to ask about the image
`stream`boolWhether to stream the response token by token

Object Detection

python
model.detect(image, object_name)
ParameterTypeDescription
`image`PIL.Image or encoded imageThe image to process
`object_name`strThe type of object to detect

Visual Pointing

python
model.point(image, object_name)
ParameterTypeDescription
`image`PIL.Image or encoded imageThe image to process
`object_name`strThe type of object to locate

Performance Optimization

Best Practices
  • Use GPU acceleration when possible - Reuse encoded images for multiple operations - For batch processing, pre-load the model once - Process images in batches rather than loading/unloading the model repeatedly - Resize very large images to reasonable dimensions before processing - Use quantization for deployment on memory-constrained devices

Troubleshooting

Common Issues
  • Out of Memory: Reduce image size or use lighter model variant - Slow Performance: Enable GPU acceleration and reuse encoded images - Library Errors: Ensure all dependencies are installed correctly - Unexpected Results: Check image formatting and question clarity

Next Steps

Now that you understand how to use Moondream with Transformers, you might want to:

  • Try advanced prompting techniques
  • Integrate Moondream into your own applications
  • Create custom pipelines for specialized tasks
  • Explore our recipes for common use cases