We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Using transformers (recommended)
pip install transformers einops
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model_id = "vikhyatk/moondream2"
revision = "2024-07-23"
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
image = Image.open('<IMAGE_PATH>')
enc_image = model.encode_image(image)
print(model.answer_question(enc_image, "Describe this image.", tokenizer))
The model is updated regularly, so we recommend pinning the model version to a specific release as shown above.
To enable Flash Attention on the text model, pass in
attn_implementation="flash_attention_2"
when instantiating the model.
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision,
torch_dtype=torch.float16, attn_implementation="flash_attention_2"
).to("cuda")
Batch inference is also supported.
answers = moondream.batch_answer(
images=[Image.open('<IMAGE_PATH_1>'), Image.open('<IMAGE_PATH_2>')],
prompts=["Describe this image.", "Are there people in this image?"],
tokenizer=tokenizer,
)
Using The Github Repository
Clone this repository and install dependencies.
pip install -r requirements.txt
sample.py
provides a CLI interface for running the model. When the --prompt
argument is not provided, the script will allow you to ask questions interactively.
python sample.py --image [IMAGE_PATH] --prompt [PROMPT]
Use gradio_demo.py
script to start a Gradio interface for the model.
python gradio_demo.py
webcam_gradio_demo.py
provides a Gradio interface for the model that uses your webcam as input and performs inference in real-time.
python webcam_gradio_demo.py
Limitations