Over 5 million downloads, and trusted by fast-growing companies worldwide
PROMPT-BASED
Powered by state-of-the-art VLM models. Prompt in minutes, no model training required.
RUN EVERYWHERE
One-click distill creates smaller, more efficient models optimized for your target device.
EASY INTEGRATION
Client libraries make it a snap to integrate into Python, Javascript, and beyond. Switch from cloud to local inference with a flag.
VISUAL QUESTION ANSWERING
Powered by state-of-the-art VLM models. Prompt in minutes, no model training required.
OCR
High accuracy text recognition from any type of document: newspapers, books, forms, and even hand-written notes.
COUNTING
Get accurate counts of things from prompts, e.g. “people”, “paperclips”, “trucks”.
CLASSIFICATION
Categorize images based on your predetermined options. “Alligator or Crocodile?” “Is this fruit fresh or rotten?”