15 minutes
Beginner
Multiple endpoints
Python with Gradio

Build an Image Analysis App with Python Client

Create a powerful image analysis application with Gradio, using the Moondream Python client library for simplified API access and clean code.

Image Captioning

Generate short and detailed descriptions of images using the client library with adjustable length settings.

Visual Question Answering

Ask natural language questions about image content and get AI-powered answers with cleaner code.

Object Detection

Detect objects in images with bounding box visualization using the client library's detect method.

Point Detection

Find precise object locations with point identification using a simple client method call.

Build an Image Analysis App with Python Client

In this beginner-friendly tutorial, we'll create a powerful image analysis application using Gradio and the Moondream Python client library. With just a single Python file, you'll be able to:

  • Upload and display images
  • Generate captions (short and detailed)
  • Ask questions about image content
  • Detect objects in images with bounding boxes
  • Find object points in images
  • See the complete API responses

This version uses the official Moondream Python client, making your code cleaner and easier to maintain.

Prerequisites

Step 1: Set Up Your Environment

bash
# Create a virtual environmentpython -m venv moondream-envsource moondream-env/bin/activate  # On Windows: moondream-env\Scripts\activate # Install dependencies pip install gradio pillow python-dotenv moondream

Step 2: Create the Application File

Create a single file named moondream_client_app.py:

python
import osimport jsonfrom PIL import Image, ImageDrawimport ioimport gradio as grfrom dotenv import load_dotenvimport moondream as md # Load environment variables load_dotenv() # Moondream API settings API_KEY = os.getenv('MOONDREAM_API_KEY')if not API_KEY:API_KEY = "your_api_key_here" # Replace with your actual API key if not using .env file # Initialize the Moondream client model = md.vl(api_key=API_KEY) def get_caption(image, length="normal"):"""Generate a caption for the image"""if image is None:return "Please upload an image first."   try:      result = model.caption(image, length=length)      return {          "caption": result["caption"],          "api_response": json.dumps(result, indent=2)      }  except Exception as e:      return {          "caption": f"Error: {str(e)}",          "api_response": str(e)      } def ask_question(image, question):"""Ask a question about the image"""if image is None:return "Please upload an image first."   if not question.strip():      return "Please enter a question."   try:      result = model.query(image, question)      return {          "answer": result["answer"],          "api_response": json.dumps(result, indent=2)      }  except Exception as e:      return {          "answer": f"Error: {str(e)}",          "api_response": str(e)      } def detect_objects(image, object_type):"""Detect objects in the image"""if image is None:return "Please upload an image first."   if not object_type.strip():      return "Please specify an object type to detect."   try:      result = model.detect(image, object_type)      objects = result["objects"]       # Draw bounding boxes on a copy of the image      img_with_boxes = image.copy()      draw = ImageDraw.Draw(img_with_boxes)       width, height = image.size      for obj in objects:          x_min = obj["x_min"] * width          y_min = obj["y_min"] * height          x_max = obj["x_max"] * width          y_max = obj["y_max"] * height           # Draw rectangle          draw.rectangle(              [(x_min, y_min), (x_max, y_max)],              outline="red",              width=3          )       return {          "image_with_boxes": img_with_boxes,          "detection_result": f"Found {len(objects)} {object_type}(s)",          "api_response": json.dumps(result, indent=2)      }  except Exception as e:      return {          "image_with_boxes": image,          "detection_result": f"Error: {str(e)}",          "api_response": str(e)      } def find_object_points(image, object_type):"""Find the points of the specified objects in the image"""if image is None:return "Please upload an image first."   if not object_type.strip():      return "Please specify an object type to locate."   try:      result = model.point(image, object_type)      points = result.get("points", [])       # Draw points on a copy of the image      img_with_points = image.copy()      draw = ImageDraw.Draw(img_with_points)       width, height = image.size      for point in points:          # Convert normalized coordinates to pixel values          x = point["x"] * width          y = point["y"] * height           # Draw a circle at each point          point_radius = 8          draw.ellipse(              [(x - point_radius, y - point_radius),               (x + point_radius, y + point_radius)],              fill="red",              outline="white",              width=2          )           # Add label text near the point          draw.text(              (x + 10, y),              object_type,              fill="white"          )       return {          "image_with_points": img_with_points,          "point_result": f"Found {len(points)} {object_type}(s)",          "api_response": json.dumps(result, indent=2)      }  except Exception as e:      return {          "image_with_points": image,          "point_result": f"Error: {str(e)}",          "api_response": str(e)      } # Gradio app functions that wrap the API functions def generate_short_caption(image):result = get_caption(image, "short")return result["caption"], result["api_response"] def generate_detailed_caption(image):result = get_caption(image, "normal")return result["caption"], result["api_response"] def process_question(image, question):result = ask_question(image, question)return result["answer"], result["api_response"] def process_detection(image, object_type):result = detect_objects(image, object_type)return result["image_with_boxes"], result["detection_result"], result["api_response"] def process_point(image, object_type):result = find_object_points(image, object_type)return result["image_with_points"], result["point_result"], result["api_response"] # Create the Gradio interface with gr.Blocks(title="Moondream Image Analyzer (Client Library)") as app:gr.Markdown("# Moondream Image Analyzer (Client Library)")gr.Markdown("Upload an image and analyze it using Moondream AI with the Python client library")   with gr.Row():      # Left column for image upload      with gr.Column(scale=1):          input_image = gr.Image(type="pil", label="Upload Image")       # Right column for results      with gr.Column(scale=2):          with gr.Tab("Caption"):              with gr.Row():                  short_btn = gr.Button("Generate Short Caption")                  detailed_btn = gr.Button("Generate Detailed Caption")               caption_output = gr.Textbox(label="Caption Result")              caption_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json")           with gr.Tab("Question Answering"):              question_input = gr.Textbox(label="Ask a question about the image")              ask_btn = gr.Button("Ask Question")              answer_output = gr.Textbox(label="Answer")              answer_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json")           with gr.Tab("Object Detection"):              object_input = gr.Textbox(label="Object type to detect (e.g., person, car, dog)")              detect_btn = gr.Button("Detect Objects")              detection_image = gr.Image(type="pil", label="Detection Result")              detection_output = gr.Textbox(label="Detection Summary")              detection_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json")           with gr.Tab("Point Detection"):              point_object_input = gr.Textbox(label="Object type to locate (e.g., person, car, dog)")              point_btn = gr.Button("Find Object Points")              point_image = gr.Image(type="pil", label="Point Detection Result")              point_output = gr.Textbox(label="Point Detection Summary")              point_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json")   # Set up event handlers  short_btn.click(      generate_short_caption,      inputs=input_image,      outputs=[caption_output, caption_api_response]  )   detailed_btn.click(      generate_detailed_caption,      inputs=input_image,      outputs=[caption_output, caption_api_response]  )   ask_btn.click(      process_question,      inputs=[input_image, question_input],      outputs=[answer_output, answer_api_response]  )   detect_btn.click(      process_detection,      inputs=[input_image, object_input],      outputs=[detection_image, detection_output, detection_api_response]  )   # Set up point detection  point_btn.click(      process_point,      inputs=[input_image, point_object_input],      outputs=[point_image, point_output, point_api_response]  ) # Run the app if **name** == "**main**":app.launch()

Step 3: Create an Environment File (Optional)

For better security, create a .env file in the same directory:

bash
MOONDREAM_API_KEY=your_api_key_here

Step 4: Run the Application

bash
# Run the Gradio apppython moondream_client_app.py

Gradio will start a local web server (typically at http://127.0.0.1:7860/) and also provide a public share link (optional) that you can use to access your app from any device.

Using the Application

Once the app is running: 1. Upload an image using the image upload area 2. Navigate between tabs to try different features: - Caption: Generate short or detailed descriptions - Question Answering: Ask about objects, colors, actions, etc. - Object Detection: Find specific objects with bounding boxes - Point Detection: Identify points of objects in the image 3. Examine the API responses to learn how the Moondream API works

How the App Works

This simple application demonstrates several key concepts:

  1. Client Library: Using the Moondream Python client for simplified API access
  2. Image Processing: Directly passing PIL images to the client library
  3. Response Handling: Parsing responses and displaying results
  4. Error Handling: Gracefully handling API errors and exceptions
  5. UI Design: Creating a user-friendly interface with tabs and sections

Key Advantages of Using the Client Library

Compared to direct API calls, the client library offers several benefits:

  1. Simplified Code: No need to handle base64 encoding/decoding or construct HTTP requests
  2. Better Type Safety: The client library provides proper interfaces for inputs and outputs
  3. Automatic Retries: The client can handle transient errors and retry automatically
  4. Simplified Authentication: Just provide your API key once when initializing the client
  5. Consistent Interface: Methods for all endpoints follow a similar pattern
Client vs. Direct API

Notice how much cleaner the code is with the client library. Instead of:

image_b64 = image_to_base64(image)
payload = {"image_url": image_b64, "question": question}
response = requests.post(f"{API_BASE}/query", headers=HEADERS, json=payload)
result = response.json()

You can simply write:

result = model.query(image, question)

Extending the Application

Here are some ways you could enhance this basic app:

  1. Add Streaming Support: Implement streaming for caption and query endpoints with stream=True
  2. Save Results: Add buttons to save images with detected objects
  3. Image URL Support: Allow users to input image URLs instead of uploading
  4. Batch Processing: Add capability to process multiple images
  5. Custom Styling: Improve the UI with custom CSS and themes

Next Steps