Build an Image Analysis App with Python Client
Create a powerful image analysis application with Gradio, using the Moondream Python client library for simplified API access and clean code.
Image Captioning
Visual Question Answering
Object Detection
Point Detection
Build an Image Analysis App with Python Client
In this beginner-friendly tutorial, we'll create a powerful image analysis application using Gradio and the Moondream Python client library. With just a single Python file, you'll be able to:
- Upload and display images
- Generate captions (short and detailed)
- Ask questions about image content
- Detect objects in images with bounding boxes
- Find object points in images
- See the complete API responses
This version uses the official Moondream Python client, making your code cleaner and easier to maintain.
Prerequisites
- Python 3.8+
- A Moondream API key (get one at console.moondream.ai)
- Basic Python knowledge
Step 1: Set Up Your Environment
# Create a virtual environmentpython -m venv moondream-envsource moondream-env/bin/activate # On Windows: moondream-env\Scripts\activate # Install dependencies pip install gradio pillow python-dotenv moondream
Step 2: Create the Application File
Create a single file named moondream_client_app.py
:
import osimport jsonfrom PIL import Image, ImageDrawimport ioimport gradio as grfrom dotenv import load_dotenvimport moondream as md # Load environment variables load_dotenv() # Moondream API settings API_KEY = os.getenv('MOONDREAM_API_KEY')if not API_KEY:API_KEY = "your_api_key_here" # Replace with your actual API key if not using .env file # Initialize the Moondream client model = md.vl(api_key=API_KEY) def get_caption(image, length="normal"):"""Generate a caption for the image"""if image is None:return "Please upload an image first." try: result = model.caption(image, length=length) return { "caption": result["caption"], "api_response": json.dumps(result, indent=2) } except Exception as e: return { "caption": f"Error: {str(e)}", "api_response": str(e) } def ask_question(image, question):"""Ask a question about the image"""if image is None:return "Please upload an image first." if not question.strip(): return "Please enter a question." try: result = model.query(image, question) return { "answer": result["answer"], "api_response": json.dumps(result, indent=2) } except Exception as e: return { "answer": f"Error: {str(e)}", "api_response": str(e) } def detect_objects(image, object_type):"""Detect objects in the image"""if image is None:return "Please upload an image first." if not object_type.strip(): return "Please specify an object type to detect." try: result = model.detect(image, object_type) objects = result["objects"] # Draw bounding boxes on a copy of the image img_with_boxes = image.copy() draw = ImageDraw.Draw(img_with_boxes) width, height = image.size for obj in objects: x_min = obj["x_min"] * width y_min = obj["y_min"] * height x_max = obj["x_max"] * width y_max = obj["y_max"] * height # Draw rectangle draw.rectangle( [(x_min, y_min), (x_max, y_max)], outline="red", width=3 ) return { "image_with_boxes": img_with_boxes, "detection_result": f"Found {len(objects)} {object_type}(s)", "api_response": json.dumps(result, indent=2) } except Exception as e: return { "image_with_boxes": image, "detection_result": f"Error: {str(e)}", "api_response": str(e) } def find_object_points(image, object_type):"""Find the points of the specified objects in the image"""if image is None:return "Please upload an image first." if not object_type.strip(): return "Please specify an object type to locate." try: result = model.point(image, object_type) points = result.get("points", []) # Draw points on a copy of the image img_with_points = image.copy() draw = ImageDraw.Draw(img_with_points) width, height = image.size for point in points: # Convert normalized coordinates to pixel values x = point["x"] * width y = point["y"] * height # Draw a circle at each point point_radius = 8 draw.ellipse( [(x - point_radius, y - point_radius), (x + point_radius, y + point_radius)], fill="red", outline="white", width=2 ) # Add label text near the point draw.text( (x + 10, y), object_type, fill="white" ) return { "image_with_points": img_with_points, "point_result": f"Found {len(points)} {object_type}(s)", "api_response": json.dumps(result, indent=2) } except Exception as e: return { "image_with_points": image, "point_result": f"Error: {str(e)}", "api_response": str(e) } # Gradio app functions that wrap the API functions def generate_short_caption(image):result = get_caption(image, "short")return result["caption"], result["api_response"] def generate_detailed_caption(image):result = get_caption(image, "normal")return result["caption"], result["api_response"] def process_question(image, question):result = ask_question(image, question)return result["answer"], result["api_response"] def process_detection(image, object_type):result = detect_objects(image, object_type)return result["image_with_boxes"], result["detection_result"], result["api_response"] def process_point(image, object_type):result = find_object_points(image, object_type)return result["image_with_points"], result["point_result"], result["api_response"] # Create the Gradio interface with gr.Blocks(title="Moondream Image Analyzer (Client Library)") as app:gr.Markdown("# Moondream Image Analyzer (Client Library)")gr.Markdown("Upload an image and analyze it using Moondream AI with the Python client library") with gr.Row(): # Left column for image upload with gr.Column(scale=1): input_image = gr.Image(type="pil", label="Upload Image") # Right column for results with gr.Column(scale=2): with gr.Tab("Caption"): with gr.Row(): short_btn = gr.Button("Generate Short Caption") detailed_btn = gr.Button("Generate Detailed Caption") caption_output = gr.Textbox(label="Caption Result") caption_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json") with gr.Tab("Question Answering"): question_input = gr.Textbox(label="Ask a question about the image") ask_btn = gr.Button("Ask Question") answer_output = gr.Textbox(label="Answer") answer_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json") with gr.Tab("Object Detection"): object_input = gr.Textbox(label="Object type to detect (e.g., person, car, dog)") detect_btn = gr.Button("Detect Objects") detection_image = gr.Image(type="pil", label="Detection Result") detection_output = gr.Textbox(label="Detection Summary") detection_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json") with gr.Tab("Point Detection"): point_object_input = gr.Textbox(label="Object type to locate (e.g., person, car, dog)") point_btn = gr.Button("Find Object Points") point_image = gr.Image(type="pil", label="Point Detection Result") point_output = gr.Textbox(label="Point Detection Summary") point_api_response = gr.Code(language="json", label="API Response Details", elem_id="pretty_json") # Set up event handlers short_btn.click( generate_short_caption, inputs=input_image, outputs=[caption_output, caption_api_response] ) detailed_btn.click( generate_detailed_caption, inputs=input_image, outputs=[caption_output, caption_api_response] ) ask_btn.click( process_question, inputs=[input_image, question_input], outputs=[answer_output, answer_api_response] ) detect_btn.click( process_detection, inputs=[input_image, object_input], outputs=[detection_image, detection_output, detection_api_response] ) # Set up point detection point_btn.click( process_point, inputs=[input_image, point_object_input], outputs=[point_image, point_output, point_api_response] ) # Run the app if **name** == "**main**":app.launch()
Step 3: Create an Environment File (Optional)
For better security, create a .env
file in the same directory:
MOONDREAM_API_KEY=your_api_key_here
Step 4: Run the Application
# Run the Gradio apppython moondream_client_app.py
Gradio will start a local web server (typically at http://127.0.0.1:7860/) and also provide a public share link (optional) that you can use to access your app from any device.
Using the Application
Once the app is running: 1. Upload an image using the image upload area 2. Navigate between tabs to try different features: - Caption: Generate short or detailed descriptions - Question Answering: Ask about objects, colors, actions, etc. - Object Detection: Find specific objects with bounding boxes - Point Detection: Identify points of objects in the image 3. Examine the API responses to learn how the Moondream API works
How the App Works
This simple application demonstrates several key concepts:
- Client Library: Using the Moondream Python client for simplified API access
- Image Processing: Directly passing PIL images to the client library
- Response Handling: Parsing responses and displaying results
- Error Handling: Gracefully handling API errors and exceptions
- UI Design: Creating a user-friendly interface with tabs and sections
Key Advantages of Using the Client Library
Compared to direct API calls, the client library offers several benefits:
- Simplified Code: No need to handle base64 encoding/decoding or construct HTTP requests
- Better Type Safety: The client library provides proper interfaces for inputs and outputs
- Automatic Retries: The client can handle transient errors and retry automatically
- Simplified Authentication: Just provide your API key once when initializing the client
- Consistent Interface: Methods for all endpoints follow a similar pattern
Client vs. Direct API
Notice how much cleaner the code is with the client library. Instead of:
image_b64 = image_to_base64(image)
payload = {"image_url": image_b64, "question": question}
response = requests.post(f"{API_BASE}/query", headers=HEADERS, json=payload)
result = response.json()
You can simply write:
result = model.query(image, question)
Extending the Application
Here are some ways you could enhance this basic app:
- Add Streaming Support: Implement streaming for caption and query endpoints with
stream=True
- Save Results: Add buttons to save images with detected objects
- Image URL Support: Allow users to input image URLs instead of uploading
- Batch Processing: Add capability to process multiple images
- Custom Styling: Improve the UI with custom CSS and themes
Next Steps
- Explore our API Reference for detailed information on all endpoints
- Check out the Recipes section for more code examples
- Learn about our Transformers Integration for local model deployment