Moondream logo
All Solutions
Retail & Security
Video Activity Monitoring

Video Activity Recognition for Retail

National Retail Chain

Automated detection of specific actions in surveillance footage for loss prevention and operations.

Video frame with action detection
The Challenge

Monitoring in-store activity across hundreds of camera feeds required a large team of security operators. Human reviewers could only actively watch a fraction of feeds at any time, and fatigue led to missed incidents. The retailer needed automated activity classification to flag events of interest in real time.

The Solution

Fine-tuned on labeled video frames covering dozens of action categories, the Moondream model identifies specific activities happening in each frame. SFT training on temporal action data enabled the model to distinguish between similar actions that the base model consistently confused.

Business Impact
  • Action classification accuracy improved from 10.2% to 38.4%
  • Covers 174 distinct action categories
  • Enables real-time flagging across hundreds of camera feeds
  • Reduced security staffing requirements for video monitoring

Complete Vision AI Stack

This solution uses Moondream's integrated stack from model training through production deployment. Every layer is designed to work together, so you go from problem to deployed system without stitching together tools from different vendors.

AI Model Layer

Base Model

Moondream 3

Fine-Tuning

SFT via Lens

Production Model

Moondream 2

Deployment Layer

Inference Engine

Photon

Target Hardware

NVIDIA T4

Deployment

On-Premises

Technical Details

Training Method

SFT

Training Steps

1000

Task Type

query

Template Match

0%50%

Ready to build your solution?

Talk to our team about how Moondream can solve your specific vision AI challenge, from model training through production deployment.