Video Activity Recognition for Retail
National Retail Chain
Automated detection of specific actions in surveillance footage for loss prevention and operations.

Monitoring in-store activity across hundreds of camera feeds required a large team of security operators. Human reviewers could only actively watch a fraction of feeds at any time, and fatigue led to missed incidents. The retailer needed automated activity classification to flag events of interest in real time.
Fine-tuned on labeled video frames covering dozens of action categories, the Moondream model identifies specific activities happening in each frame. SFT training on temporal action data enabled the model to distinguish between similar actions that the base model consistently confused.
- Action classification accuracy improved from 10.2% to 38.4%
- Covers 174 distinct action categories
- Enables real-time flagging across hundreds of camera feeds
- Reduced security staffing requirements for video monitoring
Complete Vision AI Stack
This solution uses Moondream's integrated stack from model training through production deployment. Every layer is designed to work together, so you go from problem to deployed system without stitching together tools from different vendors.
AI Model Layer
Base Model
Moondream 3
Fine-Tuning
SFT via Lens
Production Model
Moondream 2
Deployment Layer
Inference Engine
Photon
Target Hardware
NVIDIA T4
Deployment
On-Premises
Training Method
SFT
Training Steps
1000
Task Type
query
Template Match
0% → 50%
Ready to build your solution?
Talk to our team about how Moondream can solve your specific vision AI challenge, from model training through production deployment.