Driver Monitoring System

Vehicle → Face → VLM analysis pipeline

Driver Monitoring System

Detailed Project 2024–2025

PythonYOLOv8YOLO12DeepFaceSCRFDLLaVAMoondream2

Description

A computer vision pipeline for detecting traffic violations (tinted windows, seatbelts, child seats, phone usage) in three stages: vehicle detection, face detection on vehicle ROIs, and VLM analysis for contextual understanding. The staged design improves precision and runtime, while YAML configs let operators swap models, tune thresholds, and choose outputs quickly.

Key Features

Detectors — YOLOv8/YOLO12 vehicle detection over frames/streams provides high recall at runtime‑compatible speeds.
Face stage — RetinaFace/SCRFD over vehicle ROIs locates driver/passenger robustly, minimizing false positives.
VLM stage — LLaVA/Moondream2/Rex‑Omni infer contextual violations (e.g., seatbelt, phone usage) beyond simple heuristics.
Runtime — GPU/CPU auto‑detection, batch control, and headless‑friendly operation suit pipelines and server deployments.
Config — YAML‑driven model selection, thresholds, and outputs enable fast iteration for different environments.
Outputs — violation logs, crops, annotated frames, and JSON summaries integrate cleanly with downstream systems.

Challenges and Solutions

Trade‑offs — balancing speed vs accuracy across detectors/VLMs depending on hardware constraints and targets.
Pipeline — robust orchestration across stages to avoid duplicated work and ensure consistent ROI propagation.
Tuning — threshold calibration and false‑positive reduction strategies guided by evaluation datasets.