AI-Powered Industrial Surveillance Platform

Unified Technical Blueprint — Part A: Sections 1-10

Document Property	Value
Version	1.0.0
Classification	Technical Blueprint — Production Design
Target DVR	CP PLUS ORANGE CP-UVR-0801E1-CV2
Channels	8 active (scalable to 64+)
Resolution	960 x 1080 per channel
DVR Network	192.168.29.200/24, RTSP port 554
Date	2025

Cross-Reference Guide: This unified blueprint synthesizes six specialist design documents. For detailed specifications on any subsystem, refer to:

architecture.md — Full architecture, scaling, failover, cost estimation

video_ingestion.md — RTSP configuration, FFmpeg commands, edge gateway specs

ai_vision.md — Model configurations, inference code, benchmarks

database_schema.md — Complete DDL, triggers, views, RLS policies

suspicious_activity.md — Detection algorithms, scoring engine pseudocode

training_system.md — Training pipelines, quality gates, versioning logic

Section 1: Executive Summary
Section 2: Kimi Swarm Team and Agent Responsibilities
Section 3: Assumptions
Section 4: Full Architecture
Section 5: Data Flow from DVR to Cloud to Dashboard
Section 6: Recommended Tech Stack
Section 7: Database Schema
Section 8: AI Model and Training Strategy
Section 9: Suspicious Activity Night-Mode Design
Section 10: Live Video Streaming Design

Section 1: Executive Summary

1.1 Project Objective

This blueprint defines the complete technical design for an AI-powered industrial surveillance platform that transforms a legacy CP PLUS 8-channel DVR system into a modern, intelligent security operations center. The platform processes real-time video from 8 camera channels, applies state-of-the-art computer vision and face recognition AI, detects suspicious activity during night hours, and provides a unified dashboard for security operators — all while maintaining the highest standards of reliability, security, and data privacy.

The system is designed around a cloud+edge hybrid architecture where all compute-intensive AI inference runs in the cloud (AWS Mumbai), while a local edge gateway handles stream ingestion, buffering, and site-local concerns. A WireGuard VPN tunnel protects all communication between edge and cloud, ensuring the DVR has zero public internet exposure.

1.2 Key Capabilities

Capability	Description	Technology
Human Detection	Real-time person detection across all 8 channels at 15-20 FPS	YOLO11m + TensorRT FP16, 640x640
Face Detection	Accurate face localization with 5-point landmarks for alignment	SCRFD-500M-BNKPS, 640x640
Face Recognition	512-D embedding extraction with 99.83% LFW accuracy	ArcFace R100 IR-SE100 (MS1MV3)
Person Tracking	Persistent identity tracking across frames with occlusion recovery	ByteTrack (Kalman + IoU), 80.3% MOTA
Unknown Clustering	Automatic grouping of unknown faces for operator review	HDBSCAN + DBSCAN fallback, 89.5% purity
Night Mode Surveillance	10-detection-module suspicious activity analysis (22:00-06:00)	Composite scoring engine with time-decay
AI Vibe Controls	Three intuitive presets (Relaxed/Balanced/Strict) mapping to 4 confidence levels	Dynamic threshold adjustment
Safe Self-Learning	Three-mode training system with conflict detection and approval workflows	MLflow + Airflow + Manual Review
24/7 Reliability	Graceful degradation: video never stops, AI catch-up on recovery	Tiered storage + circuit breakers + replay
Real-Time Alerts	6-level escalation (NONE to EMERGENCY) with multi-channel notifications	Telegram, WhatsApp, Email, Webhook
Live Dashboard	Multi-camera grid with HLS streaming and single-camera low-latency WebRTC	Next.js 14 + HLS.js + WebRTC

1.3 Architecture Approach

The platform follows a cloud+edge+VPN hybrid pattern with five network security zones:

Cameras (8ch) --> DVR (local) --> Edge Gateway (local) --> WireGuard VPN --> AWS Cloud (EKS)
                                      |                        |
                                      | 2TB NVMe buffer         | Encrypted tunnel
                                      | 7-day ring buffer       | UDP 51820
                                      | FFmpeg ingestion        | ChaCha20-Poly1305

Key architectural decisions:

Decision	Choice	Rationale
Cloud Provider	AWS ap-south-1 (Mumbai)	Lowest latency to India, mature managed services
Container Orchestration	Amazon EKS + K3s edge	Managed control plane, GPU node support, lightweight edge
VPN	WireGuard	~60% faster than OpenVPN, modern crypto, simple setup
Message Queue	Apache Kafka (MSK)	Durable ordered log, replay capability, proven at scale
AI Inference	NVIDIA Triton + TensorRT	GPU-optimized, dynamic batching, model ensemble
Database	PostgreSQL 16 + pgvector	ACID compliance, native 512-D vector support
Object Storage	MinIO (edge+cloud) + S3 (archive)	S3-compatible API, tiered cost optimization

1.4 Target Environment

The platform targets a CP PLUS ORANGE CP-UVR-0801E1-CV2 DVR with the following characteristics:

Property	Value	Impact on Design
Brand/Model	CP PLUS ORANGE CP-UVR-0801E1-CV2	Dahua-compatible RTSP URL scheme
Channels	8 active	Initial deployment scope
Resolution	960 x 1080 per channel	AI input: letterbox to 640x640
LAN IP	192.168.29.200/24	Edge gateway on same subnet
RTSP Port	554	TCP interleaved mandatory
ONVIF	V2.6.1.867657 (Server V19.06)	Auto-discovery supported
DVR Disk	FULL (0 bytes free)	All archival is edge-managed; no DVR recording
VPN Access	WireGuard-secured	No public exposure; all traffic encrypted

Critical Design Impact: The DVR disk being full means the system cannot rely on DVR-side recording or playback features. All archival storage is managed by the edge gateway's 2TB NVMe buffer and cloud tiering.

1.5 Key Differentiators

1. AI Vibe Controls Instead of exposing complex threshold parameters to operators, the system provides three intuitive "vibe" presets — Relaxed, Balanced, and Strict — that internally map to optimized configurations for detection sensitivity and face match strictness. This innovation makes the system accessible to non-technical security staff while maintaining AI precision.

2. Safe Self-Learning Training System The platform captures operator corrections (confirmations, corrections, merges, rejections) and feeds them back into model improvement through a carefully designed three-mode learning pipeline: Manual Only, Suggested Learning (recommended), and Approved Auto-Update. A synchronous conflict detector blocks five types of label conflicts before they reach the training dataset, ensuring model integrity.

3. 24/7 Reliability with Graceful Degradation The system is architected around a single priority: video recording never stops. If the AI inference service fails, recording continues locally with queued catch-up processing on recovery. If the VPN tunnel fails, the edge gateway maintains 7 days of local buffer. If the cloud database fails, alerts accumulate in Kafka's durable log. Every failure mode has a defined degradation strategy.

4. 10-Module Night Surveillance The suspicious activity detection system goes beyond simple motion detection to provide comprehensive behavioral analysis through 10 specialized detection modules — from intrusion and loitering to abandoned objects and repeated re-entry patterns — all combined through a composite scoring engine with exponential time-decay.

1.6 Production Readiness Assessment

Dimension	Status	Notes
Architecture Completeness	Production-Ready	All 12 services fully specified with resource allocations
AI Model Selection	Production-Ready	Industry-standard models with published benchmarks
Database Design	Production-Ready	29 tables, 4 views, 8 triggers, partitioning, RLS
Security Architecture	Production-Ready	7-layer defense in depth, encrypted credentials, VPN-only
Scaling Path	Defined	8 -> 16 -> 32 -> 64+ cameras with concrete resource allocations
Failover Design	Production-Ready	Graceful degradation matrix for all failure modes
Estimated Timeline	14 weeks	4 implementation phases defined
Estimated Monthly Cost	~$2,140 USD	8-camera deployment at steady state

Section 2: Kimi Swarm Team and Agent Responsibilities

The unified blueprint was synthesized from the outputs of 11 specialist agents, each responsible for a specific domain of the platform design.

2.1 Agent Responsibility Matrix

#	Agent	Responsibility	Key Deliverables
1	Requirements Analyst	Elicited and structured all functional/non-functional requirements	Requirements traceability matrix, user stories, acceptance criteria
2	System Architect	Designed overall cloud+edge+VPN topology and service interactions	Deployment topology, 5 security zones, scaling roadmap, failover matrix
3	Video Ingestion Engineer	Specified RTSP configuration, edge gateway, and stream processing	RTSP URL patterns, FFmpeg commands, auto-reconnect logic, HLS generation
4	AI Vision Scientist	Selected and configured all CV/AI models for the inference pipeline	Model selection table, inference pipeline architecture, confidence handling
5	Database Architect	Designed complete data model with partitioning, indexing, and security	29 tables + 4 views + 8 triggers, pgvector HNSW index, RLS policies
6	Suspicious Activity Designer	Designed 10 detection modules and composite scoring engine	Detection algorithms, scoring formula, YAML configuration schema
7	Training System Engineer	Designed self-learning pipeline with safety controls	3 learning modes, conflict detection, quality gates, versioning
8	Frontend Developer	Designed Next.js dashboard with real-time video and alerts	Component architecture, HLS.js integration, WebSocket alerts
9	DevOps Engineer	Specified CI/CD, monitoring, and infrastructure-as-code	GitHub Actions + ArgoCD, Prometheus/Grafana, alerting rules
10	Security Architect	Designed defense-in-depth security across all layers	7 security layers, secret management, encryption standards
11	Technical Writer (this document)	Synthesized all specialist outputs into unified blueprint	10-section unified document with cross-references

2.2 Agent Interaction Flow

+-----------------------------------------------------------------------------+
|                         KIMI SWARM TEAM ORCHESTRATION                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|   Requirements Analyst                                                       |
|        |                                                                     |
|        v                                                                     |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | System  |<-->| Video   |<-->| AI      |<-->| Database|                  |
|   |Architect|    |Ingestion|    |Vision   |    |Architect|                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^                                              |                      |
|        |           +---------+    +---------+        |                      |
|        +---------->|Suspicious|<-->|Training |<-------+                      |
|                    |Activity  |    |System   |                               |
|                    |Designer  |    |Engineer |                               |
|                    +---------+    +---------+                               |
|                        |                                              |
|                        v                                              |
|                   +---------+    +---------+    +---------+           |
|                   |Frontend |    |DevOps   |    |Security |           |
|                   |Developer|    |Engineer |    |Architect|           |
|                   +---------+    +---------+    +---------+           |
|                        |                                              |
|                        v                                              |
|                   +---------------------+                             |
|                   | Technical Writer    |                             |
|                   | (Unified Blueprint) |                             |
|                   +---------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

2.3 Cross-Agent Design Consistency

The following cross-cutting concerns were harmonized across all agent outputs during synthesis:

Concern	Resolution	Agents Coordinated
Video latency budget	< 100ms end-to-end (AI); ~35-65s (HLS live)	Video Ingestion, AI Vision, Frontend
Face embedding storage	512-D float32, pgvector HNSW index, cosine similarity	Database, AI Vision, Training
Event data retention	90 days hot (MinIO), 1 year cold (Glacier), 7 days edge	Database, Architecture, Video Ingestion
Alert escalation	6 levels: NONE -> LOW -> MEDIUM -> HIGH -> CRITICAL -> EMERGENCY	Suspicious Activity, Database, Frontend
Model versioning	Semantic MAJOR.MINOR.PATCH with MLflow registry	Training, AI Vision, Architecture
Graceful degradation	Video never stops; AI catch-up on recovery	Architecture, Video Ingestion, AI Vision
Security zones	5 zones: Internet -> ALB -> Application -> Data -> Edge	Architecture, Security, Video Ingestion

Section 3: Assumptions

All assumptions made across the specialist designs are consolidated below. These should be validated before implementation begins.

3.1 Network and Hardware Assumptions

ID	Assumption	Validation Method	Risk if Invalid
NW-01	Edge gateway has dual Ethernet: one for local DVR subnet (192.168.29.0/24), one for internet/VPN	Physical site survey	Cannot bridge DVR to VPN
NW-02	Site internet bandwidth >= 16 Mbps sustained upload for 8 channels	ISP speed test	Video drops, AI delays
NW-03	WireGuard UDP port 51820 is not blocked by site firewall	Firewall rule check	VPN cannot establish
NW-04	DVR RTSP server supports TCP interleaved transport (`rtsp_transport tcp`)	FFmpeg test probe	UDP fallback has packet loss
NW-05	DVR supports 16+ concurrent RTSP sessions (8 channels x 2 streams)	Session stress test	Stream contention
NW-06	MTU 1400 is viable through site NAT/firewall for WireGuard tunnel	Ping with DF bit test	Fragmentation issues
HW-01	Intel NUC 13 Pro (i5-1340P, 16GB RAM, 512GB NVMe) is available for edge gateway	Hardware procurement	May need Jetson Orin alternative
HW-02	Edge gateway has UPS backup for graceful shutdown on power loss	Electrical survey	Data corruption on hard power-off
HW-03	AWS g4dn.xlarge (T4 GPU) instances are available in ap-south-1	AWS EC2 capacity check	Need alternative GPU instance

3.2 DVR Capabilities Assumptions

ID	Assumption	Validation Method	Risk if Invalid
DVR-01	DVR RTSP streams are accessible at `rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M`	FFmpeg connectivity test	Need alternative URL format
DVR-02	DVR continues serving RTSP streams even with disk full (0 bytes free)	24-hour stream stability test	Streams may stall
DVR-03	DVR sub-stream (subtype=1) provides sufficient quality for AI inference (typically 352x288 to 704x576)	Frame quality inspection	May need main stream for AI
DVR-04	DVR ONVIF server supports device discovery and stream URI retrieval	ONVIF Device Manager test	Manual camera configuration needed
DVR-05	DVR channel numbering is 1-indexed (1-8)	ONVIF profile enumeration	Off-by-one errors in configuration
DVR-06	DVR Digest authentication works with the provided credentials	RTSP DESCRIBE request test	May need Basic auth or different scheme

3.3 Environmental Assumptions

ID	Assumption	Impact if Invalid
ENV-01	Cameras provide adequate lighting for face recognition during night hours (minimum 10 lux at face distance)	Face recognition accuracy degrades; may need IR illumination
ENV-02	Camera angles allow frontal face capture at entry/exit points (yaw < 45 degrees)	Face recognition miss rate increases
ENV-03	Indoor industrial environment with minimal weather interference	False positive rate from rain/shadows is low
ENV-04	Maximum person-to-camera distance is within 10 meters for face recognition	Faces may be too small (< 20px) for reliable detection
ENV-05	Camera positions are stable (no PTZ movement during normal operation)	Zone calibration remains valid

3.4 Operational Assumptions

ID	Assumption	Impact if Invalid
OPS-01	Security operators will review unknown face clusters and provide identity labels daily	Unknown person database grows without enrichment
OPS-02	Admin will review training suggestions at least weekly in "Suggested Learning" mode	Training queue backlog accumulates
OPS-03	Site has authorized personnel who can access edge gateway for maintenance (SSH, physical)	Remote troubleshooting limited
OPS-04	Alert fatigue is a genuine concern — false positive rate > 20% leads to ignored alerts	AI vibe controls and suppression tuned accordingly
OPS-05	Incident video review requires 10-second pre-event and 30-second post-event clips	Clip configuration fixed

3.5 Security Assumptions

ID	Assumption	Impact if Invalid
SEC-01	WireGuard encryption (ChaCha20-Poly1305) meets organizational security requirements	May need additional encryption layer
SEC-02	AWS VPC with private subnets satisfies data residency requirements for India	Compliance review needed
SEC-03	Face embeddings (512-D vectors) do not constitute PII under applicable regulations	Legal review needed for biometric data handling
SEC-04	Edge gateway physical security is equivalent to server room security	Tampering risk if edge is physically accessible
SEC-05	DVR credentials can be stored encrypted (AES-256) in cloud database	Key management infrastructure required

3.6 AI Performance Assumptions

ID	Assumption	Impact if Invalid
AI-01	YOLO11m TensorRT FP16 achieves > 75% person AP@50 on surveillance footage	May need fine-tuning on site-specific data
AI-02	ArcFace R100 achieves > 98% Rank-1 accuracy on enrolled persons with 5+ reference images	Enrollment quality gates ensure minimum samples
AI-03	HDBSCAN achieves > 89% cluster purity on 512-D face embeddings from this camera setup	Fallback to DBSCAN if density varies too much
AI-04	ByteTrack maintains < 2 ID switches per 100 frames in industrial environment with occlusion	May need BoT-SORT upgrade for complex scenes
AI-05	GPU (T4) can sustain 15-20 FPS processing per stream across 8 streams with batching	CPU fallback at 5-8 FPS if GPU unavailable

Section 4: Full Architecture

4.1 High-Level System Architecture

The platform employs a cloud+edge hybrid architecture with five network security zones. Video streams are ingested at the edge, processed by AI in the cloud, and presented through a web-based dashboard. A WireGuard VPN tunnel provides encrypted, zero-exposure connectivity between edge and cloud.

+=============================================================================+
|                         CLOUD+EDGE+VPN ARCHITECTURE                          |
+=============================================================================+
|                                                                              |
|   ZONE 0: INTERNET (UNTRUSTED)                                               |
|   +---------------------+                                                    |
|   |  Users / Browsers   |                                                    |
|   |  HTTPS :443         |                                                    |
|   +----------+----------+                                                    |
|              |                                                               |
|              v                                                               |
|   ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                       |
|   +--------------------------------------------------------------+          |
|   |  AWS ALB (:443) + WAF v2 + Rate Limit + Geo-Restriction      |          |
|   |       |                                                      |          |
|   |       v                                                      |          |
|   |  Traefik Ingress Controller (:8443)                          |          |
|   |  - Route: /api/*  -> Backend Service                         |          |
|   |  - Route: /ws/*   -> WebSocket Handler                       |          |
|   |  - Route: /       -> Next.js Web App                         |          |
|   |  - TLS: Let's Encrypt auto certificates                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 2: AWS VPC APPLICATION (TRUSTED)                                      |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Stream      |  | AI Inference|  | Suspicious Activity |   |          |
|   |  | Ingestion   |  | Service     |  | Service (Night Mode)|   |          |
|   |  | (Go/FFmpeg) |  | (Triton)    |  | (Go/Python)         |   |          |
|   |  | :8081       |  | :8001 gRPC  |  | :8083               |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Backend API |  | Training    |  | Notification        |   |          |
|   |  | (Go/Gin)    |  | Service     |  | Service             |   |          |
|   |  | :8080       |  | (PyTorch)   |  | (Go)                |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +--------------------+                                      |          |
|   |  | Web Frontend       |  HLS Playback Service               |          |
|   |  | (Next.js 14 :3000) |  (Go :8085)                         |          |
|   |  +--------------------+                                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                                   |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  | PostgreSQL  |  | Redis       |  | Kafka       |           |          |
|   |  | 16 (RDS)    |  | 7 Cluster   |  | (MSK)       |           |          |
|   |  | :5432       |  | :6379       |  | :9092       |           |          |
|   |  | pgvector    |  | Pub/Sub     |  | 3 brokers   |           |          |
|   |  | HNSW index  |  | Streams     |  | 3 AZs       |           |          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   |  | MinIO       |  | S3 (Cold Archive)                 |      |          |
|   |  | (S3-compat) |  | - Standard (30d)                  |      |          |
|   |  | :9000       |  | - IA (31-90d)                     |      |          |
|   |  | 10 TB       |  | - Glacier Deep Archive (90d+)     |      |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              | WireGuard VPN Tunnel (UDP 51820)                                |
|              | ChaCha20-Poly1305 encryption                                    |
|              | Cloud peer: 10.200.0.1/32 <-> Edge peer: 10.200.0.2/32         |
|              v                                                               |
|   ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                                 |
|   +--------------------------------------------------------------+          |
|   |  +--------------------------------------------------------+  |          |
|   |  |              EDGE GATEWAY (Intel NUC)                  |  |          |
|   |  |  Ubuntu 22.04 LTS | K3s v1.28+ | 2TB NVMe             |  |          |
|   |  |                                                          |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Stream Manager  |  | HLS Segmenter   |                |  |          |
|   |  |  | (Python/asyncio)|  | (FFmpeg/nginx)  |                |  |          |
|   |  |  | 8x RTSP feeds   |  | 2s segments     |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Frame Extractor |  | Buffer Manager  |                |  |          |
|   |  |  | (AI decimation) |  | (20GB ring buf) |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  |  | VPN Client (WireGuard)  |  Health Monitor         |    |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  +--------------------------------------------------------+  |          |
|   |                            |                                             |
|   |   Local Network (192.168.29.0/24)                                       |
|   |   +------------------+    +------------------+                           |
|   |   | CP PLUS DVR      |    | Local Monitor    |                           |
|   |   | 192.168.29.200   |    | 192.168.29.10    |                           |
|   |   | 8ch | RTSP :554  |    | (optional)       |                           |
|   |   +------------------+    +------------------+                           |
|   |   CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8                                      |
|   +--------------------------------------------------------------+          |
|                                                                              |
+=============================================================================+

4.2 Service Interaction Diagram

+-----------------------------------------------------------------------------+
|                           SERVICE INTERACTIONS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|   INTERNET USERS                                                             |
|        |                                                                     |
|        | HTTPS :443                                                          |
|        v                                                                     |
|   +---------+      +----------+      +----------+                           |
|   | AWS ALB |----->| Traefik  |----->| Next.js  |  Web Frontend             |
|   | +WAF    |      | Ingress  |      | (SSR)    |  Dashboard                |
|   +---------+      +----------+      +----+-----+                           |
|                                             |                                |
|                        +--------------------+--------------------+           |
|                        |                    |                    |           |
|                        v                    v                    v           |
|                   +---------+       +------------+      +----------+       |
|                   |Backend  |       | WebSocket  |      | HLS      |       |
|                   |API (Go) |       | Handler    |      | Playback |       |
|                   |:8080    |       | /ws/alerts |      | Service  |       |
|                   +----+----+       +------------+      +----+-----+       |
|                        |                                               |
|                        | gRPC :50051                                    |
|                        v                                               |
|   +---------+    +------------+    +----------+    +----------+       |
|   | Stream  |    | AI         |    |Suspicious|    |Training  |       |
|   |Ingestion|<-->| Inference  |<-->| Activity |    |Service   |       |
|   |(Go)     |    |(Triton)    |    |(Night)   |    |(PyTorch) |       |
|   +----+----+    +------+-----+    +----+-----+    +----+-----+       |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------------------------------------------------------------+   |
|   |                        KAFKA (MSK)                            |   |
|   |  streams.raw (8 parts)  ai.detections (16 parts)             |   |
|   |  alerts.critical (4 parts)  training.data (30-day ret.)      |   |
|   |  notifications.*  system.metrics (7-day ret.)                |   |
|   +---------------------------------------------------------------+   |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------+    +------------+    +----------+    +----------+       |
|   |PostgreSQL|   | Redis      |    | MinIO    |    | MLflow   |       |
|   |16 +pgvec |   |7 Cluster   |    |S3-compat |    | Model    |       |
|   |:5432     |   |:6379       |    |:9000     |    | Registry |       |
|   +---------+    +------------+    +----------+    +----------+       |
|                                                                              |
|   Edge Gateway: WireGuard peer at 10.200.0.2/32                            |
|   Stream Ingestion pulls frames via VPN -> sends to Kafka                   |
|                                                                              |
+-----------------------------------------------------------------------------+

4.3 Network Security Zones

Five security zones provide defense in depth, from the public internet to the physically isolated edge network.

+=============================================================================+
|                         NETWORK SECURITY ZONES                               |
+=============================================================================+
|                                                                              |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 0: INTERNET (UNTRUSTED)                                        |    |
|  |  - Public users, any source IP                                        |    |
|  |  - AWS Shield Standard DDoS protection                               |    |
|  |  - Geo-restriction: allow specific countries only                    |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | HTTPS :443                                    |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                |    |
|  |  - ALB + WAF v2 (SQL injection, XSS, rate limiting rules)           |    |
|  |  - Traefik Ingress (:8443)                                          |    |
|  |  - Auth: JWT + RBAC, API keys for edge gateway                     |    |
|  |  - Public API endpoints ONLY                                        |    |
|  |  SG: alb-public-sg: 443 from 0.0.0.0/0                             |    |
|  |  SG: traefik-sg: 8443 from alb-sg ONLY                              |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Internal :8080-8090                         |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 2: AWS VPC APPLICATION (TRUSTED, ISOLATED)                     |    |
|  |  - Stream Ingestion, AI Inference, Suspicious Activity              |    |
|  |  - Training, Backend API, Notification Services                     |    |
|  |  - Pod Security: No root, read-only FS, no privilege escalation    |    |
|  |  - Network Policies: Ingress only from API GW namespace            |    |
|  |  SG: app-sg: 8080-8090 from traefik-sg ONLY                         |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Data Layer :5432, :6379, :9092, :9000       |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                            |    |
|  |  - PostgreSQL (RDS), Redis (ElastiCache), Kafka (MSK)               |    |
|  |  - MinIO object storage, S3 cold archive                            |    |
|  |  - Security Groups: ONLY from app-sg                                |    |
|  |  - RDS: Encrypted at rest (AWS KMS), no public access              |    |
|  |  - S3: Bucket policy deny all except VPC endpoint                   |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | WireGuard VPN (UDP 51820)                     |
|                              | ChaCha20-Poly1305                             |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                          |    |
|  |  - Edge Gateway (Intel NUC), K3s node                                |    |
|  |  - WireGuard peer, stream ingestion, local buffer                    |    |
|  |  - DVR (192.168.29.200): NO internet access, local ONLY             |    |
|  |  - Edge Firewall: ALLOW 192.168.29.0/24 -> DVR :554,:80           |    |
|  |                   ALLOW OUT 51820/udp -> Cloud VPN endpoint        |    |
|  |                   DENY ALL other incoming                           |    |
|  +---------------------------------------------------------------------+    |
|                                                                              |
+=============================================================================+

4.4 Service Descriptions

#	Service	Purpose	Technology	Port	Replicas
1	Edge Gateway Agent	RTSP stream pull, local recording, VPN endpoint, heartbeat	Go 1.21, systemd + K3s	8080, 51820	1 (per site)
2	Stream Ingestion	Receive frames from edge, decode, produce to Kafka, store segments	Go 1.21, FFmpeg	8081	3-20 (HPA)
3	AI Inference	GPU-accelerated detection, face recognition, embedding	Triton 2.40, TensorRT	8000, 8001, 8002	1-4 (GPU HPA)
4	Suspicious Activity	Night-mode analysis, 10 detection modules, scoring engine	Python 3.11, OpenCV	8083	2-8 (HPA)
5	Training Service	Model retraining, fine-tuning, A/B validation	PyTorch 2.1, CUDA 12.1	8084	0-1 (GPU spot)
6	Backend API	REST API, authentication, business logic	Go 1.21, Gin	8080	3-10 (HPA)
7	Web Frontend	Dashboard, live view, timeline, analytics	Next.js 14, React 18	3000	3 (CDN)
8	Notification	Multi-channel alert dispatch (Telegram, WhatsApp, Email)	Go 1.21	8086	2-5 (HPA)
9	HLS Playback	HLS segment serving for dashboard live view	Go 1.21	8085	2-4 (HPA)
10	PostgreSQL	Primary database with pgvector for embeddings	PostgreSQL 16 (RDS)	5432	1 (Multi-AZ)
11	Redis	Session store, cache, pub/sub, stream tracking	Redis 7 (ElastiCache)	6379	2 shards x 2 replicas
12	Kafka	Event bus, durable log, stream replay	Apache Kafka (MSK)	9092	3 brokers x 3 AZs
13	MinIO	Object storage for video, snapshots, model artifacts	MinIO (S3-compatible)	9000, 9001	Edge: 1, Cloud: 4

4.5 Physical Edge Gateway Specification

Component	Specification
Hardware	Intel NUC 13 Pro, Core i5-1340P (12 cores, 16 threads)
Alternative	NVIDIA Jetson Orin NX 16GB (for on-edge AI inference)
RAM	16GB DDR4-3200 (32GB recommended for 16+ channels)
Storage	2TB NVMe SSD (7-day circular buffer for all 8 streams)
LAN	Intel i226-V 2.5GbE (local DVR subnet)
WAN	Second Ethernet or WiFi (internet for VPN)
OS	Ubuntu 22.04.4 LTS Server (no GUI)
Container Runtime	Docker CE 25.x + Docker Compose 2.x
K8s Distribution	K3s v1.28+ (lightweight, single-node or 2-node HA)
Power	UPS-backed, auto-restart on power loss (BIOS setting)
Network	Dual interface: eth0 for local DVR, eth1 for internet/VPN

4.6 Cloud Infrastructure Specification

Component	Specification
Region	Primary: ap-south-1 (Mumbai), DR: ap-southeast-1 (Singapore)
VPC	10.100.0.0/16, 3 AZs, private subnets only for workloads
EKS	Managed node groups: `on-demand` for API, `spot` for batch/GPU
GPU Nodes	g4dn.xlarge (NVIDIA T4) for Triton inference, 1-4 auto-scaled
ALB	Internet-facing, WAF v2 attached, Shield Advanced optional
RDS	PostgreSQL 16, db.r6g.xlarge, Multi-AZ, encrypted at rest
ElastiCache	Redis 7, cluster mode enabled, 2 shards x 2 replicas
MSK (Kafka)	3 broker nodes, kafka.m5.large, 3 AZs
S3	Standard (hot 30d), IA (31-90d), Glacier Deep Archive (90d+)

4.7 Scaling Approach

The system scales from the initial 8-camera deployment to 64+ cameras through well-defined phases:

+-----------------------------------------------------------------------------+
|                        CAMERA SCALING ROADMAP                                |
+-----------------------------------------------------------------------------+
|                                                                              |
|  CURRENT: 8 cameras (1 DVR)                                                  |
|  +-- Edge: Intel NUC i7, 32GB RAM                                           |
|  +-- Bandwidth: ~16 Mbps upstream (2 Mbps per H.264 stream)                 |
|  +-- Cloud AI: 1x T4 GPU (8 streams @ 1 fps, batch=8)                       |
|  +-- Kafka: 8 partitions (streams.raw)                                      |
|  +-- PostgreSQL: db.r6g.xlarge                                              |
|  +-- Monthly cost: ~$2,140                                                  |
|                                                                              |
|  PHASE 1: 16 cameras (2 DVRs / 2 sites)                                      |
|  +-- Edge: 2x Intel NUC (one per site)                                      |
|  +-- Bandwidth: ~32 Mbps                                                    |
|  +-- Cloud AI: 1x T4 GPU (batch=16, still sufficient)                       |
|  +-- Kafka: 16 partitions                                                   |
|  +-- Monthly cost: ~$3,200                                                  |
|                                                                              |
|  PHASE 2: 32 cameras (4 DVRs / 4 sites)                                      |
|  +-- Edge: 4x Intel NUC                                                     |
|  +-- VPN: Hub-spoke model (4 edge peers -> 1 cloud endpoint)                |
|  +-- Bandwidth: ~64 Mbps                                                    |
|  +-- Cloud AI: 2x T4 GPUs (HPA: 2-6 replicas)                               |
|  +-- Kafka: 32 partitions                                                   |
|  +-- PostgreSQL: db.r6g.2xlarge                                             |
|  +-- Monthly cost: ~$5,500                                                  |
|                                                                              |
|  PHASE 3: 64 cameras (8 DVRs / 8 sites)                                      |
|  +-- Edge: 8x Intel NUC (or Jetson Orin for edge AI pre-filter)              |
|  +-- Bandwidth: ~128 Mbps (dedicated circuit recommended)                   |
|  +-- Cloud AI: 4x T4 GPUs or 2x A10G (g5.2xlarge)                           |
|  +-- Kafka: 64 partitions, consider MSK multi-cluster                        |
|  +-- PostgreSQL: db.r6g.4xlarge + read replica                              |
|  +-- Monthly cost: ~$9,800                                                  |
|                                                                              |
+-----------------------------------------------------------------------------+

4.8 Failover and Reliability Design

The graceful degradation matrix defines behavior for every failure mode:

+=============================================================================+
|                     GRACEFUL DEGRADATION MATRIX                              |
+=============================================================================+
|                                                                              |
|  Failure Mode              | Degradation Strategy                            |
|  ------------------------- | ----------------------------------------------- |
|  AI Inference Service DOWN | Continue recording ALL video locally            |
|  (GPU failure, model crash)| Events stored as "unprocessed"                  |
|                            | No real-time alerts                             |
|                            | Queue frames for later batch processing         |
|                            | Dashboard shows "AI OFFLINE" banner             |
|                                                                              |
|  Kafka DOWN (MSK outage)   | Edge Gateway buffers locally (20GB ring buffer) |
|                            | Backpressure: reduce to key frames only (0.2fps)|
|                            | Auto-reconnect with 2x exponential backoff      |
|                            | Replay from local buffer when Kafka recovers    |
|                                                                              |
|  VPN Tunnel DOWN           | Full local operation mode                       |
|  (internet outage)         | All recording continues locally (7-day buffer)  |
|                            | Local alert buzzer/relay (configurable)         |
|                            | No cloud dashboard access                       |
|                            | Auto-sync when VPN recovers                     |
|                                                                              |
|  PostgreSQL DOWN (RDS)     | Alert queue builds in Kafka (durable log)       |
|                            | Events not lost (Kafka 7-day retention)         |
|                            | Read-only dashboard mode (Redis cache)          |
|                            | Alert on-call engineer                          |
|                                                                              |
|  Notification Service DOWN | Alerts accumulate in DB                         |
|                            | Retry with exponential backoff                  |
|                            | Dead letter after 24 hours                      |
|                            | Dashboard shows pending count                   |
|                                                                              |
|  Edge Gateway DOWN (power) | Cloud dashboard shows "SITE OFFLINE"            |
|                            | Last known recordings in cloud                  |
|                            | Alert sent immediately                          |
|                            | UPS: graceful shutdown, preserve data           |
|                                                                              |
+=============================================================================+

Priority Order (highest first):

Video recording NEVER STOPS (local edge priority)
Critical alerts ALWAYS FIRE (local buzzer + queued cloud alerts)
AI inference gracefully degrades to batch catch-up on recovery
Dashboard operates in read-only/cache mode during DB outage
Cloud sync resumes automatically when connectivity restored

Reliability Mechanisms:

Mechanism	Implementation	Target
Stream Reconnect	Exponential backoff: 1s -> 2s -> 4s -> 8s -> max 30s	< 60s recovery
Circuit Breaker	5 failures -> OPEN (60s) -> HALF_OPEN (3 test calls) -> CLOSED	Prevent cascade failures
VPN Watchdog	Ping every 30s, restart WireGuard on 3 consecutive failures	< 90s VPN recovery
Kafka Producer	`acks=all`, `retries=10`, `enable.idempotence=true`, LZ4 compression	Zero message loss
Kafka Consumer	Manual offset commit AFTER DB write success	Exactly-once processing
Health Checks	5-layer: K8s probes -> Service metrics -> Dependency checks -> E2E synthetic -> Edge heartbeat	< 2 min detection
Auto-scaling	GPU util > 80% for 2 min -> scale out; Kafka lag > 1000 for 5 min -> scale out	Proactive capacity

Section 5: Data Flow from DVR to Cloud to Dashboard

This section traces the complete data journey from camera capture through AI processing to user presentation.

5.1 Overview: Seven Data Flows

+=============================================================================+
|                        SEVEN DATA FLOW PATHWAYS                              |
+=============================================================================+
|                                                                              |
|  Flow 1: Camera --> DVR --> Edge Gateway                                    |
|          [Analog/Digital] -> [H.264 Encode] -> [RTSP Server]                |
|                                                                              |
|  Flow 2: Edge Gateway --> VPN --> Cloud Kafka                               |
|          [FFmpeg ingest] -> [Frame extract] -> [Kafka Producer]             |
|                                                                              |
|  Flow 3: Stream Ingestion --> AI Inference                                  |
|          [Kafka Consumer] -> [GPU Batch] -> [Detection + Face Recog.]       |
|                                                                              |
|  Flow 4: AI Inference --> Events --> Database                               |
|          [Detection results] -> [Event enrich] -> [PostgreSQL]              |
|                                                                              |
|  Flow 5: Events --> Alerts --> Notifications                                |
|          [Scoring engine] -> [Alert create] -> [Multi-channel send]         |
|                                                                              |
|  Flow 6: Live Streams --> Browser Dashboard                                 |
|          [HLS segmenter] -> [Nginx relay] -> [HLS.js player]                |
|                                                                              |
|  Flow 7: Training Feedback Loop                                             |
|          [Operator review] -> [Conflict detect] -> [Model update]           |
|                                                                              |
+=============================================================================+

5.2 Flow 1: Camera to DVR to Edge Gateway

Path: Analog/Digital Camera -> DVR internal encoder -> DVR RTSP server -> Edge Gateway FFmpeg client

Protocol Stack:

Layer	Technology	Details
Camera Interface	Analog BNC / CVBS / AHD	CP PLUS DVR supports multiple analog standards
DVR Encoding	H.264 High Profile	Hardware encoder, real-time, low latency
DVR Storage	Internal HDD (currently FULL)	0 bytes free — no local recording possible
Network Transport	RTSP over TCP (interleaved)	Mandatory for reliable NAT/VPN traversal
URL Pattern	`rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M`	N=1-8, M=0(main)/1(sub)
Client	FFmpeg 6.0+	`-rtsp_transport tcp -stimeout 5000000`
Frame Rate	25 FPS (PAL) or 30 FPS (NTSC)	Configurable per channel
Resolution (main)	960 x 1080 (per channel)	Full resolution
Resolution (sub)	352 x 288 to 704 x 576	Lower bandwidth for AI

FFmpeg RTSP Connection Command:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp \
    -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c copy -f segment -segment_time 60 -reset_timestamps 1 \
    -strftime 1 "/data/buffer/ch1/%Y%m%d_%H%M%S.mkv"

Latency Budget:

Stage	Latency
Camera -> DVR (analog)	~1-5 ms
DVR encoding	~50-100 ms
RTSP over LAN	~1-2 ms
Total (camera to edge gateway)	~52-107 ms

5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud

Path: Edge Gateway FFmpeg -> Frame extraction -> JPEG encoding -> Kafka Producer -> WireGuard VPN -> Cloud MSK

Frame Processing Pipeline:

+------------+    +-------------+    +---------------+    +-------------+    +-----------+
| Raw RTSP   | -> | FFmpeg      | -> | Frame         | -> | JPEG        | -> | Kafka     |
| H.264      |    | Demux/Decode|    | Decimation    |    | Encoder     |    | Producer  |
| 25 FPS     |    |             |    | (1 fps)       |    | Quality 85  |    | (LZ4)     |
| 960x1080   |    |             |    | 640x640 crop  |    |             |    |           |
+------------+    +-------------+    +---------------+    +-------------+    +-----------+

FFmpeg Frame Extraction for AI:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -vf "fps=1,scale=640:640:force_original_aspect_ratio=decrease,pad=640:640:(ow-iw)/2:(oh-ih)/2:black" \
    -q:v 5 -f image2pipe -vcodec mjpeg pipe:1

WireGuard VPN Tunnel Configuration:

Parameter	Value
Protocol	UDP 51820
Encryption	ChaCha20-Poly1305
Key Exchange	Curve25519 (ECDH)
Preshared Key	Enabled per-peer
Keepalive	25 seconds
MTU	1400 (to account for WireGuard + IP headers)
Cloud Endpoint	10.200.0.1/32 (EC2 bastion or ALB)
Edge Endpoint	10.200.0.2/32
Route	10.200.0.0/16 (AWS VPC) accessible from edge

VPN watchdog script runs every 30 seconds; restarts WireGuard on 3 consecutive ping failures.

Latency Budget:

Stage	Latency
Frame extraction (FFmpeg)	~50-100 ms
JPEG encoding	~5-10 ms
Kafka produce (local)	~1-2 ms
WireGuard tunnel	~5-15 ms (Mumbai -> India site)
MSK broker	~1-2 ms
Total (edge to cloud Kafka)	~62-129 ms

5.4 Flow 3: Stream Ingestion to AI Inference

Path: Kafka streams.raw topic -> Stream Ingestion consumer -> Triton Inference Server -> Kafka ai.detections topic

Pipeline Architecture:

+------------+    +-------------------+    +------------------+    +-------------+
| streams.raw| -> | Stream Ingestion  | -> | NVIDIA Triton    | -> | ai.detections |
| (8 parts)  |    | (Go consumer)     |    | (GPU inference)  |    | (16 parts)    |
| JPEG frames|    | Batch aggregator  |    | gRPC :8001       |    | Detection     |
| + metadata |    | (batch=8, timeout)|    | Dynamic batching |    | + embeddings  |
+------------+    +-------------------+    +------------------+    +-------------+

Triton Model Configuration:

Model	Inputs	Outputs	GPU Memory	Latency (P50)
YOLO11m-det (TensorRT FP16)	3x640x640 float16	Bboxes, scores, labels	~2.1 GB	12 ms
SCRFD-500M (TensorRT FP16)	3x640x640 float16	Bboxes, landmarks, scores	~1.8 GB	8 ms
ArcFace R100 (TensorRT FP16)	3x112x112 float16	512-D embedding	~3.2 GB	5 ms

Total GPU memory: ~7.1 GB (fits in T4 16 GB with 8 streams)

Latency Budget:

Stage	Latency
Kafka consume (batch)	~10-50 ms
Preprocessing (resize, normalize)	~5-15 ms
YOLO11m inference (GPU)	~12 ms (P50)
SCRFD face detection (GPU)	~8 ms (P50)
ArcFace embedding (GPU, per face)	~5 ms (P50)
Post-processing (NMS, matching)	~10-30 ms
Kafka produce (results)	~1-2 ms
Total (Kafka to detection output)	~51-132 ms

5.5 Flow 4: AI Inference to Events to Database

Path: AI Detection results -> Event enricher -> PostgreSQL (multiple tables)

Data Transformation:

+------------+    +-------------------+    +---------------------+    +------------+
| Detection  | -> | Event Enricher    | -> | PostgreSQL Writer   | -> | events     |
| results    |    | - Add camera_id   |    | - UPSERT person     |    | persons    |
| (raw)      |    | - Match person    |    | - INSERT event      |    | embeddings |
|            |    | - Check whitelist |    | - INSERT embedding  |    | face_crops |
+------------+    +-------------------+    +---------------------+    +------------+

Database Write Operations per Detection:

Operation	Table	Type	Notes
Insert event record	`events`	INSERT	With bounding box, confidence, timestamp
Upsert person	`persons`	INSERT/UPDATE	If new face, create person record
Insert face crop	`face_crops`	INSERT	S3 URL, bounding box, quality score
Upsert embedding	`face_embeddings`	INSERT/UPDATE	512-D vector, pgvector HNSW index
Increment counters	`camera_stats`	UPDATE	Daily aggregation

5.6 Flow 5: Events to Alerts to Notifications

Path: AI events -> Suspicious Activity scoring engine -> Alert creation -> Notification dispatch

Scoring and Escalation:

+------------+    +-------------------+    +------------------+    +-------------+
| AI events  | -> | Suspicious Activity| -> | Alert Manager    | -> | Notification |
| (persons,  |    | Scoring Engine     |    | - Deduplicate    |    | Service      |
|  faces)    |    | - 10 modules       |    | - Rate limit     |    | - Telegram   |
|            |    | - Composite score  |    | - Suppress dup   |    | - WhatsApp   |
|            |    | - Time decay       |    | - Escalation     |    | - Email      |
+------------+    +-------------------+    +------------------+    +-------------+

Alert Escalation Matrix:

Score	Level	Color	Notification	Action
0.00 - 0.20	NONE	Gray	None	Log only
0.20 - 0.40	LOW	Blue	Dashboard only	Log + indicator
0.40 - 0.60	MEDIUM	Yellow	Dashboard + App push	Alert dispatched
0.60 - 0.80	HIGH	Orange	All of above + Telegram	Immediate alert
0.80 - 1.00	CRITICAL	Red	All of above + WhatsApp + Email	Critical alert
> 1.00	EMERGENCY	Purple + flashing	All channels + SMS	Emergency dispatch

5.7 Flow 6: Live Streams to Browser Dashboard

Path: DVR RTSP -> Edge Gateway FFmpeg -> HLS segmenter -> Nginx -> CDN -> Browser HLS.js

+--------+    +---------------+    +---------------+    +---------+    +----------+
| DVR    | -> | Edge Gateway  | -> | HLS Segmenter | -> | Nginx   | -> | Browser  |
| RTSP   |    | FFmpeg        |    | (2s segments) |    | (relay) |    | HLS.js   |
| 25 FPS |    | -copyts       |    | H.264 + AAC   |    | HTTPS   |    | Video tag|
+--------+    +---------------+    +---------------+    +---------+    +----------+

HLS Configuration:

Parameter	Value
Segment duration	2 seconds
Segment list size	5 segments (10-second sliding window)
Playlist type	Live (no #EXT-X-ENDLIST)
Codec	H.264 High Profile + AAC-LC
Adaptive bitrate	3 variants: high (3 Mbps), mid (1 Mbps), low (500 Kbps)

Latency:

Stage	Latency
DVR encoding	~50-100 ms
RTSP to edge	~1-2 ms
FFmpeg demux/remux	~20-50 ms
HLS segmenting (2s)	~2000 ms
Nginx relay	~1-5 ms
CDN propagation	~10-50 ms
HLS.js buffer	~1-2 segments (2-4s)
Browser decode	~20-50 ms
Total (camera to eye)	~2.1 - 2.3 seconds

5.8 Flow 7: Training Feedback Loop

Path: Operator review actions -> Conflict detection -> Training dataset -> Model training -> Quality gates -> Deployment

+------------+    +------------------+    +----------------+    +-------------+    +-----------+
| Operator   | -> | Conflict         | -> | Training       | -> | Quality     | -> | Deployment |
| Review     |    | Detection        |    | Dataset        |    | Gates       |    | (A/B test) |
| (confirm,  |    | (5 types)        |    | - Curate       |    | - Precision |    |            |
|  correct,  |    | - Block conflicts|    | - Label        |    |   >= 0.97   |    |            |
|  merge,    |    | - Queue safe     |    | - Augment      |    | - Recall    |    |            |
|  reject)   |    |   additions      |    | - Version      |    |   >= 0.95   |    |            |
+------------+    +------------------+    +----------------+    +-------------+    +-----------+

Training Data Flow:

Stage	Frequency	Trigger
Review action collection	Continuous	Operator clicks on dashboard
Conflict detection	Immediate (synchronous)	Every review action
Training dataset build	Weekly (or on-demand)	Queue threshold or manual
Model training	On dataset build	Airflow DAG trigger
Quality gate evaluation	After training	Automated pipeline
A/B deployment	After quality pass	Admin approval
Full production	After A/B success	Auto-promote at 48h

Section 6: Recommended Tech Stack

6.1 Technology Selection Matrix

Layer	Technology	Version	Purpose	Rationale
Cloud Platform	AWS	2025	Infrastructure (ap-south-1 Mumbai)	Best India region latency, mature managed services
Container Orchestration	Amazon EKS	v1.28+	Managed Kubernetes control plane	GPU node support, Cluster Autoscaler
Edge K8s	K3s	v1.28+	Lightweight Kubernetes at edge	Single binary, resource-efficient
VPN	WireGuard	v1.0+	Encrypted tunnel between edge and cloud	~60% faster than OpenVPN, modern crypto
Reverse Proxy	Traefik	v2.10+	Kubernetes Ingress controller	Native K8s integration, automatic TLS
AI Inference	NVIDIA Triton	2.40	GPU model serving, dynamic batching	Multi-framework, TensorRT optimization
CV Framework	OpenCV	4.8+	Image processing, pre/post-processing	Industry standard, Python/Go bindings
AI/ML Framework	PyTorch	2.1+	Model training, custom inference	Ecosystem, CUDA 12 support
Deep Learning	TensorRT	8.6+	GPU-optimized inference for YOLO, SCRFD, ArcFace	FP16 support, 3-5x speedup
Language: AI	Python	3.11	AI inference, training, suspicious activity detection	Ecosystem, scientific computing
Language: Services	Go	1.21	Stream ingestion, backend API, notifications	Performance, concurrency, small binaries
Language: Frontend	TypeScript	5.2	Web dashboard	Type safety, React ecosystem
Web Framework	Next.js	14 (App Router)	React SSR dashboard	Server components, streaming
UI Library	React	18	Component-based UI	Concurrent features, Suspense
Styling	Tailwind CSS	3.4	Utility-first CSS	Rapid development, consistent design
Video Player	HLS.js	1.4	Browser HLS playback	MSE-based, adaptive bitrate
Database	PostgreSQL	16	Primary database, vector storage	ACID, pgvector extension
Vector Search	pgvector	0.5+	HNSW index for 512-D face embeddings	Native PostgreSQL, ivfflat+hnsw
Cache/Session	Redis	7	Session store, pub/sub, rate limiting	Data structures, cluster mode
Message Queue	Apache Kafka	3.6+ (MSK)	Durable event log, stream replay	Exactly-once, retention, partitions
Object Storage	MinIO	latest (RELEASE.2024)	S3-compatible hot storage	Edge + cloud, erasure coding
Cold Archive	Amazon S3	Standard/IA/Glacier	Tiered archival (30d/90d/365d)	Cost optimization
Model Registry	MLflow	2.8+	Model versioning, experiment tracking	Open source, S3 artifact store
Orchestration	Apache Airflow	2.7+	Training pipeline DAGs	Backfill, retries, observability
Monitoring	Prometheus	2.47+	Metrics collection	Pull-based, K8s service discovery
Visualization	Grafana	10.1+	Dashboards, alerting	Panels, annotations, shared links
Log Aggregation	Grafana Loki	2.9+	Centralized logging	Label-based, cost-effective
CI/CD	GitHub Actions	v4	Build, test, lint pipelines	Native GitHub integration
GitOps	ArgoCD	2.9+	Kubernetes continuous delivery	Declarative, drift detection
Infrastructure	Terraform	1.6+	IaC for AWS resources	State management, modules
Secrets	AWS Secrets Manager	-	Encrypted credential storage	Rotation, IAM integration

6.2 Hardware Requirements

Edge Gateway (Per Site)

Component	Minimum	Recommended	High Availability
CPU	Intel i5-1340P (12 cores)	Intel i7-1370P (14 cores)	2x Intel i7 (HA cluster)
RAM	16 GB DDR4-3200	32 GB DDR4-3200	32 GB per node
Storage	1 TB NVMe SSD	2 TB NVMe SSD	2 TB per node + NAS sync
Network	1 Gbps Ethernet	2.5 Gbps Ethernet	Dual NIC + bonding
GPU (optional)	None	NVIDIA Jetson Orin NX 16GB	On-edge AI pre-filtering
Power	UPS 600VA	UPS 1000VA	Dual PSU + generator

Cloud GPU Nodes (AI Inference)

Cameras	GPU	VRAM	Streams	Cost/month (spot)
1-8	g4dn.xlarge (T4)	16 GB	8	~$200-350
8-16	g4dn.xlarge (T4)	16 GB	16	~$350-500
16-32	g4dn.2xlarge (T4)	16 GB	32	~$600-900
32-64	g5.2xlarge (A10G)	24 GB	64	~$1200-1800
64+	p4d.24xlarge (A100)	40 GB	128	~$5000-8000

6.3 Software Versions Summary

Category	Software	Version
Operating System	Ubuntu Server LTS	22.04.4
Container Runtime	Docker CE	25.x
Container Orchestration	Kubernetes (EKS/K3s)	1.28+
AI Serving	NVIDIA Triton Inference Server	2.40
GPU Runtime	CUDA	12.1+
GPU Driver	NVIDIA Driver	535+
Deep Learning Optimization	TensorRT	8.6+
AI Framework	PyTorch	2.1+
Computer Vision	OpenCV	4.8+
Video Processing	FFmpeg	6.0+
Service Language	Go	1.21+
AI/Training Language	Python	3.11+
Frontend Framework	Next.js	14
UI Library	React	18
Database	PostgreSQL	16
Message Queue	Apache Kafka	3.6+
Cache	Redis	7
Object Storage	MinIO	2024+
CI/CD	GitHub Actions	v4
GitOps	ArgoCD	2.9+
Monitoring	Prometheus + Grafana	2.47+ / 10.1+
Logging	Grafana Loki	2.9+
VPN	WireGuard	1.0+
Model Registry	MLflow	2.8+
Orchestration	Apache Airflow	2.7+
Infrastructure	Terraform	1.6+

6.4 Port Reference

Service	Port	Protocol	Location	Notes
DVR RTSP	554	TCP	192.168.29.200	Local network only
DVR HTTP	80	TCP	192.168.29.200	Admin UI, local only
DVR HTTPS	443	TCP	192.168.29.200	Admin UI, local only
DVR TCP	25001	TCP	192.168.29.200	Proprietary protocol
DVR UDP	25002	UDP	192.168.29.200	Proprietary protocol
DVR NTP	123	UDP	192.168.29.200	Time sync
WireGuard	51820	UDP	Cloud + Edge	VPN tunnel
Edge Admin	8080	TCP	192.168.29.5	Local admin UI
Edge SSH	22	TCP	192.168.29.5	Admin access only
Traefik HTTP	8000	TCP	EKS	Internal HTTP entrypoint
Traefik HTTPS	8443	TCP	EKS	Internal HTTPS entrypoint
ALB HTTPS	443	TCP	AWS	Public-facing
Backend API	8080	TCP	EKS pods	Internal service port
Triton HTTP	8000	TCP	EKS GPU nodes	Model inference HTTP
Triton gRPC	8001	TCP	EKS GPU nodes	Model inference gRPC
Triton Metrics	8002	TCP	EKS GPU nodes	Prometheus metrics
PostgreSQL	5432	TCP	RDS	VPC-private
Redis	6379	TCP	ElastiCache	VPC-private
Kafka	9092	TCP	MSK	VPC-private
MinIO API	9000	TCP	EKS + Edge	S3-compatible API
MinIO Console	9001	TCP	EKS + Edge	Admin console
Prometheus	9090	TCP	EKS	Metrics collection
Grafana	3000	TCP	EKS	Dashboards

Section 7: Database Schema

7.1 Schema Overview

The database is designed around a relational core (PostgreSQL 16) with pgvector extension for 512-dimensional face embedding storage and similarity search. The schema consists of 29 tables, 4 views, and 8 trigger functions, organized into 10 logical domains.

Schema Philosophy:

Strict normalization for reference data (cameras, persons, rules) to ensure data integrity
JSONB flexibility for event metadata and configuration to accommodate evolving AI outputs
Partitioning on all high-volume time-series tables for query performance and lifecycle management
pgvector HNSW indexing for sub-10ms face similarity search at scale
Row-level security (RLS) for multi-tenant site isolation
AES-256 encryption for all stored credentials (DVR passwords, API tokens)

7.2 Entity Relationship Overview

+=============================================================================+
|                    ENTITY RELATIONSHIP DIAGRAM                               |
+=============================================================================+
|                                                                              |
|   SITE (1) --------------------< (N) DVR                                     |
|    |                              |                                          |
|    |                              | (1)                                      |
|    |                              v                                          |
|    |                           CAMERA (N) <------------------< (N) ALERT_RULE|
|    |                              |                              |           |
|    |                              | (N)                            | (1)      |
|    |                              v                              v           |
|    |   +---------------------------------------------------------+           |
|    |   | EVENT (N) -->--(1) PERSON (1)--< (N) FACE_EMBEDDING               |
|    |   |   |                                                      |         |
|    |   |   | (N)                                                  | (N)     |
|    |   |   v                                                      v         |
|    |   | FACE_CROP (N)                                    PERSON_CLUSTER     |
|    |   |   |                                                                  |
|    |   |   | (N)                                                  +---------+|
|    |   |   v                                                      | Training||
|    |   | MEDIA_FILE (1) ----------------------------------------->| Dataset  ||
|    |   |                                                          |---------||
|    |   +--------------------------------------------------------->| Job      ||
|    |                                                              | Model    ||
|    |                              +---------+                     | Version  ||
|    |                              | Review  |                     +---------+|
|    |                              | Action  |                                |
|    |                              +---------+                                |
|    |                                    ^                                    |
|    |                                    | (N)                                |
|    +------------------------------------+                                    |
|   USER (N) -->--(N) ROLE_PERMISSION                                          |
|    |                                                                         |
|    | (1)                                                                     |
|    v                                                                         |
|   WATCHLIST (N) -->--(N) WATCHLIST_ENTRY                                     |
|                                                                              |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | Telegram|    |WhatsApp |    | Email   |    |Webhook  |                  |
|   | Config  |    | Config  |    | Config  |    | Config  |                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^              ^             ^              ^                         |
|        |              |             |              |                         |
|        +--------------+-------------+--------------+                         |
|                         |                                                    |
|                   NOTIFICATION_CHANNEL                                         |
|                         |                                                    |
|                         | (1)                                                |
|                         v                                                    |
|                   NOTIFICATION_LOG                                             |
|                                                                              |
|   +---------+    +---------+    +---------+                                  |
|   | Audit   |    | System  |    | Device  |                                  |
|   | Log     |    | Health  |    | Connect.|                                  |
|   |(partitioned) |  Log    |    |  Log    |                                  |
|   +---------+    +---------+    +---------+                                  |
|                                                                              |
+=============================================================================+

7.3 Core Tables Summary

7.3.1 Site and Infrastructure Tables

Table	Purpose	Key Fields	Rows (est.)
`sites`	Physical locations (factories, warehouses)	id, name, location, timezone, settings	1-10
`dvrs`	DVR/NVR devices per site	id, site_id, ip_address, port, username, password_encrypted, model, channels, status	1-10
`cameras`	Individual camera channels	id, dvr_id, channel_number, name, rtsp_url, resolution, fps, status, zone_config, zone_description	8-64

7.3.2 AI Detection and Identity Tables

Table	Purpose	Key Fields	Rows (est.)
`events`	All AI detection events (partitioned monthly)	id, camera_id, event_type, timestamp, confidence, bounding_box, person_id, face_crop_id, track_id	1M-10M/month
`persons`	Known and unknown individuals	id, name, status (known/unknown/blacklisted), role, company, notes, created_at	100-10,000
`face_crops`	Cropped face images metadata	id, event_id, person_id, storage_path, bounding_box, quality_score, blur_score, pose_yaw, pose_pitch	500K-5M/month
`face_embeddings`	512-D face embeddings (pgvector)	id, person_id, face_crop_id, embedding (vector(512)), model_version, is_primary	500K-5M
`person_clusters`	Unknown person cluster groups	id, cluster_label, representative_embedding_id, sample_count, first_seen, last_seen, status	10-1,000

7.3.3 Alert and Notification Tables

Table	Purpose	Key Fields	Rows (est.)
`alert_rules`	Per-camera alert configuration	id, camera_id, rule_type, name, config_json, schedule, enabled	50-500
`alerts`	Generated alert records	id, camera_id, rule_id, person_id, alert_type, severity, status, message	1K-50K/month
`notification_channels`	Alert destination endpoints	id, name, channel_type, config_json, is_active	5-20
`telegram_configs`	Telegram Bot API credentials	id, channel_id, bot_token_encrypted, chat_id	1-5
`whatsapp_configs`	WhatsApp Business API credentials	id, channel_id, api_key_encrypted, phone_number_id	1-5
`notification_log`	Delivery status per notification	id, alert_id, channel_id, status, sent_at, error_message	1K-50K/month

7.3.4 Watchlist and Access Control Tables

Table	Purpose	Key Fields	Rows (est.)
`users`	Dashboard users and operators	id, username, email, password_hash, role, is_active	5-50
`roles`	Permission roles	id, name, permissions_json	3-10
`watchlists`	Named monitoring lists	id, name, watch_type (vip/blacklist/custom), is_active	5-20
`watchlist_entries`	Persons on watchlists	id, watchlist_id, person_id, added_by, added_at	10-1,000

7.3.5 Training and ML Pipeline Tables

Table	Purpose	Key Fields	Rows (est.)
`training_datasets`	Curated face datasets for training	id, name, description, person_ids_json, sample_count, version, status	10-100
`training_jobs`	Model training job tracking	id, dataset_id, model_version_from, model_version_to, status, metrics_json	10-100
`model_versions`	Registry of trained model versions	id, version_string, training_job_id, metrics_json, is_production, is_rollback_available	10-50
`review_actions`	Operator review decisions	id, event_id, reviewer_id, action, from_person_id, to_person_id, notes	1K-100K

7.3.6 Media and Storage Tables

Table	Purpose	Key Fields	Rows (est.)
`media_files`	Registry of stored video/images	id, file_type, storage_path, size_bytes, checksum, camera_id, event_id, retention_until	100K-1M
`video_clips`	Video clip metadata for incidents	id, media_file_id, start_time, end_time, camera_id, event_id, duration_seconds	10K-100K

7.3.7 Audit and Monitoring Tables (Partitioned)

Table	Purpose	Partition	Retention
`audit_logs`	All user and system actions	Monthly by timestamp	1 year (Glacier)
`system_health_logs`	Component health metrics	Monthly by timestamp	90 days
`device_connectivity_logs`	Camera/DVR connectivity events	Monthly by timestamp	90 days

7.4 Indexing Strategy

7.4.1 pgvector HNSW Index (Critical Path)

-- HNSW index for sub-10ms face similarity search
-- ef_search controls recall/speed tradeoff (higher = more accurate, slower)
CREATE INDEX idx_face_embeddings_hnsw
ON face_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Query: Find top-K similar faces
SELECT person_id, 1 - (embedding <=> query_vector) AS similarity
FROM face_embeddings
WHERE is_primary = true
ORDER BY embedding <=> query_vector
LIMIT 5;

Parameter	Value	Rationale
`m`	16	Number of bi-directional links per node (higher = better recall, more memory)
`ef_construction`	128	Build-time exploration factor (higher = better index quality)
`ef_search` (runtime SET)	64-256	Search-time exploration factor (SET hnsw.ef_search = 128)
Distance metric	Cosine similarity (`<=>`)	Optimal for normalized face embeddings

7.4.2 B-Tree Indexes (Standard Queries)

Table	Index	Purpose
`events`	`(camera_id, timestamp DESC)`	Time-range queries per camera
`events`	`(event_type, timestamp DESC)`	Filter by event type
`events`	`(person_id)` WHERE person_id IS NOT NULL	Person event lookup
`face_crops`	`(person_id, quality_score DESC)`	Best quality face per person
`alerts`	`(status, created_at DESC)`	Pending alerts by age
`alerts`	`(severity, status)`	Critical alert dashboard
`persons`	`(status, name)`	Person directory with status filter
`persons`	`(created_at DESC)`	Recently added persons
`media_files`	`(retention_until)` WHERE retention_until < NOW() + 7 days	Expiring media cleanup

7.5 Partitioning Strategy

All high-volume time-series tables are partitioned monthly using pg_partman for automated partition management.

+-----------------------------------------------------------------------------+
|                    PARTITIONING ARCHITECTURE                                 |
+-----------------------------------------------------------------------------+
|                                                                              |
|   events (parent, empty)                                                     |
|   +-- events_y2024m01   (Jan 2024 data)                                     |
|   +-- events_y2024m02   (Feb 2024 data)                                     |
|   +-- events_y2024m03   (Mar 2024 data)                                     |
|   +-- events_y2024m04   (Apr 2024 data)                                     |
|   +-- events_y2024m05   (May 2024 data)  <-- Hot (in memory)               |
|   +-- events_default    (fallback)                                          |
|                                                                              |
|   Partition pruning: WHERE timestamp >= '2024-05-01'                        |
|                      -> Only scans events_y2024m05                           |
|                      -> ~30x faster for time-range queries                  |
|                                                                              |
|   Managed by: pg_partman extension                                          |
|   - Auto-create: 2 months ahead                                             |
|   - Auto-drop: After retention period (detach + archive)                    |
|                                                                              |
+-----------------------------------------------------------------------------+

Partitioned Tables:

Table	Partition Key	Partition Type	Retention
`events`	`timestamp`	Monthly RANGE	90 days hot, 1 year archive
`audit_logs`	`timestamp`	Monthly RANGE	1 year total
`system_health_logs`	`timestamp`	Monthly RANGE	90 days
`device_connectivity_logs`	`timestamp`	Monthly RANGE	90 days
`face_crops`	`created_at`	Monthly RANGE	90 days hot, 1 year archive

7.6 Retention Policies

Data Tier	Storage	Duration	Lifecycle
Hot Tier	PostgreSQL + MinIO	0-30 days	Fast query, indexed, in-memory cache
Warm Tier	S3 Standard	30-90 days	Available on-demand, still indexed
Cold Tier	S3 Infrequent Access	90-365 days	Retrieval within minutes
Archive Tier	Glacier Deep Archive	1-7 years	Retrieval within 12-48 hours
Compliance	Glacier Vault Lock	7+ years	Immutable, legal hold

Automated Cleanup:

Task	Frequency	Mechanism
Expire old event partitions	Daily (pg_partman)	DETACH PARTITION + S3 upload
Delete expired media files	Daily	Cron job: DELETE from media_files + MinIO removal
Purge old notification logs	Weekly	DELETE WHERE created_at < NOW() - INTERVAL '90 days'
Archive face crops to S3	Daily	Lambda: copy to S3 IA, update storage_path
Compress audit logs	Monthly	pglz/zstd compression on detached partitions
Vacuum and analyze	Weekly (auto-vacuum)	PostgreSQL autovacuum daemon

7.7 Security Considerations

7.7.1 Credential Encryption

All sensitive credentials stored with AES-256 encryption:

Table	Encrypted Field	Encryption
`dvrs`	`password_encrypted`	AES-256-CBC, key from AWS Secrets Manager
`telegram_configs`	`bot_token_encrypted`	AES-256-CBC
`whatsapp_configs`	`api_key_encrypted`	AES-256-CBC

7.7.2 Row-Level Security (RLS)

For multi-site deployments, RLS policies enforce that users only see data for sites they have access to:

-- Enable RLS on critical tables
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
ALTER TABLE persons ENABLE ROW LEVEL SECURITY;
ALTER TABLE alerts ENABLE ROW LEVEL SECURITY;

-- Policy: Users see only data from their assigned sites
CREATE POLICY site_isolation_events ON events
    USING (camera_id IN (
        SELECT c.id FROM cameras c
        JOIN dvrs d ON c.dvr_id = d.id
        JOIN site_users su ON d.site_id = su.site_id
        WHERE su.user_id = current_setting('app.current_user_id')::UUID
    ));

7.7.3 Access Control

Role	Permissions
`super_admin`	Full access to all sites, all operations
`site_admin`	Full access to assigned sites, user management
`operator`	View dashboards, acknowledge alerts, review persons
`viewer`	Read-only access to dashboards and events

7.7.4 Audit Trail

The audit_logs table (partitioned monthly) captures every significant action:

Action	Captured Data
`login`	User, IP, timestamp, MFA status, success/failure
`person_create`	Creator, name, initial status, source event
`person_update`	Updater, changed fields, old/new values
`alert_acknowledge`	Acknowledger, alert ID, timestamp
`alert_resolve`	Resolver, resolution notes
`training_approve`	Approver, model version, dataset version
`model_deploy`	Deployer, version, A/B split percentage
`config_change`	Changer, changed parameters, old/new values

7.7.5 Backup Strategy

Component	Method	Frequency	Retention
PostgreSQL	RDS automated backups	Daily	35 days
PostgreSQL	Manual snapshots	Before any schema change	90 days
MinIO/S3	Cross-region replication	Continuous	90 days in DR region
Face embeddings	pg_dump + vector export	Weekly	90 days
Model artifacts	MLflow artifact store	On training completion	Indefinite

Reference: For complete DDL including all CREATE TABLE statements, triggers, views, and functions, see database_schema.md — Sections 2 through 15 contain the full schema definition with comments and constraints.

Section 8: AI Model and Training Strategy

8.1 AI Model Selection

The inference pipeline uses three complementary deep learning models — for human detection, face detection, and face recognition — all optimized with TensorRT for GPU inference. All models run on a single NVIDIA T4 GPU with dynamic batching.

Component	Model	Framework	Input Size	FPS (T4)	Accuracy
Human Detection	YOLO11m (Ultralytics)	PyTorch -> ONNX -> TensorRT FP16	640 x 640	213	mAP@50: 80.5% (COCO)
Face Detection	SCRFD-500M-BNKPS (InsightFace)	PyTorch -> ONNX -> TensorRT FP16	640 x 640	~400	AP_medium: 87.2% (WIDERFace)
Face Recognition	ArcFace R100 (IR-SE100)	PyTorch -> ONNX -> TensorRT FP16	112 x 112	~800	99.83% (LFW), 98.35% (MegaFace)
Person Tracking	ByteTrack	Native Python + NumPy	N/A	N/A	80.3% MOTA (MOT17)
Unknown Clustering	HDBSCAN + DBSCAN fallback	scikit-learn	512-D vectors	N/A	89.5% purity, 0.855 BCubed F
Fall Detection	YOLOv8n-pose	TensorRT FP16	640 x 640	~300	Part of suspicious activity
Object Detection	YOLOv8s	TensorRT FP16	640 x 640	~450	Abandoned object detection

8.1.1 Human Detection: YOLO11m

Property	Value
Architecture	CSPDarknet backbone + PANet neck + Decoupled head
Parameters	19.6 M
FLOPs	68.2 B (at 640x640)
TensorRT Optimization	FP16, dynamic batch (1-16), layer fusion
GPU Memory	~2.1 GB at batch=8
Person class priority	Highest NMS score weighting for person class
Preprocessing	Letterbox resize to 640x640, normalize [0,1]

Export pipeline:

# PyTorch -> ONNX -> TensorRT Engine
yolo export model=yolo11m.pt format=onnx imgsz=640 half=True opset=17 simplify=True
trtexec --onnx=yolo11m.onnx --saveEngine=yolo11m.engine --fp16 \
  --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:16x3x640x640

8.1.2 Face Detection: SCRFD-500M-BNKPS

Property	Value
Architecture	Single-stage detector with FPN, BN+KPS head
Parameters	500 M (large variant for high accuracy)
Detects	Face bounding box + 5 facial landmarks
Minimum face size	20 x 20 pixels (configurable)
NMS threshold	0.45 (IoU)
Confidence threshold	0.5 (minimum detection score)
GPU Memory	~1.8 GB at batch=32

8.1.3 Face Recognition: ArcFace R100 (IR-SE100)

Property	Value
Backbone	IR-SE100 (Improved ResNet-100 with SE blocks)
Training data	MS1MV3 (5.8M images, 85K identities)
Loss function	ArcFace additive angular margin (m=0.5)
Embedding dimension	512 (float32, L2-normalized)
Distance metric	Cosine similarity (1 - cosine_distance)
Matching threshold (strict)	0.60
Matching threshold (balanced)	0.45
Matching threshold (relaxed)	0.30
GPU Memory	~3.2 GB at batch=64

Published benchmarks on standard datasets:

Dataset	Accuracy	Notes
LFW (Labeled Faces in the Wild)	99.83%	Unconstrained face verification
CFP-FP (Frontal-Profile)	99.17%	Cross-pose evaluation
AgeDB-30	98.28%	Age-invariant recognition
MegaFace (1M distractors)	98.35%	Large-scale recognition
IJB-C	96.18% (TAR@FAR=1e-4)	Template-based verification

8.2 Inference Pipeline Architecture

+=============================================================================+
|                    REAL-TIME INFERENCE PIPELINE                              |
+=============================================================================+
|                                                                              |
|  INPUT: RTSP Frame (640x640, 1 fps per stream)                              |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Frame Preprocessor| -> | YOLO11m Detector  | -> | Person Detection  |    |
|  | - Resize          |    | (TensorRT FP16)   |    | Results:          |    |
|  | - Normalize       |    | GPU: 12ms (P50)   |    | - bbox (x1,y1,x2, |    |
|  | - NCHW layout     |    | Batch: 1-16       |    |   y2)             |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     | - class (person)  |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Crop Extract | <- | SCRFD-500M        | <- | Face Detection    |    |
|  | (ROI from person  |    | (TensorRT FP16)   |    | Results:          |    |
|  |  bounding box)    |    | GPU: 8ms (P50)    |    | - face bbox       |    |
|  |                   |    | Batch: per-face   |    | - 5 landmarks     |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Alignment    | <- | ArcFace R100      | <- | Embedding Vector  |    |
|  | (5-point affine   |    | (TensorRT FP16)   |    | 512-D float32,   |    |
|  |  transform to     |    | GPU: 5ms (P50)    |    | L2-normalized     |    |
|  |  112x112)         |    | Batch: 1-64       |    |                   |    |
|  +-------------------+    +-------------------+    +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Matching     | <- | Person Tracking   | <- | Track-to-Person   |    |
|  | (cosine similarity|    | (ByteTrack)       |    | Association       |    |
|  |  vs. known DB)    |    | CPU: 2ms/frame    |    | - Match embedding |    |
|  +-------------------+    +-------------------+    |   to known persons  |    |
|       |  |  |                                      | - Create/update     |    |
|       |  |  |                                      |   track             |    |
|       v  v  v                                      +-------------------+    |
|  +-------------------+                                                        |
|  | Confidence Scorer |                                                        |
|  | (aggregate score  |                                                        |
|  |  for all detect)  |                                                        |
|  +-------------------+                                                        |
|       |                                                                       |
|       v                                                                       |
|  OUTPUT: DetectionEvent (JSON)                                               |
|  { person_id, track_id, confidence, bbox, face_crop,                         |
|    embedding, recognized_name?, quality_scores }                             |
|                                                                              |
+=============================================================================+

End-to-end latency budget per frame:

Stage	GPU	CPU Fallback
Frame preprocessing	2-5 ms	5-10 ms
YOLO11m detection	12 ms (P50)	35-56 ms (ONNX+OpenVINO)
SCRFD face detection	8 ms (P50)	15-25 ms
ArcFace embedding (per face)	5 ms (P50)	12-18 ms
ByteTrack tracking	2 ms	2-5 ms
Post-processing	5-10 ms	10-20 ms
Total (no face)	~29 ms	~67-116 ms
Total (1 face)	~34 ms	~79-134 ms
Total (5 faces)	~54 ms	~127-214 ms

8.3 Face Recognition Matching Strategy

8.3.1 Known Person Matching

+-----------------------------------------------------------------------------+
|                    FACE RECOGNITION MATCHING FLOW                            |
+-----------------------------------------------------------------------------+
|                                                                              |
|  New Face Embedding (512-D)                                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | L2 Normalize      |  embedding = embedding / ||embedding||_2              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | pgvector HNSW     | -> | Top-5 Candidates  |                              |
|  | Similarity Search |    | (cosine distance) |                              |
|  | ef_search=128     |    +-------------------+                              |
|  +-------------------+            |                                          |
|                                   v                                          |
|  +-------------------+    +-------------------+                              |
|  | Threshold Check   | <- | Best Match Score  |                              |
|  | (per AI Vibe)     |    +-------------------+                              |
|  +-------------------+            |                                          |
|       |                          |                                          |
|       +------------+-------------+                                          |
|                    |                                                        |
|         +----------+----------+                                             |
|         |                     |                                             |
|         v                     v                                             |
|    Above threshold      Below threshold                                     |
|    (Recognized)         (Unknown)                                           |
|         |                     |                                             |
|         v                     v                                             |
|  +------------+       +------------------+                                 |
|  | Assign to  |       | Check against    |                                 |
|  | known      |       | recent unknown   |                                 |
|  | person_id  |       | embeddings       |                                 |
|  | (with      |       | (5-min window)   |                                 |
|  | confidence)|       +--------+---------+                                 |
|  +------------+                |                                            |
|                                |                                            |
|                       +--------+--------+                                   |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Similar unknown    No similar unknown                      |
|                  (same person)      (new unknown)                           |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Reuse person_id   Create new                              |
|                  Update centroid   unknown person                           |
|                                    record                                   |
|                                                                              |
+-----------------------------------------------------------------------------+

8.3.2 AI Vibe Threshold Mapping

The AI Vibe system maps three intuitive presets to internal confidence thresholds:

Vibe	Face Match Threshold	Detection Confidence	Use Case
Relaxed	0.30 cosine similarity	0.40 minimum	Known persons re-identified more easily; more false positives acceptable
Balanced	0.45 cosine similarity	0.55 minimum	Default; good precision-recall tradeoff
Strict	0.60 cosine similarity	0.70 minimum	High-security scenarios; minimize false positives

Per-stream Vibe Selection:

Vibe can be set per camera via dashboard
Night mode automatically applies Strict vibe
Alert-triggered cameras automatically upgrade to Strict for 5 minutes

8.4 Unknown Person Clustering Approach

Unknown persons (faces that don't match any known person above threshold) are automatically clustered to help operators identify recurring visitors.

8.4.1 Clustering Pipeline

+-----------------------------------------------------------------------------+
|                    UNKNOWN PERSON CLUSTERING PIPELINE                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Unknown Face Embeddings (streaming)                                         |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | Sliding Window    |  Keep last N embeddings in memory (configurable)     |
|  | Buffer (500)      |  + persistent storage for long-term clustering       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | HDBSCAN Clustering| -> | Primary clusters  |  min_cluster_size=5        |
|  | (density-based)   |    | formed             |  min_samples=2             |
|  | metric=cosine     |    +-------------------+  eps=auto                   |
|  +-------------------+            |                                          |
|       | (fallback)                |                                          |
|       v                           v                                          |
|  +-------------------+    +-------------------+                              |
|  | DBSCAN Fallback   |    | Merge with        |  Check: temporal gap       |
|  | (if HDBSCAN fails |    | existing clusters |  < 30 days, cosine sim     |
|  |  to find structure|    | - centroid        |  > 0.85                    |
|  +-------------------+    |   distance        |                            |
|                           +-------------------+                            |
|                                   |                                          |
|                                   v                                          |
|                           +-------------------+                              |
|                           | Operator Review   |  Dashboard shows clusters   |
|                           | Queue             |  pending identification     |
|                           +-------------------+                              |
|                                                                              |
+-----------------------------------------------------------------------------+

8.4.2 Clustering Parameters

Parameter	Value	Description
Algorithm	HDBSCAN (primary), DBSCAN (fallback)	Density-based for irregular cluster shapes
Distance metric	Cosine similarity	Optimal for face embeddings
Minimum cluster size	5 embeddings	Minimum to form a cluster
Minimum samples	2	Core point density threshold
Merge threshold	0.85 cosine similarity	Merge clusters if centroids are close
Temporal window	30 days	Maximum gap between cluster appearances
Review trigger	10+ embeddings	Send to operator review queue

8.4.3 Clustering Quality Targets

Metric	Target	Measurement
Cluster Purity	> 89%	% of embeddings in a cluster belonging to the same person
BCubed F-Measure	> 0.85	Harmonic mean of precision and recall for clustering
Silhouette Score	> 0.3	Separation quality between clusters
False Merge Rate	< 5%	Different persons incorrectly merged
Split Rate	< 15%	Same person split into multiple clusters

8.5 Confidence Handling

8.5.1 Confidence Score Computation

Each detection event carries an aggregate confidence score computed from multiple signals:

confidence_aggregate = weighted_average(
    detection_confidence:    0.35 * yolo_confidence,
    face_detection_quality:  0.25 * scrfd_confidence,
    face_recognition_score:  0.25 * (1 - cosine_distance_to_match),
    face_quality_score:      0.15 * quality_composite
)

Where quality_composite = average(
    1.0 - blur_score,       # Sharpness (higher is better)
    1.0 - abs(pose_yaw)/90, # Frontal preference
    illumination_score,      # Well-lit face
    resolution_adequacy      # Sufficient pixels for face
)

8.5.2 Confidence Levels

Level	Score Range	Color	Action
High Confidence	0.80 - 1.00	Green	Auto-accept, no review needed
Medium Confidence	0.60 - 0.79	Yellow	Accepted, flagged for periodic review
Low Confidence	0.40 - 0.59	Orange	Requires operator review within 24h
Very Low Confidence	0.00 - 0.39	Red	Rejected, not used for training

8.6 Training Workflow Overview

The safe self-learning system captures operator feedback and converts it into model improvements through a carefully controlled pipeline.

8.6.1 Three Learning Modes

Mode	Description	Use Case	Risk Level
Manual Only	Operator explicitly triggers training runs	Highly regulated environments	Lowest
Suggested Learning (Recommended)	System suggests training candidates; operator approves	Standard production deployment	Low
Approved Auto-Update	Auto-training triggers after admin approval threshold	Mature deployment with trusted operators	Medium

8.6.2 Training Pipeline Architecture

+=============================================================================+
|                    SAFE SELF-LEARNING PIPELINE                               |
+=============================================================================+
|                                                                              |
|  STEP 1: COLLECTION                                                          |
|  +-------------------+                                                       |
|  | Operator Review   |  confirm, correct_name, merge, reject                |
|  | Actions           |  + automatic high-confidence acceptances              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 2: CONFLICT DETECTION (Synchronous, blocks immediately)               |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Label Conflict    | -> | If conflict found | -> | Block from training |   |
|  | Detector          |    | (5 types)         |    | dataset, alert admin |   |
|  | - Same face, diff |    +-------------------+    +-------------------+    |
|  |   names           |                                                       |
|  | - Diff faces, same|                                                       |
|  |   name            |                                                       |
|  | - Merge circular  |                                                       |
|  |   reference       |                                                       |
|  | - Name to already-|                                                       |
|  |   deleted person  |                                                       |
|  | - Quality below   |                                                       |
|  |   threshold       |                                                       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 3: DATASET CURATION                                                    |
|  +-------------------+                                                       |
|  | Training Dataset  |  - Collect approved examples                         |
|  | Builder           |  - Balance classes (min 5 per person)                |
|  |                   |  - Augmentation (flip, rotate, brightness)           |
|  |                   |  - Quality filter (blur, pose, illumination)         |
|  |                   |  - Train/val split (80/20)                            |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 4: MODEL TRAINING                                                      |
|  +-------------------+                                                       |
|  | Training Job      |  - ArcFace R100 backbone                              |
|  | (Airflow DAG)     |  - Fine-tuning on curated dataset                     |
|  |                   |  - Cosine annealing LR schedule                        |
|  |                   |  - Early stopping (patience=10)                       |
|  |                   |  - Mixed precision (AMP)                              |
|  |                   |  - Typical duration: 2-8 hours on V100                |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 5: QUALITY GATES                                                       |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Gate 1: Hold-out  | -> | Gate 2: Compare   | -> | Gate 3: Identity  |    |
|  |    evaluation     |    |    vs current     |    |    accuracy       |    |
|  |    (precision,    |    |    production     |    |    (100% known)   |    |
|  |     recall, f1)   |    |    (no >2% regress)|   |                   |    |
|  +-------------------+    +-------------------+    +-------------------+    |
|       |                          |                          |                |
|       +------------+-------------+--------------------------+                |
|                    |                                                          |
|         +----------+----------+                                              |
|         |                     |                                              |
|         v                     v                                              |
|     ALL PASSED            ANY FAILED                                       |
|         |                     |                                              |
|         v                     v                                              |
|  +------------+       +------------------+                                 |
|  | Proceed to |       | REJECT           |                                 |
|  | Deployment |       | - Log failure    |                                 |
|  +------------+       | - Alert admin    |                                 |
|                       | - Keep in staging|                                 |
|                       +------------------+                                 |
|                                                                              |
|  STEP 6: DEPLOYMENT                                                          |
|  +-------------------+                                                       |
|  | A/B Testing       |  - Shadow mode: 0% traffic (validation)              |
|  | (gradual rollout) |  - Canary: 5% traffic for 24h                        |
|  |                   |  - Monitor: latency, error rate, FP rate              |
|  |                   |  - Full rollout: 100% traffic                         |
|  |                   |  - Rollback: < 60 seconds to previous version         |
|  +-------------------+                                                       |
|                                                                              |
+=============================================================================+

8.7 Model Versioning and Rollback

8.7.1 Semantic Versioning

Version Component	Increment When	Example
MAJOR (X.0.0)	Full retraining, architecture change, breaking embedding change	1.0.0 -> 2.0.0 (new backbone)
MINOR (x.Y.0)	Fine-tuning, significant new data (>50 new identities)	1.0.0 -> 1.1.0 (new employees)
PATCH (x.y.Z)	Incremental update, centroid update, hotfix	1.0.0 -> 1.0.1 (new photos added)

8.7.2 Version States

State	Description	Transition
`TRAINING`	Model is being trained	Auto -> STAGING on completion
`STAGING`	Awaiting quality gate evaluation	Auto -> AWAITING_APPROVAL on pass
`AWAITING_APPROVAL`	Pending admin approval	Manual -> CANARY on approve
`CANARY`	5% traffic, monitoring	Auto -> PRODUCTION on success (24h)
`PRODUCTION`	100% traffic, active serving	Manual -> ARCHIVED on new version deploy
`ARCHIVED`	Kept for rollback, no traffic	Auto -> ROLLBACK_AVAILABLE after 30 days
`ROLLBACK_AVAILABLE`	Can be rolled back to	Manual -> PRODUCTION on rollback trigger
`DEPRECATED`	Cannot be rolled back to	Final state

8.7.3 Rollback Procedure

+-----------------------------------------------------------------------------+
|                    EMERGENCY ROLLBACK PROCEDURE                              |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Trigger: Admin initiates rollback or automatic rollback on failure         |
|                                                                              |
|  Step 1: Validate target version exists and is in ROLLBACK_AVAILABLE state  |
|  Step 2: Load target model artifacts from S3/MinIO (pre-warm GPU)          |
|  Step 3: Atomic switch: update model reference in Triton config             |
|  Step 4: Triton SIGHUP reload (zero-downtime model swap)                   |
|  Step 5: Validate: send test inference requests, check latency              |
|  Step 6: If validation fails -> auto-revert to previous production          |
|  Step 7: If validation passes -> update database model version records      |
|  Step 8: Log rollback event in audit_logs                                   |
|                                                                              |
|  Maximum rollback time: < 60 seconds                                        |
|  Zero inference downtime during rollback                                    |
|                                                                              |
+-----------------------------------------------------------------------------+

8.8 Quality Gates

8.8.1 Gate Thresholds

Gate	Metric	Minimum	Maximum	Critical
Hold-out Evaluation	Precision	0.97	—	Yes (cannot override)
Hold-out Evaluation	Recall	0.95	—	Yes
Hold-out Evaluation	F1 Score	0.96	—	Yes
No Regression	Metric regression vs production	—	2%	No (admin can override)
Identity Accuracy	Known identity recall	100%	—	Yes
Latency	P99 inference latency	—	150 ms	Yes
Confusion Analysis	False positive rate	—	5%	No

8.8.2 Quality Gate Report Example

{
  "gate_run_id": "550e8400-e29b-41d4-a716-446655440000",
  "candidate_model_version": "1.2.0",
  "baseline_model_version": "1.1.0",
  "timestamp": "2024-01-25T10:30:00Z",
  "overall_result": "PASSED",
  "gates": [
    {
      "name": "holdout_performance",
      "status": "PASSED",
      "critical": true,
      "metrics": {
        "precision": 0.9842,
        "recall": 0.9678,
        "f1_score": 0.9759
      }
    },
    {
      "name": "no_regression",
      "status": "PASSED",
      "metrics": {
        "max_regression_pct": 0.8,
        "per_metric": {
          "precision": 0.003,
          "recall": -0.008,
          "f1_score": -0.002
        }
      }
    },
    {
      "name": "known_identity_accuracy",
      "status": "PASSED",
      "metrics": {
        "known_identities_tested": 142,
        "perfect_accuracy": 142,
        "accuracy_below_threshold": 0
      }
    },
    {
      "name": "latency_requirement",
      "status": "PASSED",
      "metrics": {
        "p50_latency_ms": 45,
        "p99_latency_ms": 128,
        "threshold_ms": 150
      }
    }
  ]
}

8.8.3 Embedding Update Strategies

After a model passes quality gates and is deployed, the face embedding database must be updated. Five strategies are available:

Strategy	When to Use	Duration	Impact
Centroid Update	Few new examples (<10 per identity), same model	Seconds	Update running mean only
Incremental Add	Many new examples (10-100 per identity), same model	Minutes	Add new embeddings, keep existing
Full Reindex	Model version changed, or >10% of identities updated	Hours	Recompute all embeddings
Merge and Update	Identity merge operation	Seconds	Weighted centroid merge
Rollback Reindex	Model rollback	Minutes	Restore previous embeddings

Decision Matrix:

+-----------------------------------------------------------------------------+
|                    EMBEDDING UPDATE STRATEGY SELECTION                       |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Model changed?                                                              |
|       |                                                                      |
|       +-- YES -> FULL_REINDEX (required, embeddings are model-dependent)     |
|       |                                                                      |
|       NO -> What changed?                                                    |
|               |                                                              |
|               +-- Identity merge -> MERGE_AND_UPDATE                         |
|               |                                                              |
|               +-- Rollback -> ROLLBACK_REINDEX                               |
|               |                                                              |
|               +-- New examples?                                              |
|                       |                                                      |
|                       +-- < 10 per identity, < 10% total -> CENTROID_UPDATE |
|                       |                                                      |
|                       +-- Otherwise -> INCREMENTAL_ADD                       |
|                                                                              |
+-----------------------------------------------------------------------------+

Reference: For complete model export commands, INT8 calibration scripts, performance benchmarks, and the full Python module structure, see ai_vision.md — Sections 10-14. For the complete training pipeline code, Airflow DAG definitions, and quality gate implementations, see training_system.md — Sections 5-10.

Section 9: Suspicious Activity Night-Mode Design

9.1 Overview

The suspicious activity detection system provides comprehensive behavioral analysis during night hours (22:00-06:00 by default) through 10 specialized detection modules. Each module operates on the output of the AI inference pipeline (detected persons, tracked positions, and face identities) to identify anomalous behavior patterns.

The system features a composite scoring engine that combines signals from all modules with exponential time-decay, enabling unified threat assessment and intelligent escalation. Each camera can be independently configured with custom zones, thresholds, and schedules.

9.2 Ten Detection Modules Summary

#	Module	Description	Severity	Key CV Model
1	Intrusion Detection	Detects persons entering restricted polygon zones	HIGH (default)	YOLO11m detections + zone polygon
2	Loitering Detection	Flags persons dwelling in an area longer than threshold	MEDIUM (default)	ByteTrack + timer per track
3	Running Detection	Identifies abnormally fast movement	MEDIUM (default)	YOLOv8n-pose + optical flow speed
4	Crowding Detection	Alerts when group density exceeds threshold	HIGH (default)	DBSCAN spatial clustering
5	Fall Detection	Detects persons falling or collapsing	CRITICAL	YOLOv8n-pose keypoint analysis
6	Abandoned Object	Identifies unattended objects left behind	HIGH (default)	YOLOv8s + MOG2 background subtraction
7	After-Hours Presence	Detects any person presence during night hours	MEDIUM (default)	YOLO11m person class only
8	Zone Breach	Triggers on crossing virtual boundary lines	MEDIUM (default)	ByteTrack + line crossing algorithm
9	Repeated Re-entry	Flags patterns of entering/exiting an area multiple times	MEDIUM (default)	ByteTrack + entry/exit state machine
10	Suspicious Dwell Time	Alerts on extended presence near sensitive areas	MEDIUM (configurable)	ByteTrack + per-zone timers

9.3 Module Details

9.3.1 Module 1: Intrusion Detection

Detects when a person enters a user-defined restricted polygon zone.

Parameter	Default	Range	Description
`confidence_threshold`	0.55	0.3-0.9	Minimum person detection confidence
`overlap_threshold`	0.30	0.1-0.9	Min IoU between person bbox and zone
`cooldown_seconds`	60	0-3600	Cooldown before re-alerting same zone
`zone_severity`	HIGH	LOW/MEDIUM/HIGH	Per-zone configurable

Algorithm:

For each detected person:
    For each restricted zone polygon:
        Compute IoU(person_bbox, zone_polygon)
        If IoU > overlap_threshold AND confidence > confidence_threshold:
            If zone not in cooldown:
                Trigger INTRUSION alert
                Start cooldown timer

9.3.2 Module 2: Loitering Detection

Flags persons who remain in an area longer than a threshold.

Parameter	Default	Range	Description
`dwell_time_threshold_seconds`	300	30-1800	Time before triggering loitering alert
`movement_tolerance_pixels`	50	10-200	Max centroid movement to still count as "stationary"
`cooldown_seconds`	300	0-3600	Cooldown after alert

Algorithm:

For each active track:
    If track centroid moved < tolerance in last N seconds:
        Increment dwell timer
        If dwell_timer > threshold:
            Trigger LOITERING alert
            Reset timer (or hold until movement detected)
    Else:
        Reset dwell timer

9.3.3 Module 3: Running Detection

Identifies abnormally fast movement using pose keypoints and optical flow.

Parameter	Default	Range	Description
`speed_threshold_pixels_per_second`	150	50-500	Pixel speed threshold
`speed_threshold_kmh`	15.0	5-40	Real-world speed (requires calibration)
`confirmation_frames`	3	1-10	Consecutive frames to confirm running

Algorithm:

For each active track:
    Compute torso keypoint displacement between frames
    Convert pixel speed to km/h (if calibration available)
    Apply Farneback optical flow for refinement
    If speed > threshold for confirmation_frames:
        Trigger RUNNING alert

9.3.4 Module 4: Crowding Detection

Alerts when person group density exceeds threshold.

Parameter	Default	Range	Description
`count_threshold`	5	2-50	Minimum person count in cluster
`area_threshold`	0.15	0.05-0.5	Fraction of frame covered by group
`density_threshold`	0.05	0.01-0.2	Persons per square meter (calibrated)
`dbscan_eps`	0.08	0.01-0.3	DBSCAN neighborhood radius (normalized)

Algorithm:

Collect all person centroids in current frame
Run DBSCAN(eps=0.08, min_samples=2) on centroids
For each cluster:
    If cluster_size >= count_threshold OR cluster_area >= area_threshold:
        Trigger CROWDING alert

9.3.5 Module 5: Fall Detection

Detects persons falling or collapsing using pose keypoint analysis.

Parameter	Default	Range	Description
`fall_score_threshold`	0.75	0.5-0.95	Combined fall confidence score
`min_keypoint_confidence`	0.30	0.1-0.5	Minimum keypoint detection confidence
`torso_angle_threshold_deg`	45	30-75	Torso angle from vertical to trigger
`aspect_ratio_threshold`	1.2	0.8-2.0	Width/height ratio of person bbox
`temporal_confirmation_ms`	1000	500-3000	Duration to confirm fall (not just bend)

Algorithm:

For each detected person with pose keypoints:
    Compute torso angle from vertical (using shoulder-hip line)
    Compute bbox aspect ratio
    Check if person is on ground (feet keypoint confidence drops)
    Calculate fall_score = weighted_combination(angle, aspect_ratio, ground_contact)
    If fall_score > threshold AND duration > confirmation_ms:
        Trigger FALL alert (CRITICAL severity)

9.3.6 Module 6: Abandoned Object Detection

Identifies unattended objects using background subtraction and object detection.

Parameter	Default	Range	Description
`unattended_time_threshold_seconds`	60	10-600	Time before object is considered abandoned
`proximity_threshold_pixels`	100	20-300	Max distance from owner before "unattended"
`watchlist_classes`	["backpack", "suitcase", "box", "bag"]	—	Object classes to monitor
`bg_learning_rate`	0.005	0.001-0.01	MOG2 background model learning rate

Algorithm:

Run YOLOv8s to detect objects in watchlist_classes
Run MOG2 background subtraction to identify static foreground
For each detected object:
    Track owner proximity (nearest person)
    If owner distance > threshold AND object stationary > time_threshold:
        Trigger ABANDONED_OBJECT alert

9.3.7 Module 7: After-Hours Presence

Simple but effective: any person detected during night hours triggers an alert.

Parameter	Default	Range	Description
`detection_confidence_threshold`	0.50	0.3-0.9	Minimum person detection confidence
`min_detection_frames`	5	1-30	Frames to confirm (avoid false positives)
`check_authorized_personnel`	false	true/false	If true, check against known persons whitelist

9.3.8 Module 8: Zone Breach

Detects crossing of virtual boundary lines (directional or bidirectional).

Parameter	Default	Range	Description
`boundary_lines`	[] (user-defined)	—	Array of {start, end, direction, severity}
`allowed_direction`	"both"	both/a_to_b/b_to_a	Which direction is allowed
`crossing_threshold_pixels`	20	5-100	Min distance past line to trigger
`cooldown_seconds`	30	0-3600	Cooldown per (track, line) pair

Algorithm:

For each active track:
    For each boundary line:
        Check if track centroid crosses line in forbidden direction
        Using line equation: ax + by + c = 0, check sign change
        If crossed AND distance_past_line > threshold:
            Trigger ZONE_BREACH alert

9.3.9 Module 9: Repeated Re-entry Patterns

Detects suspicious patterns of entering and exiting an area multiple times.

Parameter	Default	Range	Description
`reentry_zone`	Full frame	polygon	Area to monitor for entries/exits
`time_window_seconds`	600	60-3600	Time window for counting cycles
`reentry_threshold`	3	2-10	Min entry/exit cycles to trigger
`min_cycle_duration_seconds`	30	5-300	Min duration of one cycle

State Machine:

For each track:
    Track state: OUTSIDE -> ENTERING -> INSIDE -> EXITING -> OUTSIDE
    Each complete cycle (entry + exit) increments counter
    If cycle_count >= threshold within time_window:
        Trigger REENTRY_PATTERN alert

9.3.10 Module 10: Suspicious Dwell Time

Extended presence near sensitive areas (different from general loitering).

Parameter	Default	Range	Description
`sensitive_zones`	[] (user-defined)	—	Zones with custom dwell thresholds
`default_dwell_threshold_seconds`	120	10-1800	Default threshold
`max_gap_seconds`	5.0	1.0-30.0	Max disappearance gap before timer reset

Predefined zone types with default thresholds:

Zone Type	Default Threshold	Default Severity
`main_entrance`	60s	MEDIUM
`emergency_exit`	30s	HIGH
`equipment_room`	45s	HIGH
`storage_area`	120s	MEDIUM
`elevator_bank`	90s	LOW
`parking_access`	60s	MEDIUM

9.4 Activity Scoring Engine

9.4.1 Composite Score Formula

All 10 modules feed into a unified scoring engine that produces a single suspicious activity score per camera:

S_total(t) = SUM_i( weight_i * signal_i(t) * decay(t - t_i) ) + bonus_cross_module

Where:
    weight_i: module-specific weight (see table below)
    signal_i(t): normalized signal value from module i [0, 1]
    decay(delta_t): exponential time-decay function
    bonus_cross_module: extra score when multiple modules fire simultaneously
    t_i: timestamp of most recent event from module i

9.4.2 Module Weights

Module	Weight	Signal Source	Signal Range
Intrusion Detection	0.25	overlap_ratio * confidence	0.0 - 1.0
Loitering Detection	0.15	dwell_ratio (dwell_time / threshold)	0.0 - 1.0+
Running Detection	0.10	speed_ratio normalized	0.0 - 1.0+
Crowding Detection	0.12	crowd_density_score	0.0 - 1.0
Fall Detection	0.20	fall_confidence_score	0.0 - 1.0
Abandoned Object	0.18	unattended_ratio (duration / threshold)	0.0 - 1.0+
After-Hours Presence	0.05	binary (1 if detected) * zone_severity_multiplier	0.0 - 1.0
Zone Breach	0.12	severity_mapped (LOW=0.3, MED=0.6, HIGH=1.0)	0.0 - 1.0
Re-entry Patterns	0.10	cycle_ratio (count / threshold)	0.0 - 1.0+
Suspicious Dwell	0.13	dwell_ratio (duration / zone_threshold)	0.0 - 1.0+

Note: Weights sum to 1.40 — this is intentional to allow cross-module amplification when multiple modules fire simultaneously.

9.4.3 Time-Decay Function

def time_decay(delta_t_seconds, half_life=300):
    """Exponential decay with 5-minute half-life by default."""
    import math
    return math.exp(-0.693 * delta_t_seconds / half_life)

# Decay reference:
#   0 min -> 1.000 (full contribution)
#   1 min -> 0.871
#   5 min -> 0.500
#  10 min -> 0.250
#  20 min -> 0.063
#  30 min -> 0.016 (effectively zero)

9.4.4 Cross-Module Amplification Bonus

When multiple modules detect simultaneously for the same track or in close proximity:

def compute_cross_module_bonus(active_signals, proximity_weight=0.15):
    n_modules = len(active_signals)
    if n_modules <= 1:
        return 0.0

    # Base bonus: +15% per additional module
    base_bonus = proximity_weight * (n_modules - 1)

    # Track overlap: same person triggering multiple rules -> higher threat
    track_bonus = 0.10 * (n_same_track_signals - 1) if n_same_track_signals >= 2 else 0

    # Zone overlap: multiple signals in same zone -> higher threat
    zone_bonus = 0.08 * (n_same_zone_signals - 1) if n_same_zone_signals >= 2 else 0

    return min(base_bonus + track_bonus + zone_bonus, 0.50)  # Cap at +0.50

9.4.5 Escalation Thresholds

Score Range	Threat Level	Color	Actions
0.00 - 0.20	NONE	Gray	Log only, no alert
0.20 - 0.40	LOW	Blue	Log + dashboard indicator
0.40 - 0.60	MEDIUM	Yellow	Log + non-urgent alert dispatch
0.60 - 0.80	HIGH	Orange	Log + immediate alert + highlight
0.80 - 1.00	CRITICAL	Red	Log + all channels + security dispatch recommendation
> 1.00	EMERGENCY	Purple/Flashing	All channels + automatic escalation to security lead

9.5 Night Mode Scheduler

9.5.1 Automatic Schedule

Parameter	Default	Configurable
Start time	22:00 (10 PM)	Yes, per camera
End time	06:00 (6 AM)	Yes, per camera
Gradual transition	15 minutes	Yes (0-60 min)
Timezone	Local site timezone	Yes
Override	Manual toggle available	Admin only

9.5.2 Gradual Transition

During the 15-minute transition window, sensitivity ramps linearly:

Transition Start (21:45)          Night Full (22:00)         Transition End (22:15)
      |                                  |                           |
      v                                  v                           v
Sensitivity: 0% ---- 25% ---- 50% ---- 75% ---- 100% ---- 100% ---- 100%
              |__________|__________|__________|__________|__________|
                  Ramp up to full night sensitivity over 15 minutes

This prevents sudden spikes in alerts when night mode activates.

9.5.3 Night Mode Behavior Changes

Aspect	Day Mode	Night Mode
Detection modules	Intrusion, Crowding, Fall, Abandoned Object	All 10 modules active
AI Vibe preset	Per-camera setting	Automatically Strict
Confidence threshold	Per-camera setting	+0.10 (stricter)
Scoring engine weights	Standard weights	+25% intrusion, +20% fall
Alert suppression	5-minute cooldown	2-minute cooldown (faster alerts)
After-hours detection	Disabled	Enabled (primary night function)

9.6 Per-Camera Configuration

Each camera has independent configuration for all detection modules:

# Example: Camera 1 - Main Entrance
cam_01:
  enabled: true
  location: "Main Entrance Lobby"
  night_mode:
    enabled: true
    custom_schedule: null        # Use system default (22:00-06:00)
    sensitivity_multiplier: 1.0   # Standard sensitivity

  intrusion_detection:
    enabled: true
    confidence_threshold: 0.65
    overlap_threshold: 0.30
    cooldown_seconds: 30
    restricted_zones:
      - zone_id: "server_room_door"
        polygon: [[0.65,0.20], [0.85,0.20], [0.85,0.60], [0.65,0.60]]
        severity: "HIGH"

  loitering_detection:
    enabled: true
    dwell_time_threshold_seconds: 300
    movement_tolerance_pixels: 50

  running_detection:
    enabled: true
    speed_threshold_pixels_per_second: 150
    confirmation_frames: 3

  fall_detection:
    enabled: true
    fall_score_threshold: 0.75
    temporal_confirmation_ms: 1000

  # ... (all 10 modules configured)

9.7 Alert Generation Logic

9.7.1 Alert Lifecycle

+------------+    +------------+    +------------+    +------------+
|  DETECTED  | -> | SUPPRESSED | -> |  EVIDENCE  | -> | DISPATCHED |
| (Rule fire)|    | (Dedup)    |    | (Capture)  |    | (Send)     |
+------------+    +------------+    +------------+    +------------+
                                                          |
                                                          v
                                                   +------------+
                                                   | ACKNOWLEDGE|
                                                   | or AUTO    |
                                                   +------------+

9.7.2 Suppression Rules

Condition	Action	Reason
Duplicate within suppression window	Log + increment counter	Prevent alert spam
Detection confidence < rule minimum	Log only	Insufficient evidence
Threat score < LOW threshold	Log only	Below alert threshold
Max alerts/hour for camera exceeded	Log + rate-limit flag	Prevent overflow
Composite score indicates low overall threat	Log + dashboard only	Reduce noise

9.7.3 Suppression Configuration

Parameter	Default	Range
Default suppression window	5 minutes	0-60 minutes
Max alerts per hour per camera	20	5-100
Max alerts per hour per rule	10	5-50
Evidence snapshot frames before	5 frames	1-30
Evidence snapshot frames after	10 frames	1-30
Evidence clip duration	10 seconds	5-60

9.7.4 Severity Assignment

Final alert severity considers both the triggering module and the composite score context:

def assign_alert_severity(detection_event, composite_score):
    base_severity = detection_event['severity']  # From module config
    severity_levels = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3, 'CRITICAL': 4}
    base_level = severity_levels.get(base_severity, 2)

    # Escalation: high composite score bumps severity up one level
    if composite_score >= 0.80 and base_level < 3:
        base_level = min(base_level + 1, 4)

    # Escalation: multiple concurrent detections for same track
    if detection_event.get('concurrent_detections_count', 0) >= 2:
        base_level = min(base_level + 1, 4)

    # Zone-specific escalation override
    if detection_event.get('zone_severity_override'):
        zone_level = severity_levels.get(detection_event['zone_severity_override'], base_level)
        base_level = max(base_level, zone_level)

    reverse_levels = {v: k for k, v in severity_levels.items()}
    return reverse_levels.get(base_level, 'MEDIUM')

9.8 Integration with Main AI Pipeline

The suspicious activity service consumes detection events from the main AI pipeline:

+-----------------------------------------------------------------------------+
|               SUSPICIOUS ACTIVITY INTEGRATION WITH MAIN PIPELINE             |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Main AI Pipeline Output:                                                    |
|  { person_id, track_id, bbox, keypoints, face_embedding, timestamp,        |
|    camera_id, confidence, face_crop_path }                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Kafka Topic       | -> | Suspicious Activity| -> | Scoring Engine    |    |
|  | ai.detections     |    | Service            |    | (per camera)      |    |
|  | (JSON events)     |    | - 10 modules       |    | - Composite score |    |
|  +-------------------+    | - Per-camera config|    | - Time decay      |    |
|                           | - Zone polygons    |    | - Cross-module    |    |
|                           +-------------------+    |   bonus           |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|                           +-------------------+    +-------------------+    |
|                           | Alert Manager     | <- | Scoring Output    |    |
|                           | - Deduplicate     |    | - Score [0, 1.5]  |    |
|                           | - Rate limit      |    | - Threat level    |    |
|                           | - Severity assign |    | - Active signals  |    |
|                           +---------+---------+    +-------------------+    |
|                                     |                                        |
|                                     v                                        |
|                           +-------------------+                             |
|                           | Alerts Table (DB) |                             |
|                           | Notification Svc  |                             |
|                           +-------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

Key integration points:

Suspicious Activity Service is a Kafka consumer on the ai.detections topic
Processes events after face recognition (has access to person identity)
Produces alert records to the alerts.critical topic for notification dispatch
Updates the composite score in Redis (with TTL = 2 * half_life) for dashboard real-time display
Stores all alert records in PostgreSQL for history and analytics

Reference: For complete detection algorithm pseudocode, zone configuration YAML schema, scoring engine implementation, and evidence capture logic, see suspicious_activity.md — Sections 2-6.

Section 10: Live Video Streaming Design

10.1 RTSP Stream Configuration for CP PLUS DVR

10.1.1 URL Format

The CP PLUS ORANGE DVR uses a Dahua-compatible RTSP URL scheme:

rtsp://admin:{password}@{dvr_ip}:554/cam/realmonitor?channel={N}&subtype={M}

Where:
    N = channel number (1-8)
    M = stream type (0 = main stream, 1 = sub stream)

Example URLs for all 8 channels:

Channel	Main Stream	Sub Stream
CH1	`rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0`	`rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=1`
CH2	`...channel=2&subtype=0`	`...channel=2&subtype=1`
CH3	`...channel=3&subtype=0`	`...channel=3&subtype=1`
CH4	`...channel=4&subtype=0`	`...channel=4&subtype=1`
CH5	`...channel=5&subtype=0`	`...channel=5&subtype=1`
CH6	`...channel=6&subtype=0`	`...channel=6&subtype=1`
CH7	`...channel=7&subtype=0`	`...channel=7&subtype=1`
CH8	`...channel=8&subtype=0`	`...channel=8&subtype=1`

10.1.2 Stream Properties

Property	Main Stream (subtype=0)	Sub Stream (subtype=1)
Resolution	960 x 1080	352 x 288 to 704 x 576
Frame rate	25 FPS (PAL)	25 FPS
Video codec	H.264 High Profile	H.264 Baseline/Main
Bitrate	~4 Mbps per channel	~1 Mbps per channel
Audio	G.711/AAC (optional)	None
Use case	Fullscreen viewing, evidence clips	AI inference, multi-camera grid

10.1.3 Stream Discovery

The edge gateway can auto-discover streams via ONVIF:

from onvif import ONVIFCamera

camera = ONVIFCamera('192.168.29.200', 80, 'admin', 'password')
media_service = camera.create_media_service()
profiles = media_service.GetProfiles()

for profile in profiles:
    stream_uri = media_service.GetStreamUri({
        'StreamSetup': {'Stream': 'RTP_unicast', 'Transport': 'RTSP'},
        'ProfileToken': profile.token
    })
    print(f"Channel: {profile.token}, URI: {stream_uri.Uri}")

10.2 Edge Gateway Stream Handling

10.2.1 FFmpeg Ingestion Pipeline

The edge gateway runs one FFmpeg process per camera stream:

# Main stream: HLS generation for live viewing
ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 -buffer_size 655360 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c:v copy -c:a copy \
    -f hls -hls_time 2 -hls_list_size 5 -hls_delete_threshold 2 \
    -hls_flags delete_segments+omit_endlist+program_date_time \
    -hls_segment_filename "/data/hls/ch1_%04d.ts" \
    "/data/hls/ch1.m3u8" \
    2>> /var/log/ffmpeg_ch1.log

10.2.2 Stream Health Monitoring

Check	Frequency	Failure Action
FFmpeg process alive	Every 5s	Restart process
RTSP connection health	Every 10s	Reconnect with backoff
Frame rate validation	Every 30s	Alert if FPS < 20
Bitrate validation	Every 30s	Alert if bitrate < 50% expected
Disk space check	Every 60s	Alert if < 10% free, emergency if < 5%

10.2.3 Auto-Reconnect Logic

class StreamReconnectManager:
    """Handles RTSP stream reconnection with exponential backoff."""

    INITIAL_BACKOFF = 1.0       # seconds
    MAX_BACKOFF = 60.0          # seconds
    BACKOFF_MULTIPLIER = 2.0
    JITTER = 0.1                # 10% random jitter

    def __init__(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def on_disconnect(self):
        self.consecutive_failures += 1
        wait_time = min(
            self.current_backoff * (self.BACKOFF_MULTIPLIER ** self.consecutive_failures),
            self.MAX_BACKOFF
        )
        # Add jitter to prevent thundering herd
        wait_time *= (1 + random.uniform(-self.JITTER, self.JITTER))
        return wait_time

    def on_success(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def should_circuit_break(self):
        return self.consecutive_failures >= 5  # Open circuit after 5 failures

10.3 HLS Generation for Dashboard

10.3.1 HLS Segment Configuration

Parameter	Value	Rationale
Segment duration (`-hls_time`)	2 seconds	Balance between latency and segment count
Playlist size (`-hls_list_size`)	5 segments	10-second sliding window for live playback
Delete threshold	2 segments beyond playlist size	Disk cleanup
Flags	`delete_segments+omit_endlist+program_date_time`	Live mode, no end list, accurate timing
Segment naming	`ch{N}_%04d.ts`	Sequential numbering for cache busting
Segment path	`/data/hls/`	Fast NVMe storage

10.3.2 Multi-Bitrate HLS (Optional)

For adaptive bitrate streaming, three variants are generated per channel:

# High quality (main stream, copy codec)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v copy -f hls -hls_time 2 \
    -hls_playlist_type vod -hls_segment_filename "ch1_high_%04d.ts" "ch1_high.m3u8"

# Medium quality (transcoded)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v libx264 -preset fast -crf 23 \
    -vf "scale=640:480" -f hls -hls_time 2 \
    -hls_segment_filename "ch1_mid_%04d.ts" "ch1_mid.m3u8"

# Low quality (sub stream)
ffmpeg -i "rtsp://...channel=1&subtype=1" -c:v copy -f hls -hls_time 2 \
    -hls_segment_filename "ch1_low_%04d.ts" "ch1_low.m3u8"

Master playlist:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=960x1080
ch1_high.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=640x480
ch1_mid.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=352x288
ch1_low.m3u8

10.3.3 HLS Latency Budget

Stage	Latency
DVR encoding	50-100 ms
RTSP to edge	1-2 ms
FFmpeg demux/remux	20-50 ms
HLS segment duration	2000 ms (2-second segments)
Nginx/CDN delivery	10-50 ms
HLS.js buffer	2000-4000 ms (1-2 segments)
Browser decode + render	20-50 ms
Total (camera to eye)	~2.1 - 2.3 seconds

10.4 WebRTC for Low-Latency Single Camera

For single-camera fullscreen viewing where low latency is critical, WebRTC provides sub-second delivery.

10.4.1 WebRTC Architecture

+------------+    +-------------------+    +-------------------+    +--------+
| Browser    |    | Edge Gateway      |    | FFmpeg            |    | DVR    |
| (WebRTC    |<-->| (WHIP/WHEP        |<-->| (decode RTSP,     |<-->| RTSP   |
|  client)   |    |  bridge)          |    |  encode VP8/H.264)|    | Server |
+------------+    +-------------------+    +-------------------+    +--------+

10.4.2 WebRTC Configuration

Parameter	Value
Signaling protocol	WHIP ( ingress) / WHEP (egress)
Video codec	H.264 (hardware) or VP8 (software)
Latency target	< 500 ms end-to-end
ICE servers	STUN only (both peers behind NAT)
Max bitrate	3 Mbps
Resolution	960x1080 (main stream)

10.4.3 WebRTC Latency Budget

Stage	Latency
DVR encoding	50-100 ms
RTSP to edge	1-2 ms
FFmpeg decode + WebRTC encode	30-80 ms
Network (edge to browser via VPN)	100-200 ms
Browser decode	20-50 ms
Total	~200-430 ms

10.5 Multi-Camera Grid Layout

10.5.1 Layout Configurations

Layout	Cameras	Stream Used	Per-Camera Resolution	Total Bandwidth
1x1 (fullscreen)	1	Main (subtype=0)	960x1080	~4 Mbps
2x2 grid	4	Sub (subtype=1)	352x288	~4 Mbps total
3x3 grid	8+1 empty	Sub (subtype=1)	352x288	~8 Mbps total
4x2 grid	8	Sub (subtype=1)	352x288	~8 Mbps total
Custom	User-defined	Mixed	Mixed	Sum of selected

Smart stream selection: The dashboard automatically switches streams based on layout:

Fullscreen single camera -> Main stream (high quality)
Grid layout -> Sub stream (bandwidth-efficient)
Camera clicked for fullscreen -> Dynamically switch to main stream

10.5.2 Grid Rendering

+-----------------------------------------------------------------------------+
|                         DASHBOARD GRID LAYOUTS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|  1x1 Layout:                         2x2 Layout:                            |
|  +------------------------+          +----------+----------+                 |
|  |                        |          | CH1      | CH2      |                 |
|  |   Camera 1             |          | (sub)    | (sub)    |                 |
|  |   Main stream          |          |          |          |                 |
|  |   960x1080             |          +----------+----------+                 |
|  |   ~4 Mbps              |          | CH3      | CH4      |                 |
|  +------------------------+          | (sub)    | (sub)    |                 |
|                                      |          |          |                 |
|                                      +----------+----------+                 |
|                                                                              |
|  3x3 Layout (8 cameras):                                                     |
|  +----------+----------+----------+                                          |
|  | CH1      | CH2      | CH3      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH4      | CH5      | CH6      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH7      | CH8      | [Empty]  |                                          |
|  | (sub)    | (sub)    |          |                                          |
|  +----------+----------+----------+                                          |
|                                                                              |
|  Bandwidth: ~8 Mbps total for 3x3 layout (8 x ~1 Mbps sub streams)          |
|                                                                              |
+-----------------------------------------------------------------------------+

10.6 Bandwidth Optimization

10.6.1 Total Bandwidth Budget

Traffic Type	Direction	Bandwidth	Notes
8x RTSP ingestion	Edge -> DVR (local)	~32 Mbps receive	Local LAN only
8x HLS upload to cloud	Edge -> Cloud (via VPN)	~8-16 Mbps upload	Transcoded and compressed
AI frames to cloud	Edge -> Cloud (via VPN)	~2-4 Mbps upload	1 FPS, JPEG compressed
Dashboard HLS playback	Cloud -> Browser	~8 Mbps per user	Cached at CDN
Control/management	Bidirectional	< 1 Mbps	WebSocket, API calls
Total edge upload		~10-20 Mbps	Primary concern for site bandwidth

10.6.2 Optimization Techniques

Technique	Savings	Implementation
Sub-stream for grid view	75% bandwidth reduction	Use subtype=1 (352x288) instead of subtype=0 (960x1080)
H.264 copy (no re-encode) for main stream	Zero CPU overhead	`-c:v copy` when no format change needed
JPEG quality tuning for AI frames	50-70% size reduction	Quality 70-85 depending on scene complexity
Frame deduplication for AI	10-30% frame reduction	Skip frames with < 2% pixel change
HLS segment caching at edge	Reduces cloud upload spikes	5-segment buffer smooths burstiness
Gzip compression for API/WebSocket	60-80% reduction	Content-Encoding: gzip

10.7 Fallback Handling

10.7.1 Stream Failure Fallback Chain

Step 1: RTSP connection fails
    +-> Retry with exponential backoff (3 attempts)
    +-> Try UDP transport if TCP fails
    +-> Circuit breaker opens after 5 consecutive failures
    |
Step 2: Stream stall detected (no frames for 10s)
    +-> Kill FFmpeg process
    +-> Restart with fresh connection
    |
Step 3: Camera marked OFFLINE
    +-> Dashboard shows "Camera Offline" placeholder
    +-> HLS playlist returns 404
    +-> Last known frame displayed with timestamp overlay
    +-> Alert sent to operations team
    |
Step 4: Camera recovers
    +-> Circuit breaker transitions to HALF_OPEN
    +-> Test stream pulled for 10 seconds
    +-> On success: circuit CLOSED, stream resumes
    +-> Dashboard auto-refreshes

10.7.2 Offline Placeholder

When a camera is offline, the HLS endpoint returns a static playlist:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ERROR: "Camera OFFLINE - Channel 1"
#EXTINF:2.000,
offline_placeholder.ts

The dashboard detects the #EXT-X-ERROR tag and displays a camera offline indicator with the last known timestamp.

10.7.3 Edge Buffer Management

The 2TB NVMe edge storage is partitioned for circular buffer operation:

Directory	Max Size	Retention	Cleanup
`/data/hls/`	20 GB	Rolling (5 segments)	Automatic via FFmpeg
`/data/buffer/ch1-ch8/`	1.5 TB	7 days circular	Age-based FIFO
`/data/buffer/ai_frames/`	100 GB	24 hours	Age-based
`/data/buffer/evidence/`	200 GB	30 days	Event-linked retention
`/data/logs/`	10 GB	30 days	Logrotate
`/data/tmp/`	50 GB	On process exit	Cleanup on restart
Total reserved	~1.88 TB	—	Fits in 2TB NVMe

Buffer exhaustion handling:

At 80% capacity: Alert admin, begin aggressive cleanup of old non-evidence data
At 90% capacity: Stop non-critical buffering (AI frames), preserve HLS + evidence only
At 95% capacity: Emergency mode — evidence-only recording, all other buffers purged
Never delete evidence clips linked to unresolved alerts

10.7.4 DVR Full Disk Mitigation

Since the DVR disk is full (0 bytes free), the system does not rely on DVR-side recording:

Function	Traditional	Our Design
Continuous recording	DVR internal HDD	Edge gateway 2TB NVMe buffer
Event/alert clips	DVR playback export	Cloud MinIO + S3 archival
Long-term storage	DVR disk rotation	AWS S3 tiered lifecycle
Playback	DVR web UI	Cloud dashboard with timeline

Reference: For complete FFmpeg commands including multi-output tee muxer, frame extraction for AI, WebRTC bridge code, and the ring buffer implementation, see video_ingestion.md — Sections 4-7.

End of Part A (Sections 1-10)

This unified technical blueprint synthesizes outputs from 11 specialist agents across 6 domain-specific design documents. For detailed implementation code, DDL, algorithms, and configuration, refer to the individual specialist documents listed in the cross-reference guide at the top of this document.

Document	Path	Content
Architecture	`architecture.md`	Full deployment specs, scaling, cost, failover
Video Ingestion	`video_ingestion.md`	RTSP config, FFmpeg, edge gateway, HLS, WebRTC
AI Vision	`ai_vision.md`	Model configs, inference pipeline, benchmarks
Database Schema	`database_schema.md`	Complete DDL, triggers, views, RLS
Suspicious Activity	`suspicious_activity.md`	10 detection modules, scoring engine
Training System	`training_system.md`	Learning pipeline, quality gates, versioning

Blueprint Part A

AI-Powered Industrial Surveillance Platform

Unified Technical Blueprint — Part A: Sections 1-10

Table of Contents

Section 1: Executive Summary

1.1 Project Objective

1.2 Key Capabilities

1.3 Architecture Approach

1.4 Target Environment

1.5 Key Differentiators

1.6 Production Readiness Assessment

Section 2: Kimi Swarm Team and Agent Responsibilities

2.1 Agent Responsibility Matrix

2.2 Agent Interaction Flow

2.3 Cross-Agent Design Consistency

Section 3: Assumptions

3.1 Network and Hardware Assumptions

3.2 DVR Capabilities Assumptions

3.3 Environmental Assumptions

3.4 Operational Assumptions

3.5 Security Assumptions

3.6 AI Performance Assumptions

Section 4: Full Architecture

4.1 High-Level System Architecture

4.2 Service Interaction Diagram

4.3 Network Security Zones

4.4 Service Descriptions

4.5 Physical Edge Gateway Specification

4.6 Cloud Infrastructure Specification

4.7 Scaling Approach

4.8 Failover and Reliability Design

Section 5: Data Flow from DVR to Cloud to Dashboard

5.1 Overview: Seven Data Flows

5.2 Flow 1: Camera to DVR to Edge Gateway

5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud

5.4 Flow 3: Stream Ingestion to AI Inference

5.5 Flow 4: AI Inference to Events to Database

5.6 Flow 5: Events to Alerts to Notifications

5.7 Flow 6: Live Streams to Browser Dashboard

5.8 Flow 7: Training Feedback Loop

Section 6: Recommended Tech Stack

6.1 Technology Selection Matrix

6.2 Hardware Requirements

Edge Gateway (Per Site)

Cloud GPU Nodes (AI Inference)

6.3 Software Versions Summary

6.4 Port Reference

Section 7: Database Schema

7.1 Schema Overview

7.2 Entity Relationship Overview

7.3 Core Tables Summary

7.3.1 Site and Infrastructure Tables

7.3.2 AI Detection and Identity Tables

7.3.3 Alert and Notification Tables

7.3.4 Watchlist and Access Control Tables

7.3.5 Training and ML Pipeline Tables

7.3.6 Media and Storage Tables

7.3.7 Audit and Monitoring Tables (Partitioned)

7.4 Indexing Strategy

7.4.1 pgvector HNSW Index (Critical Path)

7.4.2 B-Tree Indexes (Standard Queries)

7.5 Partitioning Strategy

7.6 Retention Policies

7.7 Security Considerations

7.7.1 Credential Encryption

7.7.2 Row-Level Security (RLS)

7.7.3 Access Control

7.7.4 Audit Trail

7.7.5 Backup Strategy

Section 8: AI Model and Training Strategy

8.1 AI Model Selection

8.1.1 Human Detection: YOLO11m

8.1.2 Face Detection: SCRFD-500M-BNKPS

8.1.3 Face Recognition: ArcFace R100 (IR-SE100)

8.2 Inference Pipeline Architecture

8.3 Face Recognition Matching Strategy

8.3.1 Known Person Matching

8.3.2 AI Vibe Threshold Mapping

8.4 Unknown Person Clustering Approach

8.4.1 Clustering Pipeline