Blueprint Part A

Blueprint Part A

Architecture, data flow, AI, database, and streaming strategy.

AI-Powered Industrial Surveillance Platform

Unified Technical Blueprint — Part A: Sections 1-10

Document Property Value
Version 1.0.0
Classification Technical Blueprint — Production Design
Target DVR CP PLUS ORANGE CP-UVR-0801E1-CV2
Channels 8 active (scalable to 64+)
Resolution 960 x 1080 per channel
DVR Network 192.168.29.200/24, RTSP port 554
Date 2025

Cross-Reference Guide: This unified blueprint synthesizes six specialist design documents. For detailed specifications on any subsystem, refer to:

  • architecture.md — Full architecture, scaling, failover, cost estimation
  • video_ingestion.md — RTSP configuration, FFmpeg commands, edge gateway specs
  • ai_vision.md — Model configurations, inference code, benchmarks
  • database_schema.md — Complete DDL, triggers, views, RLS policies
  • suspicious_activity.md — Detection algorithms, scoring engine pseudocode
  • training_system.md — Training pipelines, quality gates, versioning logic

Table of Contents


Section 1: Executive Summary

1.1 Project Objective

This blueprint defines the complete technical design for an AI-powered industrial surveillance platform that transforms a legacy CP PLUS 8-channel DVR system into a modern, intelligent security operations center. The platform processes real-time video from 8 camera channels, applies state-of-the-art computer vision and face recognition AI, detects suspicious activity during night hours, and provides a unified dashboard for security operators — all while maintaining the highest standards of reliability, security, and data privacy.

The system is designed around a cloud+edge hybrid architecture where all compute-intensive AI inference runs in the cloud (AWS Mumbai), while a local edge gateway handles stream ingestion, buffering, and site-local concerns. A WireGuard VPN tunnel protects all communication between edge and cloud, ensuring the DVR has zero public internet exposure.

1.2 Key Capabilities

Capability Description Technology
Human Detection Real-time person detection across all 8 channels at 15-20 FPS YOLO11m + TensorRT FP16, 640x640
Face Detection Accurate face localization with 5-point landmarks for alignment SCRFD-500M-BNKPS, 640x640
Face Recognition 512-D embedding extraction with 99.83% LFW accuracy ArcFace R100 IR-SE100 (MS1MV3)
Person Tracking Persistent identity tracking across frames with occlusion recovery ByteTrack (Kalman + IoU), 80.3% MOTA
Unknown Clustering Automatic grouping of unknown faces for operator review HDBSCAN + DBSCAN fallback, 89.5% purity
Night Mode Surveillance 10-detection-module suspicious activity analysis (22:00-06:00) Composite scoring engine with time-decay
AI Vibe Controls Three intuitive presets (Relaxed/Balanced/Strict) mapping to 4 confidence levels Dynamic threshold adjustment
Safe Self-Learning Three-mode training system with conflict detection and approval workflows MLflow + Airflow + Manual Review
24/7 Reliability Graceful degradation: video never stops, AI catch-up on recovery Tiered storage + circuit breakers + replay
Real-Time Alerts 6-level escalation (NONE to EMERGENCY) with multi-channel notifications Telegram, WhatsApp, Email, Webhook
Live Dashboard Multi-camera grid with HLS streaming and single-camera low-latency WebRTC Next.js 14 + HLS.js + WebRTC

1.3 Architecture Approach

The platform follows a cloud+edge+VPN hybrid pattern with five network security zones:

Cameras (8ch) --> DVR (local) --> Edge Gateway (local) --> WireGuard VPN --> AWS Cloud (EKS)
                                      |                        |
                                      | 2TB NVMe buffer         | Encrypted tunnel
                                      | 7-day ring buffer       | UDP 51820
                                      | FFmpeg ingestion        | ChaCha20-Poly1305

Key architectural decisions:

Decision Choice Rationale
Cloud Provider AWS ap-south-1 (Mumbai) Lowest latency to India, mature managed services
Container Orchestration Amazon EKS + K3s edge Managed control plane, GPU node support, lightweight edge
VPN WireGuard ~60% faster than OpenVPN, modern crypto, simple setup
Message Queue Apache Kafka (MSK) Durable ordered log, replay capability, proven at scale
AI Inference NVIDIA Triton + TensorRT GPU-optimized, dynamic batching, model ensemble
Database PostgreSQL 16 + pgvector ACID compliance, native 512-D vector support
Object Storage MinIO (edge+cloud) + S3 (archive) S3-compatible API, tiered cost optimization

1.4 Target Environment

The platform targets a CP PLUS ORANGE CP-UVR-0801E1-CV2 DVR with the following characteristics:

Property Value Impact on Design
Brand/Model CP PLUS ORANGE CP-UVR-0801E1-CV2 Dahua-compatible RTSP URL scheme
Channels 8 active Initial deployment scope
Resolution 960 x 1080 per channel AI input: letterbox to 640x640
LAN IP 192.168.29.200/24 Edge gateway on same subnet
RTSP Port 554 TCP interleaved mandatory
ONVIF V2.6.1.867657 (Server V19.06) Auto-discovery supported
DVR Disk FULL (0 bytes free) All archival is edge-managed; no DVR recording
VPN Access WireGuard-secured No public exposure; all traffic encrypted

Critical Design Impact: The DVR disk being full means the system cannot rely on DVR-side recording or playback features. All archival storage is managed by the edge gateway's 2TB NVMe buffer and cloud tiering.

1.5 Key Differentiators

1. AI Vibe Controls Instead of exposing complex threshold parameters to operators, the system provides three intuitive "vibe" presets — Relaxed, Balanced, and Strict — that internally map to optimized configurations for detection sensitivity and face match strictness. This innovation makes the system accessible to non-technical security staff while maintaining AI precision.

2. Safe Self-Learning Training System The platform captures operator corrections (confirmations, corrections, merges, rejections) and feeds them back into model improvement through a carefully designed three-mode learning pipeline: Manual Only, Suggested Learning (recommended), and Approved Auto-Update. A synchronous conflict detector blocks five types of label conflicts before they reach the training dataset, ensuring model integrity.

3. 24/7 Reliability with Graceful Degradation The system is architected around a single priority: video recording never stops. If the AI inference service fails, recording continues locally with queued catch-up processing on recovery. If the VPN tunnel fails, the edge gateway maintains 7 days of local buffer. If the cloud database fails, alerts accumulate in Kafka's durable log. Every failure mode has a defined degradation strategy.

4. 10-Module Night Surveillance The suspicious activity detection system goes beyond simple motion detection to provide comprehensive behavioral analysis through 10 specialized detection modules — from intrusion and loitering to abandoned objects and repeated re-entry patterns — all combined through a composite scoring engine with exponential time-decay.

1.6 Production Readiness Assessment

Dimension Status Notes
Architecture Completeness Production-Ready All 12 services fully specified with resource allocations
AI Model Selection Production-Ready Industry-standard models with published benchmarks
Database Design Production-Ready 29 tables, 4 views, 8 triggers, partitioning, RLS
Security Architecture Production-Ready 7-layer defense in depth, encrypted credentials, VPN-only
Scaling Path Defined 8 -> 16 -> 32 -> 64+ cameras with concrete resource allocations
Failover Design Production-Ready Graceful degradation matrix for all failure modes
Estimated Timeline 14 weeks 4 implementation phases defined
Estimated Monthly Cost ~$2,140 USD 8-camera deployment at steady state

Section 2: Kimi Swarm Team and Agent Responsibilities

The unified blueprint was synthesized from the outputs of 11 specialist agents, each responsible for a specific domain of the platform design.

2.1 Agent Responsibility Matrix

# Agent Responsibility Key Deliverables
1 Requirements Analyst Elicited and structured all functional/non-functional requirements Requirements traceability matrix, user stories, acceptance criteria
2 System Architect Designed overall cloud+edge+VPN topology and service interactions Deployment topology, 5 security zones, scaling roadmap, failover matrix
3 Video Ingestion Engineer Specified RTSP configuration, edge gateway, and stream processing RTSP URL patterns, FFmpeg commands, auto-reconnect logic, HLS generation
4 AI Vision Scientist Selected and configured all CV/AI models for the inference pipeline Model selection table, inference pipeline architecture, confidence handling
5 Database Architect Designed complete data model with partitioning, indexing, and security 29 tables + 4 views + 8 triggers, pgvector HNSW index, RLS policies
6 Suspicious Activity Designer Designed 10 detection modules and composite scoring engine Detection algorithms, scoring formula, YAML configuration schema
7 Training System Engineer Designed self-learning pipeline with safety controls 3 learning modes, conflict detection, quality gates, versioning
8 Frontend Developer Designed Next.js dashboard with real-time video and alerts Component architecture, HLS.js integration, WebSocket alerts
9 DevOps Engineer Specified CI/CD, monitoring, and infrastructure-as-code GitHub Actions + ArgoCD, Prometheus/Grafana, alerting rules
10 Security Architect Designed defense-in-depth security across all layers 7 security layers, secret management, encryption standards
11 Technical Writer (this document) Synthesized all specialist outputs into unified blueprint 10-section unified document with cross-references

2.2 Agent Interaction Flow

+-----------------------------------------------------------------------------+
|                         KIMI SWARM TEAM ORCHESTRATION                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|   Requirements Analyst                                                       |
|        |                                                                     |
|        v                                                                     |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | System  |<-->| Video   |<-->| AI      |<-->| Database|                  |
|   |Architect|    |Ingestion|    |Vision   |    |Architect|                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^                                              |                      |
|        |           +---------+    +---------+        |                      |
|        +---------->|Suspicious|<-->|Training |<-------+                      |
|                    |Activity  |    |System   |                               |
|                    |Designer  |    |Engineer |                               |
|                    +---------+    +---------+                               |
|                        |                                              |
|                        v                                              |
|                   +---------+    +---------+    +---------+           |
|                   |Frontend |    |DevOps   |    |Security |           |
|                   |Developer|    |Engineer |    |Architect|           |
|                   +---------+    +---------+    +---------+           |
|                        |                                              |
|                        v                                              |
|                   +---------------------+                             |
|                   | Technical Writer    |                             |
|                   | (Unified Blueprint) |                             |
|                   +---------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

2.3 Cross-Agent Design Consistency

The following cross-cutting concerns were harmonized across all agent outputs during synthesis:

Concern Resolution Agents Coordinated
Video latency budget < 100ms end-to-end (AI); ~35-65s (HLS live) Video Ingestion, AI Vision, Frontend
Face embedding storage 512-D float32, pgvector HNSW index, cosine similarity Database, AI Vision, Training
Event data retention 90 days hot (MinIO), 1 year cold (Glacier), 7 days edge Database, Architecture, Video Ingestion
Alert escalation 6 levels: NONE -> LOW -> MEDIUM -> HIGH -> CRITICAL -> EMERGENCY Suspicious Activity, Database, Frontend
Model versioning Semantic MAJOR.MINOR.PATCH with MLflow registry Training, AI Vision, Architecture
Graceful degradation Video never stops; AI catch-up on recovery Architecture, Video Ingestion, AI Vision
Security zones 5 zones: Internet -> ALB -> Application -> Data -> Edge Architecture, Security, Video Ingestion

Section 3: Assumptions

All assumptions made across the specialist designs are consolidated below. These should be validated before implementation begins.

3.1 Network and Hardware Assumptions

ID Assumption Validation Method Risk if Invalid
NW-01 Edge gateway has dual Ethernet: one for local DVR subnet (192.168.29.0/24), one for internet/VPN Physical site survey Cannot bridge DVR to VPN
NW-02 Site internet bandwidth >= 16 Mbps sustained upload for 8 channels ISP speed test Video drops, AI delays
NW-03 WireGuard UDP port 51820 is not blocked by site firewall Firewall rule check VPN cannot establish
NW-04 DVR RTSP server supports TCP interleaved transport (rtsp_transport tcp) FFmpeg test probe UDP fallback has packet loss
NW-05 DVR supports 16+ concurrent RTSP sessions (8 channels x 2 streams) Session stress test Stream contention
NW-06 MTU 1400 is viable through site NAT/firewall for WireGuard tunnel Ping with DF bit test Fragmentation issues
HW-01 Intel NUC 13 Pro (i5-1340P, 16GB RAM, 512GB NVMe) is available for edge gateway Hardware procurement May need Jetson Orin alternative
HW-02 Edge gateway has UPS backup for graceful shutdown on power loss Electrical survey Data corruption on hard power-off
HW-03 AWS g4dn.xlarge (T4 GPU) instances are available in ap-south-1 AWS EC2 capacity check Need alternative GPU instance

3.2 DVR Capabilities Assumptions

ID Assumption Validation Method Risk if Invalid
DVR-01 DVR RTSP streams are accessible at rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M FFmpeg connectivity test Need alternative URL format
DVR-02 DVR continues serving RTSP streams even with disk full (0 bytes free) 24-hour stream stability test Streams may stall
DVR-03 DVR sub-stream (subtype=1) provides sufficient quality for AI inference (typically 352x288 to 704x576) Frame quality inspection May need main stream for AI
DVR-04 DVR ONVIF server supports device discovery and stream URI retrieval ONVIF Device Manager test Manual camera configuration needed
DVR-05 DVR channel numbering is 1-indexed (1-8) ONVIF profile enumeration Off-by-one errors in configuration
DVR-06 DVR Digest authentication works with the provided credentials RTSP DESCRIBE request test May need Basic auth or different scheme

3.3 Environmental Assumptions

ID Assumption Impact if Invalid
ENV-01 Cameras provide adequate lighting for face recognition during night hours (minimum 10 lux at face distance) Face recognition accuracy degrades; may need IR illumination
ENV-02 Camera angles allow frontal face capture at entry/exit points (yaw < 45 degrees) Face recognition miss rate increases
ENV-03 Indoor industrial environment with minimal weather interference False positive rate from rain/shadows is low
ENV-04 Maximum person-to-camera distance is within 10 meters for face recognition Faces may be too small (< 20px) for reliable detection
ENV-05 Camera positions are stable (no PTZ movement during normal operation) Zone calibration remains valid

3.4 Operational Assumptions

ID Assumption Impact if Invalid
OPS-01 Security operators will review unknown face clusters and provide identity labels daily Unknown person database grows without enrichment
OPS-02 Admin will review training suggestions at least weekly in "Suggested Learning" mode Training queue backlog accumulates
OPS-03 Site has authorized personnel who can access edge gateway for maintenance (SSH, physical) Remote troubleshooting limited
OPS-04 Alert fatigue is a genuine concern — false positive rate > 20% leads to ignored alerts AI vibe controls and suppression tuned accordingly
OPS-05 Incident video review requires 10-second pre-event and 30-second post-event clips Clip configuration fixed

3.5 Security Assumptions

ID Assumption Impact if Invalid
SEC-01 WireGuard encryption (ChaCha20-Poly1305) meets organizational security requirements May need additional encryption layer
SEC-02 AWS VPC with private subnets satisfies data residency requirements for India Compliance review needed
SEC-03 Face embeddings (512-D vectors) do not constitute PII under applicable regulations Legal review needed for biometric data handling
SEC-04 Edge gateway physical security is equivalent to server room security Tampering risk if edge is physically accessible
SEC-05 DVR credentials can be stored encrypted (AES-256) in cloud database Key management infrastructure required

3.6 AI Performance Assumptions

ID Assumption Impact if Invalid
AI-01 YOLO11m TensorRT FP16 achieves > 75% person AP@50 on surveillance footage May need fine-tuning on site-specific data
AI-02 ArcFace R100 achieves > 98% Rank-1 accuracy on enrolled persons with 5+ reference images Enrollment quality gates ensure minimum samples
AI-03 HDBSCAN achieves > 89% cluster purity on 512-D face embeddings from this camera setup Fallback to DBSCAN if density varies too much
AI-04 ByteTrack maintains < 2 ID switches per 100 frames in industrial environment with occlusion May need BoT-SORT upgrade for complex scenes
AI-05 GPU (T4) can sustain 15-20 FPS processing per stream across 8 streams with batching CPU fallback at 5-8 FPS if GPU unavailable

Section 4: Full Architecture

4.1 High-Level System Architecture

The platform employs a cloud+edge hybrid architecture with five network security zones. Video streams are ingested at the edge, processed by AI in the cloud, and presented through a web-based dashboard. A WireGuard VPN tunnel provides encrypted, zero-exposure connectivity between edge and cloud.

+=============================================================================+
|                         CLOUD+EDGE+VPN ARCHITECTURE                          |
+=============================================================================+
|                                                                              |
|   ZONE 0: INTERNET (UNTRUSTED)                                               |
|   +---------------------+                                                    |
|   |  Users / Browsers   |                                                    |
|   |  HTTPS :443         |                                                    |
|   +----------+----------+                                                    |
|              |                                                               |
|              v                                                               |
|   ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                       |
|   +--------------------------------------------------------------+          |
|   |  AWS ALB (:443) + WAF v2 + Rate Limit + Geo-Restriction      |          |
|   |       |                                                      |          |
|   |       v                                                      |          |
|   |  Traefik Ingress Controller (:8443)                          |          |
|   |  - Route: /api/*  -> Backend Service                         |          |
|   |  - Route: /ws/*   -> WebSocket Handler                       |          |
|   |  - Route: /       -> Next.js Web App                         |          |
|   |  - TLS: Let's Encrypt auto certificates                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 2: AWS VPC APPLICATION (TRUSTED)                                      |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Stream      |  | AI Inference|  | Suspicious Activity |   |          |
|   |  | Ingestion   |  | Service     |  | Service (Night Mode)|   |          |
|   |  | (Go/FFmpeg) |  | (Triton)    |  | (Go/Python)         |   |          |
|   |  | :8081       |  | :8001 gRPC  |  | :8083               |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Backend API |  | Training    |  | Notification        |   |          |
|   |  | (Go/Gin)    |  | Service     |  | Service             |   |          |
|   |  | :8080       |  | (PyTorch)   |  | (Go)                |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +--------------------+                                      |          |
|   |  | Web Frontend       |  HLS Playback Service               |          |
|   |  | (Next.js 14 :3000) |  (Go :8085)                         |          |
|   |  +--------------------+                                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                                   |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  | PostgreSQL  |  | Redis       |  | Kafka       |           |          |
|   |  | 16 (RDS)    |  | 7 Cluster   |  | (MSK)       |           |          |
|   |  | :5432       |  | :6379       |  | :9092       |           |          |
|   |  | pgvector    |  | Pub/Sub     |  | 3 brokers   |           |          |
|   |  | HNSW index  |  | Streams     |  | 3 AZs       |           |          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   |  | MinIO       |  | S3 (Cold Archive)                 |      |          |
|   |  | (S3-compat) |  | - Standard (30d)                  |      |          |
|   |  | :9000       |  | - IA (31-90d)                     |      |          |
|   |  | 10 TB       |  | - Glacier Deep Archive (90d+)     |      |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              | WireGuard VPN Tunnel (UDP 51820)                                |
|              | ChaCha20-Poly1305 encryption                                    |
|              | Cloud peer: 10.200.0.1/32 <-> Edge peer: 10.200.0.2/32         |
|              v                                                               |
|   ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                                 |
|   +--------------------------------------------------------------+          |
|   |  +--------------------------------------------------------+  |          |
|   |  |              EDGE GATEWAY (Intel NUC)                  |  |          |
|   |  |  Ubuntu 22.04 LTS | K3s v1.28+ | 2TB NVMe             |  |          |
|   |  |                                                          |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Stream Manager  |  | HLS Segmenter   |                |  |          |
|   |  |  | (Python/asyncio)|  | (FFmpeg/nginx)  |                |  |          |
|   |  |  | 8x RTSP feeds   |  | 2s segments     |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Frame Extractor |  | Buffer Manager  |                |  |          |
|   |  |  | (AI decimation) |  | (20GB ring buf) |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  |  | VPN Client (WireGuard)  |  Health Monitor         |    |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  +--------------------------------------------------------+  |          |
|   |                            |                                             |
|   |   Local Network (192.168.29.0/24)                                       |
|   |   +------------------+    +------------------+                           |
|   |   | CP PLUS DVR      |    | Local Monitor    |                           |
|   |   | 192.168.29.200   |    | 192.168.29.10    |                           |
|   |   | 8ch | RTSP :554  |    | (optional)       |                           |
|   |   +------------------+    +------------------+                           |
|   |   CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8                                      |
|   +--------------------------------------------------------------+          |
|                                                                              |
+=============================================================================+

4.2 Service Interaction Diagram

+-----------------------------------------------------------------------------+
|                           SERVICE INTERACTIONS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|   INTERNET USERS                                                             |
|        |                                                                     |
|        | HTTPS :443                                                          |
|        v                                                                     |
|   +---------+      +----------+      +----------+                           |
|   | AWS ALB |----->| Traefik  |----->| Next.js  |  Web Frontend             |
|   | +WAF    |      | Ingress  |      | (SSR)    |  Dashboard                |
|   +---------+      +----------+      +----+-----+                           |
|                                             |                                |
|                        +--------------------+--------------------+           |
|                        |                    |                    |           |
|                        v                    v                    v           |
|                   +---------+       +------------+      +----------+       |
|                   |Backend  |       | WebSocket  |      | HLS      |       |
|                   |API (Go) |       | Handler    |      | Playback |       |
|                   |:8080    |       | /ws/alerts |      | Service  |       |
|                   +----+----+       +------------+      +----+-----+       |
|                        |                                               |
|                        | gRPC :50051                                    |
|                        v                                               |
|   +---------+    +------------+    +----------+    +----------+       |
|   | Stream  |    | AI         |    |Suspicious|    |Training  |       |
|   |Ingestion|<-->| Inference  |<-->| Activity |    |Service   |       |
|   |(Go)     |    |(Triton)    |    |(Night)   |    |(PyTorch) |       |
|   +----+----+    +------+-----+    +----+-----+    +----+-----+       |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------------------------------------------------------------+   |
|   |                        KAFKA (MSK)                            |   |
|   |  streams.raw (8 parts)  ai.detections (16 parts)             |   |
|   |  alerts.critical (4 parts)  training.data (30-day ret.)      |   |
|   |  notifications.*  system.metrics (7-day ret.)                |   |
|   +---------------------------------------------------------------+   |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------+    +------------+    +----------+    +----------+       |
|   |PostgreSQL|   | Redis      |    | MinIO    |    | MLflow   |       |
|   |16 +pgvec |   |7 Cluster   |    |S3-compat |    | Model    |       |
|   |:5432     |   |:6379       |    |:9000     |    | Registry |       |
|   +---------+    +------------+    +----------+    +----------+       |
|                                                                              |
|   Edge Gateway: WireGuard peer at 10.200.0.2/32                            |
|   Stream Ingestion pulls frames via VPN -> sends to Kafka                   |
|                                                                              |
+-----------------------------------------------------------------------------+

4.3 Network Security Zones

Five security zones provide defense in depth, from the public internet to the physically isolated edge network.

+=============================================================================+
|                         NETWORK SECURITY ZONES                               |
+=============================================================================+
|                                                                              |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 0: INTERNET (UNTRUSTED)                                        |    |
|  |  - Public users, any source IP                                        |    |
|  |  - AWS Shield Standard DDoS protection                               |    |
|  |  - Geo-restriction: allow specific countries only                    |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | HTTPS :443                                    |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                |    |
|  |  - ALB + WAF v2 (SQL injection, XSS, rate limiting rules)           |    |
|  |  - Traefik Ingress (:8443)                                          |    |
|  |  - Auth: JWT + RBAC, API keys for edge gateway                     |    |
|  |  - Public API endpoints ONLY                                        |    |
|  |  SG: alb-public-sg: 443 from 0.0.0.0/0                             |    |
|  |  SG: traefik-sg: 8443 from alb-sg ONLY                              |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Internal :8080-8090                         |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 2: AWS VPC APPLICATION (TRUSTED, ISOLATED)                     |    |
|  |  - Stream Ingestion, AI Inference, Suspicious Activity              |    |
|  |  - Training, Backend API, Notification Services                     |    |
|  |  - Pod Security: No root, read-only FS, no privilege escalation    |    |
|  |  - Network Policies: Ingress only from API GW namespace            |    |
|  |  SG: app-sg: 8080-8090 from traefik-sg ONLY                         |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Data Layer :5432, :6379, :9092, :9000       |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                            |    |
|  |  - PostgreSQL (RDS), Redis (ElastiCache), Kafka (MSK)               |    |
|  |  - MinIO object storage, S3 cold archive                            |    |
|  |  - Security Groups: ONLY from app-sg                                |    |
|  |  - RDS: Encrypted at rest (AWS KMS), no public access              |    |
|  |  - S3: Bucket policy deny all except VPC endpoint                   |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | WireGuard VPN (UDP 51820)                     |
|                              | ChaCha20-Poly1305                             |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                          |    |
|  |  - Edge Gateway (Intel NUC), K3s node                                |    |
|  |  - WireGuard peer, stream ingestion, local buffer                    |    |
|  |  - DVR (192.168.29.200): NO internet access, local ONLY             |    |
|  |  - Edge Firewall: ALLOW 192.168.29.0/24 -> DVR :554,:80           |    |
|  |                   ALLOW OUT 51820/udp -> Cloud VPN endpoint        |    |
|  |                   DENY ALL other incoming                           |    |
|  +---------------------------------------------------------------------+    |
|                                                                              |
+=============================================================================+

4.4 Service Descriptions

# Service Purpose Technology Port Replicas
1 Edge Gateway Agent RTSP stream pull, local recording, VPN endpoint, heartbeat Go 1.21, systemd + K3s 8080, 51820 1 (per site)
2 Stream Ingestion Receive frames from edge, decode, produce to Kafka, store segments Go 1.21, FFmpeg 8081 3-20 (HPA)
3 AI Inference GPU-accelerated detection, face recognition, embedding Triton 2.40, TensorRT 8000, 8001, 8002 1-4 (GPU HPA)
4 Suspicious Activity Night-mode analysis, 10 detection modules, scoring engine Python 3.11, OpenCV 8083 2-8 (HPA)
5 Training Service Model retraining, fine-tuning, A/B validation PyTorch 2.1, CUDA 12.1 8084 0-1 (GPU spot)
6 Backend API REST API, authentication, business logic Go 1.21, Gin 8080 3-10 (HPA)
7 Web Frontend Dashboard, live view, timeline, analytics Next.js 14, React 18 3000 3 (CDN)
8 Notification Multi-channel alert dispatch (Telegram, WhatsApp, Email) Go 1.21 8086 2-5 (HPA)
9 HLS Playback HLS segment serving for dashboard live view Go 1.21 8085 2-4 (HPA)
10 PostgreSQL Primary database with pgvector for embeddings PostgreSQL 16 (RDS) 5432 1 (Multi-AZ)
11 Redis Session store, cache, pub/sub, stream tracking Redis 7 (ElastiCache) 6379 2 shards x 2 replicas
12 Kafka Event bus, durable log, stream replay Apache Kafka (MSK) 9092 3 brokers x 3 AZs
13 MinIO Object storage for video, snapshots, model artifacts MinIO (S3-compatible) 9000, 9001 Edge: 1, Cloud: 4

4.5 Physical Edge Gateway Specification

Component Specification
Hardware Intel NUC 13 Pro, Core i5-1340P (12 cores, 16 threads)
Alternative NVIDIA Jetson Orin NX 16GB (for on-edge AI inference)
RAM 16GB DDR4-3200 (32GB recommended for 16+ channels)
Storage 2TB NVMe SSD (7-day circular buffer for all 8 streams)
LAN Intel i226-V 2.5GbE (local DVR subnet)
WAN Second Ethernet or WiFi (internet for VPN)
OS Ubuntu 22.04.4 LTS Server (no GUI)
Container Runtime Docker CE 25.x + Docker Compose 2.x
K8s Distribution K3s v1.28+ (lightweight, single-node or 2-node HA)
Power UPS-backed, auto-restart on power loss (BIOS setting)
Network Dual interface: eth0 for local DVR, eth1 for internet/VPN

4.6 Cloud Infrastructure Specification

Component Specification
Region Primary: ap-south-1 (Mumbai), DR: ap-southeast-1 (Singapore)
VPC 10.100.0.0/16, 3 AZs, private subnets only for workloads
EKS Managed node groups: on-demand for API, spot for batch/GPU
GPU Nodes g4dn.xlarge (NVIDIA T4) for Triton inference, 1-4 auto-scaled
ALB Internet-facing, WAF v2 attached, Shield Advanced optional
RDS PostgreSQL 16, db.r6g.xlarge, Multi-AZ, encrypted at rest
ElastiCache Redis 7, cluster mode enabled, 2 shards x 2 replicas
MSK (Kafka) 3 broker nodes, kafka.m5.large, 3 AZs
S3 Standard (hot 30d), IA (31-90d), Glacier Deep Archive (90d+)

4.7 Scaling Approach

The system scales from the initial 8-camera deployment to 64+ cameras through well-defined phases:

+-----------------------------------------------------------------------------+
|                        CAMERA SCALING ROADMAP                                |
+-----------------------------------------------------------------------------+
|                                                                              |
|  CURRENT: 8 cameras (1 DVR)                                                  |
|  +-- Edge: Intel NUC i7, 32GB RAM                                           |
|  +-- Bandwidth: ~16 Mbps upstream (2 Mbps per H.264 stream)                 |
|  +-- Cloud AI: 1x T4 GPU (8 streams @ 1 fps, batch=8)                       |
|  +-- Kafka: 8 partitions (streams.raw)                                      |
|  +-- PostgreSQL: db.r6g.xlarge                                              |
|  +-- Monthly cost: ~$2,140                                                  |
|                                                                              |
|  PHASE 1: 16 cameras (2 DVRs / 2 sites)                                      |
|  +-- Edge: 2x Intel NUC (one per site)                                      |
|  +-- Bandwidth: ~32 Mbps                                                    |
|  +-- Cloud AI: 1x T4 GPU (batch=16, still sufficient)                       |
|  +-- Kafka: 16 partitions                                                   |
|  +-- Monthly cost: ~$3,200                                                  |
|                                                                              |
|  PHASE 2: 32 cameras (4 DVRs / 4 sites)                                      |
|  +-- Edge: 4x Intel NUC                                                     |
|  +-- VPN: Hub-spoke model (4 edge peers -> 1 cloud endpoint)                |
|  +-- Bandwidth: ~64 Mbps                                                    |
|  +-- Cloud AI: 2x T4 GPUs (HPA: 2-6 replicas)                               |
|  +-- Kafka: 32 partitions                                                   |
|  +-- PostgreSQL: db.r6g.2xlarge                                             |
|  +-- Monthly cost: ~$5,500                                                  |
|                                                                              |
|  PHASE 3: 64 cameras (8 DVRs / 8 sites)                                      |
|  +-- Edge: 8x Intel NUC (or Jetson Orin for edge AI pre-filter)              |
|  +-- Bandwidth: ~128 Mbps (dedicated circuit recommended)                   |
|  +-- Cloud AI: 4x T4 GPUs or 2x A10G (g5.2xlarge)                           |
|  +-- Kafka: 64 partitions, consider MSK multi-cluster                        |
|  +-- PostgreSQL: db.r6g.4xlarge + read replica                              |
|  +-- Monthly cost: ~$9,800                                                  |
|                                                                              |
+-----------------------------------------------------------------------------+

4.8 Failover and Reliability Design

The graceful degradation matrix defines behavior for every failure mode:

+=============================================================================+
|                     GRACEFUL DEGRADATION MATRIX                              |
+=============================================================================+
|                                                                              |
|  Failure Mode              | Degradation Strategy                            |
|  ------------------------- | ----------------------------------------------- |
|  AI Inference Service DOWN | Continue recording ALL video locally            |
|  (GPU failure, model crash)| Events stored as "unprocessed"                  |
|                            | No real-time alerts                             |
|                            | Queue frames for later batch processing         |
|                            | Dashboard shows "AI OFFLINE" banner             |
|                                                                              |
|  Kafka DOWN (MSK outage)   | Edge Gateway buffers locally (20GB ring buffer) |
|                            | Backpressure: reduce to key frames only (0.2fps)|
|                            | Auto-reconnect with 2x exponential backoff      |
|                            | Replay from local buffer when Kafka recovers    |
|                                                                              |
|  VPN Tunnel DOWN           | Full local operation mode                       |
|  (internet outage)         | All recording continues locally (7-day buffer)  |
|                            | Local alert buzzer/relay (configurable)         |
|                            | No cloud dashboard access                       |
|                            | Auto-sync when VPN recovers                     |
|                                                                              |
|  PostgreSQL DOWN (RDS)     | Alert queue builds in Kafka (durable log)       |
|                            | Events not lost (Kafka 7-day retention)         |
|                            | Read-only dashboard mode (Redis cache)          |
|                            | Alert on-call engineer                          |
|                                                                              |
|  Notification Service DOWN | Alerts accumulate in DB                         |
|                            | Retry with exponential backoff                  |
|                            | Dead letter after 24 hours                      |
|                            | Dashboard shows pending count                   |
|                                                                              |
|  Edge Gateway DOWN (power) | Cloud dashboard shows "SITE OFFLINE"            |
|                            | Last known recordings in cloud                  |
|                            | Alert sent immediately                          |
|                            | UPS: graceful shutdown, preserve data           |
|                                                                              |
+=============================================================================+

Priority Order (highest first):

  1. Video recording NEVER STOPS (local edge priority)
  2. Critical alerts ALWAYS FIRE (local buzzer + queued cloud alerts)
  3. AI inference gracefully degrades to batch catch-up on recovery
  4. Dashboard operates in read-only/cache mode during DB outage
  5. Cloud sync resumes automatically when connectivity restored

Reliability Mechanisms:

Mechanism Implementation Target
Stream Reconnect Exponential backoff: 1s -> 2s -> 4s -> 8s -> max 30s < 60s recovery
Circuit Breaker 5 failures -> OPEN (60s) -> HALF_OPEN (3 test calls) -> CLOSED Prevent cascade failures
VPN Watchdog Ping every 30s, restart WireGuard on 3 consecutive failures < 90s VPN recovery
Kafka Producer acks=all, retries=10, enable.idempotence=true, LZ4 compression Zero message loss
Kafka Consumer Manual offset commit AFTER DB write success Exactly-once processing
Health Checks 5-layer: K8s probes -> Service metrics -> Dependency checks -> E2E synthetic -> Edge heartbeat < 2 min detection
Auto-scaling GPU util > 80% for 2 min -> scale out; Kafka lag > 1000 for 5 min -> scale out Proactive capacity

Section 5: Data Flow from DVR to Cloud to Dashboard

This section traces the complete data journey from camera capture through AI processing to user presentation.

5.1 Overview: Seven Data Flows

+=============================================================================+
|                        SEVEN DATA FLOW PATHWAYS                              |
+=============================================================================+
|                                                                              |
|  Flow 1: Camera --> DVR --> Edge Gateway                                    |
|          [Analog/Digital] -> [H.264 Encode] -> [RTSP Server]                |
|                                                                              |
|  Flow 2: Edge Gateway --> VPN --> Cloud Kafka                               |
|          [FFmpeg ingest] -> [Frame extract] -> [Kafka Producer]             |
|                                                                              |
|  Flow 3: Stream Ingestion --> AI Inference                                  |
|          [Kafka Consumer] -> [GPU Batch] -> [Detection + Face Recog.]       |
|                                                                              |
|  Flow 4: AI Inference --> Events --> Database                               |
|          [Detection results] -> [Event enrich] -> [PostgreSQL]              |
|                                                                              |
|  Flow 5: Events --> Alerts --> Notifications                                |
|          [Scoring engine] -> [Alert create] -> [Multi-channel send]         |
|                                                                              |
|  Flow 6: Live Streams --> Browser Dashboard                                 |
|          [HLS segmenter] -> [Nginx relay] -> [HLS.js player]                |
|                                                                              |
|  Flow 7: Training Feedback Loop                                             |
|          [Operator review] -> [Conflict detect] -> [Model update]           |
|                                                                              |
+=============================================================================+

5.2 Flow 1: Camera to DVR to Edge Gateway

Path: Analog/Digital Camera -> DVR internal encoder -> DVR RTSP server -> Edge Gateway FFmpeg client

Protocol Stack:

Layer Technology Details
Camera Interface Analog BNC / CVBS / AHD CP PLUS DVR supports multiple analog standards
DVR Encoding H.264 High Profile Hardware encoder, real-time, low latency
DVR Storage Internal HDD (currently FULL) 0 bytes free — no local recording possible
Network Transport RTSP over TCP (interleaved) Mandatory for reliable NAT/VPN traversal
URL Pattern rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M N=1-8, M=0(main)/1(sub)
Client FFmpeg 6.0+ -rtsp_transport tcp -stimeout 5000000
Frame Rate 25 FPS (PAL) or 30 FPS (NTSC) Configurable per channel
Resolution (main) 960 x 1080 (per channel) Full resolution
Resolution (sub) 352 x 288 to 704 x 576 Lower bandwidth for AI

FFmpeg RTSP Connection Command:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp \
    -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c copy -f segment -segment_time 60 -reset_timestamps 1 \
    -strftime 1 "/data/buffer/ch1/%Y%m%d_%H%M%S.mkv"

Latency Budget:

Stage Latency
Camera -> DVR (analog) ~1-5 ms
DVR encoding ~50-100 ms
RTSP over LAN ~1-2 ms
Total (camera to edge gateway) ~52-107 ms

5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud

Path: Edge Gateway FFmpeg -> Frame extraction -> JPEG encoding -> Kafka Producer -> WireGuard VPN -> Cloud MSK

Frame Processing Pipeline:

+------------+    +-------------+    +---------------+    +-------------+    +-----------+
| Raw RTSP   | -> | FFmpeg      | -> | Frame         | -> | JPEG        | -> | Kafka     |
| H.264      |    | Demux/Decode|    | Decimation    |    | Encoder     |    | Producer  |
| 25 FPS     |    |             |    | (1 fps)       |    | Quality 85  |    | (LZ4)     |
| 960x1080   |    |             |    | 640x640 crop  |    |             |    |           |
+------------+    +-------------+    +---------------+    +-------------+    +-----------+

FFmpeg Frame Extraction for AI:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -vf "fps=1,scale=640:640:force_original_aspect_ratio=decrease,pad=640:640:(ow-iw)/2:(oh-ih)/2:black" \
    -q:v 5 -f image2pipe -vcodec mjpeg pipe:1

WireGuard VPN Tunnel Configuration:

Parameter Value
Protocol UDP 51820
Encryption ChaCha20-Poly1305
Key Exchange Curve25519 (ECDH)
Preshared Key Enabled per-peer
Keepalive 25 seconds
MTU 1400 (to account for WireGuard + IP headers)
Cloud Endpoint 10.200.0.1/32 (EC2 bastion or ALB)
Edge Endpoint 10.200.0.2/32
Route 10.200.0.0/16 (AWS VPC) accessible from edge

VPN watchdog script runs every 30 seconds; restarts WireGuard on 3 consecutive ping failures.

Latency Budget:

Stage Latency
Frame extraction (FFmpeg) ~50-100 ms
JPEG encoding ~5-10 ms
Kafka produce (local) ~1-2 ms
WireGuard tunnel ~5-15 ms (Mumbai -> India site)
MSK broker ~1-2 ms
Total (edge to cloud Kafka) ~62-129 ms

5.4 Flow 3: Stream Ingestion to AI Inference

Path: Kafka streams.raw topic -> Stream Ingestion consumer -> Triton Inference Server -> Kafka ai.detections topic

Pipeline Architecture:

+------------+    +-------------------+    +------------------+    +-------------+
| streams.raw| -> | Stream Ingestion  | -> | NVIDIA Triton    | -> | ai.detections |
| (8 parts)  |    | (Go consumer)     |    | (GPU inference)  |    | (16 parts)    |
| JPEG frames|    | Batch aggregator  |    | gRPC :8001       |    | Detection     |
| + metadata |    | (batch=8, timeout)|    | Dynamic batching |    | + embeddings  |
+------------+    +-------------------+    +------------------+    +-------------+

Triton Model Configuration:

Model Inputs Outputs GPU Memory Latency (P50)
YOLO11m-det (TensorRT FP16) 3x640x640 float16 Bboxes, scores, labels ~2.1 GB 12 ms
SCRFD-500M (TensorRT FP16) 3x640x640 float16 Bboxes, landmarks, scores ~1.8 GB 8 ms
ArcFace R100 (TensorRT FP16) 3x112x112 float16 512-D embedding ~3.2 GB 5 ms

Total GPU memory: ~7.1 GB (fits in T4 16 GB with 8 streams)

Latency Budget:

Stage Latency
Kafka consume (batch) ~10-50 ms
Preprocessing (resize, normalize) ~5-15 ms
YOLO11m inference (GPU) ~12 ms (P50)
SCRFD face detection (GPU) ~8 ms (P50)
ArcFace embedding (GPU, per face) ~5 ms (P50)
Post-processing (NMS, matching) ~10-30 ms
Kafka produce (results) ~1-2 ms
Total (Kafka to detection output) ~51-132 ms

5.5 Flow 4: AI Inference to Events to Database

Path: AI Detection results -> Event enricher -> PostgreSQL (multiple tables)

Data Transformation:

+------------+    +-------------------+    +---------------------+    +------------+
| Detection  | -> | Event Enricher    | -> | PostgreSQL Writer   | -> | events     |
| results    |    | - Add camera_id   |    | - UPSERT person     |    | persons    |
| (raw)      |    | - Match person    |    | - INSERT event      |    | embeddings |
|            |    | - Check whitelist |    | - INSERT embedding  |    | face_crops |
+------------+    +-------------------+    +---------------------+    +------------+

Database Write Operations per Detection:

Operation Table Type Notes
Insert event record events INSERT With bounding box, confidence, timestamp
Upsert person persons INSERT/UPDATE If new face, create person record
Insert face crop face_crops INSERT S3 URL, bounding box, quality score
Upsert embedding face_embeddings INSERT/UPDATE 512-D vector, pgvector HNSW index
Increment counters camera_stats UPDATE Daily aggregation

5.6 Flow 5: Events to Alerts to Notifications

Path: AI events -> Suspicious Activity scoring engine -> Alert creation -> Notification dispatch

Scoring and Escalation:

+------------+    +-------------------+    +------------------+    +-------------+
| AI events  | -> | Suspicious Activity| -> | Alert Manager    | -> | Notification |
| (persons,  |    | Scoring Engine     |    | - Deduplicate    |    | Service      |
|  faces)    |    | - 10 modules       |    | - Rate limit     |    | - Telegram   |
|            |    | - Composite score  |    | - Suppress dup   |    | - WhatsApp   |
|            |    | - Time decay       |    | - Escalation     |    | - Email      |
+------------+    +-------------------+    +------------------+    +-------------+

Alert Escalation Matrix:

Score Level Color Notification Action
0.00 - 0.20 NONE Gray None Log only
0.20 - 0.40 LOW Blue Dashboard only Log + indicator
0.40 - 0.60 MEDIUM Yellow Dashboard + App push Alert dispatched
0.60 - 0.80 HIGH Orange All of above + Telegram Immediate alert
0.80 - 1.00 CRITICAL Red All of above + WhatsApp + Email Critical alert
> 1.00 EMERGENCY Purple + flashing All channels + SMS Emergency dispatch

5.7 Flow 6: Live Streams to Browser Dashboard

Path: DVR RTSP -> Edge Gateway FFmpeg -> HLS segmenter -> Nginx -> CDN -> Browser HLS.js

+--------+    +---------------+    +---------------+    +---------+    +----------+
| DVR    | -> | Edge Gateway  | -> | HLS Segmenter | -> | Nginx   | -> | Browser  |
| RTSP   |    | FFmpeg        |    | (2s segments) |    | (relay) |    | HLS.js   |
| 25 FPS |    | -copyts       |    | H.264 + AAC   |    | HTTPS   |    | Video tag|
+--------+    +---------------+    +---------------+    +---------+    +----------+

HLS Configuration:

Parameter Value
Segment duration 2 seconds
Segment list size 5 segments (10-second sliding window)
Playlist type Live (no #EXT-X-ENDLIST)
Codec H.264 High Profile + AAC-LC
Adaptive bitrate 3 variants: high (3 Mbps), mid (1 Mbps), low (500 Kbps)

Latency:

Stage Latency
DVR encoding ~50-100 ms
RTSP to edge ~1-2 ms
FFmpeg demux/remux ~20-50 ms
HLS segmenting (2s) ~2000 ms
Nginx relay ~1-5 ms
CDN propagation ~10-50 ms
HLS.js buffer ~1-2 segments (2-4s)
Browser decode ~20-50 ms
Total (camera to eye) ~2.1 - 2.3 seconds

5.8 Flow 7: Training Feedback Loop

Path: Operator review actions -> Conflict detection -> Training dataset -> Model training -> Quality gates -> Deployment

+------------+    +------------------+    +----------------+    +-------------+    +-----------+
| Operator   | -> | Conflict         | -> | Training       | -> | Quality     | -> | Deployment |
| Review     |    | Detection        |    | Dataset        |    | Gates       |    | (A/B test) |
| (confirm,  |    | (5 types)        |    | - Curate       |    | - Precision |    |            |
|  correct,  |    | - Block conflicts|    | - Label        |    |   >= 0.97   |    |            |
|  merge,    |    | - Queue safe     |    | - Augment      |    | - Recall    |    |            |
|  reject)   |    |   additions      |    | - Version      |    |   >= 0.95   |    |            |
+------------+    +------------------+    +----------------+    +-------------+    +-----------+

Training Data Flow:

Stage Frequency Trigger
Review action collection Continuous Operator clicks on dashboard
Conflict detection Immediate (synchronous) Every review action
Training dataset build Weekly (or on-demand) Queue threshold or manual
Model training On dataset build Airflow DAG trigger
Quality gate evaluation After training Automated pipeline
A/B deployment After quality pass Admin approval
Full production After A/B success Auto-promote at 48h

Section 6: Recommended Tech Stack

6.1 Technology Selection Matrix

Layer Technology Version Purpose Rationale
Cloud Platform AWS 2025 Infrastructure (ap-south-1 Mumbai) Best India region latency, mature managed services
Container Orchestration Amazon EKS v1.28+ Managed Kubernetes control plane GPU node support, Cluster Autoscaler
Edge K8s K3s v1.28+ Lightweight Kubernetes at edge Single binary, resource-efficient
VPN WireGuard v1.0+ Encrypted tunnel between edge and cloud ~60% faster than OpenVPN, modern crypto
Reverse Proxy Traefik v2.10+ Kubernetes Ingress controller Native K8s integration, automatic TLS
AI Inference NVIDIA Triton 2.40 GPU model serving, dynamic batching Multi-framework, TensorRT optimization
CV Framework OpenCV 4.8+ Image processing, pre/post-processing Industry standard, Python/Go bindings
AI/ML Framework PyTorch 2.1+ Model training, custom inference Ecosystem, CUDA 12 support
Deep Learning TensorRT 8.6+ GPU-optimized inference for YOLO, SCRFD, ArcFace FP16 support, 3-5x speedup
Language: AI Python 3.11 AI inference, training, suspicious activity detection Ecosystem, scientific computing
Language: Services Go 1.21 Stream ingestion, backend API, notifications Performance, concurrency, small binaries
Language: Frontend TypeScript 5.2 Web dashboard Type safety, React ecosystem
Web Framework Next.js 14 (App Router) React SSR dashboard Server components, streaming
UI Library React 18 Component-based UI Concurrent features, Suspense
Styling Tailwind CSS 3.4 Utility-first CSS Rapid development, consistent design
Video Player HLS.js 1.4 Browser HLS playback MSE-based, adaptive bitrate
Database PostgreSQL 16 Primary database, vector storage ACID, pgvector extension
Vector Search pgvector 0.5+ HNSW index for 512-D face embeddings Native PostgreSQL, ivfflat+hnsw
Cache/Session Redis 7 Session store, pub/sub, rate limiting Data structures, cluster mode
Message Queue Apache Kafka 3.6+ (MSK) Durable event log, stream replay Exactly-once, retention, partitions
Object Storage MinIO latest (RELEASE.2024) S3-compatible hot storage Edge + cloud, erasure coding
Cold Archive Amazon S3 Standard/IA/Glacier Tiered archival (30d/90d/365d) Cost optimization
Model Registry MLflow 2.8+ Model versioning, experiment tracking Open source, S3 artifact store
Orchestration Apache Airflow 2.7+ Training pipeline DAGs Backfill, retries, observability
Monitoring Prometheus 2.47+ Metrics collection Pull-based, K8s service discovery
Visualization Grafana 10.1+ Dashboards, alerting Panels, annotations, shared links
Log Aggregation Grafana Loki 2.9+ Centralized logging Label-based, cost-effective
CI/CD GitHub Actions v4 Build, test, lint pipelines Native GitHub integration
GitOps ArgoCD 2.9+ Kubernetes continuous delivery Declarative, drift detection
Infrastructure Terraform 1.6+ IaC for AWS resources State management, modules
Secrets AWS Secrets Manager - Encrypted credential storage Rotation, IAM integration

6.2 Hardware Requirements

Edge Gateway (Per Site)

Component Minimum Recommended High Availability
CPU Intel i5-1340P (12 cores) Intel i7-1370P (14 cores) 2x Intel i7 (HA cluster)
RAM 16 GB DDR4-3200 32 GB DDR4-3200 32 GB per node
Storage 1 TB NVMe SSD 2 TB NVMe SSD 2 TB per node + NAS sync
Network 1 Gbps Ethernet 2.5 Gbps Ethernet Dual NIC + bonding
GPU (optional) None NVIDIA Jetson Orin NX 16GB On-edge AI pre-filtering
Power UPS 600VA UPS 1000VA Dual PSU + generator

Cloud GPU Nodes (AI Inference)

Cameras GPU VRAM Streams Cost/month (spot)
1-8 g4dn.xlarge (T4) 16 GB 8 ~$200-350
8-16 g4dn.xlarge (T4) 16 GB 16 ~$350-500
16-32 g4dn.2xlarge (T4) 16 GB 32 ~$600-900
32-64 g5.2xlarge (A10G) 24 GB 64 ~$1200-1800
64+ p4d.24xlarge (A100) 40 GB 128 ~$5000-8000

6.3 Software Versions Summary

Category Software Version
Operating System Ubuntu Server LTS 22.04.4
Container Runtime Docker CE 25.x
Container Orchestration Kubernetes (EKS/K3s) 1.28+
AI Serving NVIDIA Triton Inference Server 2.40
GPU Runtime CUDA 12.1+
GPU Driver NVIDIA Driver 535+
Deep Learning Optimization TensorRT 8.6+
AI Framework PyTorch 2.1+
Computer Vision OpenCV 4.8+
Video Processing FFmpeg 6.0+
Service Language Go 1.21+
AI/Training Language Python 3.11+
Frontend Framework Next.js 14
UI Library React 18
Database PostgreSQL 16
Message Queue Apache Kafka 3.6+
Cache Redis 7
Object Storage MinIO 2024+
CI/CD GitHub Actions v4
GitOps ArgoCD 2.9+
Monitoring Prometheus + Grafana 2.47+ / 10.1+
Logging Grafana Loki 2.9+
VPN WireGuard 1.0+
Model Registry MLflow 2.8+
Orchestration Apache Airflow 2.7+
Infrastructure Terraform 1.6+

6.4 Port Reference

Service Port Protocol Location Notes
DVR RTSP 554 TCP 192.168.29.200 Local network only
DVR HTTP 80 TCP 192.168.29.200 Admin UI, local only
DVR HTTPS 443 TCP 192.168.29.200 Admin UI, local only
DVR TCP 25001 TCP 192.168.29.200 Proprietary protocol
DVR UDP 25002 UDP 192.168.29.200 Proprietary protocol
DVR NTP 123 UDP 192.168.29.200 Time sync
WireGuard 51820 UDP Cloud + Edge VPN tunnel
Edge Admin 8080 TCP 192.168.29.5 Local admin UI
Edge SSH 22 TCP 192.168.29.5 Admin access only
Traefik HTTP 8000 TCP EKS Internal HTTP entrypoint
Traefik HTTPS 8443 TCP EKS Internal HTTPS entrypoint
ALB HTTPS 443 TCP AWS Public-facing
Backend API 8080 TCP EKS pods Internal service port
Triton HTTP 8000 TCP EKS GPU nodes Model inference HTTP
Triton gRPC 8001 TCP EKS GPU nodes Model inference gRPC
Triton Metrics 8002 TCP EKS GPU nodes Prometheus metrics
PostgreSQL 5432 TCP RDS VPC-private
Redis 6379 TCP ElastiCache VPC-private
Kafka 9092 TCP MSK VPC-private
MinIO API 9000 TCP EKS + Edge S3-compatible API
MinIO Console 9001 TCP EKS + Edge Admin console
Prometheus 9090 TCP EKS Metrics collection
Grafana 3000 TCP EKS Dashboards

Section 7: Database Schema

7.1 Schema Overview

The database is designed around a relational core (PostgreSQL 16) with pgvector extension for 512-dimensional face embedding storage and similarity search. The schema consists of 29 tables, 4 views, and 8 trigger functions, organized into 10 logical domains.

Schema Philosophy:

  • Strict normalization for reference data (cameras, persons, rules) to ensure data integrity
  • JSONB flexibility for event metadata and configuration to accommodate evolving AI outputs
  • Partitioning on all high-volume time-series tables for query performance and lifecycle management
  • pgvector HNSW indexing for sub-10ms face similarity search at scale
  • Row-level security (RLS) for multi-tenant site isolation
  • AES-256 encryption for all stored credentials (DVR passwords, API tokens)

7.2 Entity Relationship Overview

+=============================================================================+
|                    ENTITY RELATIONSHIP DIAGRAM                               |
+=============================================================================+
|                                                                              |
|   SITE (1) --------------------< (N) DVR                                     |
|    |                              |                                          |
|    |                              | (1)                                      |
|    |                              v                                          |
|    |                           CAMERA (N) <------------------< (N) ALERT_RULE|
|    |                              |                              |           |
|    |                              | (N)                            | (1)      |
|    |                              v                              v           |
|    |   +---------------------------------------------------------+           |
|    |   | EVENT (N) -->--(1) PERSON (1)--< (N) FACE_EMBEDDING               |
|    |   |   |                                                      |         |
|    |   |   | (N)                                                  | (N)     |
|    |   |   v                                                      v         |
|    |   | FACE_CROP (N)                                    PERSON_CLUSTER     |
|    |   |   |                                                                  |
|    |   |   | (N)                                                  +---------+|
|    |   |   v                                                      | Training||
|    |   | MEDIA_FILE (1) ----------------------------------------->| Dataset  ||
|    |   |                                                          |---------||
|    |   +--------------------------------------------------------->| Job      ||
|    |                                                              | Model    ||
|    |                              +---------+                     | Version  ||
|    |                              | Review  |                     +---------+|
|    |                              | Action  |                                |
|    |                              +---------+                                |
|    |                                    ^                                    |
|    |                                    | (N)                                |
|    +------------------------------------+                                    |
|   USER (N) -->--(N) ROLE_PERMISSION                                          |
|    |                                                                         |
|    | (1)                                                                     |
|    v                                                                         |
|   WATCHLIST (N) -->--(N) WATCHLIST_ENTRY                                     |
|                                                                              |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | Telegram|    |WhatsApp |    | Email   |    |Webhook  |                  |
|   | Config  |    | Config  |    | Config  |    | Config  |                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^              ^             ^              ^                         |
|        |              |             |              |                         |
|        +--------------+-------------+--------------+                         |
|                         |                                                    |
|                   NOTIFICATION_CHANNEL                                         |
|                         |                                                    |
|                         | (1)                                                |
|                         v                                                    |
|                   NOTIFICATION_LOG                                             |
|                                                                              |
|   +---------+    +---------+    +---------+                                  |
|   | Audit   |    | System  |    | Device  |                                  |
|   | Log     |    | Health  |    | Connect.|                                  |
|   |(partitioned) |  Log    |    |  Log    |                                  |
|   +---------+    +---------+    +---------+                                  |
|                                                                              |
+=============================================================================+

7.3 Core Tables Summary

7.3.1 Site and Infrastructure Tables

Table Purpose Key Fields Rows (est.)
sites Physical locations (factories, warehouses) id, name, location, timezone, settings 1-10
dvrs DVR/NVR devices per site id, site_id, ip_address, port, username, password_encrypted, model, channels, status 1-10
cameras Individual camera channels id, dvr_id, channel_number, name, rtsp_url, resolution, fps, status, zone_config, zone_description 8-64

7.3.2 AI Detection and Identity Tables

Table Purpose Key Fields Rows (est.)
events All AI detection events (partitioned monthly) id, camera_id, event_type, timestamp, confidence, bounding_box, person_id, face_crop_id, track_id 1M-10M/month
persons Known and unknown individuals id, name, status (known/unknown/blacklisted), role, company, notes, created_at 100-10,000
face_crops Cropped face images metadata id, event_id, person_id, storage_path, bounding_box, quality_score, blur_score, pose_yaw, pose_pitch 500K-5M/month
face_embeddings 512-D face embeddings (pgvector) id, person_id, face_crop_id, embedding (vector(512)), model_version, is_primary 500K-5M
person_clusters Unknown person cluster groups id, cluster_label, representative_embedding_id, sample_count, first_seen, last_seen, status 10-1,000

7.3.3 Alert and Notification Tables

Table Purpose Key Fields Rows (est.)
alert_rules Per-camera alert configuration id, camera_id, rule_type, name, config_json, schedule, enabled 50-500
alerts Generated alert records id, camera_id, rule_id, person_id, alert_type, severity, status, message 1K-50K/month
notification_channels Alert destination endpoints id, name, channel_type, config_json, is_active 5-20
telegram_configs Telegram Bot API credentials id, channel_id, bot_token_encrypted, chat_id 1-5
whatsapp_configs WhatsApp Business API credentials id, channel_id, api_key_encrypted, phone_number_id 1-5
notification_log Delivery status per notification id, alert_id, channel_id, status, sent_at, error_message 1K-50K/month

7.3.4 Watchlist and Access Control Tables

Table Purpose Key Fields Rows (est.)
users Dashboard users and operators id, username, email, password_hash, role, is_active 5-50
roles Permission roles id, name, permissions_json 3-10
watchlists Named monitoring lists id, name, watch_type (vip/blacklist/custom), is_active 5-20
watchlist_entries Persons on watchlists id, watchlist_id, person_id, added_by, added_at 10-1,000

7.3.5 Training and ML Pipeline Tables

Table Purpose Key Fields Rows (est.)
training_datasets Curated face datasets for training id, name, description, person_ids_json, sample_count, version, status 10-100
training_jobs Model training job tracking id, dataset_id, model_version_from, model_version_to, status, metrics_json 10-100
model_versions Registry of trained model versions id, version_string, training_job_id, metrics_json, is_production, is_rollback_available 10-50
review_actions Operator review decisions id, event_id, reviewer_id, action, from_person_id, to_person_id, notes 1K-100K

7.3.6 Media and Storage Tables

Table Purpose Key Fields Rows (est.)
media_files Registry of stored video/images id, file_type, storage_path, size_bytes, checksum, camera_id, event_id, retention_until 100K-1M
video_clips Video clip metadata for incidents id, media_file_id, start_time, end_time, camera_id, event_id, duration_seconds 10K-100K

7.3.7 Audit and Monitoring Tables (Partitioned)

Table Purpose Partition Retention
audit_logs All user and system actions Monthly by timestamp 1 year (Glacier)
system_health_logs Component health metrics Monthly by timestamp 90 days
device_connectivity_logs Camera/DVR connectivity events Monthly by timestamp 90 days

7.4 Indexing Strategy

7.4.1 pgvector HNSW Index (Critical Path)

-- HNSW index for sub-10ms face similarity search
-- ef_search controls recall/speed tradeoff (higher = more accurate, slower)
CREATE INDEX idx_face_embeddings_hnsw
ON face_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Query: Find top-K similar faces
SELECT person_id, 1 - (embedding <=> query_vector) AS similarity
FROM face_embeddings
WHERE is_primary = true
ORDER BY embedding <=> query_vector
LIMIT 5;
Parameter Value Rationale
m 16 Number of bi-directional links per node (higher = better recall, more memory)
ef_construction 128 Build-time exploration factor (higher = better index quality)
ef_search (runtime SET) 64-256 Search-time exploration factor (SET hnsw.ef_search = 128)
Distance metric Cosine similarity (<=>) Optimal for normalized face embeddings

7.4.2 B-Tree Indexes (Standard Queries)

Table Index Purpose
events (camera_id, timestamp DESC) Time-range queries per camera
events (event_type, timestamp DESC) Filter by event type
events (person_id) WHERE person_id IS NOT NULL Person event lookup
face_crops (person_id, quality_score DESC) Best quality face per person
alerts (status, created_at DESC) Pending alerts by age
alerts (severity, status) Critical alert dashboard
persons (status, name) Person directory with status filter
persons (created_at DESC) Recently added persons
media_files (retention_until) WHERE retention_until < NOW() + 7 days Expiring media cleanup

7.5 Partitioning Strategy

All high-volume time-series tables are partitioned monthly using pg_partman for automated partition management.

+-----------------------------------------------------------------------------+
|                    PARTITIONING ARCHITECTURE                                 |
+-----------------------------------------------------------------------------+
|                                                                              |
|   events (parent, empty)                                                     |
|   +-- events_y2024m01   (Jan 2024 data)                                     |
|   +-- events_y2024m02   (Feb 2024 data)                                     |
|   +-- events_y2024m03   (Mar 2024 data)                                     |
|   +-- events_y2024m04   (Apr 2024 data)                                     |
|   +-- events_y2024m05   (May 2024 data)  <-- Hot (in memory)               |
|   +-- events_default    (fallback)                                          |
|                                                                              |
|   Partition pruning: WHERE timestamp >= '2024-05-01'                        |
|                      -> Only scans events_y2024m05                           |
|                      -> ~30x faster for time-range queries                  |
|                                                                              |
|   Managed by: pg_partman extension                                          |
|   - Auto-create: 2 months ahead                                             |
|   - Auto-drop: After retention period (detach + archive)                    |
|                                                                              |
+-----------------------------------------------------------------------------+

Partitioned Tables:

Table Partition Key Partition Type Retention
events timestamp Monthly RANGE 90 days hot, 1 year archive
audit_logs timestamp Monthly RANGE 1 year total
system_health_logs timestamp Monthly RANGE 90 days
device_connectivity_logs timestamp Monthly RANGE 90 days
face_crops created_at Monthly RANGE 90 days hot, 1 year archive

7.6 Retention Policies

Data Tier Storage Duration Lifecycle
Hot Tier PostgreSQL + MinIO 0-30 days Fast query, indexed, in-memory cache
Warm Tier S3 Standard 30-90 days Available on-demand, still indexed
Cold Tier S3 Infrequent Access 90-365 days Retrieval within minutes
Archive Tier Glacier Deep Archive 1-7 years Retrieval within 12-48 hours
Compliance Glacier Vault Lock 7+ years Immutable, legal hold

Automated Cleanup:

Task Frequency Mechanism
Expire old event partitions Daily (pg_partman) DETACH PARTITION + S3 upload
Delete expired media files Daily Cron job: DELETE from media_files + MinIO removal
Purge old notification logs Weekly DELETE WHERE created_at < NOW() - INTERVAL '90 days'
Archive face crops to S3 Daily Lambda: copy to S3 IA, update storage_path
Compress audit logs Monthly pglz/zstd compression on detached partitions
Vacuum and analyze Weekly (auto-vacuum) PostgreSQL autovacuum daemon

7.7 Security Considerations

7.7.1 Credential Encryption

All sensitive credentials stored with AES-256 encryption:

Table Encrypted Field Encryption
dvrs password_encrypted AES-256-CBC, key from AWS Secrets Manager
telegram_configs bot_token_encrypted AES-256-CBC
whatsapp_configs api_key_encrypted AES-256-CBC

7.7.2 Row-Level Security (RLS)

For multi-site deployments, RLS policies enforce that users only see data for sites they have access to:

-- Enable RLS on critical tables
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
ALTER TABLE persons ENABLE ROW LEVEL SECURITY;
ALTER TABLE alerts ENABLE ROW LEVEL SECURITY;

-- Policy: Users see only data from their assigned sites
CREATE POLICY site_isolation_events ON events
    USING (camera_id IN (
        SELECT c.id FROM cameras c
        JOIN dvrs d ON c.dvr_id = d.id
        JOIN site_users su ON d.site_id = su.site_id
        WHERE su.user_id = current_setting('app.current_user_id')::UUID
    ));

7.7.3 Access Control

Role Permissions
super_admin Full access to all sites, all operations
site_admin Full access to assigned sites, user management
operator View dashboards, acknowledge alerts, review persons
viewer Read-only access to dashboards and events

7.7.4 Audit Trail

The audit_logs table (partitioned monthly) captures every significant action:

Action Captured Data
login User, IP, timestamp, MFA status, success/failure
person_create Creator, name, initial status, source event
person_update Updater, changed fields, old/new values
alert_acknowledge Acknowledger, alert ID, timestamp
alert_resolve Resolver, resolution notes
training_approve Approver, model version, dataset version
model_deploy Deployer, version, A/B split percentage
config_change Changer, changed parameters, old/new values

7.7.5 Backup Strategy

Component Method Frequency Retention
PostgreSQL RDS automated backups Daily 35 days
PostgreSQL Manual snapshots Before any schema change 90 days
MinIO/S3 Cross-region replication Continuous 90 days in DR region
Face embeddings pg_dump + vector export Weekly 90 days
Model artifacts MLflow artifact store On training completion Indefinite

Reference: For complete DDL including all CREATE TABLE statements, triggers, views, and functions, see database_schema.md — Sections 2 through 15 contain the full schema definition with comments and constraints.


Section 8: AI Model and Training Strategy

8.1 AI Model Selection

The inference pipeline uses three complementary deep learning models — for human detection, face detection, and face recognition — all optimized with TensorRT for GPU inference. All models run on a single NVIDIA T4 GPU with dynamic batching.

Component Model Framework Input Size FPS (T4) Accuracy
Human Detection YOLO11m (Ultralytics) PyTorch -> ONNX -> TensorRT FP16 640 x 640 213 mAP@50: 80.5% (COCO)
Face Detection SCRFD-500M-BNKPS (InsightFace) PyTorch -> ONNX -> TensorRT FP16 640 x 640 ~400 AP_medium: 87.2% (WIDERFace)
Face Recognition ArcFace R100 (IR-SE100) PyTorch -> ONNX -> TensorRT FP16 112 x 112 ~800 99.83% (LFW), 98.35% (MegaFace)
Person Tracking ByteTrack Native Python + NumPy N/A N/A 80.3% MOTA (MOT17)
Unknown Clustering HDBSCAN + DBSCAN fallback scikit-learn 512-D vectors N/A 89.5% purity, 0.855 BCubed F
Fall Detection YOLOv8n-pose TensorRT FP16 640 x 640 ~300 Part of suspicious activity
Object Detection YOLOv8s TensorRT FP16 640 x 640 ~450 Abandoned object detection

8.1.1 Human Detection: YOLO11m

Property Value
Architecture CSPDarknet backbone + PANet neck + Decoupled head
Parameters 19.6 M
FLOPs 68.2 B (at 640x640)
TensorRT Optimization FP16, dynamic batch (1-16), layer fusion
GPU Memory ~2.1 GB at batch=8
Person class priority Highest NMS score weighting for person class
Preprocessing Letterbox resize to 640x640, normalize [0,1]

Export pipeline:

# PyTorch -> ONNX -> TensorRT Engine
yolo export model=yolo11m.pt format=onnx imgsz=640 half=True opset=17 simplify=True
trtexec --onnx=yolo11m.onnx --saveEngine=yolo11m.engine --fp16 \
  --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:16x3x640x640

8.1.2 Face Detection: SCRFD-500M-BNKPS

Property Value
Architecture Single-stage detector with FPN, BN+KPS head
Parameters 500 M (large variant for high accuracy)
Detects Face bounding box + 5 facial landmarks
Minimum face size 20 x 20 pixels (configurable)
NMS threshold 0.45 (IoU)
Confidence threshold 0.5 (minimum detection score)
GPU Memory ~1.8 GB at batch=32

8.1.3 Face Recognition: ArcFace R100 (IR-SE100)

Property Value
Backbone IR-SE100 (Improved ResNet-100 with SE blocks)
Training data MS1MV3 (5.8M images, 85K identities)
Loss function ArcFace additive angular margin (m=0.5)
Embedding dimension 512 (float32, L2-normalized)
Distance metric Cosine similarity (1 - cosine_distance)
Matching threshold (strict) 0.60
Matching threshold (balanced) 0.45
Matching threshold (relaxed) 0.30
GPU Memory ~3.2 GB at batch=64

Published benchmarks on standard datasets:

Dataset Accuracy Notes
LFW (Labeled Faces in the Wild) 99.83% Unconstrained face verification
CFP-FP (Frontal-Profile) 99.17% Cross-pose evaluation
AgeDB-30 98.28% Age-invariant recognition
MegaFace (1M distractors) 98.35% Large-scale recognition
IJB-C 96.18% (TAR@FAR=1e-4) Template-based verification

8.2 Inference Pipeline Architecture

+=============================================================================+
|                    REAL-TIME INFERENCE PIPELINE                              |
+=============================================================================+
|                                                                              |
|  INPUT: RTSP Frame (640x640, 1 fps per stream)                              |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Frame Preprocessor| -> | YOLO11m Detector  | -> | Person Detection  |    |
|  | - Resize          |    | (TensorRT FP16)   |    | Results:          |    |
|  | - Normalize       |    | GPU: 12ms (P50)   |    | - bbox (x1,y1,x2, |    |
|  | - NCHW layout     |    | Batch: 1-16       |    |   y2)             |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     | - class (person)  |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Crop Extract | <- | SCRFD-500M        | <- | Face Detection    |    |
|  | (ROI from person  |    | (TensorRT FP16)   |    | Results:          |    |
|  |  bounding box)    |    | GPU: 8ms (P50)    |    | - face bbox       |    |
|  |                   |    | Batch: per-face   |    | - 5 landmarks     |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Alignment    | <- | ArcFace R100      | <- | Embedding Vector  |    |
|  | (5-point affine   |    | (TensorRT FP16)   |    | 512-D float32,   |    |
|  |  transform to     |    | GPU: 5ms (P50)    |    | L2-normalized     |    |
|  |  112x112)         |    | Batch: 1-64       |    |                   |    |
|  +-------------------+    +-------------------+    +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Matching     | <- | Person Tracking   | <- | Track-to-Person   |    |
|  | (cosine similarity|    | (ByteTrack)       |    | Association       |    |
|  |  vs. known DB)    |    | CPU: 2ms/frame    |    | - Match embedding |    |
|  +-------------------+    +-------------------+    |   to known persons  |    |
|       |  |  |                                      | - Create/update     |    |
|       |  |  |                                      |   track             |    |
|       v  v  v                                      +-------------------+    |
|  +-------------------+                                                        |
|  | Confidence Scorer |                                                        |
|  | (aggregate score  |                                                        |
|  |  for all detect)  |                                                        |
|  +-------------------+                                                        |
|       |                                                                       |
|       v                                                                       |
|  OUTPUT: DetectionEvent (JSON)                                               |
|  { person_id, track_id, confidence, bbox, face_crop,                         |
|    embedding, recognized_name?, quality_scores }                             |
|                                                                              |
+=============================================================================+

End-to-end latency budget per frame:

Stage GPU CPU Fallback
Frame preprocessing 2-5 ms 5-10 ms
YOLO11m detection 12 ms (P50) 35-56 ms (ONNX+OpenVINO)
SCRFD face detection 8 ms (P50) 15-25 ms
ArcFace embedding (per face) 5 ms (P50) 12-18 ms
ByteTrack tracking 2 ms 2-5 ms
Post-processing 5-10 ms 10-20 ms
Total (no face) ~29 ms ~67-116 ms
Total (1 face) ~34 ms ~79-134 ms
Total (5 faces) ~54 ms ~127-214 ms

8.3 Face Recognition Matching Strategy

8.3.1 Known Person Matching

+-----------------------------------------------------------------------------+
|                    FACE RECOGNITION MATCHING FLOW                            |
+-----------------------------------------------------------------------------+
|                                                                              |
|  New Face Embedding (512-D)                                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | L2 Normalize      |  embedding = embedding / ||embedding||_2              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | pgvector HNSW     | -> | Top-5 Candidates  |                              |
|  | Similarity Search |    | (cosine distance) |                              |
|  | ef_search=128     |    +-------------------+                              |
|  +-------------------+            |                                          |
|                                   v                                          |
|  +-------------------+    +-------------------+                              |
|  | Threshold Check   | <- | Best Match Score  |                              |
|  | (per AI Vibe)     |    +-------------------+                              |
|  +-------------------+            |                                          |
|       |                          |                                          |
|       +------------+-------------+                                          |
|                    |                                                        |
|         +----------+----------+                                             |
|         |                     |                                             |
|         v                     v                                             |
|    Above threshold      Below threshold                                     |
|    (Recognized)         (Unknown)                                           |
|         |                     |                                             |
|         v                     v                                             |
|  +------------+       +------------------+                                 |
|  | Assign to  |       | Check against    |                                 |
|  | known      |       | recent unknown   |                                 |
|  | person_id  |       | embeddings       |                                 |
|  | (with      |       | (5-min window)   |                                 |
|  | confidence)|       +--------+---------+                                 |
|  +------------+                |                                            |
|                                |                                            |
|                       +--------+--------+                                   |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Similar unknown    No similar unknown                      |
|                  (same person)      (new unknown)                           |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Reuse person_id   Create new                              |
|                  Update centroid   unknown person                           |
|                                    record                                   |
|                                                                              |
+-----------------------------------------------------------------------------+

8.3.2 AI Vibe Threshold Mapping

The AI Vibe system maps three intuitive presets to internal confidence thresholds:

Vibe Face Match Threshold Detection Confidence Use Case
Relaxed 0.30 cosine similarity 0.40 minimum Known persons re-identified more easily; more false positives acceptable
Balanced 0.45 cosine similarity 0.55 minimum Default; good precision-recall tradeoff
Strict 0.60 cosine similarity 0.70 minimum High-security scenarios; minimize false positives

Per-stream Vibe Selection:

  • Vibe can be set per camera via dashboard
  • Night mode automatically applies Strict vibe
  • Alert-triggered cameras automatically upgrade to Strict for 5 minutes

8.4 Unknown Person Clustering Approach

Unknown persons (faces that don't match any known person above threshold) are automatically clustered to help operators identify recurring visitors.

8.4.1 Clustering Pipeline

+-----------------------------------------------------------------------------+
|                    UNKNOWN PERSON CLUSTERING PIPELINE                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Unknown Face Embeddings (streaming)                                         |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | Sliding Window    |  Keep last N embeddings in memory (configurable)     |
|  | Buffer (500)      |  + persistent storage for long-term clustering       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | HDBSCAN Clustering| -> | Primary clusters  |  min_cluster_size=5        |
|  | (density-based)   |    | formed             |  min_samples=2             |
|  | metric=cosine     |    +-------------------+  eps=auto                   |
|  +-------------------+            |                                          |
|       | (fallback)                |                                          |
|       v                           v                                          |
|  +-------------------+    +-------------------+                              |
|  | DBSCAN Fallback   |    | Merge with        |  Check: temporal gap       |
|  | (if HDBSCAN fails |    | existing clusters |  < 30 days, cosine sim     |
|  |  to find structure|    | - centroid        |  > 0.85                    |
|  +-------------------+    |   distance        |                            |
|                           +-------------------+                            |
|                                   |                                          |
|                                   v                                          |
|                           +-------------------+                              |
|                           | Operator Review   |  Dashboard shows clusters   |
|                           | Queue             |  pending identification     |
|                           +-------------------+                              |
|                                                                              |
+-----------------------------------------------------------------------------+

8.4.2 Clustering Parameters

Parameter Value Description
Algorithm HDBSCAN (primary), DBSCAN (fallback) Density-based for irregular cluster shapes
Distance metric Cosine similarity Optimal for face embeddings
Minimum cluster size 5 embeddings Minimum to form a cluster
Minimum samples 2 Core point density threshold
Merge threshold 0.85 cosine similarity Merge clusters if centroids are close
Temporal window 30 days Maximum gap between cluster appearances
Review trigger 10+ embeddings Send to operator review queue

8.4.3 Clustering Quality Targets

Metric Target Measurement
Cluster Purity > 89% % of embeddings in a cluster belonging to the same person
BCubed F-Measure > 0.85 Harmonic mean of precision and recall for clustering
Silhouette Score > 0.3 Separation quality between clusters
False Merge Rate < 5% Different persons incorrectly merged
Split Rate < 15% Same person split into multiple clusters

8.5 Confidence Handling

8.5.1 Confidence Score Computation

Each detection event carries an aggregate confidence score computed from multiple signals:

confidence_aggregate = weighted_average(
    detection_confidence:    0.35 * yolo_confidence,
    face_detection_quality:  0.25 * scrfd_confidence,
    face_recognition_score:  0.25 * (1 - cosine_distance_to_match),
    face_quality_score:      0.15 * quality_composite
)

Where quality_composite = average(
    1.0 - blur_score,       # Sharpness (higher is better)
    1.0 - abs(pose_yaw)/90, # Frontal preference
    illumination_score,      # Well-lit face
    resolution_adequacy      # Sufficient pixels for face
)

8.5.2 Confidence Levels

Level Score Range Color Action
High Confidence 0.80 - 1.00 Green Auto-accept, no review needed
Medium Confidence 0.60 - 0.79 Yellow Accepted, flagged for periodic review
Low Confidence 0.40 - 0.59 Orange Requires operator review within 24h
Very Low Confidence 0.00 - 0.39 Red Rejected, not used for training

8.6 Training Workflow Overview

The safe self-learning system captures operator feedback and converts it into model improvements through a carefully controlled pipeline.

8.6.1 Three Learning Modes

Mode Description Use Case Risk Level
Manual Only Operator explicitly triggers training runs Highly regulated environments Lowest
Suggested Learning (Recommended) System suggests training candidates; operator approves Standard production deployment Low
Approved Auto-Update Auto-training triggers after admin approval threshold Mature deployment with trusted operators Medium

8.6.2 Training Pipeline Architecture

+=============================================================================+
|                    SAFE SELF-LEARNING PIPELINE                               |
+=============================================================================+
|                                                                              |
|  STEP 1: COLLECTION                                                          |
|  +-------------------+                                                       |
|  | Operator Review   |  confirm, correct_name, merge, reject                |
|  | Actions           |  + automatic high-confidence acceptances              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 2: CONFLICT DETECTION (Synchronous, blocks immediately)               |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Label Conflict    | -> | If conflict found | -> | Block from training |   |
|  | Detector          |    | (5 types)         |    | dataset, alert admin |   |
|  | - Same face, diff |    +-------------------+    +-------------------+    |
|  |   names           |                                                       |
|  | - Diff faces, same|                                                       |
|  |   name            |                                                       |
|  | - Merge circular  |                                                       |
|  |   reference       |                                                       |
|  | - Name to already-|                                                       |
|  |   deleted person  |                                                       |
|  | - Quality below   |                                                       |
|  |   threshold       |                                                       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 3: DATASET CURATION                                                    |
|  +-------------------+                                                       |
|  | Training Dataset  |  - Collect approved examples                         |
|  | Builder           |  - Balance classes (min 5 per person)                |
|  |                   |  - Augmentation (flip, rotate, brightness)           |
|  |                   |  - Quality filter (blur, pose, illumination)         |
|  |                   |  - Train/val split (80/20)                            |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 4: MODEL TRAINING                                                      |
|  +-------------------+                                                       |
|  | Training Job      |  - ArcFace R100 backbone                              |
|  | (Airflow DAG)     |  - Fine-tuning on curated dataset                     |
|  |                   |  - Cosine annealing LR schedule                        |
|  |                   |  - Early stopping (patience=10)                       |
|  |                   |  - Mixed precision (AMP)                              |
|  |                   |  - Typical duration: 2-8 hours on V100                |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 5: QUALITY GATES                                                       |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Gate 1: Hold-out  | -> | Gate 2: Compare   | -> | Gate 3: Identity  |    |
|  |    evaluation     |    |    vs current     |    |    accuracy       |    |
|  |    (precision,    |    |    production     |    |    (100% known)   |    |
|  |     recall, f1)   |    |    (no >2% regress)|   |                   |    |
|  +-------------------+    +-------------------+    +-------------------+    |
|       |                          |                          |                |
|       +------------+-------------+--------------------------+                |
|                    |                                                          |
|         +----------+----------+                                              |
|         |                     |                                              |
|         v                     v                                              |
|     ALL PASSED            ANY FAILED                                       |
|         |                     |                                              |
|         v                     v                                              |
|  +------------+       +------------------+                                 |
|  | Proceed to |       | REJECT           |                                 |
|  | Deployment |       | - Log failure    |                                 |
|  +------------+       | - Alert admin    |                                 |
|                       | - Keep in staging|                                 |
|                       +------------------+                                 |
|                                                                              |
|  STEP 6: DEPLOYMENT                                                          |
|  +-------------------+                                                       |
|  | A/B Testing       |  - Shadow mode: 0% traffic (validation)              |
|  | (gradual rollout) |  - Canary: 5% traffic for 24h                        |
|  |                   |  - Monitor: latency, error rate, FP rate              |
|  |                   |  - Full rollout: 100% traffic                         |
|  |                   |  - Rollback: < 60 seconds to previous version         |
|  +-------------------+                                                       |
|                                                                              |
+=============================================================================+

8.7 Model Versioning and Rollback

8.7.1 Semantic Versioning

Version Component Increment When Example
MAJOR (X.0.0) Full retraining, architecture change, breaking embedding change 1.0.0 -> 2.0.0 (new backbone)
MINOR (x.Y.0) Fine-tuning, significant new data (>50 new identities) 1.0.0 -> 1.1.0 (new employees)
PATCH (x.y.Z) Incremental update, centroid update, hotfix 1.0.0 -> 1.0.1 (new photos added)

8.7.2 Version States

State Description Transition
TRAINING Model is being trained Auto -> STAGING on completion
STAGING Awaiting quality gate evaluation Auto -> AWAITING_APPROVAL on pass
AWAITING_APPROVAL Pending admin approval Manual -> CANARY on approve
CANARY 5% traffic, monitoring Auto -> PRODUCTION on success (24h)
PRODUCTION 100% traffic, active serving Manual -> ARCHIVED on new version deploy
ARCHIVED Kept for rollback, no traffic Auto -> ROLLBACK_AVAILABLE after 30 days
ROLLBACK_AVAILABLE Can be rolled back to Manual -> PRODUCTION on rollback trigger
DEPRECATED Cannot be rolled back to Final state

8.7.3 Rollback Procedure

+-----------------------------------------------------------------------------+
|                    EMERGENCY ROLLBACK PROCEDURE                              |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Trigger: Admin initiates rollback or automatic rollback on failure         |
|                                                                              |
|  Step 1: Validate target version exists and is in ROLLBACK_AVAILABLE state  |
|  Step 2: Load target model artifacts from S3/MinIO (pre-warm GPU)          |
|  Step 3: Atomic switch: update model reference in Triton config             |
|  Step 4: Triton SIGHUP reload (zero-downtime model swap)                   |
|  Step 5: Validate: send test inference requests, check latency              |
|  Step 6: If validation fails -> auto-revert to previous production          |
|  Step 7: If validation passes -> update database model version records      |
|  Step 8: Log rollback event in audit_logs                                   |
|                                                                              |
|  Maximum rollback time: < 60 seconds                                        |
|  Zero inference downtime during rollback                                    |
|                                                                              |
+-----------------------------------------------------------------------------+

8.8 Quality Gates

8.8.1 Gate Thresholds

Gate Metric Minimum Maximum Critical
Hold-out Evaluation Precision 0.97 Yes (cannot override)
Hold-out Evaluation Recall 0.95 Yes
Hold-out Evaluation F1 Score 0.96 Yes
No Regression Metric regression vs production 2% No (admin can override)
Identity Accuracy Known identity recall 100% Yes
Latency P99 inference latency 150 ms Yes
Confusion Analysis False positive rate 5% No

8.8.2 Quality Gate Report Example

{
  "gate_run_id": "550e8400-e29b-41d4-a716-446655440000",
  "candidate_model_version": "1.2.0",
  "baseline_model_version": "1.1.0",
  "timestamp": "2024-01-25T10:30:00Z",
  "overall_result": "PASSED",
  "gates": [
    {
      "name": "holdout_performance",
      "status": "PASSED",
      "critical": true,
      "metrics": {
        "precision": 0.9842,
        "recall": 0.9678,
        "f1_score": 0.9759
      }
    },
    {
      "name": "no_regression",
      "status": "PASSED",
      "metrics": {
        "max_regression_pct": 0.8,
        "per_metric": {
          "precision": 0.003,
          "recall": -0.008,
          "f1_score": -0.002
        }
      }
    },
    {
      "name": "known_identity_accuracy",
      "status": "PASSED",
      "metrics": {
        "known_identities_tested": 142,
        "perfect_accuracy": 142,
        "accuracy_below_threshold": 0
      }
    },
    {
      "name": "latency_requirement",
      "status": "PASSED",
      "metrics": {
        "p50_latency_ms": 45,
        "p99_latency_ms": 128,
        "threshold_ms": 150
      }
    }
  ]
}

8.8.3 Embedding Update Strategies

After a model passes quality gates and is deployed, the face embedding database must be updated. Five strategies are available:

Strategy When to Use Duration Impact
Centroid Update Few new examples (<10 per identity), same model Seconds Update running mean only
Incremental Add Many new examples (10-100 per identity), same model Minutes Add new embeddings, keep existing
Full Reindex Model version changed, or >10% of identities updated Hours Recompute all embeddings
Merge and Update Identity merge operation Seconds Weighted centroid merge
Rollback Reindex Model rollback Minutes Restore previous embeddings

Decision Matrix:

+-----------------------------------------------------------------------------+
|                    EMBEDDING UPDATE STRATEGY SELECTION                       |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Model changed?                                                              |
|       |                                                                      |
|       +-- YES -> FULL_REINDEX (required, embeddings are model-dependent)     |
|       |                                                                      |
|       NO -> What changed?                                                    |
|               |                                                              |
|               +-- Identity merge -> MERGE_AND_UPDATE                         |
|               |                                                              |
|               +-- Rollback -> ROLLBACK_REINDEX                               |
|               |                                                              |
|               +-- New examples?                                              |
|                       |                                                      |
|                       +-- < 10 per identity, < 10% total -> CENTROID_UPDATE |
|                       |                                                      |
|                       +-- Otherwise -> INCREMENTAL_ADD                       |
|                                                                              |
+-----------------------------------------------------------------------------+

Reference: For complete model export commands, INT8 calibration scripts, performance benchmarks, and the full Python module structure, see ai_vision.md — Sections 10-14. For the complete training pipeline code, Airflow DAG definitions, and quality gate implementations, see training_system.md — Sections 5-10.


Section 9: Suspicious Activity Night-Mode Design

9.1 Overview

The suspicious activity detection system provides comprehensive behavioral analysis during night hours (22:00-06:00 by default) through 10 specialized detection modules. Each module operates on the output of the AI inference pipeline (detected persons, tracked positions, and face identities) to identify anomalous behavior patterns.

The system features a composite scoring engine that combines signals from all modules with exponential time-decay, enabling unified threat assessment and intelligent escalation. Each camera can be independently configured with custom zones, thresholds, and schedules.

9.2 Ten Detection Modules Summary

# Module Description Severity Key CV Model
1 Intrusion Detection Detects persons entering restricted polygon zones HIGH (default) YOLO11m detections + zone polygon
2 Loitering Detection Flags persons dwelling in an area longer than threshold MEDIUM (default) ByteTrack + timer per track
3 Running Detection Identifies abnormally fast movement MEDIUM (default) YOLOv8n-pose + optical flow speed
4 Crowding Detection Alerts when group density exceeds threshold HIGH (default) DBSCAN spatial clustering
5 Fall Detection Detects persons falling or collapsing CRITICAL YOLOv8n-pose keypoint analysis
6 Abandoned Object Identifies unattended objects left behind HIGH (default) YOLOv8s + MOG2 background subtraction
7 After-Hours Presence Detects any person presence during night hours MEDIUM (default) YOLO11m person class only
8 Zone Breach Triggers on crossing virtual boundary lines MEDIUM (default) ByteTrack + line crossing algorithm
9 Repeated Re-entry Flags patterns of entering/exiting an area multiple times MEDIUM (default) ByteTrack + entry/exit state machine
10 Suspicious Dwell Time Alerts on extended presence near sensitive areas MEDIUM (configurable) ByteTrack + per-zone timers

9.3 Module Details

9.3.1 Module 1: Intrusion Detection

Detects when a person enters a user-defined restricted polygon zone.

Parameter Default Range Description
confidence_threshold 0.55 0.3-0.9 Minimum person detection confidence
overlap_threshold 0.30 0.1-0.9 Min IoU between person bbox and zone
cooldown_seconds 60 0-3600 Cooldown before re-alerting same zone
zone_severity HIGH LOW/MEDIUM/HIGH Per-zone configurable

Algorithm:

For each detected person:
    For each restricted zone polygon:
        Compute IoU(person_bbox, zone_polygon)
        If IoU > overlap_threshold AND confidence > confidence_threshold:
            If zone not in cooldown:
                Trigger INTRUSION alert
                Start cooldown timer

9.3.2 Module 2: Loitering Detection

Flags persons who remain in an area longer than a threshold.

Parameter Default Range Description
dwell_time_threshold_seconds 300 30-1800 Time before triggering loitering alert
movement_tolerance_pixels 50 10-200 Max centroid movement to still count as "stationary"
cooldown_seconds 300 0-3600 Cooldown after alert

Algorithm:

For each active track:
    If track centroid moved < tolerance in last N seconds:
        Increment dwell timer
        If dwell_timer > threshold:
            Trigger LOITERING alert
            Reset timer (or hold until movement detected)
    Else:
        Reset dwell timer

9.3.3 Module 3: Running Detection

Identifies abnormally fast movement using pose keypoints and optical flow.

Parameter Default Range Description
speed_threshold_pixels_per_second 150 50-500 Pixel speed threshold
speed_threshold_kmh 15.0 5-40 Real-world speed (requires calibration)
confirmation_frames 3 1-10 Consecutive frames to confirm running

Algorithm:

For each active track:
    Compute torso keypoint displacement between frames
    Convert pixel speed to km/h (if calibration available)
    Apply Farneback optical flow for refinement
    If speed > threshold for confirmation_frames:
        Trigger RUNNING alert

9.3.4 Module 4: Crowding Detection

Alerts when person group density exceeds threshold.

Parameter Default Range Description
count_threshold 5 2-50 Minimum person count in cluster
area_threshold 0.15 0.05-0.5 Fraction of frame covered by group
density_threshold 0.05 0.01-0.2 Persons per square meter (calibrated)
dbscan_eps 0.08 0.01-0.3 DBSCAN neighborhood radius (normalized)

Algorithm:

Collect all person centroids in current frame
Run DBSCAN(eps=0.08, min_samples=2) on centroids
For each cluster:
    If cluster_size >= count_threshold OR cluster_area >= area_threshold:
        Trigger CROWDING alert

9.3.5 Module 5: Fall Detection

Detects persons falling or collapsing using pose keypoint analysis.

Parameter Default Range Description
fall_score_threshold 0.75 0.5-0.95 Combined fall confidence score
min_keypoint_confidence 0.30 0.1-0.5 Minimum keypoint detection confidence
torso_angle_threshold_deg 45 30-75 Torso angle from vertical to trigger
aspect_ratio_threshold 1.2 0.8-2.0 Width/height ratio of person bbox
temporal_confirmation_ms 1000 500-3000 Duration to confirm fall (not just bend)

Algorithm:

For each detected person with pose keypoints:
    Compute torso angle from vertical (using shoulder-hip line)
    Compute bbox aspect ratio
    Check if person is on ground (feet keypoint confidence drops)
    Calculate fall_score = weighted_combination(angle, aspect_ratio, ground_contact)
    If fall_score > threshold AND duration > confirmation_ms:
        Trigger FALL alert (CRITICAL severity)

9.3.6 Module 6: Abandoned Object Detection

Identifies unattended objects using background subtraction and object detection.

Parameter Default Range Description
unattended_time_threshold_seconds 60 10-600 Time before object is considered abandoned
proximity_threshold_pixels 100 20-300 Max distance from owner before "unattended"
watchlist_classes ["backpack", "suitcase", "box", "bag"] Object classes to monitor
bg_learning_rate 0.005 0.001-0.01 MOG2 background model learning rate

Algorithm:

Run YOLOv8s to detect objects in watchlist_classes
Run MOG2 background subtraction to identify static foreground
For each detected object:
    Track owner proximity (nearest person)
    If owner distance > threshold AND object stationary > time_threshold:
        Trigger ABANDONED_OBJECT alert

9.3.7 Module 7: After-Hours Presence

Simple but effective: any person detected during night hours triggers an alert.

Parameter Default Range Description
detection_confidence_threshold 0.50 0.3-0.9 Minimum person detection confidence
min_detection_frames 5 1-30 Frames to confirm (avoid false positives)
check_authorized_personnel false true/false If true, check against known persons whitelist

9.3.8 Module 8: Zone Breach

Detects crossing of virtual boundary lines (directional or bidirectional).

Parameter Default Range Description
boundary_lines [] (user-defined) Array of {start, end, direction, severity}
allowed_direction "both" both/a_to_b/b_to_a Which direction is allowed
crossing_threshold_pixels 20 5-100 Min distance past line to trigger
cooldown_seconds 30 0-3600 Cooldown per (track, line) pair

Algorithm:

For each active track:
    For each boundary line:
        Check if track centroid crosses line in forbidden direction
        Using line equation: ax + by + c = 0, check sign change
        If crossed AND distance_past_line > threshold:
            Trigger ZONE_BREACH alert

9.3.9 Module 9: Repeated Re-entry Patterns

Detects suspicious patterns of entering and exiting an area multiple times.

Parameter Default Range Description
reentry_zone Full frame polygon Area to monitor for entries/exits
time_window_seconds 600 60-3600 Time window for counting cycles
reentry_threshold 3 2-10 Min entry/exit cycles to trigger
min_cycle_duration_seconds 30 5-300 Min duration of one cycle

State Machine:

For each track:
    Track state: OUTSIDE -> ENTERING -> INSIDE -> EXITING -> OUTSIDE
    Each complete cycle (entry + exit) increments counter
    If cycle_count >= threshold within time_window:
        Trigger REENTRY_PATTERN alert

9.3.10 Module 10: Suspicious Dwell Time

Extended presence near sensitive areas (different from general loitering).

Parameter Default Range Description
sensitive_zones [] (user-defined) Zones with custom dwell thresholds
default_dwell_threshold_seconds 120 10-1800 Default threshold
max_gap_seconds 5.0 1.0-30.0 Max disappearance gap before timer reset

Predefined zone types with default thresholds:

Zone Type Default Threshold Default Severity
main_entrance 60s MEDIUM
emergency_exit 30s HIGH
equipment_room 45s HIGH
storage_area 120s MEDIUM
elevator_bank 90s LOW
parking_access 60s MEDIUM

9.4 Activity Scoring Engine

9.4.1 Composite Score Formula

All 10 modules feed into a unified scoring engine that produces a single suspicious activity score per camera:

S_total(t) = SUM_i( weight_i * signal_i(t) * decay(t - t_i) ) + bonus_cross_module

Where:
    weight_i: module-specific weight (see table below)
    signal_i(t): normalized signal value from module i [0, 1]
    decay(delta_t): exponential time-decay function
    bonus_cross_module: extra score when multiple modules fire simultaneously
    t_i: timestamp of most recent event from module i

9.4.2 Module Weights

Module Weight Signal Source Signal Range
Intrusion Detection 0.25 overlap_ratio * confidence 0.0 - 1.0
Loitering Detection 0.15 dwell_ratio (dwell_time / threshold) 0.0 - 1.0+
Running Detection 0.10 speed_ratio normalized 0.0 - 1.0+
Crowding Detection 0.12 crowd_density_score 0.0 - 1.0
Fall Detection 0.20 fall_confidence_score 0.0 - 1.0
Abandoned Object 0.18 unattended_ratio (duration / threshold) 0.0 - 1.0+
After-Hours Presence 0.05 binary (1 if detected) * zone_severity_multiplier 0.0 - 1.0
Zone Breach 0.12 severity_mapped (LOW=0.3, MED=0.6, HIGH=1.0) 0.0 - 1.0
Re-entry Patterns 0.10 cycle_ratio (count / threshold) 0.0 - 1.0+
Suspicious Dwell 0.13 dwell_ratio (duration / zone_threshold) 0.0 - 1.0+

Note: Weights sum to 1.40 — this is intentional to allow cross-module amplification when multiple modules fire simultaneously.

9.4.3 Time-Decay Function

def time_decay(delta_t_seconds, half_life=300):
    """Exponential decay with 5-minute half-life by default."""
    import math
    return math.exp(-0.693 * delta_t_seconds / half_life)

# Decay reference:
#   0 min -> 1.000 (full contribution)
#   1 min -> 0.871
#   5 min -> 0.500
#  10 min -> 0.250
#  20 min -> 0.063
#  30 min -> 0.016 (effectively zero)

9.4.4 Cross-Module Amplification Bonus

When multiple modules detect simultaneously for the same track or in close proximity:

def compute_cross_module_bonus(active_signals, proximity_weight=0.15):
    n_modules = len(active_signals)
    if n_modules <= 1:
        return 0.0

    # Base bonus: +15% per additional module
    base_bonus = proximity_weight * (n_modules - 1)

    # Track overlap: same person triggering multiple rules -> higher threat
    track_bonus = 0.10 * (n_same_track_signals - 1) if n_same_track_signals >= 2 else 0

    # Zone overlap: multiple signals in same zone -> higher threat
    zone_bonus = 0.08 * (n_same_zone_signals - 1) if n_same_zone_signals >= 2 else 0

    return min(base_bonus + track_bonus + zone_bonus, 0.50)  # Cap at +0.50

9.4.5 Escalation Thresholds

Score Range Threat Level Color Actions
0.00 - 0.20 NONE Gray Log only, no alert
0.20 - 0.40 LOW Blue Log + dashboard indicator
0.40 - 0.60 MEDIUM Yellow Log + non-urgent alert dispatch
0.60 - 0.80 HIGH Orange Log + immediate alert + highlight
0.80 - 1.00 CRITICAL Red Log + all channels + security dispatch recommendation
> 1.00 EMERGENCY Purple/Flashing All channels + automatic escalation to security lead

9.5 Night Mode Scheduler

9.5.1 Automatic Schedule

Parameter Default Configurable
Start time 22:00 (10 PM) Yes, per camera
End time 06:00 (6 AM) Yes, per camera
Gradual transition 15 minutes Yes (0-60 min)
Timezone Local site timezone Yes
Override Manual toggle available Admin only

9.5.2 Gradual Transition

During the 15-minute transition window, sensitivity ramps linearly:

Transition Start (21:45)          Night Full (22:00)         Transition End (22:15)
      |                                  |                           |
      v                                  v                           v
Sensitivity: 0% ---- 25% ---- 50% ---- 75% ---- 100% ---- 100% ---- 100%
              |__________|__________|__________|__________|__________|
                  Ramp up to full night sensitivity over 15 minutes

This prevents sudden spikes in alerts when night mode activates.

9.5.3 Night Mode Behavior Changes

Aspect Day Mode Night Mode
Detection modules Intrusion, Crowding, Fall, Abandoned Object All 10 modules active
AI Vibe preset Per-camera setting Automatically Strict
Confidence threshold Per-camera setting +0.10 (stricter)
Scoring engine weights Standard weights +25% intrusion, +20% fall
Alert suppression 5-minute cooldown 2-minute cooldown (faster alerts)
After-hours detection Disabled Enabled (primary night function)

9.6 Per-Camera Configuration

Each camera has independent configuration for all detection modules:

# Example: Camera 1 - Main Entrance
cam_01:
  enabled: true
  location: "Main Entrance Lobby"
  night_mode:
    enabled: true
    custom_schedule: null        # Use system default (22:00-06:00)
    sensitivity_multiplier: 1.0   # Standard sensitivity

  intrusion_detection:
    enabled: true
    confidence_threshold: 0.65
    overlap_threshold: 0.30
    cooldown_seconds: 30
    restricted_zones:
      - zone_id: "server_room_door"
        polygon: [[0.65,0.20], [0.85,0.20], [0.85,0.60], [0.65,0.60]]
        severity: "HIGH"

  loitering_detection:
    enabled: true
    dwell_time_threshold_seconds: 300
    movement_tolerance_pixels: 50

  running_detection:
    enabled: true
    speed_threshold_pixels_per_second: 150
    confirmation_frames: 3

  fall_detection:
    enabled: true
    fall_score_threshold: 0.75
    temporal_confirmation_ms: 1000

  # ... (all 10 modules configured)

9.7 Alert Generation Logic

9.7.1 Alert Lifecycle

+------------+    +------------+    +------------+    +------------+
|  DETECTED  | -> | SUPPRESSED | -> |  EVIDENCE  | -> | DISPATCHED |
| (Rule fire)|    | (Dedup)    |    | (Capture)  |    | (Send)     |
+------------+    +------------+    +------------+    +------------+
                                                          |
                                                          v
                                                   +------------+
                                                   | ACKNOWLEDGE|
                                                   | or AUTO    |
                                                   +------------+

9.7.2 Suppression Rules

Condition Action Reason
Duplicate within suppression window Log + increment counter Prevent alert spam
Detection confidence < rule minimum Log only Insufficient evidence
Threat score < LOW threshold Log only Below alert threshold
Max alerts/hour for camera exceeded Log + rate-limit flag Prevent overflow
Composite score indicates low overall threat Log + dashboard only Reduce noise

9.7.3 Suppression Configuration

Parameter Default Range
Default suppression window 5 minutes 0-60 minutes
Max alerts per hour per camera 20 5-100
Max alerts per hour per rule 10 5-50
Evidence snapshot frames before 5 frames 1-30
Evidence snapshot frames after 10 frames 1-30
Evidence clip duration 10 seconds 5-60

9.7.4 Severity Assignment

Final alert severity considers both the triggering module and the composite score context:

def assign_alert_severity(detection_event, composite_score):
    base_severity = detection_event['severity']  # From module config
    severity_levels = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3, 'CRITICAL': 4}
    base_level = severity_levels.get(base_severity, 2)

    # Escalation: high composite score bumps severity up one level
    if composite_score >= 0.80 and base_level < 3:
        base_level = min(base_level + 1, 4)

    # Escalation: multiple concurrent detections for same track
    if detection_event.get('concurrent_detections_count', 0) >= 2:
        base_level = min(base_level + 1, 4)

    # Zone-specific escalation override
    if detection_event.get('zone_severity_override'):
        zone_level = severity_levels.get(detection_event['zone_severity_override'], base_level)
        base_level = max(base_level, zone_level)

    reverse_levels = {v: k for k, v in severity_levels.items()}
    return reverse_levels.get(base_level, 'MEDIUM')

9.8 Integration with Main AI Pipeline

The suspicious activity service consumes detection events from the main AI pipeline:

+-----------------------------------------------------------------------------+
|               SUSPICIOUS ACTIVITY INTEGRATION WITH MAIN PIPELINE             |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Main AI Pipeline Output:                                                    |
|  { person_id, track_id, bbox, keypoints, face_embedding, timestamp,        |
|    camera_id, confidence, face_crop_path }                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Kafka Topic       | -> | Suspicious Activity| -> | Scoring Engine    |    |
|  | ai.detections     |    | Service            |    | (per camera)      |    |
|  | (JSON events)     |    | - 10 modules       |    | - Composite score |    |
|  +-------------------+    | - Per-camera config|    | - Time decay      |    |
|                           | - Zone polygons    |    | - Cross-module    |    |
|                           +-------------------+    |   bonus           |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|                           +-------------------+    +-------------------+    |
|                           | Alert Manager     | <- | Scoring Output    |    |
|                           | - Deduplicate     |    | - Score [0, 1.5]  |    |
|                           | - Rate limit      |    | - Threat level    |    |
|                           | - Severity assign |    | - Active signals  |    |
|                           +---------+---------+    +-------------------+    |
|                                     |                                        |
|                                     v                                        |
|                           +-------------------+                             |
|                           | Alerts Table (DB) |                             |
|                           | Notification Svc  |                             |
|                           +-------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

Key integration points:

  • Suspicious Activity Service is a Kafka consumer on the ai.detections topic
  • Processes events after face recognition (has access to person identity)
  • Produces alert records to the alerts.critical topic for notification dispatch
  • Updates the composite score in Redis (with TTL = 2 * half_life) for dashboard real-time display
  • Stores all alert records in PostgreSQL for history and analytics

Reference: For complete detection algorithm pseudocode, zone configuration YAML schema, scoring engine implementation, and evidence capture logic, see suspicious_activity.md — Sections 2-6.


Section 10: Live Video Streaming Design

10.1 RTSP Stream Configuration for CP PLUS DVR

10.1.1 URL Format

The CP PLUS ORANGE DVR uses a Dahua-compatible RTSP URL scheme:

rtsp://admin:{password}@{dvr_ip}:554/cam/realmonitor?channel={N}&subtype={M}

Where:
    N = channel number (1-8)
    M = stream type (0 = main stream, 1 = sub stream)

Example URLs for all 8 channels:

Channel Main Stream Sub Stream
CH1 rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0 rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=1
CH2 ...channel=2&subtype=0 ...channel=2&subtype=1
CH3 ...channel=3&subtype=0 ...channel=3&subtype=1
CH4 ...channel=4&subtype=0 ...channel=4&subtype=1
CH5 ...channel=5&subtype=0 ...channel=5&subtype=1
CH6 ...channel=6&subtype=0 ...channel=6&subtype=1
CH7 ...channel=7&subtype=0 ...channel=7&subtype=1
CH8 ...channel=8&subtype=0 ...channel=8&subtype=1

10.1.2 Stream Properties

Property Main Stream (subtype=0) Sub Stream (subtype=1)
Resolution 960 x 1080 352 x 288 to 704 x 576
Frame rate 25 FPS (PAL) 25 FPS
Video codec H.264 High Profile H.264 Baseline/Main
Bitrate ~4 Mbps per channel ~1 Mbps per channel
Audio G.711/AAC (optional) None
Use case Fullscreen viewing, evidence clips AI inference, multi-camera grid

10.1.3 Stream Discovery

The edge gateway can auto-discover streams via ONVIF:

from onvif import ONVIFCamera

camera = ONVIFCamera('192.168.29.200', 80, 'admin', 'password')
media_service = camera.create_media_service()
profiles = media_service.GetProfiles()

for profile in profiles:
    stream_uri = media_service.GetStreamUri({
        'StreamSetup': {'Stream': 'RTP_unicast', 'Transport': 'RTSP'},
        'ProfileToken': profile.token
    })
    print(f"Channel: {profile.token}, URI: {stream_uri.Uri}")

10.2 Edge Gateway Stream Handling

10.2.1 FFmpeg Ingestion Pipeline

The edge gateway runs one FFmpeg process per camera stream:

# Main stream: HLS generation for live viewing
ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 -buffer_size 655360 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c:v copy -c:a copy \
    -f hls -hls_time 2 -hls_list_size 5 -hls_delete_threshold 2 \
    -hls_flags delete_segments+omit_endlist+program_date_time \
    -hls_segment_filename "/data/hls/ch1_%04d.ts" \
    "/data/hls/ch1.m3u8" \
    2>> /var/log/ffmpeg_ch1.log

10.2.2 Stream Health Monitoring

Check Frequency Failure Action
FFmpeg process alive Every 5s Restart process
RTSP connection health Every 10s Reconnect with backoff
Frame rate validation Every 30s Alert if FPS < 20
Bitrate validation Every 30s Alert if bitrate < 50% expected
Disk space check Every 60s Alert if < 10% free, emergency if < 5%

10.2.3 Auto-Reconnect Logic

class StreamReconnectManager:
    """Handles RTSP stream reconnection with exponential backoff."""

    INITIAL_BACKOFF = 1.0       # seconds
    MAX_BACKOFF = 60.0          # seconds
    BACKOFF_MULTIPLIER = 2.0
    JITTER = 0.1                # 10% random jitter

    def __init__(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def on_disconnect(self):
        self.consecutive_failures += 1
        wait_time = min(
            self.current_backoff * (self.BACKOFF_MULTIPLIER ** self.consecutive_failures),
            self.MAX_BACKOFF
        )
        # Add jitter to prevent thundering herd
        wait_time *= (1 + random.uniform(-self.JITTER, self.JITTER))
        return wait_time

    def on_success(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def should_circuit_break(self):
        return self.consecutive_failures >= 5  # Open circuit after 5 failures

10.3 HLS Generation for Dashboard

10.3.1 HLS Segment Configuration

Parameter Value Rationale
Segment duration (-hls_time) 2 seconds Balance between latency and segment count
Playlist size (-hls_list_size) 5 segments 10-second sliding window for live playback
Delete threshold 2 segments beyond playlist size Disk cleanup
Flags delete_segments+omit_endlist+program_date_time Live mode, no end list, accurate timing
Segment naming ch{N}_%04d.ts Sequential numbering for cache busting
Segment path /data/hls/ Fast NVMe storage

10.3.2 Multi-Bitrate HLS (Optional)

For adaptive bitrate streaming, three variants are generated per channel:

# High quality (main stream, copy codec)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v copy -f hls -hls_time 2 \
    -hls_playlist_type vod -hls_segment_filename "ch1_high_%04d.ts" "ch1_high.m3u8"

# Medium quality (transcoded)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v libx264 -preset fast -crf 23 \
    -vf "scale=640:480" -f hls -hls_time 2 \
    -hls_segment_filename "ch1_mid_%04d.ts" "ch1_mid.m3u8"

# Low quality (sub stream)
ffmpeg -i "rtsp://...channel=1&subtype=1" -c:v copy -f hls -hls_time 2 \
    -hls_segment_filename "ch1_low_%04d.ts" "ch1_low.m3u8"

Master playlist:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=960x1080
ch1_high.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=640x480
ch1_mid.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=352x288
ch1_low.m3u8

10.3.3 HLS Latency Budget

Stage Latency
DVR encoding 50-100 ms
RTSP to edge 1-2 ms
FFmpeg demux/remux 20-50 ms
HLS segment duration 2000 ms (2-second segments)
Nginx/CDN delivery 10-50 ms
HLS.js buffer 2000-4000 ms (1-2 segments)
Browser decode + render 20-50 ms
Total (camera to eye) ~2.1 - 2.3 seconds

10.4 WebRTC for Low-Latency Single Camera

For single-camera fullscreen viewing where low latency is critical, WebRTC provides sub-second delivery.

10.4.1 WebRTC Architecture

+------------+    +-------------------+    +-------------------+    +--------+
| Browser    |    | Edge Gateway      |    | FFmpeg            |    | DVR    |
| (WebRTC    |<-->| (WHIP/WHEP        |<-->| (decode RTSP,     |<-->| RTSP   |
|  client)   |    |  bridge)          |    |  encode VP8/H.264)|    | Server |
+------------+    +-------------------+    +-------------------+    +--------+

10.4.2 WebRTC Configuration

Parameter Value
Signaling protocol WHIP ( ingress) / WHEP (egress)
Video codec H.264 (hardware) or VP8 (software)
Latency target < 500 ms end-to-end
ICE servers STUN only (both peers behind NAT)
Max bitrate 3 Mbps
Resolution 960x1080 (main stream)

10.4.3 WebRTC Latency Budget

Stage Latency
DVR encoding 50-100 ms
RTSP to edge 1-2 ms
FFmpeg decode + WebRTC encode 30-80 ms
Network (edge to browser via VPN) 100-200 ms
Browser decode 20-50 ms
Total ~200-430 ms

10.5 Multi-Camera Grid Layout

10.5.1 Layout Configurations

Layout Cameras Stream Used Per-Camera Resolution Total Bandwidth
1x1 (fullscreen) 1 Main (subtype=0) 960x1080 ~4 Mbps
2x2 grid 4 Sub (subtype=1) 352x288 ~4 Mbps total
3x3 grid 8+1 empty Sub (subtype=1) 352x288 ~8 Mbps total
4x2 grid 8 Sub (subtype=1) 352x288 ~8 Mbps total
Custom User-defined Mixed Mixed Sum of selected

Smart stream selection: The dashboard automatically switches streams based on layout:

  • Fullscreen single camera -> Main stream (high quality)
  • Grid layout -> Sub stream (bandwidth-efficient)
  • Camera clicked for fullscreen -> Dynamically switch to main stream

10.5.2 Grid Rendering

+-----------------------------------------------------------------------------+
|                         DASHBOARD GRID LAYOUTS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|  1x1 Layout:                         2x2 Layout:                            |
|  +------------------------+          +----------+----------+                 |
|  |                        |          | CH1      | CH2      |                 |
|  |   Camera 1             |          | (sub)    | (sub)    |                 |
|  |   Main stream          |          |          |          |                 |
|  |   960x1080             |          +----------+----------+                 |
|  |   ~4 Mbps              |          | CH3      | CH4      |                 |
|  +------------------------+          | (sub)    | (sub)    |                 |
|                                      |          |          |                 |
|                                      +----------+----------+                 |
|                                                                              |
|  3x3 Layout (8 cameras):                                                     |
|  +----------+----------+----------+                                          |
|  | CH1      | CH2      | CH3      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH4      | CH5      | CH6      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH7      | CH8      | [Empty]  |                                          |
|  | (sub)    | (sub)    |          |                                          |
|  +----------+----------+----------+                                          |
|                                                                              |
|  Bandwidth: ~8 Mbps total for 3x3 layout (8 x ~1 Mbps sub streams)          |
|                                                                              |
+-----------------------------------------------------------------------------+

10.6 Bandwidth Optimization

10.6.1 Total Bandwidth Budget

Traffic Type Direction Bandwidth Notes
8x RTSP ingestion Edge -> DVR (local) ~32 Mbps receive Local LAN only
8x HLS upload to cloud Edge -> Cloud (via VPN) ~8-16 Mbps upload Transcoded and compressed
AI frames to cloud Edge -> Cloud (via VPN) ~2-4 Mbps upload 1 FPS, JPEG compressed
Dashboard HLS playback Cloud -> Browser ~8 Mbps per user Cached at CDN
Control/management Bidirectional < 1 Mbps WebSocket, API calls
Total edge upload ~10-20 Mbps Primary concern for site bandwidth

10.6.2 Optimization Techniques

Technique Savings Implementation
Sub-stream for grid view 75% bandwidth reduction Use subtype=1 (352x288) instead of subtype=0 (960x1080)
H.264 copy (no re-encode) for main stream Zero CPU overhead -c:v copy when no format change needed
JPEG quality tuning for AI frames 50-70% size reduction Quality 70-85 depending on scene complexity
Frame deduplication for AI 10-30% frame reduction Skip frames with < 2% pixel change
HLS segment caching at edge Reduces cloud upload spikes 5-segment buffer smooths burstiness
Gzip compression for API/WebSocket 60-80% reduction Content-Encoding: gzip

10.7 Fallback Handling

10.7.1 Stream Failure Fallback Chain

Step 1: RTSP connection fails
    +-> Retry with exponential backoff (3 attempts)
    +-> Try UDP transport if TCP fails
    +-> Circuit breaker opens after 5 consecutive failures
    |
Step 2: Stream stall detected (no frames for 10s)
    +-> Kill FFmpeg process
    +-> Restart with fresh connection
    |
Step 3: Camera marked OFFLINE
    +-> Dashboard shows "Camera Offline" placeholder
    +-> HLS playlist returns 404
    +-> Last known frame displayed with timestamp overlay
    +-> Alert sent to operations team
    |
Step 4: Camera recovers
    +-> Circuit breaker transitions to HALF_OPEN
    +-> Test stream pulled for 10 seconds
    +-> On success: circuit CLOSED, stream resumes
    +-> Dashboard auto-refreshes

10.7.2 Offline Placeholder

When a camera is offline, the HLS endpoint returns a static playlist:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ERROR: "Camera OFFLINE - Channel 1"
#EXTINF:2.000,
offline_placeholder.ts

The dashboard detects the #EXT-X-ERROR tag and displays a camera offline indicator with the last known timestamp.

10.7.3 Edge Buffer Management

The 2TB NVMe edge storage is partitioned for circular buffer operation:

Directory Max Size Retention Cleanup
/data/hls/ 20 GB Rolling (5 segments) Automatic via FFmpeg
/data/buffer/ch1-ch8/ 1.5 TB 7 days circular Age-based FIFO
/data/buffer/ai_frames/ 100 GB 24 hours Age-based
/data/buffer/evidence/ 200 GB 30 days Event-linked retention
/data/logs/ 10 GB 30 days Logrotate
/data/tmp/ 50 GB On process exit Cleanup on restart
Total reserved ~1.88 TB Fits in 2TB NVMe

Buffer exhaustion handling:

  1. At 80% capacity: Alert admin, begin aggressive cleanup of old non-evidence data
  2. At 90% capacity: Stop non-critical buffering (AI frames), preserve HLS + evidence only
  3. At 95% capacity: Emergency mode — evidence-only recording, all other buffers purged
  4. Never delete evidence clips linked to unresolved alerts

10.7.4 DVR Full Disk Mitigation

Since the DVR disk is full (0 bytes free), the system does not rely on DVR-side recording:

Function Traditional Our Design
Continuous recording DVR internal HDD Edge gateway 2TB NVMe buffer
Event/alert clips DVR playback export Cloud MinIO + S3 archival
Long-term storage DVR disk rotation AWS S3 tiered lifecycle
Playback DVR web UI Cloud dashboard with timeline

Reference: For complete FFmpeg commands including multi-output tee muxer, frame extraction for AI, WebRTC bridge code, and the ring buffer implementation, see video_ingestion.md — Sections 4-7.


End of Part A (Sections 1-10)

This unified technical blueprint synthesizes outputs from 11 specialist agents across 6 domain-specific design documents. For detailed implementation code, DDL, algorithms, and configuration, refer to the individual specialist documents listed in the cross-reference guide at the top of this document.

Document Path Content
Architecture architecture.md Full deployment specs, scaling, cost, failover
Video Ingestion video_ingestion.md RTSP config, FFmpeg, edge gateway, HLS, WebRTC
AI Vision ai_vision.md Model configs, inference pipeline, benchmarks
Database Schema database_schema.md Complete DDL, triggers, views, RLS
Suspicious Activity suspicious_activity.md 10 detection modules, scoring engine
Training System training_system.md Learning pipeline, quality gates, versioning