AI-Powered Industrial Surveillance Platform

Unified Technical Blueprint — Part A: Sections 1-10

Document Property	Value
Version	1.0.0
Classification	Technical Blueprint — Production Design
Target DVR	CP PLUS ORANGE CP-UVR-0801E1-CV2
Channels	8 active (scalable to 64+)
Resolution	960 x 1080 per channel
DVR Network	192.168.29.200/24, RTSP port 554
Date	2025

Cross-Reference Guide: This unified blueprint synthesizes six specialist design documents. For detailed specifications on any subsystem, refer to:

architecture.md — Full architecture, scaling, failover, cost estimation

video_ingestion.md — RTSP configuration, FFmpeg commands, edge gateway specs

ai_vision.md — Model configurations, inference code, benchmarks

database_schema.md — Complete DDL, triggers, views, RLS policies

suspicious_activity.md — Detection algorithms, scoring engine pseudocode

training_system.md — Training pipelines, quality gates, versioning logic

Section 1: Executive Summary
Section 2: Kimi Swarm Team and Agent Responsibilities
Section 3: Assumptions
Section 4: Full Architecture
Section 5: Data Flow from DVR to Cloud to Dashboard
Section 6: Recommended Tech Stack
Section 7: Database Schema
Section 8: AI Model and Training Strategy
Section 9: Suspicious Activity Night-Mode Design
Section 10: Live Video Streaming Design

Section 1: Executive Summary

1.1 Project Objective

This blueprint defines the complete technical design for an AI-powered industrial surveillance platform that transforms a legacy CP PLUS 8-channel DVR system into a modern, intelligent security operations center. The platform processes real-time video from 8 camera channels, applies state-of-the-art computer vision and face recognition AI, detects suspicious activity during night hours, and provides a unified dashboard for security operators — all while maintaining the highest standards of reliability, security, and data privacy.

The system is designed around a cloud+edge hybrid architecture where all compute-intensive AI inference runs in the cloud (AWS Mumbai), while a local edge gateway handles stream ingestion, buffering, and site-local concerns. A WireGuard VPN tunnel protects all communication between edge and cloud, ensuring the DVR has zero public internet exposure.

1.2 Key Capabilities

Capability	Description	Technology
Human Detection	Real-time person detection across all 8 channels at 15-20 FPS	YOLO11m + TensorRT FP16, 640x640
Face Detection	Accurate face localization with 5-point landmarks for alignment	SCRFD-500M-BNKPS, 640x640
Face Recognition	512-D embedding extraction with 99.83% LFW accuracy	ArcFace R100 IR-SE100 (MS1MV3)
Person Tracking	Persistent identity tracking across frames with occlusion recovery	ByteTrack (Kalman + IoU), 80.3% MOTA
Unknown Clustering	Automatic grouping of unknown faces for operator review	HDBSCAN + DBSCAN fallback, 89.5% purity
Night Mode Surveillance	10-detection-module suspicious activity analysis (22:00-06:00)	Composite scoring engine with time-decay
AI Vibe Controls	Three intuitive presets (Relaxed/Balanced/Strict) mapping to 4 confidence levels	Dynamic threshold adjustment
Safe Self-Learning	Three-mode training system with conflict detection and approval workflows	MLflow + Airflow + Manual Review
24/7 Reliability	Graceful degradation: video never stops, AI catch-up on recovery	Tiered storage + circuit breakers + replay
Real-Time Alerts	6-level escalation (NONE to EMERGENCY) with multi-channel notifications	Telegram, WhatsApp, Email, Webhook
Live Dashboard	Multi-camera grid with HLS streaming and single-camera low-latency WebRTC	Next.js 14 + HLS.js + WebRTC

1.3 Architecture Approach

The platform follows a cloud+edge+VPN hybrid pattern with five network security zones:

Cameras (8ch) --> DVR (local) --> Edge Gateway (local) --> WireGuard VPN --> AWS Cloud (EKS)
                                      |                        |
                                      | 2TB NVMe buffer         | Encrypted tunnel
                                      | 7-day ring buffer       | UDP 51820
                                      | FFmpeg ingestion        | ChaCha20-Poly1305

Key architectural decisions:

Decision	Choice	Rationale
Cloud Provider	AWS ap-south-1 (Mumbai)	Lowest latency to India, mature managed services
Container Orchestration	Amazon EKS + K3s edge	Managed control plane, GPU node support, lightweight edge
VPN	WireGuard	~60% faster than OpenVPN, modern crypto, simple setup
Message Queue	Apache Kafka (MSK)	Durable ordered log, replay capability, proven at scale
AI Inference	NVIDIA Triton + TensorRT	GPU-optimized, dynamic batching, model ensemble
Database	PostgreSQL 16 + pgvector	ACID compliance, native 512-D vector support
Object Storage	MinIO (edge+cloud) + S3 (archive)	S3-compatible API, tiered cost optimization

1.4 Target Environment

The platform targets a CP PLUS ORANGE CP-UVR-0801E1-CV2 DVR with the following characteristics:

Property	Value	Impact on Design
Brand/Model	CP PLUS ORANGE CP-UVR-0801E1-CV2	Dahua-compatible RTSP URL scheme
Channels	8 active	Initial deployment scope
Resolution	960 x 1080 per channel	AI input: letterbox to 640x640
LAN IP	192.168.29.200/24	Edge gateway on same subnet
RTSP Port	554	TCP interleaved mandatory
ONVIF	V2.6.1.867657 (Server V19.06)	Auto-discovery supported
DVR Disk	FULL (0 bytes free)	All archival is edge-managed; no DVR recording
VPN Access	WireGuard-secured	No public exposure; all traffic encrypted

Critical Design Impact: The DVR disk being full means the system cannot rely on DVR-side recording or playback features. All archival storage is managed by the edge gateway's 2TB NVMe buffer and cloud tiering.

1.5 Key Differentiators

1. AI Vibe Controls Instead of exposing complex threshold parameters to operators, the system provides three intuitive "vibe" presets — Relaxed, Balanced, and Strict — that internally map to optimized configurations for detection sensitivity and face match strictness. This innovation makes the system accessible to non-technical security staff while maintaining AI precision.

2. Safe Self-Learning Training System The platform captures operator corrections (confirmations, corrections, merges, rejections) and feeds them back into model improvement through a carefully designed three-mode learning pipeline: Manual Only, Suggested Learning (recommended), and Approved Auto-Update. A synchronous conflict detector blocks five types of label conflicts before they reach the training dataset, ensuring model integrity.

3. 24/7 Reliability with Graceful Degradation The system is architected around a single priority: video recording never stops. If the AI inference service fails, recording continues locally with queued catch-up processing on recovery. If the VPN tunnel fails, the edge gateway maintains 7 days of local buffer. If the cloud database fails, alerts accumulate in Kafka's durable log. Every failure mode has a defined degradation strategy.

4. 10-Module Night Surveillance The suspicious activity detection system goes beyond simple motion detection to provide comprehensive behavioral analysis through 10 specialized detection modules — from intrusion and loitering to abandoned objects and repeated re-entry patterns — all combined through a composite scoring engine with exponential time-decay.

1.6 Production Readiness Assessment

Dimension	Status	Notes
Architecture Completeness	Production-Ready	All 12 services fully specified with resource allocations
AI Model Selection	Production-Ready	Industry-standard models with published benchmarks
Database Design	Production-Ready	29 tables, 4 views, 8 triggers, partitioning, RLS
Security Architecture	Production-Ready	7-layer defense in depth, encrypted credentials, VPN-only
Scaling Path	Defined	8 -> 16 -> 32 -> 64+ cameras with concrete resource allocations
Failover Design	Production-Ready	Graceful degradation matrix for all failure modes
Estimated Timeline	14 weeks	4 implementation phases defined
Estimated Monthly Cost	~$2,140 USD	8-camera deployment at steady state

Section 2: Kimi Swarm Team and Agent Responsibilities

The unified blueprint was synthesized from the outputs of 11 specialist agents, each responsible for a specific domain of the platform design.

2.1 Agent Responsibility Matrix

#	Agent	Responsibility	Key Deliverables
1	Requirements Analyst	Elicited and structured all functional/non-functional requirements	Requirements traceability matrix, user stories, acceptance criteria
2	System Architect	Designed overall cloud+edge+VPN topology and service interactions	Deployment topology, 5 security zones, scaling roadmap, failover matrix
3	Video Ingestion Engineer	Specified RTSP configuration, edge gateway, and stream processing	RTSP URL patterns, FFmpeg commands, auto-reconnect logic, HLS generation
4	AI Vision Scientist	Selected and configured all CV/AI models for the inference pipeline	Model selection table, inference pipeline architecture, confidence handling
5	Database Architect	Designed complete data model with partitioning, indexing, and security	29 tables + 4 views + 8 triggers, pgvector HNSW index, RLS policies
6	Suspicious Activity Designer	Designed 10 detection modules and composite scoring engine	Detection algorithms, scoring formula, YAML configuration schema
7	Training System Engineer	Designed self-learning pipeline with safety controls	3 learning modes, conflict detection, quality gates, versioning
8	Frontend Developer	Designed Next.js dashboard with real-time video and alerts	Component architecture, HLS.js integration, WebSocket alerts
9	DevOps Engineer	Specified CI/CD, monitoring, and infrastructure-as-code	GitHub Actions + ArgoCD, Prometheus/Grafana, alerting rules
10	Security Architect	Designed defense-in-depth security across all layers	7 security layers, secret management, encryption standards
11	Technical Writer (this document)	Synthesized all specialist outputs into unified blueprint	10-section unified document with cross-references

2.2 Agent Interaction Flow

+-----------------------------------------------------------------------------+
|                         KIMI SWARM TEAM ORCHESTRATION                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|   Requirements Analyst                                                       |
|        |                                                                     |
|        v                                                                     |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | System  |<-->| Video   |<-->| AI      |<-->| Database|                  |
|   |Architect|    |Ingestion|    |Vision   |    |Architect|                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^                                              |                      |
|        |           +---------+    +---------+        |                      |
|        +---------->|Suspicious|<-->|Training |<-------+                      |
|                    |Activity  |    |System   |                               |
|                    |Designer  |    |Engineer |                               |
|                    +---------+    +---------+                               |
|                        |                                              |
|                        v                                              |
|                   +---------+    +---------+    +---------+           |
|                   |Frontend |    |DevOps   |    |Security |           |
|                   |Developer|    |Engineer |    |Architect|           |
|                   +---------+    +---------+    +---------+           |
|                        |                                              |
|                        v                                              |
|                   +---------------------+                             |
|                   | Technical Writer    |                             |
|                   | (Unified Blueprint) |                             |
|                   +---------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

2.3 Cross-Agent Design Consistency

The following cross-cutting concerns were harmonized across all agent outputs during synthesis:

Concern	Resolution	Agents Coordinated
Video latency budget	< 100ms end-to-end (AI); ~35-65s (HLS live)	Video Ingestion, AI Vision, Frontend
Face embedding storage	512-D float32, pgvector HNSW index, cosine similarity	Database, AI Vision, Training
Event data retention	90 days hot (MinIO), 1 year cold (Glacier), 7 days edge	Database, Architecture, Video Ingestion
Alert escalation	6 levels: NONE -> LOW -> MEDIUM -> HIGH -> CRITICAL -> EMERGENCY	Suspicious Activity, Database, Frontend
Model versioning	Semantic MAJOR.MINOR.PATCH with MLflow registry	Training, AI Vision, Architecture
Graceful degradation	Video never stops; AI catch-up on recovery	Architecture, Video Ingestion, AI Vision
Security zones	5 zones: Internet -> ALB -> Application -> Data -> Edge	Architecture, Security, Video Ingestion

Section 3: Assumptions

All assumptions made across the specialist designs are consolidated below. These should be validated before implementation begins.

3.1 Network and Hardware Assumptions

ID	Assumption	Validation Method	Risk if Invalid
NW-01	Edge gateway has dual Ethernet: one for local DVR subnet (192.168.29.0/24), one for internet/VPN	Physical site survey	Cannot bridge DVR to VPN
NW-02	Site internet bandwidth >= 16 Mbps sustained upload for 8 channels	ISP speed test	Video drops, AI delays
NW-03	WireGuard UDP port 51820 is not blocked by site firewall	Firewall rule check	VPN cannot establish
NW-04	DVR RTSP server supports TCP interleaved transport (`rtsp_transport tcp`)	FFmpeg test probe	UDP fallback has packet loss
NW-05	DVR supports 16+ concurrent RTSP sessions (8 channels x 2 streams)	Session stress test	Stream contention
NW-06	MTU 1400 is viable through site NAT/firewall for WireGuard tunnel	Ping with DF bit test	Fragmentation issues
HW-01	Intel NUC 13 Pro (i5-1340P, 16GB RAM, 512GB NVMe) is available for edge gateway	Hardware procurement	May need Jetson Orin alternative
HW-02	Edge gateway has UPS backup for graceful shutdown on power loss	Electrical survey	Data corruption on hard power-off
HW-03	AWS g4dn.xlarge (T4 GPU) instances are available in ap-south-1	AWS EC2 capacity check	Need alternative GPU instance

3.2 DVR Capabilities Assumptions

ID	Assumption	Validation Method	Risk if Invalid
DVR-01	DVR RTSP streams are accessible at `rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M`	FFmpeg connectivity test	Need alternative URL format
DVR-02	DVR continues serving RTSP streams even with disk full (0 bytes free)	24-hour stream stability test	Streams may stall
DVR-03	DVR sub-stream (subtype=1) provides sufficient quality for AI inference (typically 352x288 to 704x576)	Frame quality inspection	May need main stream for AI
DVR-04	DVR ONVIF server supports device discovery and stream URI retrieval	ONVIF Device Manager test	Manual camera configuration needed
DVR-05	DVR channel numbering is 1-indexed (1-8)	ONVIF profile enumeration	Off-by-one errors in configuration
DVR-06	DVR Digest authentication works with the provided credentials	RTSP DESCRIBE request test	May need Basic auth or different scheme

3.3 Environmental Assumptions

ID	Assumption	Impact if Invalid
ENV-01	Cameras provide adequate lighting for face recognition during night hours (minimum 10 lux at face distance)	Face recognition accuracy degrades; may need IR illumination
ENV-02	Camera angles allow frontal face capture at entry/exit points (yaw < 45 degrees)	Face recognition miss rate increases
ENV-03	Indoor industrial environment with minimal weather interference	False positive rate from rain/shadows is low
ENV-04	Maximum person-to-camera distance is within 10 meters for face recognition	Faces may be too small (< 20px) for reliable detection
ENV-05	Camera positions are stable (no PTZ movement during normal operation)	Zone calibration remains valid

3.4 Operational Assumptions

ID	Assumption	Impact if Invalid
OPS-01	Security operators will review unknown face clusters and provide identity labels daily	Unknown person database grows without enrichment
OPS-02	Admin will review training suggestions at least weekly in "Suggested Learning" mode	Training queue backlog accumulates
OPS-03	Site has authorized personnel who can access edge gateway for maintenance (SSH, physical)	Remote troubleshooting limited
OPS-04	Alert fatigue is a genuine concern — false positive rate > 20% leads to ignored alerts	AI vibe controls and suppression tuned accordingly
OPS-05	Incident video review requires 10-second pre-event and 30-second post-event clips	Clip configuration fixed

3.5 Security Assumptions

ID	Assumption	Impact if Invalid
SEC-01	WireGuard encryption (ChaCha20-Poly1305) meets organizational security requirements	May need additional encryption layer
SEC-02	AWS VPC with private subnets satisfies data residency requirements for India	Compliance review needed
SEC-03	Face embeddings (512-D vectors) do not constitute PII under applicable regulations	Legal review needed for biometric data handling
SEC-04	Edge gateway physical security is equivalent to server room security	Tampering risk if edge is physically accessible
SEC-05	DVR credentials can be stored encrypted (AES-256) in cloud database	Key management infrastructure required

3.6 AI Performance Assumptions

ID	Assumption	Impact if Invalid
AI-01	YOLO11m TensorRT FP16 achieves > 75% person AP@50 on surveillance footage	May need fine-tuning on site-specific data
AI-02	ArcFace R100 achieves > 98% Rank-1 accuracy on enrolled persons with 5+ reference images	Enrollment quality gates ensure minimum samples
AI-03	HDBSCAN achieves > 89% cluster purity on 512-D face embeddings from this camera setup	Fallback to DBSCAN if density varies too much
AI-04	ByteTrack maintains < 2 ID switches per 100 frames in industrial environment with occlusion	May need BoT-SORT upgrade for complex scenes
AI-05	GPU (T4) can sustain 15-20 FPS processing per stream across 8 streams with batching	CPU fallback at 5-8 FPS if GPU unavailable

Section 4: Full Architecture

4.1 High-Level System Architecture

The platform employs a cloud+edge hybrid architecture with five network security zones. Video streams are ingested at the edge, processed by AI in the cloud, and presented through a web-based dashboard. A WireGuard VPN tunnel provides encrypted, zero-exposure connectivity between edge and cloud.

+=============================================================================+
|                         CLOUD+EDGE+VPN ARCHITECTURE                          |
+=============================================================================+
|                                                                              |
|   ZONE 0: INTERNET (UNTRUSTED)                                               |
|   +---------------------+                                                    |
|   |  Users / Browsers   |                                                    |
|   |  HTTPS :443         |                                                    |
|   +----------+----------+                                                    |
|              |                                                               |
|              v                                                               |
|   ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                       |
|   +--------------------------------------------------------------+          |
|   |  AWS ALB (:443) + WAF v2 + Rate Limit + Geo-Restriction      |          |
|   |       |                                                      |          |
|   |       v                                                      |          |
|   |  Traefik Ingress Controller (:8443)                          |          |
|   |  - Route: /api/*  -> Backend Service                         |          |
|   |  - Route: /ws/*   -> WebSocket Handler                       |          |
|   |  - Route: /       -> Next.js Web App                         |          |
|   |  - TLS: Let's Encrypt auto certificates                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 2: AWS VPC APPLICATION (TRUSTED)                                      |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Stream      |  | AI Inference|  | Suspicious Activity |   |          |
|   |  | Ingestion   |  | Service     |  | Service (Night Mode)|   |          |
|   |  | (Go/FFmpeg) |  | (Triton)    |  | (Go/Python)         |   |          |
|   |  | :8081       |  | :8001 gRPC  |  | :8083               |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Backend API |  | Training    |  | Notification        |   |          |
|   |  | (Go/Gin)    |  | Service     |  | Service             |   |          |
|   |  | :8080       |  | (PyTorch)   |  | (Go)                |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +--------------------+                                      |          |
|   |  | Web Frontend       |  HLS Playback Service               |          |
|   |  | (Next.js 14 :3000) |  (Go :8085)                         |          |
|   |  +--------------------+                                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                                   |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  | PostgreSQL  |  | Redis       |  | Kafka       |           |          |
|   |  | 16 (RDS)    |  | 7 Cluster   |  | (MSK)       |           |          |
|   |  | :5432       |  | :6379       |  | :9092       |           |          |
|   |  | pgvector    |  | Pub/Sub     |  | 3 brokers   |           |          |
|   |  | HNSW index  |  | Streams     |  | 3 AZs       |           |          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   |  | MinIO       |  | S3 (Cold Archive)                 |      |          |
|   |  | (S3-compat) |  | - Standard (30d)                  |      |          |
|   |  | :9000       |  | - IA (31-90d)                     |      |          |
|   |  | 10 TB       |  | - Glacier Deep Archive (90d+)     |      |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              | WireGuard VPN Tunnel (UDP 51820)                                |
|              | ChaCha20-Poly1305 encryption                                    |
|              | Cloud peer: 10.200.0.1/32 <-> Edge peer: 10.200.0.2/32         |
|              v                                                               |
|   ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                                 |
|   +--------------------------------------------------------------+          |
|   |  +--------------------------------------------------------+  |          |
|   |  |              EDGE GATEWAY (Intel NUC)                  |  |          |
|   |  |  Ubuntu 22.04 LTS | K3s v1.28+ | 2TB NVMe             |  |          |
|   |  |                                                          |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Stream Manager  |  | HLS Segmenter   |                |  |          |
|   |  |  | (Python/asyncio)|  | (FFmpeg/nginx)  |                |  |          |
|   |  |  | 8x RTSP feeds   |  | 2s segments     |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Frame Extractor |  | Buffer Manager  |                |  |          |
|   |  |  | (AI decimation) |  | (20GB ring buf) |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  |  | VPN Client (WireGuard)  |  Health Monitor         |    |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  +--------------------------------------------------------+  |          |
|   |                            |                                             |
|   |   Local Network (192.168.29.0/24)                                       |
|   |   +------------------+    +------------------+                           |
|   |   | CP PLUS DVR      |    | Local Monitor    |                           |
|   |   | 192.168.29.200   |    | 192.168.29.10    |                           |
|   |   | 8ch | RTSP :554  |    | (optional)       |                           |
|   |   +------------------+    +------------------+                           |
|   |   CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8                                      |
|   +--------------------------------------------------------------+          |
|                                                                              |
+=============================================================================+

4.2 Service Interaction Diagram

+-----------------------------------------------------------------------------+
|                           SERVICE INTERACTIONS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|   INTERNET USERS                                                             |
|        |                                                                     |
|        | HTTPS :443                                                          |
|        v                                                                     |
|   +---------+      +----------+      +----------+                           |
|   | AWS ALB |----->| Traefik  |----->| Next.js  |  Web Frontend             |
|   | +WAF    |      | Ingress  |      | (SSR)    |  Dashboard                |
|   +---------+      +----------+      +----+-----+                           |
|                                             |                                |
|                        +--------------------+--------------------+           |
|                        |                    |                    |           |
|                        v                    v                    v           |
|                   +---------+       +------------+      +----------+       |
|                   |Backend  |       | WebSocket  |      | HLS      |       |
|                   |API (Go) |       | Handler    |      | Playback |       |
|                   |:8080    |       | /ws/alerts |      | Service  |       |
|                   +----+----+       +------------+      +----+-----+       |
|                        |                                               |
|                        | gRPC :50051                                    |
|                        v                                               |
|   +---------+    +------------+    +----------+    +----------+       |
|   | Stream  |    | AI         |    |Suspicious|    |Training  |       |
|   |Ingestion|<-->| Inference  |<-->| Activity |    |Service   |       |
|   |(Go)     |    |(Triton)    |    |(Night)   |    |(PyTorch) |       |
|   +----+----+    +------+-----+    +----+-----+    +----+-----+       |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------------------------------------------------------------+   |
|   |                        KAFKA (MSK)                            |   |
|   |  streams.raw (8 parts)  ai.detections (16 parts)             |   |
|   |  alerts.critical (4 parts)  training.data (30-day ret.)      |   |
|   |  notifications.*  system.metrics (7-day ret.)                |   |
|   +---------------------------------------------------------------+   |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------+    +------------+    +----------+    +----------+       |
|   |PostgreSQL|   | Redis      |    | MinIO    |    | MLflow   |       |
|   |16 +pgvec |   |7 Cluster   |    |S3-compat |    | Model    |       |
|   |:5432     |   |:6379       |    |:9000     |    | Registry |       |
|   +---------+    +------------+    +----------+    +----------+       |
|                                                                              |
|   Edge Gateway: WireGuard peer at 10.200.0.2/32                            |
|   Stream Ingestion pulls frames via VPN -> sends to Kafka                   |
|                                                                              |
+-----------------------------------------------------------------------------+

4.3 Network Security Zones

Five security zones provide defense in depth, from the public internet to the physically isolated edge network.

+=============================================================================+
|                         NETWORK SECURITY ZONES                               |
+=============================================================================+
|                                                                              |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 0: INTERNET (UNTRUSTED)                                        |    |
|  |  - Public users, any source IP                                        |    |
|  |  - AWS Shield Standard DDoS protection                               |    |
|  |  - Geo-restriction: allow specific countries only                    |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | HTTPS :443                                    |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                |    |
|  |  - ALB + WAF v2 (SQL injection, XSS, rate limiting rules)           |    |
|  |  - Traefik Ingress (:8443)                                          |    |
|  |  - Auth: JWT + RBAC, API keys for edge gateway                     |    |
|  |  - Public API endpoints ONLY                                        |    |
|  |  SG: alb-public-sg: 443 from 0.0.0.0/0                             |    |
|  |  SG: traefik-sg: 8443 from alb-sg ONLY                              |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Internal :8080-8090                         |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 2: AWS VPC APPLICATION (TRUSTED, ISOLATED)                     |    |
|  |  - Stream Ingestion, AI Inference, Suspicious Activity              |    |
|  |  - Training, Backend API, Notification Services                     |    |
|  |  - Pod Security: No root, read-only FS, no privilege escalation    |    |
|  |  - Network Policies: Ingress only from API GW namespace            |    |
|  |  SG: app-sg: 8080-8090 from traefik-sg ONLY                         |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Data Layer :5432, :6379, :9092, :9000       |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                            |    |
|  |  - PostgreSQL (RDS), Redis (ElastiCache), Kafka (MSK)               |    |
|  |  - MinIO object storage, S3 cold archive                            |    |
|  |  - Security Groups: ONLY from app-sg                                |    |
|  |  - RDS: Encrypted at rest (AWS KMS), no public access              |    |
|  |  - S3: Bucket policy deny all except VPC endpoint                   |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | WireGuard VPN (UDP 51820)                     |
|                              | ChaCha20-Poly1305                             |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                          |    |
|  |  - Edge Gateway (Intel NUC), K3s node                                |    |
|  |  - WireGuard peer, stream ingestion, local buffer                    |    |
|  |  - DVR (192.168.29.200): NO internet access, local ONLY             |    |
|  |  - Edge Firewall: ALLOW 192.168.29.0/24 -> DVR :554,:80           |    |
|  |                   ALLOW OUT 51820/udp -> Cloud VPN endpoint        |    |
|  |                   DENY ALL other incoming                           |    |
|  +---------------------------------------------------------------------+    |
|                                                                              |
+=============================================================================+

4.4 Service Descriptions

#	Service	Purpose	Technology	Port	Replicas
1	Edge Gateway Agent	RTSP stream pull, local recording, VPN endpoint, heartbeat	Go 1.21, systemd + K3s	8080, 51820	1 (per site)
2	Stream Ingestion	Receive frames from edge, decode, produce to Kafka, store segments	Go 1.21, FFmpeg	8081	3-20 (HPA)
3	AI Inference	GPU-accelerated detection, face recognition, embedding	Triton 2.40, TensorRT	8000, 8001, 8002	1-4 (GPU HPA)
4	Suspicious Activity	Night-mode analysis, 10 detection modules, scoring engine	Python 3.11, OpenCV	8083	2-8 (HPA)
5	Training Service	Model retraining, fine-tuning, A/B validation	PyTorch 2.1, CUDA 12.1	8084	0-1 (GPU spot)
6	Backend API	REST API, authentication, business logic	Go 1.21, Gin	8080	3-10 (HPA)
7	Web Frontend	Dashboard, live view, timeline, analytics	Next.js 14, React 18	3000	3 (CDN)
8	Notification	Multi-channel alert dispatch (Telegram, WhatsApp, Email)	Go 1.21	8086	2-5 (HPA)
9	HLS Playback	HLS segment serving for dashboard live view	Go 1.21	8085	2-4 (HPA)
10	PostgreSQL	Primary database with pgvector for embeddings	PostgreSQL 16 (RDS)	5432	1 (Multi-AZ)
11	Redis	Session store, cache, pub/sub, stream tracking	Redis 7 (ElastiCache)	6379	2 shards x 2 replicas
12	Kafka	Event bus, durable log, stream replay	Apache Kafka (MSK)	9092	3 brokers x 3 AZs
13	MinIO	Object storage for video, snapshots, model artifacts	MinIO (S3-compatible)	9000, 9001	Edge: 1, Cloud: 4

4.5 Physical Edge Gateway Specification

Component	Specification
Hardware	Intel NUC 13 Pro, Core i5-1340P (12 cores, 16 threads)
Alternative	NVIDIA Jetson Orin NX 16GB (for on-edge AI inference)
RAM	16GB DDR4-3200 (32GB recommended for 16+ channels)
Storage	2TB NVMe SSD (7-day circular buffer for all 8 streams)
LAN	Intel i226-V 2.5GbE (local DVR subnet)
WAN	Second Ethernet or WiFi (internet for VPN)
OS	Ubuntu 22.04.4 LTS Server (no GUI)
Container Runtime	Docker CE 25.x + Docker Compose 2.x
K8s Distribution	K3s v1.28+ (lightweight, single-node or 2-node HA)
Power	UPS-backed, auto-restart on power loss (BIOS setting)
Network	Dual interface: eth0 for local DVR, eth1 for internet/VPN

4.6 Cloud Infrastructure Specification

Component	Specification
Region	Primary: ap-south-1 (Mumbai), DR: ap-southeast-1 (Singapore)
VPC	10.100.0.0/16, 3 AZs, private subnets only for workloads
EKS	Managed node groups: `on-demand` for API, `spot` for batch/GPU
GPU Nodes	g4dn.xlarge (NVIDIA T4) for Triton inference, 1-4 auto-scaled
ALB	Internet-facing, WAF v2 attached, Shield Advanced optional
RDS	PostgreSQL 16, db.r6g.xlarge, Multi-AZ, encrypted at rest
ElastiCache	Redis 7, cluster mode enabled, 2 shards x 2 replicas
MSK (Kafka)	3 broker nodes, kafka.m5.large, 3 AZs
S3	Standard (hot 30d), IA (31-90d), Glacier Deep Archive (90d+)

4.7 Scaling Approach

The system scales from the initial 8-camera deployment to 64+ cameras through well-defined phases:

+-----------------------------------------------------------------------------+
|                        CAMERA SCALING ROADMAP                                |
+-----------------------------------------------------------------------------+
|                                                                              |
|  CURRENT: 8 cameras (1 DVR)                                                  |
|  +-- Edge: Intel NUC i7, 32GB RAM                                           |
|  +-- Bandwidth: ~16 Mbps upstream (2 Mbps per H.264 stream)                 |
|  +-- Cloud AI: 1x T4 GPU (8 streams @ 1 fps, batch=8)                       |
|  +-- Kafka: 8 partitions (streams.raw)                                      |
|  +-- PostgreSQL: db.r6g.xlarge                                              |
|  +-- Monthly cost: ~$2,140                                                  |
|                                                                              |
|  PHASE 1: 16 cameras (2 DVRs / 2 sites)                                      |
|  +-- Edge: 2x Intel NUC (one per site)                                      |
|  +-- Bandwidth: ~32 Mbps                                                    |
|  +-- Cloud AI: 1x T4 GPU (batch=16, still sufficient)                       |
|  +-- Kafka: 16 partitions                                                   |
|  +-- Monthly cost: ~$3,200                                                  |
|                                                                              |
|  PHASE 2: 32 cameras (4 DVRs / 4 sites)                                      |
|  +-- Edge: 4x Intel NUC                                                     |
|  +-- VPN: Hub-spoke model (4 edge peers -> 1 cloud endpoint)                |
|  +-- Bandwidth: ~64 Mbps                                                    |
|  +-- Cloud AI: 2x T4 GPUs (HPA: 2-6 replicas)                               |
|  +-- Kafka: 32 partitions                                                   |
|  +-- PostgreSQL: db.r6g.2xlarge                                             |
|  +-- Monthly cost: ~$5,500                                                  |
|                                                                              |
|  PHASE 3: 64 cameras (8 DVRs / 8 sites)                                      |
|  +-- Edge: 8x Intel NUC (or Jetson Orin for edge AI pre-filter)              |
|  +-- Bandwidth: ~128 Mbps (dedicated circuit recommended)                   |
|  +-- Cloud AI: 4x T4 GPUs or 2x A10G (g5.2xlarge)                           |
|  +-- Kafka: 64 partitions, consider MSK multi-cluster                        |
|  +-- PostgreSQL: db.r6g.4xlarge + read replica                              |
|  +-- Monthly cost: ~$9,800                                                  |
|                                                                              |
+-----------------------------------------------------------------------------+

4.8 Failover and Reliability Design

The graceful degradation matrix defines behavior for every failure mode:

+=============================================================================+
|                     GRACEFUL DEGRADATION MATRIX                              |
+=============================================================================+
|                                                                              |
|  Failure Mode              | Degradation Strategy                            |
|  ------------------------- | ----------------------------------------------- |
|  AI Inference Service DOWN | Continue recording ALL video locally            |
|  (GPU failure, model crash)| Events stored as "unprocessed"                  |
|                            | No real-time alerts                             |
|                            | Queue frames for later batch processing         |
|                            | Dashboard shows "AI OFFLINE" banner             |
|                                                                              |
|  Kafka DOWN (MSK outage)   | Edge Gateway buffers locally (20GB ring buffer) |
|                            | Backpressure: reduce to key frames only (0.2fps)|
|                            | Auto-reconnect with 2x exponential backoff      |
|                            | Replay from local buffer when Kafka recovers    |
|                                                                              |
|  VPN Tunnel DOWN           | Full local operation mode                       |
|  (internet outage)         | All recording continues locally (7-day buffer)  |
|                            | Local alert buzzer/relay (configurable)         |
|                            | No cloud dashboard access                       |
|                            | Auto-sync when VPN recovers                     |
|                                                                              |
|  PostgreSQL DOWN (RDS)     | Alert queue builds in Kafka (durable log)       |
|                            | Events not lost (Kafka 7-day retention)         |
|                            | Read-only dashboard mode (Redis cache)          |
|                            | Alert on-call engineer                          |
|                                                                              |
|  Notification Service DOWN | Alerts accumulate in DB                         |
|                            | Retry with exponential backoff                  |
|                            | Dead letter after 24 hours                      |
|                            | Dashboard shows pending count                   |
|                                                                              |
|  Edge Gateway DOWN (power) | Cloud dashboard shows "SITE OFFLINE"            |
|                            | Last known recordings in cloud                  |
|                            | Alert sent immediately                          |
|                            | UPS: graceful shutdown, preserve data           |
|                                                                              |
+=============================================================================+

Priority Order (highest first):

Video recording NEVER STOPS (local edge priority)
Critical alerts ALWAYS FIRE (local buzzer + queued cloud alerts)
AI inference gracefully degrades to batch catch-up on recovery
Dashboard operates in read-only/cache mode during DB outage
Cloud sync resumes automatically when connectivity restored

Reliability Mechanisms:

Mechanism	Implementation	Target
Stream Reconnect	Exponential backoff: 1s -> 2s -> 4s -> 8s -> max 30s	< 60s recovery
Circuit Breaker	5 failures -> OPEN (60s) -> HALF_OPEN (3 test calls) -> CLOSED	Prevent cascade failures
VPN Watchdog	Ping every 30s, restart WireGuard on 3 consecutive failures	< 90s VPN recovery
Kafka Producer	`acks=all`, `retries=10`, `enable.idempotence=true`, LZ4 compression	Zero message loss
Kafka Consumer	Manual offset commit AFTER DB write success	Exactly-once processing
Health Checks	5-layer: K8s probes -> Service metrics -> Dependency checks -> E2E synthetic -> Edge heartbeat	< 2 min detection
Auto-scaling	GPU util > 80% for 2 min -> scale out; Kafka lag > 1000 for 5 min -> scale out	Proactive capacity

Section 5: Data Flow from DVR to Cloud to Dashboard

This section traces the complete data journey from camera capture through AI processing to user presentation.

5.1 Overview: Seven Data Flows

+=============================================================================+
|                        SEVEN DATA FLOW PATHWAYS                              |
+=============================================================================+
|                                                                              |
|  Flow 1: Camera --> DVR --> Edge Gateway                                    |
|          [Analog/Digital] -> [H.264 Encode] -> [RTSP Server]                |
|                                                                              |
|  Flow 2: Edge Gateway --> VPN --> Cloud Kafka                               |
|          [FFmpeg ingest] -> [Frame extract] -> [Kafka Producer]             |
|                                                                              |
|  Flow 3: Stream Ingestion --> AI Inference                                  |
|          [Kafka Consumer] -> [GPU Batch] -> [Detection + Face Recog.]       |
|                                                                              |
|  Flow 4: AI Inference --> Events --> Database                               |
|          [Detection results] -> [Event enrich] -> [PostgreSQL]              |
|                                                                              |
|  Flow 5: Events --> Alerts --> Notifications                                |
|          [Scoring engine] -> [Alert create] -> [Multi-channel send]         |
|                                                                              |
|  Flow 6: Live Streams --> Browser Dashboard                                 |
|          [HLS segmenter] -> [Nginx relay] -> [HLS.js player]                |
|                                                                              |
|  Flow 7: Training Feedback Loop                                             |
|          [Operator review] -> [Conflict detect] -> [Model update]           |
|                                                                              |
+=============================================================================+

5.2 Flow 1: Camera to DVR to Edge Gateway

Path: Analog/Digital Camera -> DVR internal encoder -> DVR RTSP server -> Edge Gateway FFmpeg client

Protocol Stack:

Layer	Technology	Details
Camera Interface	Analog BNC / CVBS / AHD	CP PLUS DVR supports multiple analog standards
DVR Encoding	H.264 High Profile	Hardware encoder, real-time, low latency
DVR Storage	Internal HDD (currently FULL)	0 bytes free — no local recording possible
Network Transport	RTSP over TCP (interleaved)	Mandatory for reliable NAT/VPN traversal
URL Pattern	`rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M`	N=1-8, M=0(main)/1(sub)
Client	FFmpeg 6.0+	`-rtsp_transport tcp -stimeout 5000000`
Frame Rate	25 FPS (PAL) or 30 FPS (NTSC)	Configurable per channel
Resolution (main)	960 x 1080 (per channel)	Full resolution
Resolution (sub)	352 x 288 to 704 x 576	Lower bandwidth for AI

FFmpeg RTSP Connection Command:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp \
    -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c copy -f segment -segment_time 60 -reset_timestamps 1 \
    -strftime 1 "/data/buffer/ch1/%Y%m%d_%H%M%S.mkv"

Latency Budget:

Stage	Latency
Camera -> DVR (analog)	~1-5 ms
DVR encoding	~50-100 ms
RTSP over LAN	~1-2 ms
Total (camera to edge gateway)	~52-107 ms

5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud

Path: Edge Gateway FFmpeg -> Frame extraction -> JPEG encoding -> Kafka Producer -> WireGuard VPN -> Cloud MSK

Frame Processing Pipeline:

+------------+    +-------------+    +---------------+    +-------------+    +-----------+
| Raw RTSP   | -> | FFmpeg      | -> | Frame         | -> | JPEG        | -> | Kafka     |
| H.264      |    | Demux/Decode|    | Decimation    |    | Encoder     |    | Producer  |
| 25 FPS     |    |             |    | (1 fps)       |    | Quality 85  |    | (LZ4)     |
| 960x1080   |    |             |    | 640x640 crop  |    |             |    |           |
+------------+    +-------------+    +---------------+    +-------------+    +-----------+

FFmpeg Frame Extraction for AI:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -vf "fps=1,scale=640:640:force_original_aspect_ratio=decrease,pad=640:640:(ow-iw)/2:(oh-ih)/2:black" \
    -q:v 5 -f image2pipe -vcodec mjpeg pipe:1

WireGuard VPN Tunnel Configuration:

Parameter	Value
Protocol	UDP 51820
Encryption	ChaCha20-Poly1305
Key Exchange	Curve25519 (ECDH)
Preshared Key	Enabled per-peer
Keepalive	25 seconds
MTU	1400 (to account for WireGuard + IP headers)
Cloud Endpoint	10.200.0.1/32 (EC2 bastion or ALB)
Edge Endpoint	10.200.0.2/32
Route	10.200.0.0/16 (AWS VPC) accessible from edge

VPN watchdog script runs every 30 seconds; restarts WireGuard on 3 consecutive ping failures.

Latency Budget:

Stage	Latency
Frame extraction (FFmpeg)	~50-100 ms
JPEG encoding	~5-10 ms
Kafka produce (local)	~1-2 ms
WireGuard tunnel	~5-15 ms (Mumbai -> India site)
MSK broker	~1-2 ms
Total (edge to cloud Kafka)	~62-129 ms

5.4 Flow 3: Stream Ingestion to AI Inference

Path: Kafka streams.raw topic -> Stream Ingestion consumer -> Triton Inference Server -> Kafka ai.detections topic

Pipeline Architecture:

+------------+    +-------------------+    +------------------+    +-------------+
| streams.raw| -> | Stream Ingestion  | -> | NVIDIA Triton    | -> | ai.detections |
| (8 parts)  |    | (Go consumer)     |    | (GPU inference)  |    | (16 parts)    |
| JPEG frames|    | Batch aggregator  |    | gRPC :8001       |    | Detection     |
| + metadata |    | (batch=8, timeout)|    | Dynamic batching |    | + embeddings  |
+------------+    +-------------------+    +------------------+    +-------------+

Triton Model Configuration:

Model	Inputs	Outputs	GPU Memory	Latency (P50)
YOLO11m-det (TensorRT FP16)	3x640x640 float16	Bboxes, scores, labels	~2.1 GB	12 ms
SCRFD-500M (TensorRT FP16)	3x640x640 float16	Bboxes, landmarks, scores	~1.8 GB	8 ms
ArcFace R100 (TensorRT FP16)	3x112x112 float16	512-D embedding	~3.2 GB	5 ms

Total GPU memory: ~7.1 GB (fits in T4 16 GB with 8 streams)

Latency Budget:

Stage	Latency
Kafka consume (batch)	~10-50 ms
Preprocessing (resize, normalize)	~5-15 ms
YOLO11m inference (GPU)	~12 ms (P50)
SCRFD face detection (GPU)	~8 ms (P50)
ArcFace embedding (GPU, per face)	~5 ms (P50)
Post-processing (NMS, matching)	~10-30 ms
Kafka produce (results)	~1-2 ms
Total (Kafka to detection output)	~51-132 ms

5.5 Flow 4: AI Inference to Events to Database

Path: AI Detection results -> Event enricher -> PostgreSQL (multiple tables)

Data Transformation:

+------------+    +-------------------+    +---------------------+    +------------+
| Detection  | -> | Event Enricher    | -> | PostgreSQL Writer   | -> | events     |
| results    |    | - Add camera_id   |    | - UPSERT person     |    | persons    |
| (raw)      |    | - Match person    |    | - INSERT event      |    | embeddings |
|            |    | - Check whitelist |    | - INSERT embedding  |    | face_crops |
+------------+    +-------------------+    +---------------------+    +------------+

Database Write Operations per Detection:

Operation	Table	Type	Notes
Insert event record	`events`	INSERT	With bounding box, confidence, timestamp
Upsert person	`persons`	INSERT/UPDATE	If new face, create person record
Insert face crop	`face_crops`	INSERT	S3 URL, bounding box, quality score
Upsert embedding	`face_embeddings`	INSERT/UPDATE	512-D vector, pgvector HNSW index
Increment counters	`camera_stats`	UPDATE	Daily aggregation

5.6 Flow 5: Events to Alerts to Notifications

Path: AI events -> Suspicious Activity scoring engine -> Alert creation -> Notification dispatch

Scoring and Escalation:

+------------+    +-------------------+    +------------------+    +-------------+
| AI events  | -> | Suspicious Activity| -> | Alert Manager    | -> | Notification |
| (persons,  |    | Scoring Engine     |    | - Deduplicate    |    | Service      |
|  faces)    |    | - 10 modules       |    | - Rate limit     |    | - Telegram   |
|            |    | - Composite score  |    | - Suppress dup   |    | - WhatsApp   |
|            |    | - Time decay       |    | - Escalation     |    | - Email      |
+------------+    +-------------------+    +------------------+    +-------------+

Alert Escalation Matrix:

Score	Level	Color	Notification	Action
0.00 - 0.20	NONE	Gray	None	Log only
0.20 - 0.40	LOW	Blue	Dashboard only	Log + indicator
0.40 - 0.60	MEDIUM	Yellow	Dashboard + App push	Alert dispatched
0.60 - 0.80	HIGH	Orange	All of above + Telegram	Immediate alert
0.80 - 1.00	CRITICAL	Red	All of above + WhatsApp + Email	Critical alert
> 1.00	EMERGENCY	Purple + flashing	All channels + SMS	Emergency dispatch

5.7 Flow 6: Live Streams to Browser Dashboard

Path: DVR RTSP -> Edge Gateway FFmpeg -> HLS segmenter -> Nginx -> CDN -> Browser HLS.js

+--------+    +---------------+    +---------------+    +---------+    +----------+
| DVR    | -> | Edge Gateway  | -> | HLS Segmenter | -> | Nginx   | -> | Browser  |
| RTSP   |    | FFmpeg        |    | (2s segments) |    | (relay) |    | HLS.js   |
| 25 FPS |    | -copyts       |    | H.264 + AAC   |    | HTTPS   |    | Video tag|
+--------+    +---------------+    +---------------+    +---------+    +----------+

HLS Configuration:

Parameter	Value
Segment duration	2 seconds
Segment list size	5 segments (10-second sliding window)
Playlist type	Live (no #EXT-X-ENDLIST)
Codec	H.264 High Profile + AAC-LC
Adaptive bitrate	3 variants: high (3 Mbps), mid (1 Mbps), low (500 Kbps)

Latency:

Stage	Latency
DVR encoding	~50-100 ms
RTSP to edge	~1-2 ms
FFmpeg demux/remux	~20-50 ms
HLS segmenting (2s)	~2000 ms
Nginx relay	~1-5 ms
CDN propagation	~10-50 ms
HLS.js buffer	~1-2 segments (2-4s)
Browser decode	~20-50 ms
Total (camera to eye)	~2.1 - 2.3 seconds

5.8 Flow 7: Training Feedback Loop

Path: Operator review actions -> Conflict detection -> Training dataset -> Model training -> Quality gates -> Deployment

+------------+    +------------------+    +----------------+    +-------------+    +-----------+
| Operator   | -> | Conflict         | -> | Training       | -> | Quality     | -> | Deployment |
| Review     |    | Detection        |    | Dataset        |    | Gates       |    | (A/B test) |
| (confirm,  |    | (5 types)        |    | - Curate       |    | - Precision |    |            |
|  correct,  |    | - Block conflicts|    | - Label        |    |   >= 0.97   |    |            |
|  merge,    |    | - Queue safe     |    | - Augment      |    | - Recall    |    |            |
|  reject)   |    |   additions      |    | - Version      |    |   >= 0.95   |    |            |
+------------+    +------------------+    +----------------+    +-------------+    +-----------+

Training Data Flow:

Stage	Frequency	Trigger
Review action collection	Continuous	Operator clicks on dashboard
Conflict detection	Immediate (synchronous)	Every review action
Training dataset build	Weekly (or on-demand)	Queue threshold or manual
Model training	On dataset build	Airflow DAG trigger
Quality gate evaluation	After training	Automated pipeline
A/B deployment	After quality pass	Admin approval
Full production	After A/B success	Auto-promote at 48h

Section 6: Recommended Tech Stack

6.1 Technology Selection Matrix

Layer	Technology	Version	Purpose	Rationale
Cloud Platform	AWS	2025	Infrastructure (ap-south-1 Mumbai)	Best India region latency, mature managed services
Container Orchestration	Amazon EKS	v1.28+	Managed Kubernetes control plane	GPU node support, Cluster Autoscaler
Edge K8s	K3s	v1.28+	Lightweight Kubernetes at edge	Single binary, resource-efficient
VPN	WireGuard	v1.0+	Encrypted tunnel between edge and cloud	~60% faster than OpenVPN, modern crypto
Reverse Proxy	Traefik	v2.10+	Kubernetes Ingress controller	Native K8s integration, automatic TLS
AI Inference	NVIDIA Triton	2.40	GPU model serving, dynamic batching	Multi-framework, TensorRT optimization
CV Framework	OpenCV	4.8+	Image processing, pre/post-processing	Industry standard, Python/Go bindings
AI/ML Framework	PyTorch	2.1+	Model training, custom inference	Ecosystem, CUDA 12 support
Deep Learning	TensorRT	8.6+	GPU-optimized inference for YOLO, SCRFD, ArcFace	FP16 support, 3-5x speedup
Language: AI	Python	3.11	AI inference, training, suspicious activity detection	Ecosystem, scientific computing
Language: Services	Go	1.21	Stream ingestion, backend API, notifications	Performance, concurrency, small binaries
Language: Frontend	TypeScript	5.2	Web dashboard	Type safety, React ecosystem
Web Framework	Next.js	14 (App Router)	React SSR dashboard	Server components, streaming
UI Library	React	18	Component-based UI	Concurrent features, Suspense
Styling	Tailwind CSS	3.4	Utility-first CSS	Rapid development, consistent design
Video Player	HLS.js	1.4	Browser HLS playback	MSE-based, adaptive bitrate
Database	PostgreSQL	16	Primary database, vector storage	ACID, pgvector extension
Vector Search	pgvector	0.5+	HNSW index for 512-D face embeddings	Native PostgreSQL, ivfflat+hnsw
Cache/Session	Redis	7	Session store, pub/sub, rate limiting	Data structures, cluster mode
Message Queue	Apache Kafka	3.6+ (MSK)	Durable event log, stream replay	Exactly-once, retention, partitions
Object Storage	MinIO	latest (RELEASE.2024)	S3-compatible hot storage	Edge + cloud, erasure coding
Cold Archive	Amazon S3	Standard/IA/Glacier	Tiered archival (30d/90d/365d)	Cost optimization
Model Registry	MLflow	2.8+	Model versioning, experiment tracking	Open source, S3 artifact store
Orchestration	Apache Airflow	2.7+	Training pipeline DAGs	Backfill, retries, observability
Monitoring	Prometheus	2.47+	Metrics collection	Pull-based, K8s service discovery
Visualization	Grafana	10.1+	Dashboards, alerting	Panels, annotations, shared links
Log Aggregation	Grafana Loki	2.9+	Centralized logging	Label-based, cost-effective
CI/CD	GitHub Actions	v4	Build, test, lint pipelines	Native GitHub integration
GitOps	ArgoCD	2.9+	Kubernetes continuous delivery	Declarative, drift detection
Infrastructure	Terraform	1.6+	IaC for AWS resources	State management, modules
Secrets	AWS Secrets Manager	-	Encrypted credential storage	Rotation, IAM integration

6.2 Hardware Requirements

Edge Gateway (Per Site)

Component	Minimum	Recommended	High Availability
CPU	Intel i5-1340P (12 cores)	Intel i7-1370P (14 cores)	2x Intel i7 (HA cluster)
RAM	16 GB DDR4-3200	32 GB DDR4-3200	32 GB per node
Storage	1 TB NVMe SSD	2 TB NVMe SSD	2 TB per node + NAS sync
Network	1 Gbps Ethernet	2.5 Gbps Ethernet	Dual NIC + bonding
GPU (optional)	None	NVIDIA Jetson Orin NX 16GB	On-edge AI pre-filtering
Power	UPS 600VA	UPS 1000VA	Dual PSU + generator

Cloud GPU Nodes (AI Inference)

Cameras	GPU	VRAM	Streams	Cost/month (spot)
1-8	g4dn.xlarge (T4)	16 GB	8	~$200-350
8-16	g4dn.xlarge (T4)	16 GB	16	~$350-500
16-32	g4dn.2xlarge (T4)	16 GB	32	~$600-900
32-64	g5.2xlarge (A10G)	24 GB	64	~$1200-1800
64+	p4d.24xlarge (A100)	40 GB	128	~$5000-8000

6.3 Software Versions Summary

Category	Software	Version
Operating System	Ubuntu Server LTS	22.04.4
Container Runtime	Docker CE	25.x
Container Orchestration	Kubernetes (EKS/K3s)	1.28+
AI Serving	NVIDIA Triton Inference Server	2.40
GPU Runtime	CUDA	12.1+
GPU Driver	NVIDIA Driver	535+
Deep Learning Optimization	TensorRT	8.6+
AI Framework	PyTorch	2.1+
Computer Vision	OpenCV	4.8+
Video Processing	FFmpeg	6.0+
Service Language	Go	1.21+
AI/Training Language	Python	3.11+
Frontend Framework	Next.js	14
UI Library	React	18
Database	PostgreSQL	16
Message Queue	Apache Kafka	3.6+
Cache	Redis	7
Object Storage	MinIO	2024+
CI/CD	GitHub Actions	v4
GitOps	ArgoCD	2.9+
Monitoring	Prometheus + Grafana	2.47+ / 10.1+
Logging	Grafana Loki	2.9+
VPN	WireGuard	1.0+
Model Registry	MLflow	2.8+
Orchestration	Apache Airflow	2.7+
Infrastructure	Terraform	1.6+

6.4 Port Reference

Service	Port	Protocol	Location	Notes
DVR RTSP	554	TCP	192.168.29.200	Local network only
DVR HTTP	80	TCP	192.168.29.200	Admin UI, local only
DVR HTTPS	443	TCP	192.168.29.200	Admin UI, local only
DVR TCP	25001	TCP	192.168.29.200	Proprietary protocol
DVR UDP	25002	UDP	192.168.29.200	Proprietary protocol
DVR NTP	123	UDP	192.168.29.200	Time sync
WireGuard	51820	UDP	Cloud + Edge	VPN tunnel
Edge Admin	8080	TCP	192.168.29.5	Local admin UI
Edge SSH	22	TCP	192.168.29.5	Admin access only
Traefik HTTP	8000	TCP	EKS	Internal HTTP entrypoint
Traefik HTTPS	8443	TCP	EKS	Internal HTTPS entrypoint
ALB HTTPS	443	TCP	AWS	Public-facing
Backend API	8080	TCP	EKS pods	Internal service port
Triton HTTP	8000	TCP	EKS GPU nodes	Model inference HTTP
Triton gRPC	8001	TCP	EKS GPU nodes	Model inference gRPC
Triton Metrics	8002	TCP	EKS GPU nodes	Prometheus metrics
PostgreSQL	5432	TCP	RDS	VPC-private
Redis	6379	TCP	ElastiCache	VPC-private
Kafka	9092	TCP	MSK	VPC-private
MinIO API	9000	TCP	EKS + Edge	S3-compatible API
MinIO Console	9001	TCP	EKS + Edge	Admin console
Prometheus	9090	TCP	EKS	Metrics collection
Grafana	3000	TCP	EKS	Dashboards

Section 7: Database Schema

7.1 Schema Overview

The database is designed around a relational core (PostgreSQL 16) with pgvector extension for 512-dimensional face embedding storage and similarity search. The schema consists of 29 tables, 4 views, and 8 trigger functions, organized into 10 logical domains.

Schema Philosophy:

Strict normalization for reference data (cameras, persons, rules) to ensure data integrity
JSONB flexibility for event metadata and configuration to accommodate evolving AI outputs
Partitioning on all high-volume time-series tables for query performance and lifecycle management
pgvector HNSW indexing for sub-10ms face similarity search at scale
Row-level security (RLS) for multi-tenant site isolation
AES-256 encryption for all stored credentials (DVR passwords, API tokens)

7.2 Entity Relationship Overview

+=============================================================================+
|                    ENTITY RELATIONSHIP DIAGRAM                               |
+=============================================================================+
|                                                                              |
|   SITE (1) --------------------< (N) DVR                                     |
|    |                              |                                          |
|    |                              | (1)                                      |
|    |                              v                                          |
|    |                           CAMERA (N) <------------------< (N) ALERT_RULE|
|    |                              |                              |           |
|    |                              | (N)                            | (1)      |
|    |                              v                              v           |
|    |   +---------------------------------------------------------+           |
|    |   | EVENT (N) -->--(1) PERSON (1)--< (N) FACE_EMBEDDING               |
|    |   |   |                                                      |         |
|    |   |   | (N)                                                  | (N)     |
|    |   |   v                                                      v         |
|    |   | FACE_CROP (N)                                    PERSON_CLUSTER     |
|    |   |   |                                                                  |
|    |   |   | (N)                                                  +---------+|
|    |   |   v                                                      | Training||
|    |   | MEDIA_FILE (1) ----------------------------------------->| Dataset  ||
|    |   |                                                          |---------||
|    |   +--------------------------------------------------------->| Job      ||
|    |                                                              | Model    ||
|    |                              +---------+                     | Version  ||
|    |                              | Review  |                     +---------+|
|    |                              | Action  |                                |
|    |                              +---------+                                |
|    |                                    ^                                    |
|    |                                    | (N)                                |
|    +------------------------------------+                                    |
|   USER (N) -->--(N) ROLE_PERMISSION                                          |
|    |                                                                         |
|    | (1)                                                                     |
|    v                                                                         |
|   WATCHLIST (N) -->--(N) WATCHLIST_ENTRY                                     |
|                                                                              |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | Telegram|    |WhatsApp |    | Email   |    |Webhook  |                  |
|   | Config  |    | Config  |    | Config  |    | Config  |                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^              ^             ^              ^                         |
|        |              |             |              |                         |
|        +--------------+-------------+--------------+                         |
|                         |                                                    |
|                   NOTIFICATION_CHANNEL                                         |
|                         |                                                    |
|                         | (1)                                                |
|                         v                                                    |
|                   NOTIFICATION_LOG                                             |
|                                                                              |
|   +---------+    +---------+    +---------+                                  |
|   | Audit   |    | System  |    | Device  |                                  |
|   | Log     |    | Health  |    | Connect.|                                  |
|   |(partitioned) |  Log    |    |  Log    |                                  |
|   +---------+    +---------+    +---------+                                  |
|                                                                              |
+=============================================================================+

7.3 Core Tables Summary

7.3.1 Site and Infrastructure Tables

Table	Purpose	Key Fields	Rows (est.)
`sites`	Physical locations (factories, warehouses)	id, name, location, timezone, settings	1-10
`dvrs`	DVR/NVR devices per site	id, site_id, ip_address, port, username, password_encrypted, model, channels, status	1-10
`cameras`	Individual camera channels	id, dvr_id, channel_number, name, rtsp_url, resolution, fps, status, zone_config, zone_description	8-64

7.3.2 AI Detection and Identity Tables

Table	Purpose	Key Fields	Rows (est.)
`events`	All AI detection events (partitioned monthly)	id, camera_id, event_type, timestamp, confidence, bounding_box, person_id, face_crop_id, track_id	1M-10M/month
`persons`	Known and unknown individuals	id, name, status (known/unknown/blacklisted), role, company, notes, created_at	100-10,000
`face_crops`	Cropped face images metadata	id, event_id, person_id, storage_path, bounding_box, quality_score, blur_score, pose_yaw, pose_pitch	500K-5M/month
`face_embeddings`	512-D face embeddings (pgvector)	id, person_id, face_crop_id, embedding (vector(512)), model_version, is_primary	500K-5M
`person_clusters`	Unknown person cluster groups	id, cluster_label, representative_embedding_id, sample_count, first_seen, last_seen, status	10-1,000

7.3.3 Alert and Notification Tables

Table	Purpose	Key Fields	Rows (est.)
`alert_rules`	Per-camera alert configuration	id, camera_id, rule_type, name, config_json, schedule, enabled	50-500
`alerts`	Generated alert records	id, camera_id, rule_id, person_id, alert_type, severity, status, message	1K-50K/month
`notification_channels`	Alert destination endpoints	id, name, channel_type, config_json, is_active	5-20
`telegram_configs`	Telegram Bot API credentials	id, channel_id, bot_token_encrypted, chat_id	1-5
`whatsapp_configs`	WhatsApp Business API credentials	id, channel_id, api_key_encrypted, phone_number_id	1-5
`notification_log`	Delivery status per notification	id, alert_id, channel_id, status, sent_at, error_message	1K-50K/month

7.3.4 Watchlist and Access Control Tables

Table	Purpose	Key Fields	Rows (est.)
`users`	Dashboard users and operators	id, username, email, password_hash, role, is_active	5-50
`roles`	Permission roles	id, name, permissions_json	3-10
`watchlists`	Named monitoring lists	id, name, watch_type (vip/blacklist/custom), is_active	5-20
`watchlist_entries`	Persons on watchlists	id, watchlist_id, person_id, added_by, added_at	10-1,000

7.3.5 Training and ML Pipeline Tables

Table	Purpose	Key Fields	Rows (est.)
`training_datasets`	Curated face datasets for training	id, name, description, person_ids_json, sample_count, version, status	10-100
`training_jobs`	Model training job tracking	id, dataset_id, model_version_from, model_version_to, status, metrics_json	10-100
`model_versions`	Registry of trained model versions	id, version_string, training_job_id, metrics_json, is_production, is_rollback_available	10-50
`review_actions`	Operator review decisions	id, event_id, reviewer_id, action, from_person_id, to_person_id, notes	1K-100K

7.3.6 Media and Storage Tables

Table	Purpose	Key Fields	Rows (est.)
`media_files`	Registry of stored video/images	id, file_type, storage_path, size_bytes, checksum, camera_id, event_id, retention_until	100K-1M
`video_clips`	Video clip metadata for incidents	id, media_file_id, start_time, end_time, camera_id, event_id, duration_seconds	10K-100K

7.3.7 Audit and Monitoring Tables (Partitioned)

Table	Purpose	Partition	Retention
`audit_logs`	All user and system actions	Monthly by timestamp	1 year (Glacier)
`system_health_logs`	Component health metrics	Monthly by timestamp	90 days
`device_connectivity_logs`	Camera/DVR connectivity events	Monthly by timestamp	90 days

7.4 Indexing Strategy

7.4.1 pgvector HNSW Index (Critical Path)

-- HNSW index for sub-10ms face similarity search
-- ef_search controls recall/speed tradeoff (higher = more accurate, slower)
CREATE INDEX idx_face_embeddings_hnsw
ON face_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Query: Find top-K similar faces
SELECT person_id, 1 - (embedding <=> query_vector) AS similarity
FROM face_embeddings
WHERE is_primary = true
ORDER BY embedding <=> query_vector
LIMIT 5;

Parameter	Value	Rationale
`m`	16	Number of bi-directional links per node (higher = better recall, more memory)
`ef_construction`	128	Build-time exploration factor (higher = better index quality)
`ef_search` (runtime SET)	64-256	Search-time exploration factor (SET hnsw.ef_search = 128)
Distance metric	Cosine similarity (`<=>`)	Optimal for normalized face embeddings

7.4.2 B-Tree Indexes (Standard Queries)

Table	Index	Purpose
`events`	`(camera_id, timestamp DESC)`	Time-range queries per camera
`events`	`(event_type, timestamp DESC)`	Filter by event type
`events`	`(person_id)` WHERE person_id IS NOT NULL	Person event lookup
`face_crops`	`(person_id, quality_score DESC)`	Best quality face per person
`alerts`	`(status, created_at DESC)`	Pending alerts by age
`alerts`	`(severity, status)`	Critical alert dashboard
`persons`	`(status, name)`	Person directory with status filter
`persons`	`(created_at DESC)`	Recently added persons
`media_files`	`(retention_until)` WHERE retention_until < NOW() + 7 days	Expiring media cleanup

7.5 Partitioning Strategy

All high-volume time-series tables are partitioned monthly using pg_partman for automated partition management.

+-----------------------------------------------------------------------------+
|                    PARTITIONING ARCHITECTURE                                 |
+-----------------------------------------------------------------------------+
|                                                                              |
|   events (parent, empty)                                                     |
|   +-- events_y2024m01   (Jan 2024 data)                                     |
|   +-- events_y2024m02   (Feb 2024 data)                                     |
|   +-- events_y2024m03   (Mar 2024 data)                                     |
|   +-- events_y2024m04   (Apr 2024 data)                                     |
|   +-- events_y2024m05   (May 2024 data)  <-- Hot (in memory)               |
|   +-- events_default    (fallback)                                          |
|                                                                              |
|   Partition pruning: WHERE timestamp >= '2024-05-01'                        |
|                      -> Only scans events_y2024m05                           |
|                      -> ~30x faster for time-range queries                  |
|                                                                              |
|   Managed by: pg_partman extension                                          |
|   - Auto-create: 2 months ahead                                             |
|   - Auto-drop: After retention period (detach + archive)                    |
|                                                                              |
+-----------------------------------------------------------------------------+

Partitioned Tables:

Table	Partition Key	Partition Type	Retention
`events`	`timestamp`	Monthly RANGE	90 days hot, 1 year archive
`audit_logs`	`timestamp`	Monthly RANGE	1 year total
`system_health_logs`	`timestamp`	Monthly RANGE	90 days
`device_connectivity_logs`	`timestamp`	Monthly RANGE	90 days
`face_crops`	`created_at`	Monthly RANGE	90 days hot, 1 year archive

7.6 Retention Policies

Data Tier	Storage	Duration	Lifecycle
Hot Tier	PostgreSQL + MinIO	0-30 days	Fast query, indexed, in-memory cache
Warm Tier	S3 Standard	30-90 days	Available on-demand, still indexed
Cold Tier	S3 Infrequent Access	90-365 days	Retrieval within minutes
Archive Tier	Glacier Deep Archive	1-7 years	Retrieval within 12-48 hours
Compliance	Glacier Vault Lock	7+ years	Immutable, legal hold

Automated Cleanup:

Task	Frequency	Mechanism
Expire old event partitions	Daily (pg_partman)	DETACH PARTITION + S3 upload
Delete expired media files	Daily	Cron job: DELETE from media_files + MinIO removal
Purge old notification logs	Weekly	DELETE WHERE created_at < NOW() - INTERVAL '90 days'
Archive face crops to S3	Daily	Lambda: copy to S3 IA, update storage_path
Compress audit logs	Monthly	pglz/zstd compression on detached partitions
Vacuum and analyze	Weekly (auto-vacuum)	PostgreSQL autovacuum daemon

7.7 Security Considerations

7.7.1 Credential Encryption

All sensitive credentials stored with AES-256 encryption:

Table	Encrypted Field	Encryption
`dvrs`	`password_encrypted`	AES-256-CBC, key from AWS Secrets Manager
`telegram_configs`	`bot_token_encrypted`	AES-256-CBC
`whatsapp_configs`	`api_key_encrypted`	AES-256-CBC

7.7.2 Row-Level Security (RLS)

For multi-site deployments, RLS policies enforce that users only see data for sites they have access to:

-- Enable RLS on critical tables
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
ALTER TABLE persons ENABLE ROW LEVEL SECURITY;
ALTER TABLE alerts ENABLE ROW LEVEL SECURITY;

-- Policy: Users see only data from their assigned sites
CREATE POLICY site_isolation_events ON events
    USING (camera_id IN (
        SELECT c.id FROM cameras c
        JOIN dvrs d ON c.dvr_id = d.id
        JOIN site_users su ON d.site_id = su.site_id
        WHERE su.user_id = current_setting('app.current_user_id')::UUID
    ));

7.7.3 Access Control

Role	Permissions
`super_admin`	Full access to all sites, all operations
`site_admin`	Full access to assigned sites, user management
`operator`	View dashboards, acknowledge alerts, review persons
`viewer`	Read-only access to dashboards and events

7.7.4 Audit Trail

The audit_logs table (partitioned monthly) captures every significant action:

Action	Captured Data
`login`	User, IP, timestamp, MFA status, success/failure
`person_create`	Creator, name, initial status, source event
`person_update`	Updater, changed fields, old/new values
`alert_acknowledge`	Acknowledger, alert ID, timestamp
`alert_resolve`	Resolver, resolution notes
`training_approve`	Approver, model version, dataset version
`model_deploy`	Deployer, version, A/B split percentage
`config_change`	Changer, changed parameters, old/new values

7.7.5 Backup Strategy

Component	Method	Frequency	Retention
PostgreSQL	RDS automated backups	Daily	35 days
PostgreSQL	Manual snapshots	Before any schema change	90 days
MinIO/S3	Cross-region replication	Continuous	90 days in DR region
Face embeddings	pg_dump + vector export	Weekly	90 days
Model artifacts	MLflow artifact store	On training completion	Indefinite

Reference: For complete DDL including all CREATE TABLE statements, triggers, views, and functions, see database_schema.md — Sections 2 through 15 contain the full schema definition with comments and constraints.

Section 8: AI Model and Training Strategy

8.1 AI Model Selection

The inference pipeline uses three complementary deep learning models — for human detection, face detection, and face recognition — all optimized with TensorRT for GPU inference. All models run on a single NVIDIA T4 GPU with dynamic batching.

Component	Model	Framework	Input Size	FPS (T4)	Accuracy
Human Detection	YOLO11m (Ultralytics)	PyTorch -> ONNX -> TensorRT FP16	640 x 640	213	mAP@50: 80.5% (COCO)
Face Detection	SCRFD-500M-BNKPS (InsightFace)	PyTorch -> ONNX -> TensorRT FP16	640 x 640	~400	AP_medium: 87.2% (WIDERFace)
Face Recognition	ArcFace R100 (IR-SE100)	PyTorch -> ONNX -> TensorRT FP16	112 x 112	~800	99.83% (LFW), 98.35% (MegaFace)
Person Tracking	ByteTrack	Native Python + NumPy	N/A	N/A	80.3% MOTA (MOT17)
Unknown Clustering	HDBSCAN + DBSCAN fallback	scikit-learn	512-D vectors	N/A	89.5% purity, 0.855 BCubed F
Fall Detection	YOLOv8n-pose	TensorRT FP16	640 x 640	~300	Part of suspicious activity
Object Detection	YOLOv8s	TensorRT FP16	640 x 640	~450	Abandoned object detection

8.1.1 Human Detection: YOLO11m

Property	Value
Architecture	CSPDarknet backbone + PANet neck + Decoupled head
Parameters	19.6 M
FLOPs	68.2 B (at 640x640)
TensorRT Optimization	FP16, dynamic batch (1-16), layer fusion
GPU Memory	~2.1 GB at batch=8
Person class priority	Highest NMS score weighting for person class
Preprocessing	Letterbox resize to 640x640, normalize [0,1]

Export pipeline:

# PyTorch -> ONNX -> TensorRT Engine
yolo export model=yolo11m.pt format=onnx imgsz=640 half=True opset=17 simplify=True
trtexec --onnx=yolo11m.onnx --saveEngine=yolo11m.engine --fp16 \
  --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:16x3x640x640

8.1.2 Face Detection: SCRFD-500M-BNKPS

Property	Value
Architecture	Single-stage detector with FPN, BN+KPS head
Parameters	500 M (large variant for high accuracy)
Detects	Face bounding box + 5 facial landmarks
Minimum face size	20 x 20 pixels (configurable)
NMS threshold	0.45 (IoU)
Confidence threshold	0.5 (minimum detection score)
GPU Memory	~1.8 GB at batch=32

8.1.3 Face Recognition: ArcFace R100 (IR-SE100)

Property	Value
Backbone	IR-SE100 (Improved ResNet-100 with SE blocks)
Training data	MS1MV3 (5.8M images, 85K identities)
Loss function	ArcFace additive angular margin (m=0.5)
Embedding dimension	512 (float32, L2-normalized)
Distance metric	Cosine similarity (1 - cosine_distance)
Matching threshold (strict)	0.60
Matching threshold (balanced)	0.45
Matching threshold (relaxed)	0.30
GPU Memory	~3.2 GB at batch=64

Published benchmarks on standard datasets:

Dataset	Accuracy	Notes
LFW (Labeled Faces in the Wild)	99.83%	Unconstrained face verification
CFP-FP (Frontal-Profile)	99.17%	Cross-pose evaluation
AgeDB-30	98.28%	Age-invariant recognition
MegaFace (1M distractors)	98.35%	Large-scale recognition
IJB-C	96.18% (TAR@FAR=1e-4)	Template-based verification

8.2 Inference Pipeline Architecture

+=============================================================================+
|                    REAL-TIME INFERENCE PIPELINE                              |
+=============================================================================+
|                                                                              |
|  INPUT: RTSP Frame (640x640, 1 fps per stream)                              |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Frame Preprocessor| -> | YOLO11m Detector  | -> | Person Detection  |    |
|  | - Resize          |    | (TensorRT FP16)   |    | Results:          |    |
|  | - Normalize       |    | GPU: 12ms (P50)   |    | - bbox (x1,y1,x2, |    |
|  | - NCHW layout     |    | Batch: 1-16       |    |   y2)             |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     | - class (person)  |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Crop Extract | <- | SCRFD-500M        | <- | Face Detection    |    |
|  | (ROI from person  |    | (TensorRT FP16)   |    | Results:          |    |
|  |  bounding box)    |    | GPU: 8ms (P50)    |    | - face bbox       |    |
|  |                   |    | Batch: per-face   |    | - 5 landmarks     |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Alignment    | <- | ArcFace R100      | <- | Embedding Vector  |    |
|  | (5-point affine   |    | (TensorRT FP16)   |    | 512-D float32,   |    |
|  |  transform to     |    | GPU: 5ms (P50)    |    | L2-normalized     |    |
|  |  112x112)         |    | Batch: 1-64       |    |                   |    |
|  +-------------------+    +-------------------+    +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Matching     | <- | Person Tracking   | <- | Track-to-Person   |    |
|  | (cosine similarity|    | (ByteTrack)       |    | Association       |    |
|  |  vs. known DB)    |    | CPU: 2ms/frame    |    | - Match embedding |    |
|  +-------------------+    +-------------------+    |   to known persons  |    |
|       |  |  |                                      | - Create/update     |    |
|       |  |  |                                      |   track             |    |
|       v  v  v                                      +-------------------+    |
|  +-------------------+                                                        |
|  | Confidence Scorer |                                                        |
|  | (aggregate score  |                                                        |
|  |  for all detect)  |                                                        |
|  +-------------------+                                                        |
|       |                                                                       |
|       v                                                                       |
|  OUTPUT: DetectionEvent (JSON)                                               |
|  { person_id, track_id, confidence, bbox, face_crop,                         |
|    embedding, recognized_name?, quality_scores }                             |
|                                                                              |
+=============================================================================+

End-to-end latency budget per frame:

Stage	GPU	CPU Fallback
Frame preprocessing	2-5 ms	5-10 ms
YOLO11m detection	12 ms (P50)	35-56 ms (ONNX+OpenVINO)
SCRFD face detection	8 ms (P50)	15-25 ms
ArcFace embedding (per face)	5 ms (P50)	12-18 ms
ByteTrack tracking	2 ms	2-5 ms
Post-processing	5-10 ms	10-20 ms
Total (no face)	~29 ms	~67-116 ms
Total (1 face)	~34 ms	~79-134 ms
Total (5 faces)	~54 ms	~127-214 ms

8.3 Face Recognition Matching Strategy

8.3.1 Known Person Matching

+-----------------------------------------------------------------------------+
|                    FACE RECOGNITION MATCHING FLOW                            |
+-----------------------------------------------------------------------------+
|                                                                              |
|  New Face Embedding (512-D)                                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | L2 Normalize      |  embedding = embedding / ||embedding||_2              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | pgvector HNSW     | -> | Top-5 Candidates  |                              |
|  | Similarity Search |    | (cosine distance) |                              |
|  | ef_search=128     |    +-------------------+                              |
|  +-------------------+            |                                          |
|                                   v                                          |
|  +-------------------+    +-------------------+                              |
|  | Threshold Check   | <- | Best Match Score  |                              |
|  | (per AI Vibe)     |    +-------------------+                              |
|  +-------------------+            |                                          |
|       |                          |                                          |
|       +------------+-------------+                                          |
|                    |                                                        |
|         +----------+----------+                                             |
|         |                     |                                             |
|         v                     v                                             |
|    Above threshold      Below threshold                                     |
|    (Recognized)         (Unknown)                                           |
|         |                     |                                             |
|         v                     v                                             |
|  +------------+       +------------------+                                 |
|  | Assign to  |       | Check against    |                                 |
|  | known      |       | recent unknown   |                                 |
|  | person_id  |       | embeddings       |                                 |
|  | (with      |       | (5-min window)   |                                 |
|  | confidence)|       +--------+---------+                                 |
|  +------------+                |                                            |
|                                |                                            |
|                       +--------+--------+                                   |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Similar unknown    No similar unknown                      |
|                  (same person)      (new unknown)                           |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Reuse person_id   Create new                              |
|                  Update centroid   unknown person                           |
|                                    record                                   |
|                                                                              |
+-----------------------------------------------------------------------------+

8.3.2 AI Vibe Threshold Mapping

The AI Vibe system maps three intuitive presets to internal confidence thresholds:

Vibe	Face Match Threshold	Detection Confidence	Use Case
Relaxed	0.30 cosine similarity	0.40 minimum	Known persons re-identified more easily; more false positives acceptable
Balanced	0.45 cosine similarity	0.55 minimum	Default; good precision-recall tradeoff
Strict	0.60 cosine similarity	0.70 minimum	High-security scenarios; minimize false positives

Per-stream Vibe Selection:

Vibe can be set per camera via dashboard
Night mode automatically applies Strict vibe
Alert-triggered cameras automatically upgrade to Strict for 5 minutes

8.4 Unknown Person Clustering Approach

Unknown persons (faces that don't match any known person above threshold) are automatically clustered to help operators identify recurring visitors.

8.4.1 Clustering Pipeline

+-----------------------------------------------------------------------------+
|                    UNKNOWN PERSON CLUSTERING PIPELINE                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Unknown Face Embeddings (streaming)                                         |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | Sliding Window    |  Keep last N embeddings in memory (configurable)     |
|  | Buffer (500)      |  + persistent storage for long-term clustering       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | HDBSCAN Clustering| -> | Primary clusters  |  min_cluster_size=5        |
|  | (density-based)   |    | formed             |  min_samples=2             |
|  | metric=cosine     |    +-------------------+  eps=auto                   |
|  +-------------------+            |                                          |
|       | (fallback)                |                                          |
|       v                           v                                          |
|  +-------------------+    +-------------------+                              |
|  | DBSCAN Fallback   |    | Merge with        |  Check: temporal gap       |
|  | (if HDBSCAN fails |    | existing clusters |  < 30 days, cosine sim     |
|  |  to find structure|    | - centroid        |  > 0.85                    |
|  +-------------------+    |   distance        |                            |
|                           +-------------------+                            |
|                                   |                                          |
|                                   v                                          |
|                           +-------------------+                              |
|                           | Operator Review   |  Dashboard shows clusters   |
|                           | Queue             |  pending identification     |
|                           +-------------------+                              |
|                                                                              |
+-----------------------------------------------------------------------------+

8.4.2 Clustering Parameters

Parameter	Value	Description
Algorithm	HDBSCAN (primary), DBSCAN (fallback)	Density-based for irregular cluster shapes
Distance metric	Cosine similarity	Optimal for face embeddings
Minimum cluster size	5 embeddings	Minimum to form a cluster
Minimum samples	2	Core point density threshold
Merge threshold	0.85 cosine similarity	Merge clusters if centroids are close
Temporal window	30 days	Maximum gap between cluster appearances
Review trigger	10+ embeddings	Send to operator review queue

8.4.3 Clustering Quality Targets

Metric	Target	Measurement
Cluster Purity	> 89%	% of embeddings in a cluster belonging to the same person
BCubed F-Measure	> 0.85	Harmonic mean of precision and recall for clustering
Silhouette Score	> 0.3	Separation quality between clusters
False Merge Rate	< 5%	Different persons incorrectly merged
Split Rate	< 15%	Same person split into multiple clusters

8.5 Confidence Handling

8.5.1 Confidence Score Computation

Each detection event carries an aggregate confidence score computed from multiple signals:

confidence_aggregate = weighted_average(
    detection_confidence:    0.35 * yolo_confidence,
    face_detection_quality:  0.25 * scrfd_confidence,
    face_recognition_score:  0.25 * (1 - cosine_distance_to_match),
    face_quality_score:      0.15 * quality_composite
)

Where quality_composite = average(
    1.0 - blur_score,       # Sharpness (higher is better)
    1.0 - abs(pose_yaw)/90, # Frontal preference
    illumination_score,      # Well-lit face
    resolution_adequacy      # Sufficient pixels for face
)

8.5.2 Confidence Levels

Level	Score Range	Color	Action
High Confidence	0.80 - 1.00	Green	Auto-accept, no review needed
Medium Confidence	0.60 - 0.79	Yellow	Accepted, flagged for periodic review
Low Confidence	0.40 - 0.59	Orange	Requires operator review within 24h
Very Low Confidence	0.00 - 0.39	Red	Rejected, not used for training

8.6 Training Workflow Overview

The safe self-learning system captures operator feedback and converts it into model improvements through a carefully controlled pipeline.

8.6.1 Three Learning Modes

Mode	Description	Use Case	Risk Level
Manual Only	Operator explicitly triggers training runs	Highly regulated environments	Lowest
Suggested Learning (Recommended)	System suggests training candidates; operator approves	Standard production deployment	Low
Approved Auto-Update	Auto-training triggers after admin approval threshold	Mature deployment with trusted operators	Medium

8.6.2 Training Pipeline Architecture

+=============================================================================+
|                    SAFE SELF-LEARNING PIPELINE                               |
+=============================================================================+
|                                                                              |
|  STEP 1: COLLECTION                                                          |
|  +-------------------+                                                       |
|  | Operator Review   |  confirm, correct_name, merge, reject                |
|  | Actions           |  + automatic high-confidence acceptances              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 2: CONFLICT DETECTION (Synchronous, blocks immediately)               |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Label Conflict    | -> | If conflict found | -> | Block from training |   |
|  | Detector          |    | (5 types)         |    | dataset, alert admin |   |
|  | - Same face, diff |    +-------------------+    +-------------------+    |
|  |   names           |                                                       |
|  | - Diff faces, same|                                                       |
|  |   name            |                                                       |
|  | - Merge circular  |                                                       |
|  |   reference       |                                                       |
|  | - Name to already-|                                                       |
|  |   deleted person  |                                                       |
|  | - Quality below   |                                                       |
|  |   threshold       |                                                       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 3: DATASET CURATION                                                    |
|  +-------------------+                                                       |
|  | Training Dataset  |  - Collect approved examples                         |
|  | Builder           |  - Balance classes (min 5 per person)                |
|  |                   |  - Augmentation (flip, rotate, brightness)           |
|  |                   |  - Quality filter (blur, pose, illumination)         |
|  |                   |  - Train/val split (80/20)                            |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 4: MODEL TRAINING                                                      |
|  +-------------------+                                                       |
|  | Training Job      |  - ArcFace R100 backbone                              |
|  | (Airflow DAG)     |  - Fine-tuning on curated dataset                     |
|  |                   |  - Cosine annealing LR schedule                        |
|  |                   |  - Early stopping (patience=10)                       |
|  |                   |  - Mixed precision (AMP)                              |
|  |                   |  - Typical duration: 2-8 hours on V100                |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 5: QUALITY GATES                                                       |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Gate 1: Hold-out  | -> | Gate 2: Compare   | -> | Gate 3: Identity  |    |
|  |    evaluation     |    |    vs current     |    |    accuracy       |    |
|  |    (precision,    |    |    production     |    |    (100% known)   |    |
|  |     recall, f1)   |    |    (no >2% regress)|   |                   |    |
|  +-------------------+    +-------------------+    +-------------------+    |
|       |                          |                          |                |
|       +------------+-------------+--------------------------+                |
|                    |                                                          |
|         +----------+----------+                                              |
|         |                     |                                              |
|         v                     v                                              |
|     ALL PASSED            ANY FAILED                                       |
|         |                     |                                              |
|         v                     v                                              |
|  +------------+       +------------------+                                 |
|  | Proceed to |       | REJECT           |                                 |
|  | Deployment |       | - Log failure    |                                 |
|  +------------+       | - Alert admin    |                                 |
|                       | - Keep in staging|                                 |
|                       +------------------+                                 |
|                                                                              |
|  STEP 6: DEPLOYMENT                                                          |
|  +-------------------+                                                       |
|  | A/B Testing       |  - Shadow mode: 0% traffic (validation)              |
|  | (gradual rollout) |  - Canary: 5% traffic for 24h                        |
|  |                   |  - Monitor: latency, error rate, FP rate              |
|  |                   |  - Full rollout: 100% traffic                         |
|  |                   |  - Rollback: < 60 seconds to previous version         |
|  +-------------------+                                                       |
|                                                                              |
+=============================================================================+

8.7 Model Versioning and Rollback

8.7.1 Semantic Versioning

Version Component	Increment When	Example
MAJOR (X.0.0)	Full retraining, architecture change, breaking embedding change	1.0.0 -> 2.0.0 (new backbone)
MINOR (x.Y.0)	Fine-tuning, significant new data (>50 new identities)	1.0.0 -> 1.1.0 (new employees)
PATCH (x.y.Z)	Incremental update, centroid update, hotfix	1.0.0 -> 1.0.1 (new photos added)

8.7.2 Version States

State	Description	Transition
`TRAINING`	Model is being trained	Auto -> STAGING on completion
`STAGING`	Awaiting quality gate evaluation	Auto -> AWAITING_APPROVAL on pass
`AWAITING_APPROVAL`	Pending admin approval	Manual -> CANARY on approve
`CANARY`	5% traffic, monitoring	Auto -> PRODUCTION on success (24h)
`PRODUCTION`	100% traffic, active serving	Manual -> ARCHIVED on new version deploy
`ARCHIVED`	Kept for rollback, no traffic	Auto -> ROLLBACK_AVAILABLE after 30 days
`ROLLBACK_AVAILABLE`	Can be rolled back to	Manual -> PRODUCTION on rollback trigger
`DEPRECATED`	Cannot be rolled back to	Final state

8.7.3 Rollback Procedure

+-----------------------------------------------------------------------------+
|                    EMERGENCY ROLLBACK PROCEDURE                              |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Trigger: Admin initiates rollback or automatic rollback on failure         |
|                                                                              |
|  Step 1: Validate target version exists and is in ROLLBACK_AVAILABLE state  |
|  Step 2: Load target model artifacts from S3/MinIO (pre-warm GPU)          |
|  Step 3: Atomic switch: update model reference in Triton config             |
|  Step 4: Triton SIGHUP reload (zero-downtime model swap)                   |
|  Step 5: Validate: send test inference requests, check latency              |
|  Step 6: If validation fails -> auto-revert to previous production          |
|  Step 7: If validation passes -> update database model version records      |
|  Step 8: Log rollback event in audit_logs                                   |
|                                                                              |
|  Maximum rollback time: < 60 seconds                                        |
|  Zero inference downtime during rollback                                    |
|                                                                              |
+-----------------------------------------------------------------------------+

8.8 Quality Gates

8.8.1 Gate Thresholds

Gate	Metric	Minimum	Maximum	Critical
Hold-out Evaluation	Precision	0.97	—	Yes (cannot override)
Hold-out Evaluation	Recall	0.95	—	Yes
Hold-out Evaluation	F1 Score	0.96	—	Yes
No Regression	Metric regression vs production	—	2%	No (admin can override)
Identity Accuracy	Known identity recall	100%	—	Yes
Latency	P99 inference latency	—	150 ms	Yes
Confusion Analysis	False positive rate	—	5%	No

8.8.2 Quality Gate Report Example

{
  "gate_run_id": "550e8400-e29b-41d4-a716-446655440000",
  "candidate_model_version": "1.2.0",
  "baseline_model_version": "1.1.0",
  "timestamp": "2024-01-25T10:30:00Z",
  "overall_result": "PASSED",
  "gates": [
    {
      "name": "holdout_performance",
      "status": "PASSED",
      "critical": true,
      "metrics": {
        "precision": 0.9842,
        "recall": 0.9678,
        "f1_score": 0.9759
      }
    },
    {
      "name": "no_regression",
      "status": "PASSED",
      "metrics": {
        "max_regression_pct": 0.8,
        "per_metric": {
          "precision": 0.003,
          "recall": -0.008,
          "f1_score": -0.002
        }
      }
    },
    {
      "name": "known_identity_accuracy",
      "status": "PASSED",
      "metrics": {
        "known_identities_tested": 142,
        "perfect_accuracy": 142,
        "accuracy_below_threshold": 0
      }
    },
    {
      "name": "latency_requirement",
      "status": "PASSED",
      "metrics": {
        "p50_latency_ms": 45,
        "p99_latency_ms": 128,
        "threshold_ms": 150
      }
    }
  ]
}

8.8.3 Embedding Update Strategies

After a model passes quality gates and is deployed, the face embedding database must be updated. Five strategies are available:

Strategy	When to Use	Duration	Impact
Centroid Update	Few new examples (<10 per identity), same model	Seconds	Update running mean only
Incremental Add	Many new examples (10-100 per identity), same model	Minutes	Add new embeddings, keep existing
Full Reindex	Model version changed, or >10% of identities updated	Hours	Recompute all embeddings
Merge and Update	Identity merge operation	Seconds	Weighted centroid merge
Rollback Reindex	Model rollback	Minutes	Restore previous embeddings

Decision Matrix:

+-----------------------------------------------------------------------------+
|                    EMBEDDING UPDATE STRATEGY SELECTION                       |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Model changed?                                                              |
|       |                                                                      |
|       +-- YES -> FULL_REINDEX (required, embeddings are model-dependent)     |
|       |                                                                      |
|       NO -> What changed?                                                    |
|               |                                                              |
|               +-- Identity merge -> MERGE_AND_UPDATE                         |
|               |                                                              |
|               +-- Rollback -> ROLLBACK_REINDEX                               |
|               |                                                              |
|               +-- New examples?                                              |
|                       |                                                      |
|                       +-- < 10 per identity, < 10% total -> CENTROID_UPDATE |
|                       |                                                      |
|                       +-- Otherwise -> INCREMENTAL_ADD                       |
|                                                                              |
+-----------------------------------------------------------------------------+

Reference: For complete model export commands, INT8 calibration scripts, performance benchmarks, and the full Python module structure, see ai_vision.md — Sections 10-14. For the complete training pipeline code, Airflow DAG definitions, and quality gate implementations, see training_system.md — Sections 5-10.

Section 9: Suspicious Activity Night-Mode Design

9.1 Overview

The suspicious activity detection system provides comprehensive behavioral analysis during night hours (22:00-06:00 by default) through 10 specialized detection modules. Each module operates on the output of the AI inference pipeline (detected persons, tracked positions, and face identities) to identify anomalous behavior patterns.

The system features a composite scoring engine that combines signals from all modules with exponential time-decay, enabling unified threat assessment and intelligent escalation. Each camera can be independently configured with custom zones, thresholds, and schedules.

9.2 Ten Detection Modules Summary

#	Module	Description	Severity	Key CV Model
1	Intrusion Detection	Detects persons entering restricted polygon zones	HIGH (default)	YOLO11m detections + zone polygon
2	Loitering Detection	Flags persons dwelling in an area longer than threshold	MEDIUM (default)	ByteTrack + timer per track
3	Running Detection	Identifies abnormally fast movement	MEDIUM (default)	YOLOv8n-pose + optical flow speed
4	Crowding Detection	Alerts when group density exceeds threshold	HIGH (default)	DBSCAN spatial clustering
5	Fall Detection	Detects persons falling or collapsing	CRITICAL	YOLOv8n-pose keypoint analysis
6	Abandoned Object	Identifies unattended objects left behind	HIGH (default)	YOLOv8s + MOG2 background subtraction
7	After-Hours Presence	Detects any person presence during night hours	MEDIUM (default)	YOLO11m person class only
8	Zone Breach	Triggers on crossing virtual boundary lines	MEDIUM (default)	ByteTrack + line crossing algorithm
9	Repeated Re-entry	Flags patterns of entering/exiting an area multiple times	MEDIUM (default)	ByteTrack + entry/exit state machine
10	Suspicious Dwell Time	Alerts on extended presence near sensitive areas	MEDIUM (configurable)	ByteTrack + per-zone timers

9.3 Module Details

9.3.1 Module 1: Intrusion Detection

Detects when a person enters a user-defined restricted polygon zone.

Parameter	Default	Range	Description
`confidence_threshold`	0.55	0.3-0.9	Minimum person detection confidence
`overlap_threshold`	0.30	0.1-0.9	Min IoU between person bbox and zone
`cooldown_seconds`	60	0-3600	Cooldown before re-alerting same zone
`zone_severity`	HIGH	LOW/MEDIUM/HIGH	Per-zone configurable

Algorithm:

For each detected person:
    For each restricted zone polygon:
        Compute IoU(person_bbox, zone_polygon)
        If IoU > overlap_threshold AND confidence > confidence_threshold:
            If zone not in cooldown:
                Trigger INTRUSION alert
                Start cooldown timer

9.3.2 Module 2: Loitering Detection

Flags persons who remain in an area longer than a threshold.

Parameter	Default	Range	Description
`dwell_time_threshold_seconds`	300	30-1800	Time before triggering loitering alert
`movement_tolerance_pixels`	50	10-200	Max centroid movement to still count as "stationary"
`cooldown_seconds`	300	0-3600	Cooldown after alert

Algorithm:

For each active track:
    If track centroid moved < tolerance in last N seconds:
        Increment dwell timer
        If dwell_timer > threshold:
            Trigger LOITERING alert
            Reset timer (or hold until movement detected)
    Else:
        Reset dwell timer

9.3.3 Module 3: Running Detection

Identifies abnormally fast movement using pose keypoints and optical flow.

Parameter	Default	Range	Description
`speed_threshold_pixels_per_second`	150	50-500	Pixel speed threshold
`speed_threshold_kmh`	15.0	5-40	Real-world speed (requires calibration)
`confirmation_frames`	3	1-10	Consecutive frames to confirm running

Algorithm:

For each active track:
    Compute torso keypoint displacement between frames
    Convert pixel speed to km/h (if calibration available)
    Apply Farneback optical flow for refinement
    If speed > threshold for confirmation_frames:
        Trigger RUNNING alert

9.3.4 Module 4: Crowding Detection

Alerts when person group density exceeds threshold.

Parameter	Default	Range	Description
`count_threshold`	5	2-50	Minimum person count in cluster
`area_threshold`	0.15	0.05-0.5	Fraction of frame covered by group
`density_threshold`	0.05	0.01-0.2	Persons per square meter (calibrated)
`dbscan_eps`	0.08	0.01-0.3	DBSCAN neighborhood radius (normalized)

Algorithm:

Collect all person centroids in current frame
Run DBSCAN(eps=0.08, min_samples=2) on centroids
For each cluster:
    If cluster_size >= count_threshold OR cluster_area >= area_threshold:
        Trigger CROWDING alert

9.3.5 Module 5: Fall Detection

Detects persons falling or collapsing using pose keypoint analysis.

Parameter	Default	Range	Description
`fall_score_threshold`	0.75	0.5-0.95	Combined fall confidence score
`min_keypoint_confidence`	0.30	0.1-0.5	Minimum keypoint detection confidence
`torso_angle_threshold_deg`	45	30-75	Torso angle from vertical to trigger
`aspect_ratio_threshold`	1.2	0.8-2.0	Width/height ratio of person bbox
`temporal_confirmation_ms`	1000	500-3000	Duration to confirm fall (not just bend)

Algorithm:

For each detected person with pose keypoints:
    Compute torso angle from vertical (using shoulder-hip line)
    Compute bbox aspect ratio
    Check if person is on ground (feet keypoint confidence drops)
    Calculate fall_score = weighted_combination(angle, aspect_ratio, ground_contact)
    If fall_score > threshold AND duration > confirmation_ms:
        Trigger FALL alert (CRITICAL severity)

9.3.6 Module 6: Abandoned Object Detection

Identifies unattended objects using background subtraction and object detection.

Parameter	Default	Range	Description
`unattended_time_threshold_seconds`	60	10-600	Time before object is considered abandoned
`proximity_threshold_pixels`	100	20-300	Max distance from owner before "unattended"
`watchlist_classes`	["backpack", "suitcase", "box", "bag"]	—	Object classes to monitor
`bg_learning_rate`	0.005	0.001-0.01	MOG2 background model learning rate

Algorithm:

Run YOLOv8s to detect objects in watchlist_classes
Run MOG2 background subtraction to identify static foreground
For each detected object:
    Track owner proximity (nearest person)
    If owner distance > threshold AND object stationary > time_threshold:
        Trigger ABANDONED_OBJECT alert

9.3.7 Module 7: After-Hours Presence

Simple but effective: any person detected during night hours triggers an alert.

Parameter	Default	Range	Description
`detection_confidence_threshold`	0.50	0.3-0.9	Minimum person detection confidence
`min_detection_frames`	5	1-30	Frames to confirm (avoid false positives)
`check_authorized_personnel`	false	true/false	If true, check against known persons whitelist

9.3.8 Module 8: Zone Breach

Detects crossing of virtual boundary lines (directional or bidirectional).

Parameter	Default	Range	Description
`boundary_lines`	[] (user-defined)	—	Array of {start, end, direction, severity}
`allowed_direction`	"both"	both/a_to_b/b_to_a	Which direction is allowed
`crossing_threshold_pixels`	20	5-100	Min distance past line to trigger
`cooldown_seconds`	30	0-3600	Cooldown per (track, line) pair

Algorithm:

For each active track:
    For each boundary line:
        Check if track centroid crosses line in forbidden direction
        Using line equation: ax + by + c = 0, check sign change
        If crossed AND distance_past_line > threshold:
            Trigger ZONE_BREACH alert

9.3.9 Module 9: Repeated Re-entry Patterns

Detects suspicious patterns of entering and exiting an area multiple times.

Parameter	Default	Range	Description
`reentry_zone`	Full frame	polygon	Area to monitor for entries/exits
`time_window_seconds`	600	60-3600	Time window for counting cycles
`reentry_threshold`	3	2-10	Min entry/exit cycles to trigger
`min_cycle_duration_seconds`	30	5-300	Min duration of one cycle

State Machine:

For each track:
    Track state: OUTSIDE -> ENTERING -> INSIDE -> EXITING -> OUTSIDE
    Each complete cycle (entry + exit) increments counter
    If cycle_count >= threshold within time_window:
        Trigger REENTRY_PATTERN alert

9.3.10 Module 10: Suspicious Dwell Time

Extended presence near sensitive areas (different from general loitering).

Parameter	Default	Range	Description
`sensitive_zones`	[] (user-defined)	—	Zones with custom dwell thresholds
`default_dwell_threshold_seconds`	120	10-1800	Default threshold
`max_gap_seconds`	5.0	1.0-30.0	Max disappearance gap before timer reset

Predefined zone types with default thresholds:

Zone Type	Default Threshold	Default Severity
`main_entrance`	60s	MEDIUM
`emergency_exit`	30s	HIGH
`equipment_room`	45s	HIGH
`storage_area`	120s	MEDIUM
`elevator_bank`	90s	LOW
`parking_access`	60s	MEDIUM

9.4 Activity Scoring Engine

9.4.1 Composite Score Formula

All 10 modules feed into a unified scoring engine that produces a single suspicious activity score per camera:

S_total(t) = SUM_i( weight_i * signal_i(t) * decay(t - t_i) ) + bonus_cross_module

Where:
    weight_i: module-specific weight (see table below)
    signal_i(t): normalized signal value from module i [0, 1]
    decay(delta_t): exponential time-decay function
    bonus_cross_module: extra score when multiple modules fire simultaneously
    t_i: timestamp of most recent event from module i

9.4.2 Module Weights

Module	Weight	Signal Source	Signal Range
Intrusion Detection	0.25	overlap_ratio * confidence	0.0 - 1.0
Loitering Detection	0.15	dwell_ratio (dwell_time / threshold)	0.0 - 1.0+
Running Detection	0.10	speed_ratio normalized	0.0 - 1.0+
Crowding Detection	0.12	crowd_density_score	0.0 - 1.0
Fall Detection	0.20	fall_confidence_score	0.0 - 1.0
Abandoned Object	0.18	unattended_ratio (duration / threshold)	0.0 - 1.0+
After-Hours Presence	0.05	binary (1 if detected) * zone_severity_multiplier	0.0 - 1.0
Zone Breach	0.12	severity_mapped (LOW=0.3, MED=0.6, HIGH=1.0)	0.0 - 1.0
Re-entry Patterns	0.10	cycle_ratio (count / threshold)	0.0 - 1.0+
Suspicious Dwell	0.13	dwell_ratio (duration / zone_threshold)	0.0 - 1.0+

Note: Weights sum to 1.40 — this is intentional to allow cross-module amplification when multiple modules fire simultaneously.

9.4.3 Time-Decay Function

def time_decay(delta_t_seconds, half_life=300):
    """Exponential decay with 5-minute half-life by default."""
    import math
    return math.exp(-0.693 * delta_t_seconds / half_life)

# Decay reference:
#   0 min -> 1.000 (full contribution)
#   1 min -> 0.871
#   5 min -> 0.500
#  10 min -> 0.250
#  20 min -> 0.063
#  30 min -> 0.016 (effectively zero)

9.4.4 Cross-Module Amplification Bonus

When multiple modules detect simultaneously for the same track or in close proximity:

def compute_cross_module_bonus(active_signals, proximity_weight=0.15):
    n_modules = len(active_signals)
    if n_modules <= 1:
        return 0.0

    # Base bonus: +15% per additional module
    base_bonus = proximity_weight * (n_modules - 1)

    # Track overlap: same person triggering multiple rules -> higher threat
    track_bonus = 0.10 * (n_same_track_signals - 1) if n_same_track_signals >= 2 else 0

    # Zone overlap: multiple signals in same zone -> higher threat
    zone_bonus = 0.08 * (n_same_zone_signals - 1) if n_same_zone_signals >= 2 else 0

    return min(base_bonus + track_bonus + zone_bonus, 0.50)  # Cap at +0.50

9.4.5 Escalation Thresholds

Score Range	Threat Level	Color	Actions
0.00 - 0.20	NONE	Gray	Log only, no alert
0.20 - 0.40	LOW	Blue	Log + dashboard indicator
0.40 - 0.60	MEDIUM	Yellow	Log + non-urgent alert dispatch
0.60 - 0.80	HIGH	Orange	Log + immediate alert + highlight
0.80 - 1.00	CRITICAL	Red	Log + all channels + security dispatch recommendation
> 1.00	EMERGENCY	Purple/Flashing	All channels + automatic escalation to security lead

9.5 Night Mode Scheduler

9.5.1 Automatic Schedule

Parameter	Default	Configurable
Start time	22:00 (10 PM)	Yes, per camera
End time	06:00 (6 AM)	Yes, per camera
Gradual transition	15 minutes	Yes (0-60 min)
Timezone	Local site timezone	Yes
Override	Manual toggle available	Admin only

9.5.2 Gradual Transition

During the 15-minute transition window, sensitivity ramps linearly:

Transition Start (21:45)          Night Full (22:00)         Transition End (22:15)
      |                                  |                           |
      v                                  v                           v
Sensitivity: 0% ---- 25% ---- 50% ---- 75% ---- 100% ---- 100% ---- 100%
              |__________|__________|__________|__________|__________|
                  Ramp up to full night sensitivity over 15 minutes

This prevents sudden spikes in alerts when night mode activates.

9.5.3 Night Mode Behavior Changes

Aspect	Day Mode	Night Mode
Detection modules	Intrusion, Crowding, Fall, Abandoned Object	All 10 modules active
AI Vibe preset	Per-camera setting	Automatically Strict
Confidence threshold	Per-camera setting	+0.10 (stricter)
Scoring engine weights	Standard weights	+25% intrusion, +20% fall
Alert suppression	5-minute cooldown	2-minute cooldown (faster alerts)
After-hours detection	Disabled	Enabled (primary night function)

9.6 Per-Camera Configuration

Each camera has independent configuration for all detection modules:

# Example: Camera 1 - Main Entrance
cam_01:
  enabled: true
  location: "Main Entrance Lobby"
  night_mode:
    enabled: true
    custom_schedule: null        # Use system default (22:00-06:00)
    sensitivity_multiplier: 1.0   # Standard sensitivity

  intrusion_detection:
    enabled: true
    confidence_threshold: 0.65
    overlap_threshold: 0.30
    cooldown_seconds: 30
    restricted_zones:
      - zone_id: "server_room_door"
        polygon: [[0.65,0.20], [0.85,0.20], [0.85,0.60], [0.65,0.60]]
        severity: "HIGH"

  loitering_detection:
    enabled: true
    dwell_time_threshold_seconds: 300
    movement_tolerance_pixels: 50

  running_detection:
    enabled: true
    speed_threshold_pixels_per_second: 150
    confirmation_frames: 3

  fall_detection:
    enabled: true
    fall_score_threshold: 0.75
    temporal_confirmation_ms: 1000

  # ... (all 10 modules configured)

9.7 Alert Generation Logic

9.7.1 Alert Lifecycle

+------------+    +------------+    +------------+    +------------+
|  DETECTED  | -> | SUPPRESSED | -> |  EVIDENCE  | -> | DISPATCHED |
| (Rule fire)|    | (Dedup)    |    | (Capture)  |    | (Send)     |
+------------+    +------------+    +------------+    +------------+
                                                          |
                                                          v
                                                   +------------+
                                                   | ACKNOWLEDGE|
                                                   | or AUTO    |
                                                   +------------+

9.7.2 Suppression Rules

Condition	Action	Reason
Duplicate within suppression window	Log + increment counter	Prevent alert spam
Detection confidence < rule minimum	Log only	Insufficient evidence
Threat score < LOW threshold	Log only	Below alert threshold
Max alerts/hour for camera exceeded	Log + rate-limit flag	Prevent overflow
Composite score indicates low overall threat	Log + dashboard only	Reduce noise

9.7.3 Suppression Configuration

Parameter	Default	Range
Default suppression window	5 minutes	0-60 minutes
Max alerts per hour per camera	20	5-100
Max alerts per hour per rule	10	5-50
Evidence snapshot frames before	5 frames	1-30
Evidence snapshot frames after	10 frames	1-30
Evidence clip duration	10 seconds	5-60

9.7.4 Severity Assignment

Final alert severity considers both the triggering module and the composite score context:

def assign_alert_severity(detection_event, composite_score):
    base_severity = detection_event['severity']  # From module config
    severity_levels = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3, 'CRITICAL': 4}
    base_level = severity_levels.get(base_severity, 2)

    # Escalation: high composite score bumps severity up one level
    if composite_score >= 0.80 and base_level < 3:
        base_level = min(base_level + 1, 4)

    # Escalation: multiple concurrent detections for same track
    if detection_event.get('concurrent_detections_count', 0) >= 2:
        base_level = min(base_level + 1, 4)

    # Zone-specific escalation override
    if detection_event.get('zone_severity_override'):
        zone_level = severity_levels.get(detection_event['zone_severity_override'], base_level)
        base_level = max(base_level, zone_level)

    reverse_levels = {v: k for k, v in severity_levels.items()}
    return reverse_levels.get(base_level, 'MEDIUM')

9.8 Integration with Main AI Pipeline

The suspicious activity service consumes detection events from the main AI pipeline:

+-----------------------------------------------------------------------------+
|               SUSPICIOUS ACTIVITY INTEGRATION WITH MAIN PIPELINE             |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Main AI Pipeline Output:                                                    |
|  { person_id, track_id, bbox, keypoints, face_embedding, timestamp,        |
|    camera_id, confidence, face_crop_path }                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Kafka Topic       | -> | Suspicious Activity| -> | Scoring Engine    |    |
|  | ai.detections     |    | Service            |    | (per camera)      |    |
|  | (JSON events)     |    | - 10 modules       |    | - Composite score |    |
|  +-------------------+    | - Per-camera config|    | - Time decay      |    |
|                           | - Zone polygons    |    | - Cross-module    |    |
|                           +-------------------+    |   bonus           |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|                           +-------------------+    +-------------------+    |
|                           | Alert Manager     | <- | Scoring Output    |    |
|                           | - Deduplicate     |    | - Score [0, 1.5]  |    |
|                           | - Rate limit      |    | - Threat level    |    |
|                           | - Severity assign |    | - Active signals  |    |
|                           +---------+---------+    +-------------------+    |
|                                     |                                        |
|                                     v                                        |
|                           +-------------------+                             |
|                           | Alerts Table (DB) |                             |
|                           | Notification Svc  |                             |
|                           +-------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

Key integration points:

Suspicious Activity Service is a Kafka consumer on the ai.detections topic
Processes events after face recognition (has access to person identity)
Produces alert records to the alerts.critical topic for notification dispatch
Updates the composite score in Redis (with TTL = 2 * half_life) for dashboard real-time display
Stores all alert records in PostgreSQL for history and analytics

Reference: For complete detection algorithm pseudocode, zone configuration YAML schema, scoring engine implementation, and evidence capture logic, see suspicious_activity.md — Sections 2-6.

Section 10: Live Video Streaming Design

10.1 RTSP Stream Configuration for CP PLUS DVR

10.1.1 URL Format

The CP PLUS ORANGE DVR uses a Dahua-compatible RTSP URL scheme:

rtsp://admin:{password}@{dvr_ip}:554/cam/realmonitor?channel={N}&subtype={M}

Where:
    N = channel number (1-8)
    M = stream type (0 = main stream, 1 = sub stream)

Example URLs for all 8 channels:

Channel	Main Stream	Sub Stream
CH1	`rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0`	`rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=1`
CH2	`...channel=2&subtype=0`	`...channel=2&subtype=1`
CH3	`...channel=3&subtype=0`	`...channel=3&subtype=1`
CH4	`...channel=4&subtype=0`	`...channel=4&subtype=1`
CH5	`...channel=5&subtype=0`	`...channel=5&subtype=1`
CH6	`...channel=6&subtype=0`	`...channel=6&subtype=1`
CH7	`...channel=7&subtype=0`	`...channel=7&subtype=1`
CH8	`...channel=8&subtype=0`	`...channel=8&subtype=1`

10.1.2 Stream Properties

Property	Main Stream (subtype=0)	Sub Stream (subtype=1)
Resolution	960 x 1080	352 x 288 to 704 x 576
Frame rate	25 FPS (PAL)	25 FPS
Video codec	H.264 High Profile	H.264 Baseline/Main
Bitrate	~4 Mbps per channel	~1 Mbps per channel
Audio	G.711/AAC (optional)	None
Use case	Fullscreen viewing, evidence clips	AI inference, multi-camera grid

10.1.3 Stream Discovery

The edge gateway can auto-discover streams via ONVIF:

from onvif import ONVIFCamera

camera = ONVIFCamera('192.168.29.200', 80, 'admin', 'password')
media_service = camera.create_media_service()
profiles = media_service.GetProfiles()

for profile in profiles:
    stream_uri = media_service.GetStreamUri({
        'StreamSetup': {'Stream': 'RTP_unicast', 'Transport': 'RTSP'},
        'ProfileToken': profile.token
    })
    print(f"Channel: {profile.token}, URI: {stream_uri.Uri}")

10.2 Edge Gateway Stream Handling

10.2.1 FFmpeg Ingestion Pipeline

The edge gateway runs one FFmpeg process per camera stream:

# Main stream: HLS generation for live viewing
ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 -buffer_size 655360 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c:v copy -c:a copy \
    -f hls -hls_time 2 -hls_list_size 5 -hls_delete_threshold 2 \
    -hls_flags delete_segments+omit_endlist+program_date_time \
    -hls_segment_filename "/data/hls/ch1_%04d.ts" \
    "/data/hls/ch1.m3u8" \
    2>> /var/log/ffmpeg_ch1.log

10.2.2 Stream Health Monitoring

Check	Frequency	Failure Action
FFmpeg process alive	Every 5s	Restart process
RTSP connection health	Every 10s	Reconnect with backoff
Frame rate validation	Every 30s	Alert if FPS < 20
Bitrate validation	Every 30s	Alert if bitrate < 50% expected
Disk space check	Every 60s	Alert if < 10% free, emergency if < 5%

10.2.3 Auto-Reconnect Logic

class StreamReconnectManager:
    """Handles RTSP stream reconnection with exponential backoff."""

    INITIAL_BACKOFF = 1.0       # seconds
    MAX_BACKOFF = 60.0          # seconds
    BACKOFF_MULTIPLIER = 2.0
    JITTER = 0.1                # 10% random jitter

    def __init__(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def on_disconnect(self):
        self.consecutive_failures += 1
        wait_time = min(
            self.current_backoff * (self.BACKOFF_MULTIPLIER ** self.consecutive_failures),
            self.MAX_BACKOFF
        )
        # Add jitter to prevent thundering herd
        wait_time *= (1 + random.uniform(-self.JITTER, self.JITTER))
        return wait_time

    def on_success(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def should_circuit_break(self):
        return self.consecutive_failures >= 5  # Open circuit after 5 failures

10.3 HLS Generation for Dashboard

10.3.1 HLS Segment Configuration

Parameter	Value	Rationale
Segment duration (`-hls_time`)	2 seconds	Balance between latency and segment count
Playlist size (`-hls_list_size`)	5 segments	10-second sliding window for live playback
Delete threshold	2 segments beyond playlist size	Disk cleanup
Flags	`delete_segments+omit_endlist+program_date_time`	Live mode, no end list, accurate timing
Segment naming	`ch{N}_%04d.ts`	Sequential numbering for cache busting
Segment path	`/data/hls/`	Fast NVMe storage

10.3.2 Multi-Bitrate HLS (Optional)

For adaptive bitrate streaming, three variants are generated per channel:

# High quality (main stream, copy codec)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v copy -f hls -hls_time 2 \
    -hls_playlist_type vod -hls_segment_filename "ch1_high_%04d.ts" "ch1_high.m3u8"

# Medium quality (transcoded)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v libx264 -preset fast -crf 23 \
    -vf "scale=640:480" -f hls -hls_time 2 \
    -hls_segment_filename "ch1_mid_%04d.ts" "ch1_mid.m3u8"

# Low quality (sub stream)
ffmpeg -i "rtsp://...channel=1&subtype=1" -c:v copy -f hls -hls_time 2 \
    -hls_segment_filename "ch1_low_%04d.ts" "ch1_low.m3u8"

Master playlist:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=960x1080
ch1_high.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=640x480
ch1_mid.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=352x288
ch1_low.m3u8

10.3.3 HLS Latency Budget

Stage	Latency
DVR encoding	50-100 ms
RTSP to edge	1-2 ms
FFmpeg demux/remux	20-50 ms
HLS segment duration	2000 ms (2-second segments)
Nginx/CDN delivery	10-50 ms
HLS.js buffer	2000-4000 ms (1-2 segments)
Browser decode + render	20-50 ms
Total (camera to eye)	~2.1 - 2.3 seconds

10.4 WebRTC for Low-Latency Single Camera

For single-camera fullscreen viewing where low latency is critical, WebRTC provides sub-second delivery.

10.4.1 WebRTC Architecture

+------------+    +-------------------+    +-------------------+    +--------+
| Browser    |    | Edge Gateway      |    | FFmpeg            |    | DVR    |
| (WebRTC    |<-->| (WHIP/WHEP        |<-->| (decode RTSP,     |<-->| RTSP   |
|  client)   |    |  bridge)          |    |  encode VP8/H.264)|    | Server |
+------------+    +-------------------+    +-------------------+    +--------+

10.4.2 WebRTC Configuration

Parameter	Value
Signaling protocol	WHIP ( ingress) / WHEP (egress)
Video codec	H.264 (hardware) or VP8 (software)
Latency target	< 500 ms end-to-end
ICE servers	STUN only (both peers behind NAT)
Max bitrate	3 Mbps
Resolution	960x1080 (main stream)

10.4.3 WebRTC Latency Budget

Stage	Latency
DVR encoding	50-100 ms
RTSP to edge	1-2 ms
FFmpeg decode + WebRTC encode	30-80 ms
Network (edge to browser via VPN)	100-200 ms
Browser decode	20-50 ms
Total	~200-430 ms

10.5 Multi-Camera Grid Layout

10.5.1 Layout Configurations

Layout	Cameras	Stream Used	Per-Camera Resolution	Total Bandwidth
1x1 (fullscreen)	1	Main (subtype=0)	960x1080	~4 Mbps
2x2 grid	4	Sub (subtype=1)	352x288	~4 Mbps total
3x3 grid	8+1 empty	Sub (subtype=1)	352x288	~8 Mbps total
4x2 grid	8	Sub (subtype=1)	352x288	~8 Mbps total
Custom	User-defined	Mixed	Mixed	Sum of selected

Smart stream selection: The dashboard automatically switches streams based on layout:

Fullscreen single camera -> Main stream (high quality)
Grid layout -> Sub stream (bandwidth-efficient)
Camera clicked for fullscreen -> Dynamically switch to main stream

10.5.2 Grid Rendering

+-----------------------------------------------------------------------------+
|                         DASHBOARD GRID LAYOUTS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|  1x1 Layout:                         2x2 Layout:                            |
|  +------------------------+          +----------+----------+                 |
|  |                        |          | CH1      | CH2      |                 |
|  |   Camera 1             |          | (sub)    | (sub)    |                 |
|  |   Main stream          |          |          |          |                 |
|  |   960x1080             |          +----------+----------+                 |
|  |   ~4 Mbps              |          | CH3      | CH4      |                 |
|  +------------------------+          | (sub)    | (sub)    |                 |
|                                      |          |          |                 |
|                                      +----------+----------+                 |
|                                                                              |
|  3x3 Layout (8 cameras):                                                     |
|  +----------+----------+----------+                                          |
|  | CH1      | CH2      | CH3      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH4      | CH5      | CH6      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH7      | CH8      | [Empty]  |                                          |
|  | (sub)    | (sub)    |          |                                          |
|  +----------+----------+----------+                                          |
|                                                                              |
|  Bandwidth: ~8 Mbps total for 3x3 layout (8 x ~1 Mbps sub streams)          |
|                                                                              |
+-----------------------------------------------------------------------------+

10.6 Bandwidth Optimization

10.6.1 Total Bandwidth Budget

Traffic Type	Direction	Bandwidth	Notes
8x RTSP ingestion	Edge -> DVR (local)	~32 Mbps receive	Local LAN only
8x HLS upload to cloud	Edge -> Cloud (via VPN)	~8-16 Mbps upload	Transcoded and compressed
AI frames to cloud	Edge -> Cloud (via VPN)	~2-4 Mbps upload	1 FPS, JPEG compressed
Dashboard HLS playback	Cloud -> Browser	~8 Mbps per user	Cached at CDN
Control/management	Bidirectional	< 1 Mbps	WebSocket, API calls
Total edge upload		~10-20 Mbps	Primary concern for site bandwidth

10.6.2 Optimization Techniques

Technique	Savings	Implementation
Sub-stream for grid view	75% bandwidth reduction	Use subtype=1 (352x288) instead of subtype=0 (960x1080)
H.264 copy (no re-encode) for main stream	Zero CPU overhead	`-c:v copy` when no format change needed
JPEG quality tuning for AI frames	50-70% size reduction	Quality 70-85 depending on scene complexity
Frame deduplication for AI	10-30% frame reduction	Skip frames with < 2% pixel change
HLS segment caching at edge	Reduces cloud upload spikes	5-segment buffer smooths burstiness
Gzip compression for API/WebSocket	60-80% reduction	Content-Encoding: gzip

10.7 Fallback Handling

10.7.1 Stream Failure Fallback Chain

Step 1: RTSP connection fails
    +-> Retry with exponential backoff (3 attempts)
    +-> Try UDP transport if TCP fails
    +-> Circuit breaker opens after 5 consecutive failures
    |
Step 2: Stream stall detected (no frames for 10s)
    +-> Kill FFmpeg process
    +-> Restart with fresh connection
    |
Step 3: Camera marked OFFLINE
    +-> Dashboard shows "Camera Offline" placeholder
    +-> HLS playlist returns 404
    +-> Last known frame displayed with timestamp overlay
    +-> Alert sent to operations team
    |
Step 4: Camera recovers
    +-> Circuit breaker transitions to HALF_OPEN
    +-> Test stream pulled for 10 seconds
    +-> On success: circuit CLOSED, stream resumes
    +-> Dashboard auto-refreshes

10.7.2 Offline Placeholder

When a camera is offline, the HLS endpoint returns a static playlist:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ERROR: "Camera OFFLINE - Channel 1"
#EXTINF:2.000,
offline_placeholder.ts

The dashboard detects the #EXT-X-ERROR tag and displays a camera offline indicator with the last known timestamp.

10.7.3 Edge Buffer Management

The 2TB NVMe edge storage is partitioned for circular buffer operation:

Directory	Max Size	Retention	Cleanup
`/data/hls/`	20 GB	Rolling (5 segments)	Automatic via FFmpeg
`/data/buffer/ch1-ch8/`	1.5 TB	7 days circular	Age-based FIFO
`/data/buffer/ai_frames/`	100 GB	24 hours	Age-based
`/data/buffer/evidence/`	200 GB	30 days	Event-linked retention
`/data/logs/`	10 GB	30 days	Logrotate
`/data/tmp/`	50 GB	On process exit	Cleanup on restart
Total reserved	~1.88 TB	—	Fits in 2TB NVMe

Buffer exhaustion handling:

At 80% capacity: Alert admin, begin aggressive cleanup of old non-evidence data
At 90% capacity: Stop non-critical buffering (AI frames), preserve HLS + evidence only
At 95% capacity: Emergency mode — evidence-only recording, all other buffers purged
Never delete evidence clips linked to unresolved alerts

10.7.4 DVR Full Disk Mitigation

Since the DVR disk is full (0 bytes free), the system does not rely on DVR-side recording:

Function	Traditional	Our Design
Continuous recording	DVR internal HDD	Edge gateway 2TB NVMe buffer
Event/alert clips	DVR playback export	Cloud MinIO + S3 archival
Long-term storage	DVR disk rotation	AWS S3 tiered lifecycle
Playback	DVR web UI	Cloud dashboard with timeline

Reference: For complete FFmpeg commands including multi-output tee muxer, frame extraction for AI, WebRTC bridge code, and the ring buffer implementation, see video_ingestion.md — Sections 4-7.

End of Part A (Sections 1-10)

This unified technical blueprint synthesizes outputs from 11 specialist agents across 6 domain-specific design documents. For detailed implementation code, DDL, algorithms, and configuration, refer to the individual specialist documents listed in the cross-reference guide at the top of this document.

Document	Path	Content
Architecture	`architecture.md`	Full deployment specs, scaling, cost, failover
Video Ingestion	`video_ingestion.md`	RTSP config, FFmpeg, edge gateway, HLS, WebRTC
AI Vision	`ai_vision.md`	Model configs, inference pipeline, benchmarks
Database Schema	`database_schema.md`	Complete DDL, triggers, views, RLS
Suspicious Activity	`suspicious_activity.md`	10 detection modules, scoring engine
Training System	`training_system.md`	Learning pipeline, quality gates, versioning

Sentinel AI Surveillance Platform — Unified Technical Blueprint (Part B)

Document Version: 1.0 Date: 2025-01-16 Classification: Confidential — Internal Use Only Author: Technical Architecture Team

Part B Table of Contents

Section 11: Alerting Design (Notification System)
Section 12: Security Design
Section 13: UX / Website Structure
Section 14: Deployment Plan
Section 15: Testing Plan
Section 16: Self-Test Framework
Section 17: Sample Self-Test Report
Section 18: Risks and Mitigations
Section 19: Final Implementation Roadmap
Section 20: Final Production-Readiness Summary

Section 11: Alerting Design

11.1 Architecture Overview

The notification system employs an event-driven architecture built on Redis Pub/Sub for real-time message distribution. All detection events, system alerts, and manual triggers flow through a unified pipeline that supports dual-channel delivery via Telegram Bot API and WhatsApp Business API (Meta Official). The system is designed to ensure that critical security alerts are never lost while maintaining high performance and reliability through sophisticated rate limiting, retry logic, and dead letter queue handling.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         ALERTING ARCHITECTURE                                 │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                         EVENT SOURCES                            │       │
│   │                                                                  │       │
│   │  Detection Pipeline ──▶ New person detected                     │       │
│   │  Face Recognition ────▶ Known/Unknown/Watchlist match           │       │
│   │  System Monitors ─────▶ Camera offline, Storage full, VPN down   │       │
│   │  Manual Triggers ─────▶ Operator-initiated alerts                │       │
│   │  AI Anomaly Engine ───▶ Suspicious activity detected             │       │
│   └──────────────────────────┬──────────────────────────────────────┘       │
│                              │                                               │
│                              ▼                                               │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                    REDIS PUB/SUB                                 │       │
│   │                                                                  │       │
│   │  Channel: alerts.critical  ─── High priority, immediate process  │       │
│   │  Channel: alerts.high      ─── Standard priority                 │       │
│   │  Channel: alerts.medium    ─── Batched processing                │       │
│   │  Channel: system.health    ─── System health events              │       │
│   └──────────────┬───────────────────────────────────────────────────┘       │
│                  │                                                            │
│   ┌──────────────┴───────────────────────────────────────────────────┐       │
│   │                    NOTIFICATION ROUTER                           │       │
│   │                     (Python/FastAPI)                             │       │
│   │                                                                  │       │
│   │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │       │
│   │  │ Event Parser │──▶ Rules Engine │──▶ Channel Selector    │  │       │
│   │  └──────────────┘  └──────────────┘  └──────────────────────┘  │       │
│   └──────────────────────────┬───────────────────────────────────────┘       │
│                              │                                               │
│          ┌───────────────────┼───────────────────┐                           │
│          ▼                   ▼                   ▼                           │
│   ┌─────────────┐   ┌───────────────┐   ┌──────────────────┐               │
│   │   TEMPLATE  │   │    RATE       │   │   ESCALATION     │               │
│   │   RENDERER  │   │   LIMITER     │   │    ENGINE        │               │
│   │             │   │               │   │                  │               │
│   │  HTML/TXT   │   │ Token Bucket  │   │ 3-level timeout  │               │
│   │  per channel│   │ 4-tier limits │   │ Auto-escalation  │               │
│   └──────┬──────┘   └───────┬───────┘   └────────┬─────────┘               │
│          │                  │                     │                         │
│          └──────────────────┼─────────────────────┘                         │
│                             ▼                                               │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    CHANNEL ADAPTERS                              │      │
│   │                                                                  │      │
│   │  ┌──────────────────────────┐  ┌──────────────────────────┐     │      │
│   │  │    TELEGRAM BOT API      │  │  WHATSAPP BUSINESS API   │     │      │
│   │  │                          │  │                          │     │      │
│   │  │  - HTML formatting       │  │  - Template messages     │     │      │
│   │  │  - Inline keyboards      │  │  - Session messages      │     │      │
│   │  │  - Media groups          │  │  - Interactive messages  │     │      │
│   │  │  - Edit/Delete messages  │  │  - Media attachments     │     │      │
│   │  │  - Webhook receipts      │  │  - Message status API    │     │      │
│   │  └──────────┬───────────────┘  └──────────┬───────────────┘     │      │
│   └─────────────┼─────────────────────────────┼─────────────────────┘      │
│                 │                             │                              │
│                 ▼                             ▼                              │
│          ┌──────────────┐            ┌──────────────┐                       │
│          │  Telegram    │            │  WhatsApp    │                       │
│          │  Servers     │            │  Cloud API   │                       │
│          └──────────────┘            └──────────────┘                       │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                    SUPPORTING SERVICES                           │       │
│   │                                                                  │       │
│   │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │       │
│   │  │ RETRY MGR    │  │    DLQ       │  │  DELIVERY TRACKER    │  │       │
│   │  │ Exponential  │  │ Redis-backed │  │  Webhook callbacks   │  │       │
│   │  │ 5 max        │  │ Admin review │  │  Status dashboard    │  │       │
│   │  └──────────────┘  └──────────────┘  └──────────────────────┘  │       │
│   └─────────────────────────────────────────────────────────────────┘       │
└──────────────────────────────────────────────────────────────────────────────┘

Key Design Principles:

Principle	Implementation
Guaranteed delivery	At-least-once delivery via retry with exponential backoff; dead letter queue for permanent failures
Ordered processing	Events within a single camera stream processed in sequence; no alert reordering
Non-blocking	Alert generation does not block the detection pipeline; async processing via queues
Channel isolation	Failure in one channel (e.g., Telegram down) does not affect the other (WhatsApp continues)
Deduplication	5-minute window for duplicate suppression; composite key based on camera + person + event type
Observability	Every notification tracked from creation through delivery with full audit trail

11.2 Telegram Integration

11.2.1 Bot API Configuration

Telegram integration uses the official Telegram Bot API for message delivery. The bot is configured with encrypted tokens stored in HashiCorp Vault, with HTML message formatting for rich alert presentation.

Parameter	Value	Notes
API Base URL	`https://api.telegram.org/bot<TOKEN>/`	Standard Bot API endpoint
API Version	Bot API 7.x	Latest stable as of Q1 2025
Token Storage	HashiCorp Vault (AES-256-GCM encrypted)	Rotated every 180 days
Communication	HTTPS POST + WebSocket fallback	TLS 1.3 required for all calls
Message Format	HTML subset	`<b>`, `<i>`, `<code>`, `<pre>`, `<a href>` tags supported
Max Message Size	4096 characters per message	Longer messages auto-split into parts
Media Size Limit (Image)	10 MB per image	Processed via Pillow for compression
Media Size Limit (Video)	50 MB per video	Processed via FFmpeg for re-encoding
Media Group Limit	Up to 10 items per media group	Album delivery for multi-image alerts
Global Rate Limit	30 messages per second	Across all chats
Per-Chat Rate Limit	1 message per second	Per conversation throttling
Webhook Endpoint	`/webhooks/telegram`	Receives delivery receipts and callback queries

11.2.2 Bot Features and Capabilities

Inline Keyboards: Every alert message includes contextual action buttons that allow operators to respond directly from Telegram without opening the web dashboard.

Keyboard Type	Buttons	Actions
Standard Alert	Acknowledge / View Live / Details	Confirm receipt, open stream, view full info
Watchlist Alert	Acknowledge / View Live / Escalate / Details	Includes escalation for watchlist matches
Blacklist Alert	ACKNOWLEDGE NOW / View Live / Dispatch Security / Escalate / Details	Highest priority actions for blacklist
Escalation Notice	Acknowledge / View Original Alert	Acknowledge escalated alert or view source
System Alert	Acknowledge / View Dashboard / Details	System-level alert actions

Media Groups: When an alert contains multiple evidence images (up to 10), they are sent as a Telegram media group (album). This presents all related images in a single scrollable gallery rather than individual messages, reducing chat clutter.

Webhook Receipts: Telegram delivers message status updates via webhooks:

Webhook Type	Trigger	Action
`message`	Bot receives a command	Process command (e.g., /status, /acknowledge)
`callback_query`	User clicks inline button	Execute action, update message status
`edited_message`	Message edited externally	Log for audit trail
`my_chat_member`	Bot added/removed from chat	Update recipient group membership

Chat Commands:

Command	Description	Response
`/status`	Get system health status	Camera count, offline count, last alert time
`/acknowledge <alert_id>`	Acknowledge an alert	Confirmation or error message
`/cameras`	List all cameras and their status	Camera name, status, last seen
`/health`	Get edge gateway health	CPU, memory, disk, VPN status
`/help`	Show available commands	Command reference

11.2.3 Security Considerations

Telegram bot tokens are among the most sensitive credentials in the system. The following security measures are implemented:

Measure	Implementation
Encryption at rest	AES-256-GCM in Vault
Token rotation	Every 180 days or immediately on compromise suspicion
Rotation procedure	1) Generate new token via BotFather, 2) Update Vault, 3) Notify services to hot-reload, 4) 5-minute grace period, 5) Revoke old token
IP allowlisting	Webhook endpoint accepts only Telegram IP ranges
Webhook secret	HMAC verification on incoming webhook payloads
No token logging	Tokens never appear in application logs
No token in code	Tokens injected via Vault at runtime

11.3 WhatsApp Business API Integration

11.3.1 Meta Cloud API Configuration

WhatsApp integration uses Meta's official Cloud API (Business Platform), which provides a reliable, enterprise-grade messaging channel. This requires a verified Meta Business account and pre-approved message templates for proactive messaging.

Parameter	Value	Notes
API Base URL	`https://graph.facebook.com/v18.0/`	Meta Graph API v18.0 minimum
Authentication	Permanent Access Token	Scoped to WhatsApp Business Management
Token Storage	HashiCorp Vault (AES-256-GCM encrypted)	Rotated every 180 days
Phone Number ID	Dedicated business phone number	Not shared with other WhatsApp uses
Business Account	Verified Meta Business Account	Required for template message approval
Message Types	Template messages + Session messages	Template for first contact; session for replies
Media Size Limit (All)	16 MB per file	Stricter than Telegram; aggressive compression needed
Supported Media	JPEG, PNG, MP4 (H.264), PDF, Audio	Format validation before upload
Global Rate Limit	80 messages per second	Across all recipients
Per-Recipient Rate Limit	20 messages per minute	Per WhatsApp ID throttling
Webhook Endpoint	`/webhooks/whatsapp`	Receives message status updates

11.3.2 Message Types

Template Messages: Pre-approved message templates are required for any proactive (business-initiated) message. Templates must be created and submitted for approval in Meta Business Manager. Each template contains named parameters that are dynamically populated at send time.

Template Name	Purpose	Parameters	Approval Status
`person_detected_known`	Known person detected	name, role, camera, date, time, confidence, alert_id	Approved
`person_detected_unknown`	Unknown person alert	camera, date, time, confidence	Approved
`watchlist_match`	Person on watchlist detected	name, watchlist_type, camera, date, time	Approved
`blacklist_alert`	Blacklisted person detected	name, camera, date, time	Approved
`suspicious_activity`	Suspicious behavior detected	activity_type, camera, date, time, confidence	Approved
`system_alert`	System health alert	message, timestamp, severity	Approved
`escalation_notice`	Alert escalation notification	alert_id, level, summary, elapsed_minutes	Approved
`daily_digest`	Daily summary of activity	date, total_detections, total_alerts, top_cameras	Approved
`test_message`	System test	timestamp	Approved

Session Messages: Within a 24-hour window after a user sends a message to the business, free-form session messages can be sent without template restrictions. This is used for:

Acknowledgment confirmations
Escalation follow-ups
Interactive conversations initiated by the recipient
Quick reply responses

11.3.3 Webhook Event Handling

Webhook Event	Trigger	System Action
`messages.delivered`	Message delivered to device	Update delivery status to `delivered`
`messages.read`	Recipient read the message	Update delivery status to `read`
`messages.failed`	Message delivery failed	Trigger retry or move to DLQ
`message_reaction`	Recipient reacted to message	Log for engagement metrics
`account_alerts`	Meta account issue	Alert admin, review account status
`template_category_update`	Template status change	Update template catalog

11.4 Alert Routing Rules Engine

11.4.1 Condition Types

The routing engine evaluates 9 distinct condition types to determine which recipients receive which alerts through which channels. Multiple conditions can be combined with AND/OR logic for precise targeting.

#	Condition Type	Description	Example Values	Operators
1	`camera`	Source camera identifier	"CAM-01", "CAM-02", "entrance-cam"	equals, in, not_in
2	`person`	Detected known person	"John Smith", "Jane Doe"	equals, in, not_in
3	`role`	Person role category	"employee", "visitor", "vendor", "contractor", "security"	equals, in
4	`event_type`	Type of detection event	"person_detected", "unknown_person", "suspicious_activity", "crowd_gathering", "camera_tamper"	equals, in
5	`zone`	Detection zone name	"entrance", "restricted_area", "parking", "lobby", "warehouse"	equals, in
6	`time`	Time of day range	"08:00-18:00", "22:00-06:00"	between, not_between
7	`day`	Day of week	"monday", "weekday", "weekend"	equals, in
8	`severity`	Alert severity level	"critical", "high", "medium", "low", "info"	equals, in, gte
9	`watchlist`	Watchlist membership	"vip", "blacklist", "authorized", "temporary_access"	equals, in

11.4.2 Rule Structure

Each routing rule consists of conditions, actions, and metadata:

rule:
  id: "rule-001"
  name: "Blacklist Immediate Alert"
  enabled: true
  priority: 100  # Higher number = evaluated first
  
  conditions:
    operator: "AND"
    conditions:
      - field: "watchlist"
        operator: "equals"
        value: "blacklist"
      - field: "severity"
        operator: "in"
        value: ["critical", "high"]
  
  actions:
    - channel: "telegram"
      recipients: ["security_team", "management"]
      template: "blacklist_alert"
      media: ["image", "video"]
      bypass_quiet_hours: true
      priority: "high"
    
    - channel: "whatsapp"
      recipients: ["security_manager"]
      template: "blacklist_alert"
      media: ["image"]
      bypass_quiet_hours: true
  
  metadata:
    created_by: "admin"
    created_at: "2025-01-01T00:00:00Z"
    last_modified: "2025-01-10T12:00:00Z"
    tags: ["critical", "blacklist"]

11.4.3 Default Routing Rules

The system ships with a comprehensive set of default routing rules that cover common surveillance scenarios:

#	Scenario	Conditions	Severity	Recipients	Channels	Media	Quiet Hours
1	Known employee normal hours	role=employee, time=08:00-18:00, weekday	Info	None (log only)	—	—	N/A
2	Known employee after hours	role=employee, time=18:00-08:00	Low	Security team	Telegram	Image	Respected
3	Known visitor during hours	role=visitor, time=08:00-18:00	Low	Reception desk	Telegram	Image	Respected
4	Unknown person detected	event_type=unknown_person	Medium	Security team	Telegram + WhatsApp	Image	Respected
5	Unknown person after hours	event_type=unknown_person, time=22:00-06:00	High	Security team + Manager	Both	Image + Video	Bypassed
6	Watchlist match	watchlist=watchlist	High	Security team	Both	Image + Video	Respected
7	Blacklist match	watchlist=blacklist	Critical	All groups	Both (bypass quiet)	Image + Video	Bypassed
8	VIP detected	watchlist=vip	Low	Reception desk	Telegram	Image	Respected
9	Camera offline	event_type=camera_offline	High	IT team + Security team	Telegram	None	Bypassed
10	Storage > 90%	event_type=storage_warning	High	IT team + Management	Both	None	Bypassed
11	Storage > 95%	event_type=storage_critical	Critical	All groups	Both (bypass quiet)	None	Bypassed
12	VPN tunnel down	event_type=vpn_down	Critical	IT team + Management	Both (bypass quiet)	None	Bypassed
13	Suspicious activity	event_type=suspicious_activity	High	Security team	Both	Image + Video	Respected
14	Crowd gathering	event_type=crowd_gathering	Medium	Security team	Telegram	Image	Respected

11.5 Recipient Groups and Quiet Hours

11.5.1 Recipient Group Management

Recipient groups are the primary mechanism for organizing alert destinations. Each group contains one or more contacts with specified channels.

Group Name	Members	Primary Channel	Backup Channel	Alert Preferences	Quiet Hours
Security Team	On-site security guards	Telegram	WhatsApp	All except info	Disabled
Security Manager	Shift supervisor	WhatsApp	Telegram	Medium and above	Disabled
IT Team	Infrastructure staff	Telegram	WhatsApp	System alerts only	Nights
Management	Facility managers	WhatsApp	Telegram	Critical only	Disabled
Reception	Front desk staff	Telegram	None	Visitor-related, VIP	Disabled
After-Hours	On-call personnel	WhatsApp	Telegram	High and Critical	Disabled

Group Configuration Interface:

Groups are managed through the web dashboard at /settings/notifications/groups. Each group can be configured with:

Setting	Description
Group name	Human-readable identifier
Description	Purpose of the group
Members	List of Telegram chat IDs and WhatsApp phone numbers
Default channel	Primary delivery channel
Alert severity filter	Minimum severity to deliver
Quiet hours override	Whether quiet hours apply to this group
Media preferences	Which media types to include
Max alerts per hour	Rate limit for this group

11.5.2 Quiet Hours Configuration

Quiet hours allow suppressing non-critical alerts during configured time windows. Critical alerts always bypass quiet hours — this is a non-configurable safety measure.

quiet_hours:
  enabled: false                    # DISABLED BY DEFAULT for security
  preset: "none"                    # none / nights / weekends / custom
  
  custom_schedule:
    - label: "Weekday Nights"
      days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
      start_time: "22:00"
      end_time: "06:00"
      timezone: "Asia/Kolkata"
      
    - label: "Weekend All Day"
      days: ["Saturday", "Sunday"]
      start_time: "00:00"
      end_time: "23:59"
      timezone: "Asia/Kolkata"
  
  allowed_during_quiet:             # Which severities bypass quiet hours
    - "critical"                    # Always delivered (non-configurable)
    
  emergency_bypass:
    enabled: true
    triggers:
      - severity: "critical"
      - tag: "emergency"
      - rule_override: "bypass_quiet_hours"
    notification_method: "all_channels"
    
  suppression_behavior: "queue"     # queue / discard / digest
  # "queue": Hold until quiet hours end
  # "discard": Drop non-critical alerts entirely
  # "digest": Send summary when quiet hours end

Security Note: Quiet hours are disabled by default because the surveillance use case requires continuous awareness. Any decision to enable quiet hours must be documented with security team sign-off.

11.5.3 Per-Recipient Quiet Hours

Individual recipients can configure personal quiet hours that override group settings:

Recipient	Personal Quiet Hours	Group Override	Effect
Security Guard A	None	Security Team (Disabled)	Receives all alerts
IT Manager	23:00-07:00	IT Team (Nights)	Matches group — no IT alerts at night
Manager B	22:00-08:00	Management (Disabled)	Personal quiet hours applied

11.6 Message Templates

11.6.1 Telegram HTML Templates

All Telegram templates use a safe HTML subset for rich formatting with inline action keyboards.

Template: Person Detected (Known)

🔍 <b>Person Detected</b>

<b>{name}</b> ({role})
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<a href="{dashboard_url}">View in Dashboard</a>

Template: Unknown Person Detected

❓ <b>Unknown Person Detected</b>

📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<i>This person is not in the database.</i>

<a href="{naming_url}">Name This Person</a>

Template: Watchlist Match

⚠️ <b>WATCHLIST ALERT</b>

<b>{name}</b>
📋 Watchlist: {watchlist_type}
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<i>This person is on a watchlist and requires attention.</i>

Template: Blacklist Alert

🚨 <b>BLACKLIST ALERT</b> 🚨

⚠️ <b>{name}</b> has been detected!
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<b>This person is BLACKLISTED. Immediate attention required.</b>

<a href="{dispatch_url}">🚨 Dispatch Security</a>

Template: Escalation Notice

⬆️ <b>Alert Escalated — Level {escalation_level}</b>

Alert #{alert_id} has been escalated.

Original: {alert_summary}
⏱️ Unacknowledged for {elapsed_minutes} minutes
Threshold: {threshold_minutes} minutes

<i>Please review immediately.</i>

Template: System Alert

⚙️ <b>System Alert</b>

{message}

🕐 {timestamp}
Severity: {severity}

<a href="{health_dashboard_url}">View System Health</a>

Template: Daily Digest

📊 <b>Daily Activity Digest — {date}</b>

👥 Persons Detected: {total_detections}
🔔 Alerts Generated: {total_alerts}
📹 Cameras Online: {cameras_online}/{cameras_total}

Top Cameras:
{camera_list}

<a href="{full_report_url}">View Full Report</a>

11.6.2 WhatsApp Template Format

WhatsApp templates use a different format — they are pre-registered with Meta and use numbered parameter substitution:

Template: person_detected_known

🔍 Person Detected

{{1}} ({{2}})
📍 Camera: {{3}}
🕐 {{4}} at {{5}}
🎯 Confidence: {{6}}%
Alert ID: {{7}}

Parameters: {{1}}=name, {{2}}=role, {{3}}=camera_name, {{4}}=date, {{5}}=time, {{6}}=confidence, {{7}}=alert_id

11.6.3 Template Variable Reference

Variable	Description	Source	Example
`{name}`	Detected person's name	Person database	"John Smith"
`{role}`	Person's role	Person database	"Employee"
`{camera_name}`	Camera display name	Camera configuration	"Main Entrance"
`{date}`	Event date	Event timestamp	"2025-01-16"
`{time}`	Event time	Event timestamp	"14:32:15"
`{confidence}`	Detection confidence %	AI inference result	"97.3"
`{alert_id}`	Unique alert identifier	Alert database	"ALT-20250116-001"
`{watchlist_type}`	Watchlist category	Watchlist configuration	"Blacklist"
`{activity_type}`	Type of suspicious activity	AI classification	"Loitering"
`{severity}`	Alert severity	Rules engine	"Critical"
`{dashboard_url}`	Deep link to dashboard	System configuration	"https://..."
`{elapsed_minutes}`	Time since alert creation	System clock	"15"

11.7 Retry Logic and Rate Limiting

11.7.1 Retry Configuration

Failed notifications are retried using an exponential backoff strategy to avoid overwhelming downstream services.

Parameter	Value	Description
Maximum retries	5	After 5 failures, move to DLQ
Base delay	2 seconds	Initial retry wait time
Exponential base	2	Delay multiplier (2^n)
Maximum delay	300 seconds (5 minutes)	Cap on retry delay
Jitter	Up to 1 second random	Prevents thundering herd

Retry Schedule:

Attempt	Delay	Cumulative Time
1 (initial)	Immediate	0s
2	2s + jitter	~2s
3	4s + jitter	~6s
4	8s + jitter	~14s
5	16s + jitter	~30s
6 (final)	32s + jitter	~62s
DLQ	—	After 62s total

Retryable Errors:

Error Code	Description	Retry?
Timeout	Request timed out	Yes
429 Too Many Requests	Rate limited by provider	Yes (with longer delay)
500 Internal Server Error	Provider error	Yes
502 Bad Gateway	Provider gateway error	Yes
503 Service Unavailable	Provider temporarily down	Yes
409 Conflict	Request conflict	Yes
401 Unauthorized	Authentication failed	No (credential issue)
403 Forbidden	Permission denied	No (configuration issue)
400 Bad Request	Invalid request	No (template/parameter issue)
Chat not found	Recipient blocked bot	No

Non-Retryable Errors (Immediate DLQ):

Invalid bot token (401)
Bot blocked by user (403)
Chat not found
Malformed template (400)
Message too long (after split)
Unsupported media format

11.7.2 Circuit Breaker

Each channel adapter implements a circuit breaker to prevent cascading failures:

Parameter	Value
Failure threshold	10 consecutive failures
Open state duration	60 seconds
Half-open test calls	3 successful calls required
Monitoring window	5 minutes

Circuit States:

State	Behavior	Transition Trigger
`Closed`	Normal operation — all requests pass	Initial state, or after half-open success
`Open`	Fast fail — no requests sent to provider	10 consecutive failures
`Half-Open`	Limited test requests allowed	After 60-second open timeout

11.7.3 Rate Limiting Tiers

The notification system implements multi-tier rate limiting to prevent abuse and ensure fair resource distribution:

Tier	Limit	Scope	Burst
Global (all channels)	200 messages/minute	Across all channels combined	20
Telegram Global	30 messages/second	All Telegram traffic	5
Telegram Per-Chat	1 message/second	Per conversation	1
WhatsApp Global	80 messages/second	All WhatsApp traffic	10
WhatsApp Per-Recipient	20 messages/minute	Per phone number	3
Per Camera Source	30 alerts/minute	Prevents camera spam	5
Per Severity (Critical)	No limit	Critical alerts bypass rate limits	N/A

Token Bucket Algorithm: Each tier maintains a token bucket. A token is consumed per message. Tokens replenish at the configured rate. If no tokens are available, the message is queued or rejected based on priority.

11.7.4 Alert Deduplication

Alerts are deduplicated to prevent notification spam when the same event triggers repeatedly:

Deduplication Key	Components	Window	Action on Duplicate
Known person	`camera_id + person_id + event_type`	5 minutes	Suppress, append counter to original
Unknown person	`camera_id + event_type`	5 minutes	Suppress, append counter to original
System alert	`alert_type + source_id`	15 minutes	Suppress, update existing message
Watchlist match	`camera_id + person_id + watchlist_id`	10 minutes	Suppress, append counter

When a duplicate is detected, the original message is updated with a counter (e.g., "+3 more detections"), avoiding a flood of similar messages.

11.8 Escalation Rules

11.8.1 Escalation Thresholds

When an alert goes unacknowledged, it automatically escalates through up to 3 levels, each with increasing urgency and broader recipient distribution.

Severity	Level 1 (Primary)	Level 2 (Secondary)	Level 3 (Final)
Critical	5 minutes	10 minutes	20 minutes
High	15 minutes	30 minutes	60 minutes
Medium	30 minutes	60 minutes	120 minutes
Low	60 minutes	120 minutes	240 minutes
Info	Never	Never	Never

11.8.2 Escalation Actions per Level

Level	Name	Notification Action	Recipient Expansion	Severity Change
0	Original	Standard routing rules	Primary recipients only	Original severity
1	Primary	Re-notify with escalation prefix	Add management group	Increase by one level
2	Secondary	Force all channels, bypass quiet hours	Add all groups, increase severity	Increase by one level
3	Final	All-hands notification, include audit trail	All configured recipients	Set to Critical

Escalation Cancellation: Acknowledgment cancels ALL pending escalation timers for an alert. Acknowledgment can occur via:

Telegram inline "Acknowledge" button click
WhatsApp quick reply "Ack"
Web dashboard "Acknowledge" button
REST API POST /api/v1/alerts/{id}/acknowledge
Chat command /acknowledge {alert_id}

11.8.3 Escalation Notification Template

⬆️ <b>ESCALATION — Level {level}</b>

Original Alert: {alert_summary}
Alert ID: {alert_id}
First Detected: {first_detected_time}
Current Time: {current_time}
Unacknowledged: {elapsed_minutes} minutes
Escalation Threshold: {threshold_minutes} minutes

This alert has been escalated because it has not been acknowledged.
Please review immediately.

<a href="{acknowledge_url}">✅ Acknowledge Now</a>
<a href="{view_alert_url}">👁 View Details</a>

11.9 Media Attachment Handling

11.9.1 Media Processing Pipeline

When an alert includes media (snapshot images or video clips), a multi-stage processing pipeline ensures the media meets channel-specific requirements:

Original Media (from detection)
       │
       ▼
┌──────────────────┐
│  1. Store Original│  ──▶ MinIO/S3 (full resolution archival)
│     in Storage    │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  2. Process for  │
│     Telegram     │
└────────┬─────────┘
         │
         ├──▶ Image: Resize 1280x720, JPEG quality 85, max 10 MB
         ├──▶ Video: H.264, 1280x720, max 50 MB, max 60 seconds
         └──▶ Media Group: Each image < 10 MB, max 10 items
         │
         ▼
┌──────────────────┐
│  3. Process for  │
│     WhatsApp     │
└────────┬─────────┘
         │
         ├──▶ Image: Resize 1600x900, JPEG quality 80, max 16 MB
         └──▶ Video: H.264, 1280x720, max 16 MB, max 60 seconds

11.9.2 Image Processing Details

Step	Operation	Parameters
1. Load	Open source image	Pillow (PIL)
2. Convert	Convert to RGB	Drop alpha channel if present
3. Resize	Scale to target dimensions	Lanczos resampling
4. Compress	JPEG encoding	Quality: 85 (Telegram), 80 (WhatsApp)
5. Check size	Verify file size under limit	If over limit, reduce quality iteratively
6. Fallback	Aggressive compression	If quality < 50 and still over limit, reduce dimensions

Iterative Quality Reduction:

def compress_image_to_limit(image, size_limit_mb, channel):
    quality = 85 if channel == 'telegram' else 80
    min_quality = 40
    
    while quality >= min_quality:
        buffer = io.BytesIO()
        image.save(buffer, format='JPEG', quality=quality, optimize=True)
        size_mb = buffer.tell() / (1024 * 1024)
        
        if size_mb <= size_limit_mb:
            return buffer.getvalue()
        
        quality -= 5
    
    # If still over limit, reduce dimensions by 25% and retry
    new_size = (int(image.width * 0.75), int(image.height * 0.75))
    image = image.resize(new_size, Image.LANCZOS)
    return compress_image_to_limit(image, size_limit_mb, channel)

11.9.3 Video Processing Details

Videos are processed with FFmpeg using two-pass encoding to achieve the target bitrate calculated from the size limit:

# Calculate target bitrate: (size_limit_bytes * 8) / duration_seconds
# Example: 16 MB limit, 10 second clip = (16*1024*1024*8) / 10 = ~13.4 Mbps

ffmpeg -i input.mp4 \
    -c:v libx264 \
    -b:v 10M \                          # Target video bitrate
    -maxrate 12M \                      # Maximum bitrate
    -bufsize 20M \                      # Buffer size
    -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
    -c:a aac -b:a 128k \               # Audio encoding
    -movflags +faststart \              # Web-optimized
    -preset fast \                      # Encoding speed/quality tradeoff
    -y output.mp4

11.10 Delivery Tracking

11.10.1 Delivery Status Lifecycle

Every notification progresses through a well-defined status lifecycle, tracked in the database for audit and troubleshooting:

Status	Description	Terminal?
`pending`	Queued, waiting to be sent	No
`processing`	Currently being sent to provider	No
`sent`	API request to provider succeeded	No
`delivered`	Provider confirmed delivery to device	No
`read`	Recipient opened/read the message	No
`engaged`	User interacted (button click, reaction)	Yes
`failed`	Permanently failed (non-retryable error)	Yes
`retrying`	Scheduled for retry attempt	No
`dead_letter`	Moved to DLQ after all retries exhausted	Yes
`suppressed`	Blocked by quiet hours or deduplication	Yes
`cancelled`	Cancelled (e.g., acknowledged before send)	Yes
`expired`	Message TTL expired before delivery	Yes

Status Transitions:

pending → processing → sent → delivered → read → engaged
   │          │          │         │
   ▼          ▼          ▼         ▼
retrying  cancelled   failed   suppressed
   │
   ▼
dead_letter

11.10.2 Dead Letter Queue (DLQ)

Failed notifications that exhaust all retry attempts are moved to a Redis-backed Dead Letter Queue. Admin users can review and manage DLQ entries through the web dashboard.

DLQ Feature	Description
Storage	Redis sorted set, ordered by failure timestamp
Retention	30 days
View	Filterable by channel, error type, date range
Actions	Retry individual, Retry all (batch), Discard, Export
Alert	Daily digest of DLQ count; alert if > 10 entries
Auto-retry	Optional: automatically retry DLQ entries every 6 hours

11.11 API Endpoints Summary

11.11.1 REST Endpoints (13 endpoints)

#	Method	Endpoint	Purpose	Auth
1	GET	`/api/v1/notifications/rules`	List all routing rules	Admin
2	POST	`/api/v1/notifications/rules`	Create new routing rule	Admin
3	GET	`/api/v1/notifications/rules/{id}`	Get specific rule	Admin
4	PUT	`/api/v1/notifications/rules/{id}`	Update routing rule	Admin
5	DELETE	`/api/v1/notifications/rules/{id}`	Delete routing rule	Admin
6	GET	`/api/v1/notifications/templates`	List message templates	Admin
7	POST	`/api/v1/notifications/templates`	Create/update template	Admin
8	GET	`/api/v1/notifications/delivery-status/{alert_id}`	Get delivery status for alert	Operator+
9	GET	`/api/v1/notifications/{id}/status`	Single notification status	Operator+
10	POST	`/api/v1/notifications/{id}/retry`	Manual retry of failed notification	Admin
11	GET	`/api/v1/notifications/dlq`	List dead letter queue	Admin
12	POST	`/api/v1/notifications/dlq/retry-all`	Retry all DLQ entries	Admin
13	POST	`/api/v1/notifications/dlq/clear`	Clear all DLQ entries	Admin

11.11.2 Alert Management Endpoints

#	Method	Endpoint	Purpose	Auth
1	GET	`/api/v1/alerts`	List alerts with filters	Operator+
2	GET	`/api/v1/alerts/{id}`	Get single alert details	Operator+
3	POST	`/api/v1/alerts/{id}/acknowledge`	Acknowledge alert	Operator+
4	POST	`/api/v1/alerts/{id}/resolve`	Resolve alert	Operator+
5	POST	`/api/v1/alerts/{id}/ignore`	Ignore alert	Operator+
6	POST	`/api/v1/alerts/{id}/false-positive`	Mark as false positive	Operator+
7	POST	`/api/v1/alerts/bulk/acknowledge`	Bulk acknowledge	Operator+
8	POST	`/api/v1/alerts/bulk/ignore`	Bulk ignore	Operator+

11.11.3 WebSocket Endpoints (2 endpoints)

Endpoint	Purpose	Authentication
`WS /api/v1/notifications/live`	Real-time notification stream for connected clients	JWT token in query parameter
`WS /api/v1/alerts/stream`	Live alert feed for operator dashboards	JWT token in query parameter

11.11.4 Webhook Endpoints (2 endpoints)

Endpoint	Source	Purpose
`POST /webhooks/telegram`	Telegram servers	Receive delivery receipts, callback queries, chat events
`POST /webhooks/whatsapp`	Meta servers	Receive message status updates, incoming messages

Webhook Security:

Measure	Implementation
Telegram	HMAC-SHA256 signature verification using bot token
WhatsApp	SHA-256 signature verification using app secret
IP allowlisting	Only accept requests from Telegram/Meta IP ranges
Replay protection	Reject messages with timestamps older than 5 minutes
Rate limiting	100 requests per minute per source IP

Section 12: Security Design

12.1 Security Architecture Overview

The Sentinel AI Surveillance Platform implements defense-in-depth security across seven distinct layers. Every component — from network perimeter to data storage — has been designed with security as a primary consideration, reflecting the sensitive nature of surveillance data, biometric information, and the critical safety function the system performs.

┌──────────────────────────────────────────────────────────────────────────────┐
│                      DEFENSE IN DEPTH ARCHITECTURE                            │
│                                                                              │
│   LAYER 1: PERIMETER                                                         │
│   ─────────────────                                                          │
│   AWS WAF v2 │ Geo-restriction │ DDoS protection │ Rate limiting             │
│                                                                              │
│   LAYER 2: TRANSPORT                                                         │
│   ─────────────────                                                          │
│   TLS 1.3 │ mTLS internal │ WireGuard ChaCha20-Poly1305 │ Certificate mgmt  │
│                                                                              │
│   LAYER 3: AUTHENTICATION & AUTHORIZATION                                    │
│   ─────────────────────────────────────────                                  │
│   Argon2id │ JWT ES256 │ TOTP MFA │ RBAC 4 roles │ API keys                 │
│                                                                              │
│   LAYER 4: APPLICATION SECURITY                                              │
│   ────────────────────────────                                               │
│   Input validation │ Parameterized queries │ CSP │ CSRF │ CORS │ File upload │
│                                                                              │
│   LAYER 5: DATA SECURITY                                                     │
│   ────────────────────                                                       │
│   AES-256-GCM at rest │ Field-level encryption │ Signed URLs │ Key rotation  │
│                                                                              │
│   LAYER 6: NETWORK SEGMENTATION                                              │
│   ───────────────────────────                                                │
│   VPC private subnets │ Security groups │ Network Policies │ Firewall rules  │
│                                                                              │
│   LAYER 7: AUDIT & MONITORING                                                │
│   ─────────────────────────                                                  │
│   Hash-chain audit log │ Real-time alerts │ CloudTrail │ Flow Logs           │
└──────────────────────────────────────────────────────────────────────────────┘

12.2 SSL/TLS Configuration

12.2.1 Protocol and Cipher Suite Requirements

All external-facing services enforce strong TLS configuration with modern cipher suites:

Setting	Value	Rationale
Minimum TLS Version	TLS 1.2	Fallback for older clients; TLS 1.3 preferred
Preferred TLS Version	TLS 1.3	Fastest, most secure handshake
Cipher Suites (TLS 1.2)	`ECDHE-ECDSA-AES256-GCM-SHA384`	Forward secrecy, AES-GCM authenticated encryption
Cipher Suites (TLS 1.2)	`ECDHE-RSA-AES256-GCM-SHA384`	Same with RSA certificates
Cipher Suites (TLS 1.2)	`ECDHE-ECDSA-CHACHA20-POLY1305`	Mobile-optimized cipher
Cipher Suites (TLS 1.2)	`ECDHE-RSA-CHACHA20-POLY1305`	Mobile-optimized with RSA
Cipher Suites (TLS 1.3)	`TLS_AES_256_GCM_SHA384`	Mandatory TLS 1.3 cipher
Cipher Suites (TLS 1.3)	`TLS_CHACHA20_POLY1305_SHA256`	Alternative TLS 1.3 cipher
Disabled Ciphers	CBC mode, RC4, 3DES, DES, MD5, SHA1, RSA key exchange (no forward secrecy)	Known weaknesses
HSTS	`max-age=63072000; includeSubDomains; preload`	2-year HSTS with preload eligibility
OCSP Stapling	Enabled	Reduces certificate validation latency
Certificate Provider	Let's Encrypt (ACME v2)	Free, automated, trusted
Auto-renewal	60 days before expiry	Ensures 30+ day buffer
Certificate Transparency	Required	All certificates publicly logged

12.2.2 mTLS for Internal Service Communication

All inter-service communication uses mutual TLS (mTLS) with client certificate verification. This means both the client and server must present valid certificates signed by the internal Certificate Authority.

Parameter	Value
Internal CA	Self-managed ECDSA P-256 CA
Certificate lifetime	90 days (auto-rotated)
Verification mode	Required (reject if no client cert)
Revocation	CRL + OCSP
Service identity	SPIFFE URI in certificate Subject Alternative Name

Benefits of mTLS:

Even if network boundaries are breached, unauthorized services cannot access internal APIs
Every service-to-service call is authenticated and encrypted
Certificates provide strong service identity (not just IP-based)
No shared secrets between services (except Vault tokens)

12.2.3 TLS Configuration Code Example

# FastAPI TLS configuration
from fastapi import FastAPI
from uvicorn.config import Config

app = FastAPI()

# TLS settings for uvicorn
ssl_config = {
    "ssl_keyfile": "/certs/server.key",
    "ssl_certfile": "/certs/server.crt",
    "ssl_ca_certs": "/certs/ca.crt",          # For mTLS
    "ssl_cert_reqs": ssl.CERT_REQUIRED,        # Require client cert
    "ssl_min_version": ssl.TLSVersion.TLSv1_2,
    "ssl_ciphers": "ECDHE-ECDSA-AES256-GCM-SHA384:"
                    "ECDHE-RSA-AES256-GCM-SHA384:"
                    "ECDHE-ECDSA-CHACHA20-POLY1305:"
                    "ECDHE-RSA-CHACHA20-POLY1305",
}

12.3 Authentication

12.3.1 Password Policy

Requirement	Value	Enforcement
Minimum length	12 characters	Hard validation
Complexity	At least one uppercase, one lowercase, one digit, one special character	Regex validation
Password history	Last 12 passwords cannot be reused	Database check
Hashing algorithm	Argon2id (memory-hard, resistant to GPU cracking)	Passwords never stored in plaintext
Argon2id parameters	Time cost: 3, Memory: 64MB, Parallelism: 4	Tuned for 500ms hash time
HaveIBeenPwned check	Enabled for all new passwords	k-anonymity API (no full password sent)
Maximum age	90 days	Configurable; reminder at 75 days
Lockout after failures	5 failed attempts	30-minute lockout
Password change	Users cannot reuse current password	Immediate validation

12.3.2 JWT Token Configuration

Parameter	Value	Notes
Signing algorithm	ES256 (ECDSA with P-256 curve)	Smaller signatures than RS256; same security
Access token lifetime	15 minutes	Short-lived for security
Refresh token lifetime	7 days	Long-lived but revocable
Key rotation	Every 180 days	Dual-key support for zero-downtime rotation
Key storage	HashiCorp Vault	Private key never exposed to application filesystem
Token binding	Session ID + browser fingerprint	Detects token theft/reuse
Claims	`sub`, `iss`, `aud`, `exp`, `iat`, `jti`, `role`, `permissions`, `mfa_verified`	Standard + custom claims
Issuer	`sentinel-ai`	Verified by all services
Audience	`sentinel-api`	Scope-limited

JWT Token Structure:

{
  "header": {
    "alg": "ES256",
    "typ": "JWT",
    "kid": "key-2025-01"
  },
  "payload": {
    "sub": "user-uuid-here",
    "iss": "sentinel-ai",
    "aud": "sentinel-api",
    "exp": 1705500000,
    "iat": 1705499100,
    "jti": "unique-token-id",
    "role": "operator",
    "permissions": ["alerts:view", "alerts:acknowledge", "cameras:view"],
    "mfa_verified": true,
    "session_id": "sess-uuid-here"
  }
}

12.3.3 Multi-Factor Authentication (MFA)

Parameter	Value
Method	TOTP (Time-based One-Time Password) per RFC 6238
Issuer label	"Sentinel AI Surveillance"
Algorithm	SHA-1 (for compatibility)
Digit length	6 digits
Time step	30 seconds
Valid window	1 step before and after current (3-step tolerance)
Recovery codes	10 single-use codes generated at setup
Enforced for	Super Admin, Admin roles (mandatory)
Optional for	Operator, Viewer roles (recommended)
QR code format	otpauth://totp/Sentinel%20AI:{username}?secret={secret}&issuer=Sentinel%20AI

MFA Enforcement Matrix:

Role	MFA Required	Can Disable
Super Admin	Yes	No
Admin	Yes	No
Operator	No (Recommended)	Yes
Viewer	No	Yes

12.4 Role-Based Access Control (RBAC)

12.4.1 Role Definitions

Role	Level	Description	Typical Users	Count
Super Admin	L1	Full system access; can manage other admins	CISO, CTO, Platform Lead	1-2
Admin	L2	Administrative functions; day-to-day management	Security Manager, IT Manager	2-4
Operator	L3	Day-to-day surveillance operations	Security guards, SOC analysts	5-20
Viewer	L4	Read-only access for review and audit	Auditors, Management	2-10

12.4.2 Permission Matrix (30+ Permissions)

Permission	Super Admin	Admin	Operator	Viewer
`users:full_access`	Y	N	N	N
`users:manage` (create/edit/deactivate)	Y	Y	N	N
`users:view` (list, details)	Y	Y	Y	Y
`users:reset_password`	Y	Y	N	N
`users:reset_mfa`	Y	Y	N	N
`cameras:full_access`	Y	N	N	N
`cameras:manage` (add/edit/remove)	Y	Y	N	N
`cameras:view` (list, status)	Y	Y	Y	Y
`cameras:control` (PTZ, restart stream)	Y	Y	Y	N
`cameras:configure_zones`	Y	Y	N	N
`alerts:manage` (edit rules, bulk actions)	Y	Y	N	N
`alerts:view` (list, filter, search)	Y	Y	Y	Y
`alerts:acknowledge`	Y	Y	Y	N
`alerts:resolve`	Y	Y	Y	N
`alerts:mark_false_positive`	Y	Y	Y	N
`persons:full_access`	Y	N	N	N
`persons:manage` (create/edit/delete)	Y	Y	N	N
`persons:view` (gallery, profiles)	Y	Y	Y	Y
`persons:name_unknown`	Y	Y	Y	N
`persons:merge`	Y	Y	Y	N
`watchlists:manage` (create/edit/delete)	Y	Y	N	N
`watchlists:view` (list, members)	Y	Y	Y	Y
`watchlists:add_remove_members`	Y	Y	Y	N
`ai_settings:manage` (change defaults)	Y	Y	N	N
`ai_settings:view` (see current settings)	Y	Y	Y	Y
`ai_settings:adjust` (operator adjustments)	Y	Y	Y	N
`reports:full_access`	Y	N	N	N
`reports:view` (all reports)	Y	Y	Y	Y
`reports:export`	Y	Y	Y	N
`system:full_access`	Y	N	N	N
`system:manage` (config changes)	Y	Y	N	N
`system:view` (health, status)	Y	Y	Y	Y
`audit:view` (audit logs)	Y	Y	N	N
`notifications:manage` (routing rules)	Y	Y	N	N
`storage:manage` (retention policies)	Y	Y	N	N
`storage:view` (usage, reports)	Y	Y	Y	Y
`privacy:manage` (GDPR actions)	Y	Y	N	N
`privacy:view` (consent status)	Y	Y	Y	Y

12.4.3 Resource-Level Permissions

Beyond global permissions, the system supports resource-level access control:

Resource Type	Granularity	Example
Cameras	Per-camera access	Operator A can only view CAM-01, CAM-02
Zones	Per-zone access	Operator B can only view "entrance" zone
Alerts	Per-camera origin	Viewer can only see alerts from specific cameras
Persons	Per-department	HR can only view employee records
Watchlists	Per-watchlist	Security can only view "blacklist", not "vip"

12.5 VPN and Network Security

12.5.1 WireGuard VPN Configuration

WireGuard provides the encrypted tunnel between cloud infrastructure and the edge site:

Parameter	Value	Notes
Protocol	WireGuard	Modern, simple, fast VPN
Port	UDP 51820	Single port, firewall-friendly
Authentication	Ed25519 key pairs + Preshared Key (PSK)	Defense in depth
Encryption	ChaCha20-Poly1305	Fast on hardware without AES-NI
Key exchange	Curve25519 elliptic curve	128-bit security
Tunnel network	10.200.0.0/24	Dedicated VPN subnet
Cloud endpoint	10.200.0.1/32	Single IP for cloud side
Edge endpoint	10.200.0.2/32	Single IP for edge side
AllowedIPs (cloud)	10.200.0.2/32, 192.168.29.0/24	Edge + camera network only
AllowedIPs (edge)	10.100.0.0/16, 10.200.0.0/24	Full cloud VPC + VPN
Keepalive	25 seconds	Prevents NAT timeout
Key rotation	365 days	Annual rotation via maintenance window

12.5.2 Network Segmentation Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                         NETWORK ARCHITECTURE                                  │
│                                                                              │
│   INTERNET                                                                   │
│      │                                                                       │
│      ▼                                                                       │
│   ┌──────────────┐                                                           │
│   │  AWS WAF     │                                                           │
│   │  + ALB       │                                                           │
│   └──────┬───────┘                                                           │
│          │                                                                   │
│   ═══════╪════════════════ AWS CLOUD VPC: 10.100.0.0/16 ═══════════════════  │
│          │                                                                   │
│          │    ┌──────────────────────────────────────────────────────┐       │
│          │    │  PUBLIC SUBNET: 10.100.1.0/24                      │       │
│          │    │  - ALB (Application Load Balancer)                  │       │
│          │    │  - NAT Gateway                                       │       │
│          │    │  - WireGuard VPN Gateway (10.200.0.1)               │       │
│          │    │  - Bastion Host (emergency SSH, admin IPs only)     │       │
│          │    └──────────────────────────────────────────────────────┘       │
│          │                                                                   │
│          └────▶┌──────────────────────────────────────────────────────┐      │
│               │  PRIVATE SUBNET: 10.100.2.0/24 (App Tier)            │      │
│               │  - EKS Worker Nodes (API, AI, Web pods)              │      │
│               │  - Stream Ingestion Service                          │      │
│               │  - Alert Engine                                      │      │
│               │  - Notification Service                              │      │
│               └──────────────────────────────────────────────────────┘      │
│               ▲                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  DATA SUBNET: 10.100.3.0/24 (No Internet)          │ │
│               │    │  - RDS PostgreSQL (Multi-AZ)                        │ │
│               │    │  - ElastiCache Redis Cluster                        │ │
│               │    │  - Amazon MSK Kafka                                 │ │
│               │    │  - NO INTERNET ACCESS (VPC endpoints only)          │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  MONITORING SUBNET: 10.100.4.0/24                  │ │
│               │    │  - Prometheus, Grafana, Alertmanager                │ │
│               │    │  - Loki (log aggregation)                           │ │
│               │    │  - Jaeger (distributed tracing)                     │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                                                              │
│   ════════════╪══════════════════════════════════════════════════════════    │
│               │                                                              │
│               │         WireGuard VPN Tunnel (UDP 51820)                   │
│               │                                                              │
│   ════════════╪══════════════════════════════════════════════════════════    │
│               │                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  EDGE GATEWAY: 192.168.29.5/24 (Intel NUC)           │ │
│               │    │  OS: Ubuntu Server 22.04 LTS (minimal)               │ │
│               │    │  - Docker Compose stack                              │ │
│               │    │  - WireGuard Client (10.200.0.2)                     │ │
│               │    │  - Local MinIO (hot storage)                         │ │
│               │    │  - Redis (local cache)                               │ │
│               │    │  - Video Capture Service                             │ │
│               │    │  - AI Inference (edge models)                        │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                              │                               │
│               │    ┌─────────────────────────┴──────────────────────┐       │
│               │    │  CAMERA LAN: 192.168.29.0/24                    │       │
│               │    │  - CP PLUS DVR: 192.168.29.200 (8 channels)     │       │
│               │    │  - RTSP streams on port 554                     │       │
│               │    │  - NO INTERNET ACCESS                           │       │
│               │    │  - NO ROUTE TO CLOUD (only via edge gateway)    │       │
│               │    └────────────────────────────────────────────────┘       │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

12.5.3 Firewall Rules

Edge Gateway Firewall (iptables):

Direction	Protocol	Port	Source	Destination	Action	Purpose
IN	TCP	22	Admin IP range only	Edge gateway	ACCEPT	SSH management
IN	UDP	51820	Cloud VPN IP	Edge gateway	ACCEPT	WireGuard tunnel
IN	TCP	8080	Local LAN only	Edge gateway	ACCEPT	Admin UI
IN	—	—	Any	Edge gateway	DROP	Default deny
OUT	TCP	443	Edge gateway	AWS S3 endpoint	ACCEPT	Cloud storage sync
OUT	UDP	51820	Edge gateway	Cloud VPN IP	ACCEPT	WireGuard tunnel
OUT	TCP	8080	Edge gateway	Local LAN	ACCEPT	Internal services
OUT	—	—	Edge gateway	Internet	DROP	No direct internet

Cloud Firewall (AWS Security Groups):

Direction	Protocol	Port	Source	Action	Purpose
IN	TCP	443	0.0.0.0/0	ACCEPT	Public HTTPS
IN	UDP	51820	Edge gateway IP	ACCEPT	WireGuard
IN	TCP	5432	App security group	ACCEPT	PostgreSQL
IN	TCP	6379	App security group	ACCEPT	Redis
IN	TCP	9092	App security group	ACCEPT	Kafka
IN	TCP	22	Admin IPs only	ACCEPT	Bastion SSH
IN	—	—	Any	DROP	Default deny

12.6 Secret Management

12.6.1 Vault Integration

All secrets are stored in HashiCorp Vault with automatic rotation policies:

Secret Type	Encryption	Rotation Frequency	Rotation Method	Access Pattern
Database passwords	AES-256-GCM	90 days	Terraform + Vault dynamic credentials	Short-lived (1-hour TTL)
JWT signing keys	AES-256-GCM	180 days	Dual-key grace period	Zero-downtime rotation
Internal API keys	AES-256-GCM	90 days	Zero-downtime rotation	Automated
Telegram bot tokens	AES-256-GCM	180 days	Regenerate via BotFather	Semi-automated
WhatsApp API tokens	AES-256-GCM	180 days	Regenerate via Meta Business Manager	Semi-automated
DVR credentials	AES-256-GCM	180 days	Manual via DVR web UI	Manual
TLS certificates	ACME auto	60 days	cert-manager + Let's Encrypt	Fully automated
WireGuard keys	AES-256-GCM	365 days	Maintenance window rotation	Scripted
Backup encryption keys	AES-256-GCM	365 days	Re-encrypt all backups	Automated
Session secrets	AES-256-GCM	On security incident	Immediate revocation	Admin trigger

12.6.2 Dynamic Database Credentials

Instead of static database passwords, the system uses Vault's dynamic credential engine:

Application → Vault (request db credentials)
                  │
                  ▼
           Vault creates temporary DB user
           (TTL: 1 hour, auto-revoke)
                  │
                  ▼
           Application receives credentials
           Uses them for DB connections
                  │
                  ▼
           After TTL expires → Vault revokes DB user
           Application requests new credentials

Benefits:

No long-lived database passwords in application configuration
Each application instance gets unique credentials
Automatic credential rotation without application restart
Full audit trail of credential issuance and revocation
Instant credential revocation on compromise

12.6.3 Field-Level Encryption

PII and biometric data in the database uses AES-256-GCM field-level encryption:

Field Category	Example Fields	Encryption
Personal identification	`name_encrypted`, `email_encrypted`, `phone_encrypted`	AES-256-GCM per-field
Employment data	`employee_id_encrypted`, `department_encrypted`	AES-256-GCM per-field
Biometric data	`face_encoding_encrypted` (512-D vector)	AES-256-GCM per-field
Media metadata	`location_encrypted` (GPS coordinates)	AES-256-GCM per-field

Encryption Architecture:

Application receives plaintext data
       │
       ▼
[Encrypt field-by-field using Vault KMS]
       │
       ▼
Store ciphertext in PostgreSQL
       │
       ▼
[Decrypt only in application layer when needed]
       │
       ▼
Decrypted data never logged, never cached

12.7 Audit Logging

12.7.1 Tamper-Resistant Hash-Chain

The audit log implements a cryptographically linked chain to ensure integrity:

Field	Purpose	Example
`event_id`	Unique UUID for each audit event	`550e8400-e29b-41d4-a716-446655440000`
`timestamp`	ISO 8601 timestamp	`2025-01-16T14:32:15Z`
`event_type`	Category of event	`user_login`, `person_viewed`, `alert_acknowledged`
`actor_id`	User who performed the action	`user-uuid-here`
`actor_role`	Role of the actor at the time	`operator`
`resource_type`	Type of resource accessed	`person`, `camera`, `alert`
`resource_id`	Specific resource identifier	`person-123`, `cam-01`
`action`	Action performed	`view`, `edit`, `delete`, `create`
`result`	Success or failure	`success`, `failure`, `denied`
`ip_address`	Source IP address	`10.100.2.15`
`session_id`	Session identifier	`sess-uuid-here`
`previous_hash`	SHA-256 hash of the previous entry	`a3f5c2...`
`entry_hash`	SHA-256 hash of current entry content	`b7e1d9...`
`signature`	ECDSA signature of the entry hash	`30450221...`

Chain Verification: Any modification to historical entries invalidates all subsequent hashes and signatures, making tampering detectable.

12.7.2 Log Retention Policy

Log Type	Online Retention	Archive Retention	Storage Type
Authentication events	1 year	6 years	WORM (Write-Once-Read-Many)
Authorization decisions	1 year	6 years	WORM
Person data modifications	1 year	6 years	WORM
Alert actions (ack, resolve)	1 year	3 years	Standard
Configuration changes	2 years	5 years	Standard
Security events	1 year	6 years	WORM
System health events	90 days	1 year	Standard
API access logs	90 days	1 year	Standard

12.7.3 Real-Time Security Alerting

Automated detection rules trigger alerts on suspicious patterns:

Rule ID	Rule Name	Condition	Auto-Response
SEC-001	Brute force login	> 5 failed logins from same IP in 5 minutes	Block IP for 1 hour; alert security team
SEC-002	Credential stuffing	> 10 unique usernames from same IP in 5 minutes	Block IP for 24 hours; alert security team
SEC-003	Impossible travel	Logins > 500 km apart within 1 hour	Force MFA re-verification; alert security team
SEC-004	Privilege escalation	> 20 admin actions in 10 minutes from new user	Alert security team; log for review
SEC-005	Data exfiltration	> 1 GB downloaded by single user in 1 hour	Suspend account; alert security team
SEC-006	Off-hours admin	Admin action between 22:00-06:00	Log + notify security manager
SEC-007	MFA bypass attempt	> 3 MFA failures then success without MFA	Block account; alert security team
SEC-008	Suspicious media access	> 50 media downloads by non-security role	Alert security team
SEC-009	Unknown device login	Login from unrecognized device fingerprint	Require MFA; notify user
SEC-010	Concurrent sessions	> 3 concurrent sessions for same user	Force logout of oldest session

12.8 Media Access Security

12.8.1 Signed URL Architecture

Media files are never served directly from object storage. All access is mediated through signed URLs:

Parameter	Value	Notes
Default expiration	5 minutes	Short-lived to prevent sharing
Maximum expiration	1 hour	For bulk exports only
URL binding	Tied to user session	Invalidated on logout
Single-use option	Available for sensitive media	Blacklist incident footage
Access logging	Every media request logged	User ID, media ID, timestamp, IP
IP binding	Optional	URL valid only from requesting IP
Watermarking	Optional	Username/timestamp overlay on images

Signed URL Flow:

1. User requests to view media
2. System checks: authentication + authorization + consent
3. If allowed: generate signed URL with HMAC-SHA256 signature
4. URL format: https://cdn.example.com/media/{id}?token={jwt}&sig={hmac}
5. Redirect user to signed URL
6. CDN/Object storage validates signature and expiry
7. Media served if valid; 403 if expired or invalid
8. Access logged with full context

12.8.2 Media Access Controls

Control	Implementation
No direct S3/MinIO URLs	All access via signed URL proxy
Authentication required	Valid JWT session required for all media requests
Authorization enforced	RBAC checks per media item; camera-level permissions respected
Access logging	Every media request logged with user ID, media ID, timestamp, IP, session
DPO notification	Automatic notification for access to sensitive media (blacklist incidents)
Secure deletion	Overwrite with random data + verification before removal
Download tracking	Number of downloads per media item tracked and reported

12.9 API Security

12.9.1 Defense Layers

Layer	Implementation	Details
Rate limiting	Per-endpoint, per-user tiers	Token bucket algorithm; 100 req/min default; 10 req/min for auth endpoints
Input validation	Pydantic models on all endpoints	Strict type checking; reject unknown fields; max length limits
SQL injection prevention	Parameterized queries only	No dynamic SQL construction; ORM for all database access
XSS prevention	Output encoding + CSP headers	User input never rendered as HTML; Content-Security-Policy enforced
CSRF protection	SameSite=Strict cookies + tokens	State-changing operations require CSRF token validation
CORS	Restricted to known origins	No wildcard origins; explicit allowlist per environment
Request size limits	10 MB default; 50 MB for media upload	Prevents DoS via large payloads
Request timeout	30 seconds default	Prevents resource exhaustion

12.9.2 Security Headers

Header	Value	Purpose
`Strict-Transport-Security`	`max-age=63072000; includeSubDomains; preload`	Enforce HTTPS for 2 years
`X-Content-Type-Options`	`nosniff`	Prevent MIME-type sniffing
`X-Frame-Options`	`DENY`	Prevent clickjacking
`X-XSS-Protection`	`0`	Disabled — CSP is preferred defense
`Referrer-Policy`	`strict-origin-when-cross-origin`	Minimal referrer information
`Permissions-Policy`	`camera=(), microphone=(), geolocation=()`	Disable browser APIs not needed
`Content-Security-Policy`	`default-src 'self'; script-src 'self' 'nonce-{random}'; style-src 'self' 'unsafe-inline'; img-src 'self' blob: data: https://.amazonaws.com; media-src 'self' blob: https://.amazonaws.com; connect-src 'self' wss://*.example.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self';`	Comprehensive CSP
`Cache-Control` (API)	`no-store, no-cache, must-revalidate, proxy-revalidate`	Prevent caching of API responses
`Pragma` (API)	`no-cache`	Legacy cache directive

12.10 Session Security

Parameter	Value	Notes
Cookie flags	`HttpOnly; Secure; SameSite=Strict`	Full protection against XSS and CSRF
Access token storage	Memory only (JavaScript variable)	Never stored in localStorage
Access token max-age	15 minutes	Short-lived
Refresh token storage	`HttpOnly` secure cookie	Cannot be accessed by JavaScript
Refresh token max-age	7 days	Long-lived but revocable
Session absolute timeout	8 hours	Force re-login after 8 hours
Idle timeout	30 minutes	Expire if no activity
Max concurrent sessions	3 per user	Prevents session abuse
Session fixation protection	Regenerate session ID on login	Prevent fixation attacks
Session binding	Browser fingerprint + IP validation	Detect session theft
Force logout capability	Admin can revoke all sessions for any user	Immediate effect via Redis
Session storage	Redis with AUTH enabled	Encrypted at rest

12.11 Data Privacy (GDPR Compliance)

12.11.1 GDPR Compliance Matrix

GDPR Principle	Implementation Detail	Evidence
Lawful Basis	Legitimate interest assessment documented per processing purpose	LIA document filed with DPO
Data Minimization	Only facial feature embeddings (512-D vector) stored; raw images discarded after encoding	Architecture documentation
Purpose Limitation	Facial data used ONLY for security/safety purposes; no marketing or secondary use	Privacy policy
Storage Limitation	Automated retention enforcement; cryptographic deletion after expiry	Retention policy configuration
Accuracy	Regular review and correction procedures; user can request correction	Data correction workflow
Integrity & Confidentiality	AES-256-GCM encryption, RBAC access controls, audit logging	Security architecture
Accountability	DPO appointed; Privacy Impact Assessment completed; Records of Processing maintained	Compliance documentation
Transparency	Privacy notice displayed at camera entry points; privacy policy on website	Physical signage + web policy

12.11.2 Consent Management

Consent is managed through a comprehensive lifecycle:

Stage	Description	Transition Trigger
`pending`	Consent requested but not yet obtained	Initial system setup
`granted`	Explicit consent obtained	User signs consent form
`withdrawn`	Consent actively withdrawn	User requests deletion/stop processing
`deleted`	All data removed; audit trail only	Deletion workflow complete

Consent Metadata:

Field	Description
Consent method	`written` / `digital` / `verbal`
Consent document reference	ID of signed consent form
Consent date	When consent was obtained
Consent recorder	Who recorded the consent
Consent expiry	Annual expiry date
Consent scope	What processing is consented to

Withdrawal Processing:

User submits withdrawal request (any channel)
System flags person record for deletion
Delete face embeddings (biometric data) within 72 hours
Delete all personal images from storage
Anonymize detection events (keep event, replace name with [REDACTED], remove person link)
Delete related event clips
Log all deletion actions in audit trail
Confirm completion to user within 30 days

12.11.3 Privacy Mode Controls

Four privacy modes are available per camera:

Mode	Recording	Face Recognition	Alerts	Live View	Use Case
Full Operation	Yes	Yes	All	Yes	Standard surveillance
Recording Only	Yes	No	Motion only (no face)	Yes	Areas where facial recognition is not needed
Live View Only	No	No	No	Yes	Privacy-sensitive areas; viewing only
Privacy Mode	No	No	No	Privacy overlay	Break rooms, restrooms — privacy completely protected

12.12 Edge Gateway Security

12.12.1 Hardening Checklist

#	Hardening Measure	Implementation
1	Minimal OS	Ubuntu Server 22.04 LTS — no desktop packages
2	Disabled Bluetooth	`systemctl stop bluetooth; systemctl disable bluetooth`
3	Disabled WiFi	`nmcli radio wifi off; modprobe -r iwlwifi`
4	Disabled CUPS	`systemctl stop cups; systemctl disable cups`
5	Disabled avahi/mDNS	`systemctl stop avahi-daemon; systemctl disable avahi-daemon`
6	Disabled snapd	`systemctl stop snapd; systemctl disable snapd`
7	Disabled modemmanager	`systemctl stop ModemManager; systemctl disable ModemManager`
8	SSH key-only	`PasswordAuthentication no; PubkeyAuthentication yes`
9	SSH LAN-only	`ListenAddress 192.168.29.5`
10	SSH root disabled	`PermitRootLogin no`
11	SSH rate limit	`MaxAuthTries 3; ClientAliveInterval 300`
12	SSH protocol 2	`Protocol 2` (only)
13	SSH modern ciphers	`Ciphers chacha20-poly1305@openssh.com`
14	Auto-updates	`unattended-upgrades` — security updates only
15	Update schedule	Daily at 03:00; auto-reboot at 04:00 if required
16	Disk encryption	LUKS + TPM2 auto-unseal
17	Tamper detection	File integrity monitoring (AIDE) for critical config
18	Container security	Non-root users, read-only root FS, no new privileges
19	Firewall	iptables default deny; explicit allow only
20	No internet access	All outbound traffic via VPN tunnel only

12.12.2 LUKS Disk Encryption with TPM2

The edge gateway uses LUKS full-disk encryption with TPM2 auto-unseal for headless operation:

# During setup — encrypt the data partition
cryptsetup luksFormat /dev/nvme0n1p2 \
    --type luks2 \
    --cipher aes-xts-plain64 \
    --key-size 512 \
    --pbkdf argon2id \
    --tpm2-device=auto

# Bind the LUKS key to TPM2 PCR measurements
cryptsetup luksAddKey /dev/nvme0n1p2 \
    --key-slot 1 \
    --tpm2-device=auto \
    --tpm2-pcrs=0,2,7

# During boot — TPM2 auto-unseals if PCRs match
cryptsetup open --tpm2-device=auto /dev/nvme0n1p2 data

PCR Measurements Bound:

PCR	Purpose
PCR 0	Core system firmware executable code
PCR 2	Extended or pluggable executable code
PCR 7	Secure Boot state

12.13 Cloud Infrastructure Security

Control	Implementation	Verification
Private subnets	All internal services in private subnets; no public IPs	VPC flow logs
Security groups	Least privilege; explicit allow only; no default allow-all	Quarterly review
Database access	No public access; app servers only via security group reference	AWS Config rule
Bastion host	Emergency access only; non-standard SSH port (2222); admin IP allowlist only	Access log audit
IMDSv2	Enforced on all EC2 instances; no IMDSv1 fallback	Instance metadata check
Container security	Non-root users, read-only root FS, no new privileges, drop ALL capabilities	Pod Security admission
Image scanning	Trivy + Snyk on every build; HIGH/CRITICAL vulnerabilities block deployment	CI/CD pipeline gate
Image signing	Cosign signature verification required before deployment	Admission controller
Resource quotas	Kubernetes LimitRange on all namespaces	Resource quota monitoring
Network policies	Default deny all ingress/egress; explicit rules per service	Policy audit
Pod Security	Restricted standard enforced cluster-wide	Pod Security admission
Secrets management	Vault + External Secrets Operator; no secrets in Git	Secret scanning
Logging	All AWS API calls logged via CloudTrail; VPC Flow Logs enabled	Log analysis

12.14 Secrets Rotation Policy

Secret Type	Frequency	Method	Automation	Rollback
Database passwords	90 days	Terraform + Vault dynamic credentials	Full	N/A (short-lived)
JWT signing keys	180 days	Dual-key grace period; new key signs, old key verifies for 7 days	Full	Keep old key for 7 days
Internal API keys	90 days	Zero-downtime: add new key, deploy, remove old key	Full	Immediate via config revert
Telegram/WhatsApp tokens	180 days or on suspicion	Generate new via provider, update Vault, 5-min grace, revoke old	Semi	Old token valid for 5-minute grace
TLS certificates	60 days	cert-manager + Let's Encrypt auto-renewal	Full	Previous certificate cached
WireGuard keys	365 days	Maintenance window: generate new keys, update both endpoints simultaneously	Scripted	Manual key restore
DVR credentials	180 days	Manual via DVR web UI	Manual	Previous password documented
Backup encryption keys	365 days	Generate new key, re-encrypt all backups in background	Full	Previous key kept for 30 days
Session secrets	On security incident	Immediate: generate new secret, force all re-authentication	Admin trigger	Not applicable

12.15 Incident Response

12.15.1 Security Event Detection and Response

Phase	Timeline	Actions	Responsible
Detection	Automated (real-time)	Automated rules + behavioral analysis detect anomaly; alert generated	System
Assessment	0-15 minutes	On-call engineer evaluates severity; determines if genuine security event	On-call Engineer
Containment	15-60 minutes	Isolate affected systems; revoke compromised credentials; block malicious IPs	Security Team
Eradication	1-4 hours	Remove root cause; patch vulnerabilities; rotate all exposed secrets	Engineering
Recovery	4-24 hours	Restore from clean backups; verify system integrity; re-enable services	Platform Team
Lessons Learned	24-48 hours	Post-mortem; update procedures; implement preventive measures	Security Team

12.15.2 Breach Notification Procedure

Phase	Timeline	GDPR Requirement	Actions
Detection & Assessment	0-24 hours	—	Confirm breach; contain; assemble response team
Investigation	24-72 hours	Article 33(1)	Forensic analysis; determine scope of affected data
Supervisory Authority	Within 72 hours	Article 33	Notify Data Protection Authority
Data Subjects	Without undue delay	Article 34	Notify affected individuals if high risk
Recovery	Post-notification	—	Restore from clean backups; apply patches
Post-Incident	Within 48 hours	Article 5(2)	Root cause analysis; update plans; document

12.15.3 Breach Severity Classification

Level	Criteria	Notification Required	Example
Low	No personal data accessed	Internal only	Failed attack attempt; no data exposure
Medium	Limited personal data; no sensitive data	DPA notification	Username/email list exposed
High	Sensitive personal data or biometric data accessed	DPA + Data subjects	Facial embeddings database accessed
Critical	Large-scale biometric exfiltration; ongoing threat	DPA + Data subjects + Public	Ransomware attack with biometric data theft

12.16 Security Checklist Summary

The complete security checklist contains 100+ items across 15 categories. The following table summarizes the key items per category:

Category	Items	Key Requirements
SSL/TLS	8	TLS 1.3, strong cipher suites only, HSTS, OCSP stapling, auto-renewal
Authentication	13	Argon2id, JWT ES256, MFA enforcement, password policy, HaveIBeenPwned
RBAC	7	4 roles, 30+ permissions, resource-level access, default deny
VPN & Network	10	WireGuard + PSK, 5 security zones, firewall deny-all, network policies
Secret Management	10	Vault storage, dynamic credentials, field encryption, rotation schedule
Audit Logging	11	Hash-chain integrity, 20+ fields per entry, WORM storage, real-time alerts
Media Access	8	Signed URLs, session-bound, 5-min expiry, single-use option, watermarking
API Security	11	Rate limiting, Pydantic validation, parameterized queries, CSP, CSRF, CORS
Session Security	8	HttpOnly/Secure/Strict cookies, 8h absolute timeout, 30m idle timeout
Data Privacy (GDPR)	13	Consent tracking, right to deletion, anonymization, DPO, PIA
Edge Gateway	12	20-point hardening, LUKS + TPM2, tamper detection, auto-updates
Cloud Infrastructure	11	Private subnets, image scanning, Pod Security, IMDSv2, CloudTrail
Secrets Rotation	7	All types scheduled, 60-day TLS, 90-day DB, dual-key JWT
Incident Response	9	Detection rules, breach notification, severity classification, post-mortem
Total	130+	—

Section 13: UX / Website Structure

13.1 Design System

13.1.1 Design Philosophy

The UX design follows a "dark cockpit" philosophy optimized for 24/7 surveillance operations. The interface minimizes eye strain during long monitoring shifts while ensuring critical information is immediately visible. All design decisions prioritize operator efficiency and rapid threat identification.

Principle	Implementation
Dark mode default	Near-black background with blue-tinted grays to reduce eye strain in low-light environments
Information density	High-density layouts that maximize data visible without scrolling
At-a-glance status	Color-coded status indicators for immediate situational awareness
Progressive disclosure	Advanced controls hidden behind "Expand" toggles; essential info always visible
Consistent patterns	Same interaction patterns reused across all 18 pages
Responsive feedback	Every action produces visible feedback within 100ms

13.1.2 Color Palette

Token	Hex	RGBA	Usage	Contrast Ratio
`--bg-primary`	`#0B0E14`	rgb(11, 14, 20)	Main application background	—
`--bg-secondary`	`#151922`	rgb(21, 25, 34)	Card and panel backgrounds	—
`--bg-tertiary`	`#1E2330`	rgb(30, 35, 48)	Elevated surfaces, modals, dropdowns	—
`--bg-sidebar`	`#0D1117`	rgb(13, 17, 23)	Sidebar navigation background	—
`--bg-hover`	`#1A2030`	rgb(26, 32, 48)	Row/card hover state	—
`--bg-selected`	`#1E3A5F`	rgb(30, 58, 95)	Selected item background	—
`--text-primary`	`#E2E8F0`	rgb(226, 232, 240)	Headings, important content	15.8:1
`--text-secondary`	`#94A3B8`	rgb(148, 163, 184)	Labels, descriptions, metadata	9.2:1
`--text-muted`	`#64748B`	rgb(100, 115, 139)	Placeholder text, disabled states	6.1:1
`--accent-blue`	`#3B82F6`	rgb(59, 130, 246)	Primary accent — buttons, links, active states	4.5:1
`--accent-blue-hover`	`#2563EB`	rgb(37, 99, 235)	Button/link hover state	5.1:1
`--accent-green`	`#10B981`	rgb(16, 185, 129)	Success, online status, positive trends	5.3:1
`--accent-red`	`#EF4444`	rgb(239, 68, 68)	Critical alerts, errors, offline status	5.0:1
`--accent-orange`	`#F59E0B`	rgb(245, 158, 11)	Warnings, medium severity	5.4:1
`--accent-yellow`	`#FBBF24`	rgb(251, 191, 36)	Watchlist indicators, highlights	6.1:1
`--accent-purple`	`#8B5CF6`	rgb(139, 92, 246)	AI features, special highlights	4.8:1
`--border-color`	`#1E293B`	rgb(30, 41, 59)	Card borders, dividers, separators	—
`--border-focus`	`#3B82F6`	rgb(59, 130, 246)	Focus ring color	—
`--shadow-sm`	`0 1px 2px rgba(0,0,0,0.3)`	—	Subtle elevation	—
`--shadow-md`	`0 4px 6px rgba(0,0,0,0.4)`	—	Card elevation	—
`--shadow-lg`	`0 10px 25px rgba(0,0,0,0.5)`	—	Modal/dialog elevation	—

13.1.3 Typography

Token	Font Family	Size	Weight	Line Height	Letter Spacing	Usage
Display	Inter	28px	700 (Bold)	1.2	-0.02em	Page titles
H1	Inter	22px	600 (Semi-bold)	1.3	-0.01em	Section headings
H2	Inter	18px	600 (Semi-bold)	1.4	0	Card titles, modal headers
H3	Inter	15px	500 (Medium)	1.4	0	Sub-sections, form labels
Body	Inter	14px	400 (Regular)	1.5	0	General text, descriptions
Body Small	Inter	13px	400 (Regular)	1.5	0	Secondary body text
Caption	Inter	12px	400 (Regular)	1.4	0.01em	Captions, metadata, footnotes
Timestamp	JetBrains Mono	12px	400 (Regular)	1.4	0	All timestamps, durations
Code	JetBrains Mono	13px	400 (Regular)	1.5	0	Code snippets, IDs, technical data
Badge	Inter	11px	500 (Medium)	1	0.02em	Status badges, tags

13.1.4 Spacing and Layout

Token	Value	Usage
Sidebar expanded	260px	Full navigation with labels and icons
Sidebar collapsed	72px	Icons only; hover for tooltip
Top bar height	56px	Clock, alerts, user menu
Content padding	24px	Page content horizontal padding
Content max-width	1400px	Maximum content width; centered above
Card padding	16px	Internal card padding
Card border radius	12px	Card and panel corners
Card gap	16px	Gap between cards in grid
Button border radius	8px	Button corners
Input border radius	6px	Form input corners
Modal border radius	16px	Modal/dialog corners
Toast border radius	8px	Toast notification corners
Avatar size (small)	24px	Inline avatars
Avatar size (medium)	40px	Card headers, lists
Avatar size (large)	64px	Profile pages
Icon size (default)	20px	Navigation and actions
Icon size (small)	16px	Inline icons
Scrollbar width	8px	Custom styled scrollbar

13.2 Global Navigation Structure

13.2.1 Layout Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│ [Logo]  Sentinel AI Surveillance              [Clock] [Alerts] [👤 User] │  ▲ 56px
├────────┬───────────────────────────────────────────────────────────────────┤
│        │                                                                    │
│  [📊]  │                    MAIN CONTENT AREA                              │
│  Dash  │                                                                    │
│  board │    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│        │    │   Card 1     │  │   Card 2     │  │   Card 3     │         │
│  [📹]  │    │              │  │              │  │              │         │
│  Live  │    └──────────────┘  └──────────────┘  └──────────────┘         │
│        │                                                                    │
│  [🔔]  │    ┌──────────────────────────────────────────────────┐         │
│ Alerts │    │              Wide Card / Table                   │         │
│        │    └──────────────────────────────────────────────────┘         │
│  [🔍]  │                                                                    │
│ Detec  │                                                                    │
│ tions  │                                                                    │
│        │                                                                    │
│ [remaining navigation items...]                                            │
│        │                                                                    │
├────────┤                                                                    │
│◁ / ▷  │                                                                    │
└────────┴───────────────────────────────────────────────────────────────────┘
  ◄── 260px (expanded) / 72px (collapsed) ──►

13.2.2 Navigation Menu Items

#	Icon	Label	Route	Badge Type	Required Permission
1	`LayoutDashboard`	Dashboard	`/dashboard`	None	Any
2	`Video`	Live View	`/live`	Online camera count	`cameras:view`
3	`Bell`	Alert Center	`/alerts`	Pending alert count	`alerts:view`
4	`ScanEye`	Detections	`/detections`	None	`cameras:view`
5	`Users`	Person Gallery	`/persons`	Total person count	`persons:view`
6	`UserQuestion`	Unknown Review	`/unknowns`	Queue count	`persons:view`
7	`ClockAlert`	Suspicious Activity	`/timeline`	None	`alerts:view`
8	`Search`	Search	`/search`	None	Any
9	`ShieldAlert`	Watchlists	`/watchlists`	None	`watchlists:view`
10	`Sparkles`	AI Vibe Settings	`/settings/ai`	None	`ai_settings:view`
11	`Brain`	Training Review	`/training`	Pending suggestions	`ai_settings:view`
12	`Activity`	System Health	`/health`	Status dot (green/yellow/red)	`system:view`
13	`Settings`	Settings	`/settings`	None	Admin functions

Settings Submenu:

#	Icon	Label	Route	Required Permission
13a	`Camera`	Camera Management	`/settings/cameras`	`cameras:manage`
13b	`HardDrive`	Retention & Storage	`/settings/storage`	`storage:manage`
13c	`UserCog`	Admin Users	`/settings/users`	`users:manage`
13d	`BellRing`	Notification Settings	`/settings/notifications`	`notifications:manage`

13.2.3 Top Bar

Element	Position	Content	Update Frequency
Logo + Brand	Left	Sentinel AI logo + text	Static
Current Time	Center-Right	`HH:MM:SS` live clock	Every second
Alert Badge	Right	Bell icon with red count badge	On alert change
User Menu	Far right	Avatar + dropdown menu	Static

User Menu Dropdown:

Item	Action
Profile	Navigate to user profile
Preferences	Theme, timezone, notification preferences
Keyboard Shortcuts	Show shortcut reference modal
Help & Documentation	Open help center
Logout	End session (clears all tokens)

13.3 Page Descriptions

13.3.1 Page 1: Login (`/login`)

The login page is the entry point to the system. It is designed for quick, secure access with minimal friction.

Feature	Specification
Layout	Centered card on dark background
Logo	Sentinel AI logo (large) centered above form
Fields	Username/email (text input), Password (password input with show/hide toggle)
Remember me	Checkbox — "Keep me signed in for 7 days"
Submit	"Sign In" button — full width, accent blue
MFA step	Appears after successful password; 6-digit TOTP input with auto-focus
Error states	Inline validation; shake animation on error
Footer	"v2.3.1" version number, copyright, privacy policy link
Security	Rate limiting (5 attempts / 15 min), CAPTCHA after 3 failures
Redirect	After login, redirect to originally requested URL (or Dashboard)
Session	JWT access token (15 min) + refresh token cookie (7 days)

13.3.2 Page 2: Dashboard (`/dashboard`)

The Dashboard is the primary landing page providing at-a-glance situational awareness.

┌──────────────────────────────────────────────────────────────────────────────┐
│  Dashboard                                          [Refresh] [Date Range] │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌──────────────┐ │
│  │  📹 8/8       │ │  🔔 12        │ │  👥 47        │ │  ✓ Healthy  │ │
│  │  Cameras      │ │  Alerts Today │ │  Persons      │ │  System     │ │
│  │  Online       │ │  3 Critical   │ │  Detected     │ │  All Good   │ │
│  └────────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │
│                                                                              │
│  ┌────────────────────────────────────────┐ ┌──────────────────────────┐   │
│  │  Alert Distribution (Last 24 Hours)   │ │  Recent Alerts           │   │
│  │                                        │ │                          │   │
│  │  8 ┤          ██                       │ │  🔴 CAM-01  Unknown     │   │
│  │  6 ┤    ██    ██  ██                   │ │     14:32 — Entrance    │   │
│  │  4 ┤    ██ ██ ██  ██ ██                │ │  🟡 CAM-03  Watchlist   │   │
│  │  2 ┤ ██ ██ ██ ██  ██ ██ ██             │ │     13:15 — Parking     │   │
│  │  0 ┼────┬────┬────┬────┬────┬────┬──  │ │  🟠 CAM-05  System      │   │
│  │     00  04  08  12  16  20           │ │     12:08 — Storage 90% │   │
│  │                                        │ │                          │   │
│  └────────────────────────────────────────┘ └──────────────────────────┘   │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────┐     │
│  │  Camera Status Grid (2x4)                                        │     │
│  │                                                                    │     │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │     │
│  │  │ CAM-01  │ │ CAM-02  │ │ CAM-03  │ │ CAM-04  │           │     │
│  │  │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │           │     │
│  │  │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│           │     │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘           │     │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │     │
│  │  │ CAM-05  │ │ CAM-06  │ │ CAM-07  │ │ CAM-08  │           │     │
│  │  │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │           │     │
│  │  │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│           │     │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘           │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────┐     │
│  │  Activity Feed                                                    │     │
│  │  14:32 — Unknown person detected at CAM-01 (Entrance)            │     │
│  │  14:15 — Watchlist match: John Smith at CAM-03 (Parking)         │     │
│  │  13:58 — Operator Alice acknowledged alert #ALT-2847             │     │
│  │  13:42 — Camera CAM-05 stream reconnected                        │     │
│  │  13:30 — Daily training completed: 3 new face clusters          │     │
│  └──────────────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘

Dashboard Components:

Component	Refresh Rate	Description
Stat cards	30 seconds	Active cameras, alerts today, persons detected, system health
Alert distribution chart	5 minutes	Bar chart showing alerts by hour for last 24 hours
Recent alerts card	30 seconds	Last 5 alerts with severity badge, camera, timestamp
Camera status grid	30 seconds	2x4 grid of all 8 cameras with live thumbnail and status dot
Activity feed	Real-time (WebSocket)	Recent system events — detections, alerts, operator actions

13.3.3 Page 3: Live Camera View (`/live`)

The live view is the primary monitoring interface, showing real-time streams from all 8 cameras.

Feature	Specification
Default layout	2x4 grid (8 cameras)
Layout options	1x1 (single), 2x2 (4 cameras), 2x4 (8 cameras), 4x4 (16 cameras for future scaling)
Stream format	HLS (HTTP Live Streaming) with WebRTC fallback for lower latency
Per-camera overlay	Camera name, status dot, expand button, snapshot button
Grid controls	Play all / Pause all, Refresh all streams, Layout selector
Camera states	Loading (spinner), Playing, Paused, Error (retry button), Offline (gray placeholder)
Fullscreen	Click any camera to expand; press `F` to toggle fullscreen for focused camera
Camera switching	Press `1`-`8` to focus camera by number
Snapshot	Press `S` or click camera snapshot button to capture current frame
Recording indicator	Red pulsing dot on cameras actively recording
Alert overlay	Flashing border on camera that triggered recent alert

13.3.4 Page 4: Alert Center (`/alerts`)

The Alert Center provides comprehensive alert management with filtering, batch actions, and detailed investigation tools.

Feature	Specification
Filter bar	Date range picker, severity multi-select (Critical/High/Medium/Low/Info), camera multi-select, status filter (Pending/Acknowledged/Resolved/Ignored), type filter
Severity legend	Color-coded badges: Critical (red), High (orange), Medium (yellow), Low (blue), Info (gray)
Alert cards	Each card: thumbnail image, camera name, timestamp, severity badge, person name (if known), description, current status
Card actions	Acknowledge, Resolve, Ignore, View Details, Mark False Positive
Bulk actions	Checkbox selection; batch Acknowledge or Ignore
Sort options	Newest first (default), Oldest first, Severity (highest first), Camera name
Pagination	20 alerts per page; infinite scroll option
Empty state	"No alerts in the selected period" with illustration
Detail panel	Slide-out panel with full alert info: images, video clip, AI confidence, detection metadata, person profile link

13.3.5 Page 5: Recent Detections (`/detections`)

Shows all recent detection events with face thumbnails and recognition results.

Feature	Specification
Filter controls	Known/Unknown/All toggle, date range picker, camera selector, person name search
Detection cards	Face thumbnail + name (or "Unknown") + confidence percentage + camera name + timestamp + watchlist badge
Card click	Opens detail view with full-size image, sighting history for that person, camera info
Actions	"Name This Person" (unknowns), "View Profile" (known), "Add to Watchlist"
Confidence indicator	Visual bar showing confidence level; color-coded (green > 90%, yellow 70-90%, orange < 70%)
Grid layout	4 columns desktop, 3 tablet, 2 mobile
Auto-refresh	New detections appear at top without page reload (WebSocket)

13.3.6 Page 6: Person Gallery (`/persons`)

A browsable gallery of all known persons in the system.

Feature	Specification
Search bar	Full-text search across names, roles, departments, tags
Role filters	Employee / Visitor / Vendor / Contractor / Other — pill-style toggle buttons
Sort options	Name (A-Z), Last Seen (recent first), Sightings Count (highest first), Date Added (newest first)
Person cards	Face image, name, role badge, department, last seen timestamp, total sightings count
Grid layout	5 columns desktop (xl), 4 columns (lg), 3 columns (md), 2 columns (sm)
Pagination	50 persons per page
Actions	Click card → navigate to Person Profile; right-click context menu
Bulk actions	Select multiple for bulk add to watchlist
Empty state	"No persons found" with "Add your first person" CTA

13.3.7 Page 7: Unknown Persons Review (`/unknowns`)

The review queue for unidentifified persons — a critical workflow for building the person database.

Feature	Specification
Queue view	Cards of unknown person clusters (grouped by face similarity via DBSCAN)
Cluster card	Representative face image + cluster size (number of sightings) + first/last seen + cameras detected at + confidence range
Actions per cluster	Name This Person, Merge with Existing, Ignore Cluster, Mark as Reviewed
AI insight panel	Pattern suggestion: "Seen 5x at entrance between 08:00-09:00 — possibly employee"
Progress indicator	"23 unknown clusters remaining" with progress bar
Batch review	Keyboard navigation (arrow keys + Enter to select action) for rapid review
Empty state	"Great job! No unknown persons to review. All caught up!" with celebration animation
Reviewed history	Tab to view previously reviewed clusters

13.3.8 Page 8: Person Profile (`/persons/{id}`)

Detailed view of a single person's information, detection history, and management options.

Feature	Specification
Header	Name, role badge, status (Active/Inactive), action buttons (Edit, Delete, Add to Watchlist)
Photo gallery	Primary face photo (large) + additional reference photos in thumbnail grid below
Info panel	Department, employee ID, contact information, notes, tags, date added, added by
Sighting history	Timeline of all detections — timestamp, camera name, confidence, thumbnail image
Sighting stats	Total sightings, first seen, last seen, most common camera, most common time
Watchlist memberships	Which watchlists this person belongs to, with badge per watchlist
Activity log	Who created/edited the profile and when; full audit trail
Danger zone	Delete person (with confirmation dialog explaining consequences)

13.3.9 Page 9: Suspicious Activity Timeline (`/timeline`)

A timeline-based visualization of flagged events for pattern analysis.

Feature	Specification
Timeline view	Horizontal time axis with event markers positioned by timestamp
Event types	Unusual movement (orange), Loitering (yellow), Unauthorized access (red), Crowd gathering (purple)
Color coding	Each event type has a distinct color; severity affects marker size
Filters	Event type multi-select, camera selector, date range, severity threshold
Zoom levels	Hour view, Day view (default), Week view, Month view
Click marker	Opens detail panel with description, evidence images, AI reasoning, confidence
Density heatmap	Background shows detection density to identify high-activity periods

13.3.10 Page 10: Search (`/search`)

Global search across all data types in the system.

Feature	Specification
Search bar	Prominent centered search input with clear button
Category filters	Person, Camera, Event, Alert — toggle pills
Results grouping	Results grouped by category with section headers
Person search	Type name or upload a photo for face recognition similarity search
Camera search	By name, location, or status
Event search	By description, camera, person, or event type
Alert search	By ID, description, or camera
Keyboard shortcut	`/` (forward slash) focuses search from any page
Recent searches	Dropdown shows recent searches for quick access
Empty state	"No results found" with search tips

13.3.11 Page 11: Watchlists (`/watchlists`)

Management interface for watchlist categories and their members.

Feature	Specification
Watchlist cards	Name, icon (selected from preset), color, member count, alert settings summary
Create button	"+ New Watchlist" with modal: name, icon picker, color picker, alert configuration
Default watchlists	VIP (green), Blacklist (red), Authorized (blue), Temporary Access (yellow)
Card click	Opens watchlist detail with full member list
Member management	Add from gallery (search + select), remove member, bulk import via CSV
Alert settings	Per-watchlist: alert timing, severity override, notify groups, quiet hours override
Test button	"Test Alert" — sends test notification for this watchlist to verify configuration
Member table	Sortable by name, date added, added by, sightings count

13.3.12 Page 12: AI Vibe Settings (`/settings/ai`)

The AI Vibe Settings page presents AI configuration as friendly questions rather than technical parameters.

#	Setting	Question	Options	Description
1	Detection Sensitivity	"How carefully should the AI watch?"	Relaxed / Balanced / High / Maximum	Controls how aggressively the AI reports detections
2	Face Match Threshold	"How confident should the AI be before naming someone?"	Lenient / Normal / Strict / Very Strict	Lower = more matches but more false positives
3	Night Mode	"How should the AI behave at night?"	Off / Diminished / Active / Enhanced	Night-specific model and sensitivity adjustment
4	Evidence Capture	"What should be saved when someone is detected?"	Photo Only / Photo + 5s Clip / Photo + 10s Clip / Full Recording	Media stored per detection event
5	Alert Style	"When should alerts be sent?"	Silent / Digest / Normal / Urgent / Critical	Controls alert frequency and channels used
6	Learning Mode	"Should the AI learn from new sightings?"	Off / Review First / Auto-Learn Cautiously / Auto-Learn Aggressively	How unknown face clusters are handled
7	Privacy Mode	"How should privacy be handled?"	Full Recognition / Blur Unrecognized / Blur All Faces / Privacy Zones	Face processing and display privacy

Each setting control:

Segmented button group (pill-shaped options)
Selected option highlighted in accent blue
Brief description below updates on selection
Current value displayed as badge
Auto-save (no save button); toast confirms: "Detection Sensitivity updated to High"
Expand toggle reveals internal numerical values (Admin permission required)

Advanced Mode (Admin only): When expanded, each control shows the internal parameter values:

Setting	Option	Internal Value
Detection Sensitivity	Relaxed	Confidence threshold: 0.85, NMS: 0.5
Detection Sensitivity	Balanced	Confidence threshold: 0.70, NMS: 0.45
Detection Sensitivity	High	Confidence threshold: 0.55, NMS: 0.4
Detection Sensitivity	Maximum	Confidence threshold: 0.40, NMS: 0.35
Face Match Threshold	Lenient	Similarity threshold: 0.60
Face Match Threshold	Normal	Similarity threshold: 0.70
Face Match Threshold	Strict	Similarity threshold: 0.80
Face Match Threshold	Very Strict	Similarity threshold: 0.90

13.3.13 Page 13: Training Review (`/training`)

Interface for reviewing AI-suggested face clusters and approving them for model training.

Feature	Specification
Suggestion cards	Face cluster the AI is uncertain about — multiple face images + AI confidence + reason for suggestion
Card layout	Grid of face thumbnails + confidence bar + suggestion reason ("Seen 8x at different cameras, high confidence match")
Actions per suggestion	Approve (add to training data), Reject (not a valid cluster), Merge with Existing Person
Batch actions	Select multiple suggestions for bulk Approve/Reject
Queue status	"12 suggestions pending review" with progress bar
Filter	By confidence level, camera, date range
History	Tab showing previously reviewed suggestions with outcome
Training metrics	Model accuracy trend, training data count, last training time

13.3.14 Page 14: System Health (`/health`)

Real-time system health monitoring dashboard.

Feature	Specification
Status overview	Large status indicator: All Systems Operational (green) / Degraded (yellow) / Critical (red)
Service cards	Per-service status card: Video Capture, AI Inference, Database, Storage, Notifications, VPN
Per-service metrics	Status dot, uptime percentage, last restart, CPU, memory
Camera health table	All 8 cameras: stream status, FPS, bitrate, last seen, error count
System metrics	CPU usage (%), memory usage (%), disk usage (%), network I/O
Logs viewer	Recent system logs with severity filtering (DEBUG/INFO/WARNING/ERROR/CRITICAL); tail -f style auto-scroll
Refresh	Auto-refresh every 30 seconds; manual refresh button
Historical view	Toggle to show metrics history (last 1h, 6h, 24h, 7d)

13.3.15 Page 15: Notifications Settings (`/settings/notifications`)

Configuration interface for the notification system.

Feature	Specification
Recipient groups	Add/edit/delete groups; each group has name, Telegram chat IDs, WhatsApp numbers, alert preferences
Routing rules	Visual rule builder with drag-and-drop condition blocks (camera, person, role, event_type, zone, time, day, severity, watchlist)
Quiet hours	Schedule builder with day-of-week checkboxes, time range pickers, timezone selector
Template editor	Edit message templates per alert type; live preview with sample data; variable reference panel
Delivery status	Real-time view showing notification delivery states (pending/sent/delivered/failed)
Test buttons	"Send Test Alert" per channel to verify configuration
DLQ viewer	Dead letter queue entries with retry/discard actions

13.3.16 Page 16: Admin Users (`/settings/users`)

User management interface for administrators.

Feature	Specification
Users table	Username, email, role badge, status (Active/Inactive), last login, MFA status, actions menu
Add user	Modal: username, email, role selector, password (or send invite link), MFA toggle
Edit user	Role, status, force password change on next login, reset 2FA, session revocation
User activity log	Login history (timestamp, IP, device), actions taken, settings changed
Bulk actions	Deactivate multiple accounts simultaneously
Filter	By role, status, last login date range
Sort	By username, role, last login, created date
Pagination	25 users per page

13.3.17 Page 17: Camera Management (`/settings/cameras`)

Configuration interface for camera setup and zone management.

Feature	Specification
Camera cards	Name, status (Online/Offline/Disabled), IP/connection string, stream info (resolution, FPS), action buttons (Edit, Test, Disable)
Add camera	Modal: name, location, stream URL, credentials, channel number, description
Edit camera	All camera properties; test connection button
Zone configuration	Interactive polygon drawing on live camera feed; zone name, color, sensitivity, type (Entrance/Restricted/Detection/Ignore)
Stream settings	Resolution (720p/1080p), frame rate (5/10/15/25/30 FPS), codec (H.264/H.265), night mode toggle
Recording settings	Continuous/event-triggered, retention policy, storage location
Camera ordering	Drag to reorder cameras in grid layout

13.3.18 Page 18: Retention & Storage (`/settings/storage`)

Storage management and retention policy configuration.

Feature	Specification
Storage overview	Donut chart showing usage breakdown: Video recordings, Detection snapshots, Training data, System logs, Free space
Numerical values	Total capacity / Used / Free; warning at > 80% (yellow), critical at > 95% (red)
Retention policies	Dropdown per category: 7 days / 14 days / 30 days / 60 days / 90 days / 180 days / 365 days / Forever
Auto-cleanup	Enable toggle + schedule time picker (daily at 03:00 default)
Actions	"Save Settings", "Run Cleanup Now" (with confirmation), "Export Storage Report"
Growth projection	Estimated days until full based on current growth rate
Storage alerts	Configure alert thresholds (80% warning, 90% high, 95% critical)

13.4 Key User Flows

13.4.1 Flow 1: Daily Operator — Monitor & Respond

┌──────────────────────────────────────────────────────────────────────────────┐
│                  FLOW 1: DAILY OPERATOR (Monitor & Respond)                   │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: LOGIN                                                              │
│  ──────────────                                                              │
│  Enter username → Enter password → MFA code (if enabled)                    │
│  → Redirect to Dashboard                                                     │
│                                                                              │
│  STEP 2: DASHBOARD REVIEW (~30 seconds)                                     │
│  ─────────────────────────────────────                                       │
│  Glance at stat cards:                                                       │
│    ├─ All 8 cameras online? ✓                                               │
│    ├─ Any critical alerts pending? (red badge)                               │
│    ├─ Any unknown persons detected?                                          │
│    └─ System health OK?                                                      │
│                                                                              │
│  If critical alert visible:                                                  │
│    → Click alert card → Go to Alert Center                                  │
│  If no urgent alerts:                                                        │
│    → Click "Live View" in sidebar                                           │
│                                                                              │
│  STEP 3: LIVE CAMERA MONITORING (ongoing)                                   │
│  ─────────────────────────────────────                                       │
│  View 2x4 grid of all cameras                                               │
│  Observe feeds for anomalies                                                │
│                                                                              │
│  When alert toast appears (top-right):                                       │
│    → Toast slides in with sound notification                                 │
│    → Click toast to view alert details                                       │
│                                                                              │
│  STEP 4: ALERT RESPONSE                                                     │
│  ──────────────────                                                          │
│  Click alert toast OR navigate to Alert Center                               │
│  Review alert card:                                                          │
│    ├─ Thumbnail image                                                        │
│    ├─ Camera name, timestamp                                                 │
│    ├─ Alert type (unknown person, watchlist match, etc.)                     │
│    └─ Severity level                                                         │
│                                                                              │
│  Click "View Details" for full information:                                  │
│    ├─ Full-size image / video clip                                           │
│    ├─ AI confidence score                                                    │
│    ├─ Detection metadata (bounding box, zone)                                │
│    └─ Person profile link (if known)                                         │
│                                                                              │
│  DECISION:                                                                   │
│    ├─ False detection → Click "Mark as False Positive"                       │
│    ├─ Legitimate alert → Click "Acknowledge" or "Resolve"                    │
│    ├─ Unknown person → Click "Name This Person"                              │
│    ├─ Needs escalation → Click "Escalate"                                    │
│    └─ Need live view → Click "View Live" to jump to camera                   │
│                                                                              │
│  STEP 5: RETURN TO MONITORING                                               │
│  ────────────────────────────                                                │
│  After handling alert, return to Live View                                   │
│  Continue monitoring cycle                                                   │
│                                                                              │
│  STEP 6: END OF SHIFT                                                       │
│  ──────────────────                                                          │
│  Review unacknowledged alerts (if any)                                       │
│  Check System Health page                                                    │
│  Hand over to next operator (verbal + note any pending issues)               │
│  Click user menu → Logout                                                    │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.2 Flow 2: New Person Onboarding

┌──────────────────────────────────────────────────────────────────────────────┐
│                    FLOW 2: NEW PERSON ONBOARDING                              │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRIGGER: System detects unknown person → Alert created → Operator notified │
│                                                                              │
│  STEP 1: REVIEW DETECTION                                                    │
│  ────────────────────                                                        │
│  Navigate to "Recent Detections" via sidebar                                 │
│  Filter: "Unknown" (toggle button)                                           │
│  Click on unknown detection card                                             │
│                                                                              │
│  Detail view shows:                                                          │
│    ├─ Full-size face image                                                   │
│    ├─ Camera: CAM-01 (Entrance)                                              │
│    ├─ Timestamp: 2025-01-16 14:32:15                                        │
│    ├─ Confidence: 87.3%                                                      │
│    └─ AI note: "No matching person found in database"                        │
│                                                                              │
│  STEP 2: NAME THE PERSON                                                     │
│  ────────────────────                                                        │
│  Click "Name This Person" button                                             │
│  Modal dialog appears:                                                       │
│                                                                              │
│    ┌────────────────────────────────────┐                                   │
│    │  Name This Person                   │                                   │
│    │                                     │                                   │
│    │  Face: [thumbnail]                  │                                   │
│    │                                     │                                   │
│    │  Full Name *     [____________]     │                                   │
│    │  Role *          [Employee ▼]       │                                   │
│    │  Department      [____________]     │                                   │
│    │  Employee ID     [____________]     │                                   │
│    │  Notes           [____________]     │                                   │
│    │  Tags            [____________]     │                                   │
│    │                                     │                                   │
│    │  Similar existing persons:          │                                   │
│    │  [No similar persons found]         │                                   │
│    │                                     │                                   │
│    │  [Cancel]  [Save & Create Profile]  │                                   │
│    └────────────────────────────────────┘                                   │
│                                                                              │
│  STEP 3: SIMILARITY CHECK                                                    │
│  ────────────────────                                                        │
│  System searches for similar existing persons                                │
│  If matches found: display side-by-side comparison                           │
│    → Option to merge with existing person instead of creating new            │
│  If no matches: proceed with creation                                        │
│                                                                              │
│  STEP 4: SAVE PROFILE                                                        │
│  ──────────────                                                              │
│  Click "Save & Create Profile"                                               │
│  Toast notification: "Profile created for [Name]"                            │
│  Detection card updates with person name                                     │
│  Person now appears in Person Gallery                                        │
│                                                                              │
│  STEP 5: ADD TRAINING IMAGES (Optional)                                      │
│  ────────────────────────────────────                                        │
│  Navigate to Person Profile                                                  │
│  Click "Upload Reference Photos"                                             │
│  Select additional clear face images                                         │
│  System queues for model retraining                                          │
│  Toast: "3 new training images added. Model will retrain automatically."     │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.3 Flow 3: Unknown Person Review Queue

┌──────────────────────────────────────────────────────────────────────────────┐
│                  FLOW 3: UNKNOWN PERSON REVIEW QUEUE                          │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: OPEN REVIEW QUEUE                                                   │
│  ──────────────────────                                                      │
│  Sidebar → "Unknown Persons Review"                                          │
│  View: Grid of unknown person cluster cards                                  │
│  Header: "23 unknown clusters remaining"                                     │
│                                                                              │
│  STEP 2: SELECT CLUSTER                                                      │
│  ────────────────                                                            │
│  Click on a cluster card to expand                                           │
│  Shows:                                                                      │
│    ├─ Representative face (largest)                                          │
│    ├─ Gallery of all face instances in cluster                               │
│    ├─ Sighting history (camera, time, count)                                 │
│    ├─ AI pattern insight: "Seen 5x at entrance between 08:00-09:00"         │
│    └─ Confidence distribution graph                                          │
│                                                                              │
│  STEP 3: MAKE DECISION                                                       │
│  ────────────────                                                            │
│  Options:                                                                    │
│    ├─ [Name This Person] → Enter details → Create new profile               │
│    ├─ [Merge with Existing] → Search/select person → Confirm merge          │
│    ├─ [Ignore Cluster] → "False detection / not a person" → Remove         │
│    └─ [Mark Reviewed] → "Unsure, keep in queue for later"                   │
│                                                                              │
│  STEP 4: QUEUE UPDATES                                                       │
│  ────────────────                                                            │
│  Processed item removed from queue                                           │
│  Toast confirms action: "Cluster marked as [Name]. 22 remaining."            │
│  Auto-advance to next cluster (optional)                                     │
│  Keyboard shortcut: Right arrow → next cluster                               │
│                                                                              │
│  STEP 5: CONTINUE REVIEW                                                     │
│  ────────────────                                                            │
│  Process all clusters or stop and resume later                               │
│  Queue persists across sessions                                              │
│  New clusters automatically added as detected                                │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.4 Flow 4: AI Settings Adjustment

┌──────────────────────────────────────────────────────────────────────────────┐
│                    FLOW 4: AI SETTINGS ADJUSTMENT                             │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: NAVIGATE TO AI VIBE SETTINGS                                        │
│  ────────────────────────────────────                                        │
│  Sidebar → "AI Vibe Settings" (Sparkles icon)                                │
│  View: Scrollable page with 7 setting sections                               │
│                                                                              │
│  STEP 2: ADJUST DETECTION SENSITIVITY                                        │
│  ────────────────────────────────                                            │
│  Section: "How carefully should the AI watch?"                               │
│  Current: [Relaxed] [Balanced] [High] [Maximum]                              │
│  Change: Click "High"                                                        │
│  Description updates:                                                        │
│    "High: The AI will catch almost everything.                               │
│     Expect more alerts, including some false positives."                     │
│  Toast: "Detection Sensitivity updated to High"                              │
│  Change takes effect immediately                                             │
│                                                                              │
│  STEP 3: ADJUST ALERT STYLE                                                  │
│  ────────────────────                                                        │
│  Section: "When should alerts be sent?"                                      │
│  Current: [Silent] [Digest] [Normal] [Urgent] [Critical]                     │
│  Change: Click "Critical"                                                    │
│  Description updates:                                                        │
│    "Critical: Only truly important events trigger alerts.                    │
│     All other activity is logged but not alerted."                           │
│  Toast: "Alert Style updated to Critical"                                    │
│                                                                              │
│  STEP 4: REVIEW ADVANCED (Admin only)                                        │
│  ────────────────────────────────────                                        │
│  Click "Expand" on Advanced Settings                                         │
│  Shows internal values:                                                      │
│    Detection Sensitivity: High                                               │
│    └─ Confidence Threshold: 0.55                                             │
│    └─ NMS Threshold: 0.40                                                    │
│    └─ Model: yolo11m.onnx                                                    │
│  Admin can directly edit numerical values                                    │
│                                                                              │
│  STEP 5: DONE                                                                │
│  ────────                                                                    │
│  All changes auto-saved                                                      │
│  Return to monitoring — changes effective immediately                        │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.5 Flow 5: Watchlist Alert Configuration

┌──────────────────────────────────────────────────────────────────────────────┐
│                 FLOW 5: WATCHLIST ALERT CONFIGURATION                         │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: NAVIGATE TO WATCHLISTS                                              │
│  ────────────────────────────                                                │
│  Sidebar → "Watchlists"                                                      │
│  View: Grid of existing watchlist cards                                      │
│  Default: VIP, Blacklist, Authorized, Temporary Access                      │
│                                                                              │
│  STEP 2: CREATE NEW WATCHLIST (Optional)                                     │
│  ────────────────────────────────────                                        │
│  Click "+ New Watchlist"                                                     │
│  Modal:                                                                      │
│    Name: [Security Escort Required]                                          │
│    Icon: [🛡️] (icon picker)                                                 │
│    Color: [Orange] (color picker)                                            │
│    Description: [People who require security escort]                         │
│    Click "Create"                                                            │
│  New watchlist card appears in grid                                          │
│                                                                              │
│  STEP 3: ADD MEMBERS                                                         │
│  ────────────────                                                            │
│  Click on watchlist card                                                     │
│  Click "Add from Gallery"                                                    │
│  Search/select persons to add:                                               │
│    [☑] John Doe                                                             │
│    [☑] Jane Smith                                                           │
│    [☐] Bob Johnson (not selected)                                            │
│  Click "Add to Watchlist"                                                    │
│  Toast: "2 persons added to Security Escort Required"                        │
│                                                                              │
│  STEP 4: CONFIGURE ALERTS                                                    │
│  ────────────────────                                                        │
│  Click "Settings" tab on watchlist detail                                    │
│  Configure:                                                                  │
│    Alert Timing:    [☑] Immediate    [☐] Delayed (___ min)                  │
│    Severity:        [☐] Inherit    [☑] Force Critical                       │
│    Notify Groups:   [☑] Security Team    [☐] Management                     │
│    Media:           [☑] Image    [☑] Video                                  │
│    Quiet Hours:     [☐] Respect global    [☑] Always alert                  │
│    Escalation:      [☑] Enable escalation (5/10/20 min)                     │
│  Click "Save"                                                                │
│                                                                              │
│  STEP 5: TEST                                                                │
│  ────────                                                                    │
│  Click "Test Alert" button                                                   │
│  System sends test alert through configured channels                         │
│  Verify: Telegram message received ✓                                         │
│  Verify: WhatsApp message received ✓                                         │
│  Watchlist is now active and monitoring                                      │
└──────────────────────────────────────────────────────────────────────────────┘

13.5 Component Specifications

13.5.1 Camera Feed Component

State	Visual	Interaction
Loading	Centered spinner overlay, camera name visible	None — wait for stream
Playing	Live stream active, recording dot if applicable	Click to focus, hover for controls
Paused	Stream paused, large play button overlay	Click to resume
Error	Error icon + "Connection failed" + Retry button	Click Retry to reconnect
Offline	Gray placeholder with camera icon + "Offline"	Shows last online timestamp
Disabled	Grayed out with "Disabled" badge	No stream attempted

Prop	Type	Required	Default	Description
`cameraId`	`string`	Yes	—	Unique camera identifier (e.g., "cam-01")
`name`	`string`	Yes	—	Display name shown as overlay
`streamUrl`	`string`	Yes	—	HLS or WebRTC stream URL
`status`	`'online' \| 'offline' \| 'reconnecting' \| 'disabled'`	Yes	—	Current camera status
`layout`	`'grid' \| 'fullscreen'`	No	`'grid'`	Current layout mode
`quality`	`'auto' \| 'hd' \| 'sd'`	No	`'auto'`	Stream quality preference
`showControls`	`boolean`	No	`true`	Show overlay controls
`onFocus`	`(id: string) => void`	No	—	Callback when camera is focused
`onSnapshot`	`(id: string) => void`	No	—	Callback when snapshot is taken

13.5.2 Alert Card Component

Prop	Type	Required	Description
`id`	`string`	Yes	Alert unique identifier
`severity`	`'critical' \| 'high' \| 'medium' \| 'low' \| 'info'`	Yes	Alert severity level
`type`	`string`	Yes	Alert type classification
`cameraName`	`string`	Yes	Source camera display name
`timestamp`	`Date`	Yes	When the alert occurred
`thumbnail`	`string`	No	URL to thumbnail image
`personName`	`string`	No	Identified person name (if known)
`status`	`'pending' \| 'acknowledged' \| 'resolved' \| 'ignored'`	Yes	Current alert status
`onAcknowledge`	`() => void`	No	Acknowledge callback
`onResolve`	`() => void`	No	Resolve callback
`onIgnore`	`() => void`	No	Ignore callback
`onViewDetails`	`() => void`	No	View details callback

13.5.3 Stat Card Component

Prop	Type	Required	Description
`title`	`string`	Yes	Card label (e.g., "Cameras Online")
`value`	`string \| number`	Yes	Main displayed value (e.g., "8/8")
`icon`	`LucideIcon`	Yes	Icon component from Lucide React
`color`	`'green' \| 'blue' \| 'orange' \| 'red' \| 'purple'`	No	Color theme (default: blue)
`trend`	`number`	No	Percentage change from previous period
`subtitle`	`string`	No	Secondary text below value
`href`	`string`	No	Navigation link (e.g., to detail page)

13.6 Toast Notification System

Type	Icon	Color	Duration	Use Case
Success	Check circle	Green (`#10B981`)	3 seconds	Action completed successfully
Error	X circle	Red (`#EF4444`)	5 seconds (or persistent)	Action failed; may require user attention
Warning	Alert triangle	Orange (`#F59E0B`)	4 seconds	Non-critical issue; may need attention
Info	Info circle	Blue (`#3B82F6`)	3 seconds	Informational message
Alert	Bell	Red (`#EF4444`)	Persistent (until dismissed)	Critical alert notification

Toast behavior:

Appears in top-right corner
Stacks up to 5 toasts simultaneously
Older toasts pushed down when new ones arrive
Hovering pauses auto-dismiss timer
Click to dismiss immediately
Swipe right to dismiss (mobile)

13.7 Modal System

Size	Width	Use Case
Small	400px	Confirmations, simple forms
Medium (default)	560px	Standard forms, detail views
Large	800px	Complex forms, image viewers
Fullscreen	100%	Camera fullscreen, large data tables

Modal behavior:

Backdrop click to close (configurable)
Escape key to close (configurable)
Focus trap — Tab cycles within modal
Return focus to trigger element on close
Body scroll locked when modal open
Enter key submits primary action (forms)

13.8 Responsive Behavior

Breakpoint	Width	Layout Changes
`xs`	< 576px	Single column; stacked layouts; bottom tab bar; hamburger menu; camera grid 1x1 or 2x1
`sm`	576-767px	Two column layouts; sidebar as overlay drawer; camera grid 2x2
`md`	768-991px	Collapsed sidebar (72px); filters as drawer; camera grid 2x3; 3-column person gallery
`lg`	992-1199px	Sidebar expanded (260px); full desktop layout; 4-column person gallery
`xl`	1200-1399px	Full desktop layout; 5-column person gallery; 2x4 camera grid
`xxl`	1400px+	Max content width 1400px centered; all features visible

13.9 Keyboard Shortcuts

Shortcut	Context	Action
`?`	Global	Show keyboard shortcuts reference modal
`/`	Global	Focus global search bar
`Escape`	Global	Close modal / exit fullscreen / deselect
`F`	Live View	Toggle fullscreen on focused camera
`S`	Live View	Take snapshot of focused camera
`1-8`	Live View	Focus camera 1-8
`Space`	Live View	Pause/play focused camera stream
`A`	Alert Center	Acknowledge selected alert
`R`	Alert Center	Resolve selected alert
`N`	Detections / Unknowns	Name unknown person
`→`	Unknown Review	Next cluster
`←`	Unknown Review	Previous cluster
`Ctrl+K`	Global	Command palette (quick navigation)
`Ctrl+Shift+A`	Global	Acknowledge most recent alert
`M`	Live View	Toggle mute on camera audio
`+` / `-`	Timeline	Zoom in / zoom out

13.10 Animation Guidelines

Animation	Duration	Easing	Description
Page transition	200ms	`ease-out`	Fade in on route change
Modal open	250ms	`cubic-bezier(0.16, 1, 0.3, 1)`	Scale up + fade in
Modal close	150ms	`ease-in`	Scale down + fade out
Sidebar toggle	250ms	`ease-in-out`	Width transition 260px ↔ 72px
Toast slide-in	300ms	`ease-out`	Slide from right + fade in
Toast fade-out	200ms	`ease-in`	Fade out before removal
Card hover lift	150ms	`ease`	Subtle translateY(-2px) + shadow increase
Segmented slider	200ms	`ease`	Sliding background between options
Pulse (recording)	2s	`ease-in-out` infinite	Red dot opacity oscillation
Stats update	500ms	`ease`	Number count-up animation
Skeleton shimmer	1.5s	`linear infinite`	Shimmer gradient sweep
Alert flash	1s	`ease-out`	Border flash on camera with new alert
Camera focus	300ms	`ease-out`	Expand to fullscreen
Dropdown open	150ms	`ease-out`	Fade + slight translateY
Tooltip	100ms	`ease`	Fade in on hover

13.11 Technology Stack

Layer	Technology	Version	Purpose
Framework	React	18.x	UI library
Meta-framework	Next.js	14.x	SSR, routing, API routes
Language	TypeScript	5.x	Type safety
Styling	Tailwind CSS	3.x	Utility-first CSS
Theme	CSS Custom Properties	—	Dark mode via `dark` class
UI Components	shadcn/ui	latest	Base component library
Icons	Lucide React	latest	Consistent icon set
State Management	Zustand	4.x	Lightweight global state
Data Fetching	TanStack Query (React Query)	5.x	Server state management
Real-time	Socket.IO Client	4.x	WebSocket for live updates
Video	hls.js	latest	HLS stream playback
Video (WebRTC)	native	—	WebRTC stream fallback
Charts	Recharts	2.x	Data visualization
Date/Time	date-fns	2.x	Date formatting and manipulation
Forms	React Hook Form	7.x	Form state management
Validation	Zod	3.x	Schema validation
Zone Drawing	SVG + native events	—	Polygon drawing on camera feed
Testing	Vitest	1.x	Unit testing
E2E Testing	Playwright	1.x	Browser automation testing
Build	Next.js built-in	—	Production optimization

Section 14: Deployment Plan

14.1 Deployment Architecture Overview

The deployment architecture spans two physical environments: AWS cloud for centralized services and an Intel NUC edge gateway at the surveillance site. Both environments are connected via an encrypted WireGuard VPN tunnel. All deployments use containerization (Docker/Kubernetes) with GitOps-based continuous delivery.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         DEPLOYMENT ARCHITECTURE                               │
│                                                                              │
│    ┌────────────────────────────────────────────────────────────────────┐   │
│    │                        AWS CLOUD                                   │   │
│    │                                                                    │   │
│    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │   │
│    │  │ Route 53 │──▶  ALB     │──▶  EKS     │──▶  App Pods       │ │   │
│    │  │   DNS    │  │ TLS 1.3  │  │ Cluster  │  │  (FastAPI/Next)  │ │   │
│    │  └──────────┘  └──────────┘  └────┬─────┘  └──────────────────┘ │   │
│    │                                    │                              │   │
│    │  ┌──────────┐  ┌──────────┐  ┌────┴─────┐  ┌──────────────────┐ │   │
│    │  │   S3     │  │   RDS    │  │ ElastiCache│  │  MSK Kafka       │ │   │
│    │  │  Media   │  │ Postgres │  │  Redis    │  │  (Event Bus)     │ │   │
│    │  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘ │   │
│    │                                                                    │   │
│    │  ┌──────────────────────────────────────────────────────────────┐ │   │
│    │  │  WireGuard VPN Gateway (EC2)  ←────→  Edge Gateway          │ │   │
│    │  │  UDP 51820                    Tunnel   (Intel NUC, Site)     │ │   │
│    │  └──────────────────────────────────────────────────────────────┘ │   │
│    └────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│    ┌────────────────────────────────────────────────────────────────────┐   │
│    │                        EDGE SITE                                   │   │
│    │                                                                    │   │
│    │  ┌─────────────────────────────────────────────────────────────┐  │   │
│    │  │              Intel NUC (Ubuntu Server 22.04)                │  │   │
│    │  │                                                             │  │   │
│    │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │   │
│    │  │  │ Video Capture│  │ AI Inference │  │   MinIO      │    │  │   │
│    │  │  │  (RTSP/FFmpeg)│  │ (YOLO/Face) │  │  (Storage)   │    │  │   │
│    │  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │   │
│    │  │                                                             │  │   │
│    │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │   │
│    │  │  │    Redis     │  │  WireGuard   │  │ Node Exporter│    │  │   │
│    │  │  │   (Cache)    │  │   (VPN)      │  │  (Metrics)   │    │  │   │
│    │  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │   │
│    │  │                                                             │  │   │
│    │  └─────────────────────────────────────────────────────────────┘  │   │
│    │                              │                                     │   │
│    │                    ┌─────────┴──────────┐                         │   │
│    │                    │  Camera LAN         │                         │   │
│    │                    │  CP PLUS DVR        │                         │   │
│    │                    │  192.168.29.200:554 │                         │   │
│    │                    │  (8 channels)       │                         │   │
│    │                    └─────────────────────┘                         │   │
│    └────────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────────┘

14.2 Cloud Deployment (AWS EKS)

14.2.1 EKS Cluster Configuration

Parameter	Value	Notes
Kubernetes version	1.28+	Latest stable at deployment
Control plane	Managed by AWS	Multi-AZ availability
Node group type	Managed (EC2)	t3.large for general, g4dn.xlarge for GPU
CNI	Amazon VPC CNI	Native VPC networking for pods
Ingress controller	NGINX Ingress + cert-manager	TLS termination at ALB
GitOps	ArgoCD	Declarative continuous deployment
Pod identity	IRSA (IAM Roles for Service Accounts)	No long-term AWS credentials

14.2.2 Cloud Service Resources

Service	AWS Service	Instance/Tier	HA Mode	Monthly Est.
Orchestration	Amazon EKS	Managed control plane	Multi-AZ	$73
Application nodes	EC2 (t3.large)	3 nodes (on-demand)	Multi-AZ spread	$200
GPU nodes	EC2 (g4dn.xlarge)	1 node (spot preferred)	Single + auto-recovery	$350
Database	RDS PostgreSQL 15	db.r6g.xlarge Multi-AZ	Multi-AZ with failover	$520
Cache	ElastiCache Redis	cache.r6g.large (2 shards)	Cluster mode	$260
Message bus	Amazon MSK	kafka.m5.large (3 brokers)	Multi-AZ	$350
Object storage	S3	Standard + IA + Glacier	Cross-region replication	$200
Load balancer	ALB	Application Load Balancer	Multi-AZ	$25
DNS	Route 53	Hosted zone + health checks	Global	$15
VPN gateway	EC2 (t3.micro)	WireGuard endpoint	Single (monitor for HA)	$15
Secrets	AWS Secrets Manager	Vault integration	Multi-AZ	$10
Monitoring	CloudWatch	Logs + metrics + alarms	Multi-AZ	$50
Total				~$2,088/month

14.3 Edge Deployment (Intel NUC)

14.3.1 Edge Hardware Specification

Component	Specification	Notes
Device	Intel NUC 13 Pro (or equivalent)	Fanless preferred for reliability
CPU	Intel Core i7-1360P (12 cores, 16 threads)	Sufficient for 8 streams + AI inference
RAM	32 GB DDR4-3200 (2x16 GB)	Dual channel for memory bandwidth
Storage (OS)	500 GB NVMe SSD (Samsung 980 Pro or equivalent)	Fast boot and application loading
Storage (Data)	2 TB NVMe SSD (Samsung 990 Pro or equivalent)	7-day local recording buffer
Network	Intel i226-V 2.5 GbE (dual port)	Dual NIC for WAN + LAN separation
WiFi	Disabled in BIOS	Security — no wireless
Bluetooth	Disabled in BIOS	Security — no wireless
TPM	TPM 2.0 enabled	For LUKS auto-unseal
OS	Ubuntu Server 22.04 LTS (minimal install)	No desktop environment

14.3.2 Edge Docker Compose Configuration

version: "3.8"

services:
  # RTSP stream capture and frame extraction
  video-capture:
    image: sentinel/surveillance-video-capture:v2.3.1
    restart: unless-stopped
    network_mode: host
    environment:
      - DVR_IP=192.168.29.200
      - DVR_PORT=554
      - NUM_CHANNELS=8
      - FRAME_EXTRACT_FPS=1
      - RECORDING_SEGMENT_SEC=10
      - REDIS_HOST=localhost
      - REDIS_PORT=6379
      - MINIO_ENDPOINT=localhost:9000
    volumes:
      - /data/frames:/app/frames
      - /data/recordings:/app/recordings
      - ./secrets:/run/secrets:ro
    depends_on:
      - redis
      - minio
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"

  # AI inference service (lightweight edge models)
  ai-inference:
    image: sentinel/surveillance-ai-inference:edge-v2.3.1
    restart: unless-stopped
    runtime: nvidia  # If NVIDIA GPU available; fallback to CPU
    environment:
      - MODEL_PATH=/models
      - REDIS_HOST=localhost
      - REDIS_PORT=6379
      - MINIO_ENDPOINT=localhost:9000
      - INFERENCE_BATCH_SIZE=8
      - CONFIDENCE_THRESHOLD=0.7
      - NMS_THRESHOLD=0.45
    volumes:
      - ./models:/models:ro
      - /data/frames:/app/frames:ro
      - ./secrets:/run/secrets:ro
    depends_on:
      - redis
    deploy:
      resources:
        limits:
          cpus: '6.0'
          memory: 8G
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"

  # Local object storage (S3-compatible)
  minio:
    image: minio/minio:RELEASE.2024-latest
    restart: unless-stopped
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - /data/minio:/data
    environment:
      - MINIO_ROOT_USER_FILE=/run/secrets/minio_user
      - MINIO_ROOT_PASSWORD_FILE=/run/secrets/minio_password
    secrets:
      - minio_user
      - minio_password
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G

  # Local cache and Pub/Sub
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: >
      redis-server
      --requirepass """
      --appendonly yes
      --maxmemory 512mb
      --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    ports:
      - "127.0.0.1:6379:6379"
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M

  # WireGuard VPN client
  wireguard:
    image: linuxserver/wireguard:latest
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    environment:
      - PUID=1000
      - PGID=1000
    volumes:
      - ./wireguard-config:/config
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
    deploy:
      resources:
        limits:
          cpus: '0.25'
          memory: 64M

  # Metrics exporter for Prometheus
  node-exporter:
    image: prom/node-exporter:latest
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

volumes:
  redis_data:
    driver: local

secrets:
  minio_user:
    file: ./secrets/minio_user.txt
  minio_password:
    file: ./secrets/minio_password.txt

14.4 Configuration and Environment Variables

14.4.1 Environment Structure

Environment	URL Pattern	Data	Purpose
Development	`*.dev.internal`	Synthetic test data	Feature development, local testing
Staging	`*.staging.example.com`	Anonymized production-like data	Integration testing, UAT
Production	`*.example.com`	Real operational data	Live surveillance operations

14.4.2 Required Environment Variables

# ─── APPLICATION ───
APP_ENV=production                    # dev | staging | production
APP_NAME="Sentinel AI Surveillance"
APP_VERSION=2.3.1
APP_DEBUG=false
APP_SECRET_KEY=<random-256-bit-key>   # Used for session signing
LOG_LEVEL=INFO                         # DEBUG | INFO | WARNING | ERROR | CRITICAL

# ─── SERVER ───
API_HOST=0.0.0.0
API_PORT=8080
WORKERS=4                              # Uvicorn worker processes
TIMEZONE=Asia/Kolkata

# ─── DATABASE ───
DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/surveillance
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10
DB_POOL_TIMEOUT=30
DB_ECHO=false                          # Set true for SQL logging (dev only)

# ─── REDIS ───
REDIS_URL=redis://:password@redis-endpoint:6379/0
REDIS_POOL_SIZE=50
REDIS_SOCKET_TIMEOUT=5

# ─── OBJECT STORAGE (S3 or MinIO) ───
STORAGE_TYPE=s3                        # s3 | minio
STORAGE_ENDPOINT=s3.amazonaws.com
STORAGE_BUCKET=sentinel-surveillance-media
STORAGE_REGION=ap-south-1
STORAGE_ACCESS_KEY=<access-key>
STORAGE_SECRET_KEY=<secret-key>
STORAGE_SECURE=true
STORAGE_URL_EXPIRY=300                 # Signed URL expiry in seconds

# ─── DVR / CAMERA CONNECTION ───
DVR_IP=192.168.29.200
DVR_PORT=554
DVR_USERNAME=admin
DVR_PASSWORD=<dvr-password>
DVR_CHANNELS=8
DVR_STREAM_QUALITY=0                   # 0=main (high), 1=sub (low)
DVR_RTSP_TEMPLATE="rtsp://{user}:{pass}@{ip}:{port}/user={user}&password={pass}&channel={ch}&stream={quality}.sdp?"

# ─── AI MODELS ───
MODEL_PATH=/models
HUMAN_DETECTION_MODEL=yolo11m.onnx
FACE_DETECTION_MODEL=scrfd_10g_bnkps.onnx
FACE_RECOGNITION_MODEL=arcface_r100.onnx
CONFIDENCE_THRESHOLD=0.7
NMS_THRESHOLD=0.45
FACE_MATCH_THRESHOLD=0.70
UNKNOWN_CLUSTER_EPS=0.35
UNKNOWN_CLUSTER_MIN_SAMPLES=3

# ─── TELEGRAM NOTIFICATIONS ───
TELEGRAM_ENABLED=true
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_WEBHOOK_URL=https://api.example.com/webhooks/telegram
TELEGRAM_WEBHOOK_SECRET=<webhook-secret>
TELEGRAM_ADMIN_CHAT_ID=<admin-chat-id>

# ─── WHATSAPP NOTIFICATIONS ───
WHATSAPP_ENABLED=true
WHATSAPP_API_VERSION=v18.0
WHATSAPP_ACCESS_TOKEN=<access-token>
WHATSAPP_PHONE_NUMBER_ID=<phone-number-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN=<verify-token>
WHATSAPP_BUSINESS_ACCOUNT_ID=<business-account-id>

# ─── VPN ───
VPN_ENABLED=true
VPN_TYPE=wireguard
VPN_ENDPOINT=wg.example.com:51820
VPN_PUBLIC_KEY=<server-public-key>
VPN_PRIVATE_KEY=<client-private-key>
VPN_PRESHARED_KEY=<preshared-key>
VPN_ALLOWED_IPS=10.100.0.0/16
VPN_KEEPALIVE=25

# ─── AUTHENTICATION ───
JWT_SECRET_KEY=<ecdsa-private-key-pem>
JWT_PUBLIC_KEY=<ecdsa-public-key-pem>
JWT_ALGORITHM=ES256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=15
JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
MFA_REQUIRED_ROLES=super_admin,admin
MFA_ISSUER="Sentinel AI Surveillance"

# ─── MONITORING ───
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
GRAFANA_URL=https://grafana.example.com
SENTRY_DSN=<sentry-dsn>
HEALTH_CHECK_INTERVAL=30

# ─── RETENTION ───
RECORDING_RETENTION_DAYS=90
DETECTION_SNAPSHOT_RETENTION_DAYS=90
EVENT_LOG_RETENTION_DAYS=365
AUDIT_LOG_RETENTION_DAYS=365
TRAINING_DATA_RETENTION_DAYS=365
AUTO_CLEANUP_ENABLED=true
AUTO_CLEANUP_HOUR=3                    # 3:00 AM daily

# ─── SECURITY ───
CORS_ALLOWED_ORIGINS=https://app.example.com,https://staging.example.com
CSP_REPORT_ONLY=false
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
SESSION_MAX_AGE_HOURS=8
SESSION_IDLE_TIMEOUT_MINUTES=30

14.5 Rollout Stages

14.5.1 Stage 1: Foundation (Weeks 1-4)

Objective: Infrastructure, VPN connectivity, and core data layer operational.

Week	Tasks	Deliverables	Success Criteria
1	AWS account setup, VPC creation (3 AZs), EKS cluster deployment, IAM roles	Cloud network ready	VPC flow logs active; EKS nodes Ready
1	RDS PostgreSQL Multi-AZ, ElastiCache Redis cluster	Data layer ready	DB connections successful; replication lag < 1s
2	S3 buckets (media, backups, logs), lifecycle policies, CORS	Storage ready	Upload/download test successful
2	WireGuard VPN gateway (EC2), key generation, firewall rules	VPN endpoint ready	Tunnel handshake successful
3	Edge gateway: OS install, hardening, Docker, WireGuard client	Edge device ready	Edge connects to cloud over VPN
3	Edge services: MinIO, Redis, video capture container	Edge services running	RTSP streams reachable from edge
4	Database schema migration (29 tables), seed data (admin user, 8 cameras)	Database ready	Schema matches design; seed data present
4	Monitoring: Prometheus, Grafana, CloudWatch dashboards	Monitoring active	Dashboards accessible; metrics flowing
4	End-to-end connectivity test	Full pipeline verified	Video from DVR → Edge → Cloud (VPN) → S3

Milestone M1 — Infrastructure Ready (End of Week 4):

All cloud services deployed and healthy
VPN tunnel established and stable (< 100ms latency)
Edge gateway online, all Docker services running
Database schema deployed with migrations and seed data
All 8 camera streams reachable from edge
Basic monitoring and alerting in place

14.5.2 Stage 2: Core AI Pipeline (Weeks 5-8)

Objective: Video ingestion, AI detection, face recognition, and basic API operational.

Week	Tasks	Deliverables	Success Criteria
5	Video capture service: RTSP ingestion, frame extraction, segment recording	Stream ingestion working	All 8 streams connected; FPS > 5 per stream
5	Kafka topic setup, stream ingestion producer	Event streaming ready	Frames published to Kafka
6	AI Inference Service: YOLO (human detection), SCRFD (face detection)	Detection models running	mAP > 0.90 for human detection
6	Detection event storage in PostgreSQL	Detection database working	Events queryable via API
7	ArcFace (face recognition) model deployment, embedding generation, pgvector	Face recognition working	Rank-1 accuracy > 95% on test set
7	Person matching logic: known person lookup, unknown person handling	Person matching working	Correct identification in < 100ms
8	FastAPI core: health endpoints, camera endpoints, detection endpoints	API core functional	All endpoints return correct data
8	Basic authentication: login, JWT token issuance, password hashing	Auth working	Login → token → authenticated requests

Milestone M2 — AI Pipeline Operational (End of Week 8):

All 8 camera streams ingesting at target FPS
Human detection, face detection, and face recognition operational
Detection events stored and queryable
Person matching (known/unknown) working
Basic REST API serving authenticated requests
End-to-end: Camera → Detection → Database → API

14.5.3 Stage 3: Application Layer (Weeks 9-12)

Objective: Web dashboard, alerting, notifications, and person management operational.

Week	Tasks	Deliverables	Success Criteria
9	Next.js project setup, design system, Tailwind config, dark theme	Frontend foundation	Login page renders correctly
9	Authentication flow: login form, MFA input, token management, logout	Auth UI working	Full login → dashboard flow
10	Dashboard page: stat cards, alert chart, camera grid, activity feed	Dashboard live	All widgets populated with real data
10	Live camera view: HLS player, grid layout, fullscreen, camera controls	Live view working	All 8 streams visible, playable
10	Alert engine: rule evaluation, severity assignment, routing	Alert generation working	Alerts created within 5s of detection
11	Telegram integration: bot setup, message templates, inline keyboards	Telegram alerts working	Test alert received in Telegram
11	WhatsApp integration: template messages, session messages	WhatsApp alerts working	Test template message received
11	Person management: gallery, profile, CRUD, face matching display	Person management working	Person created, detected, viewed
12	Unknown review queue: cluster display, naming, merging, ignore	Review queue working	Unknown person processed through queue
12	Watchlists: CRUD, member management, alert routing	Watchlists working	Watchlist match triggers correct alert
12	WebSocket: real-time alert feed, dashboard updates	Real-time working	Alerts appear without page refresh

Milestone M3 — Application Live (End of Week 12):

Web dashboard accessible with live camera feeds
Alerts generated and delivered via Telegram and WhatsApp
Person management (add, view, match, review unknowns) working
Watchlist alerts functional with correct routing
Real-time updates via WebSocket
All RBAC permissions enforced in UI

14.5.4 Stage 4: Intelligence (Weeks 13-16)

Objective: Night mode, training pipeline, self-learning, and advanced features.

Week	Tasks	Deliverables	Success Criteria
13	Night mode: low-light model training, deployment, auto-scheduling	Night mode working	Detection mAP > 0.75 in < 5 lux conditions
13	AI Vibe Settings page: all 7 controls, auto-save, advanced mode	Settings page working	All controls functional, changes effective immediately
14	Training pipeline: data collection, model training job, evaluation	Training pipeline working	Model accuracy improves with new training data
14	Model versioning: A/B testing, shadow mode, promotion workflow	Model management working	Blue/green model deployment
15	Self-learning service: automatic unknown clustering, suggestions	Self-learning working	Suggestions generated for unknown clusters
15	Privacy mode: face blurring, privacy zones, per-camera settings	Privacy mode working	Faces blurred according to settings
15	Suspicious activity detection: pattern rules, anomaly scoring	Advanced alerts working	Anomaly alerts generated for unusual behavior
16	Search service: face similarity search, text search, filters	Search working	Results returned in < 500ms
16	System health dashboard: service cards, metrics, logs viewer	Health dashboard working	All systems visible with status

Milestone M4 — Intelligence Features Live (End of Week 16):

Night mode detection operational
Training pipeline runs and improves models
Self-learning suggestions appear in review queue
Privacy modes configurable and effective
Suspicious activity alerts functional
Search returns results in acceptable time
All AI Vibe Settings controls operational

14.5.5 Stage 5: Hardening (Weeks 17-20)

Objective: Security hardening, testing framework, operations readiness, production go-live.

Week	Tasks	Deliverables	Success Criteria
17	Security penetration test (external vendor)	Pen test report	All critical/high findings addressed
17	SAST/DAST scans, dependency vulnerability scan	Scan reports	Zero critical vulnerabilities
17	Self-test framework: 21 test suites, scheduling, reporting	Testing framework deployed	All test suites execute successfully
18	Backup configuration: pgBackRest, S3 sync, restore procedures	Backup system ready	Restore test successful
18	DR environment setup, failover procedures, quarterly drill schedule	DR ready	DR failover test: RTO < 1 hour
18	Incident response runbooks: 5 documented procedures	Runbooks complete	All scenarios documented
19	Load testing: 8/16/32/64 camera simulation	Load test report	System handles 64 cameras within SLA
19	Performance tuning: database queries, API response times, cache optimization	Tuning complete	p95 API response < 200ms
19	Operations team training: system overview, runbooks, escalation procedures	Team trained	Training sign-off complete
19	98-item go-live checklist review	Checklist complete	All items pass
20	Final readiness review, security sign-off, management approval	Go approval	All stakeholders sign off
20	Production DNS cutover, monitoring, 72-hour stability period	Production live	72-hour stability confirmed

Milestone M5 — Production Go-Live (End of Week 20):

Security audit complete with all findings addressed
Self-test framework passing (score >= 85)
DR tested and verified (RTO < 1 hour, RPO < 15 minutes)
Operations team trained and runbooks reviewed
Load test passed at 64-camera target
98-item go-live checklist: all items complete
System stable in production for 72+ hours

14.6 Kubernetes Manifests Overview

Resource Type	Name	Purpose	Namespace
Deployment	`api`	FastAPI application server (3 replicas)	sentinel
Deployment	`ai-inference`	AI model serving (GPU node)	sentinel
Deployment	`video-capture`	RTSP stream ingestion (edge)	sentinel
Deployment	`alert-engine`	Alert generation and routing	sentinel
Deployment	`notification-service`	Telegram/WhatsApp delivery	sentinel
Deployment	`frontend`	Next.js web application	sentinel
Deployment	`websocket`	WebSocket real-time server	sentinel
StatefulSet	`redis`	Session cache and Pub/Sub	sentinel-data
Service	`api-service`	Internal API access (ClusterIP)	sentinel
Service	`ai-service`	AI inference access (ClusterIP)	sentinel
Service	`frontend-service`	Web app access (ClusterIP)	sentinel
Ingress	`sentinel-ingress`	External HTTPS routing	sentinel
ConfigMap	`app-config`	Application configuration	sentinel
ConfigMap	`nginx-config`	Ingress/Nginx configuration	sentinel
Secret	`app-secrets`	Encrypted secrets (Vault agent injector)	sentinel
Secret	`tls-cert`	TLS certificate (cert-manager)	sentinel
HPA	`api-hpa`	Auto-scale API: 3-10 replicas	sentinel
HPA	`ai-hpa`	Auto-scale AI: 1-4 replicas	sentinel
NetworkPolicy	`default-deny`	Block all unauthorized traffic	sentinel
NetworkPolicy	`allow-api`	API ingress rules	sentinel
NetworkPolicy	`allow-ai`	AI service communication rules	sentinel
PodDisruptionBudget	`api-pdb`	Ensure 2 API pods minimum	sentinel
ServiceMonitor	`api-metrics`	Prometheus scraping config	sentinel-monitoring
PrometheusRule	`alert-rules`	Alerting rules for platform	sentinel-monitoring

14.7 VPN Setup Procedure

14.7.1 Cloud VPN Gateway Setup

#!/bin/bash
# cloud-vpn-setup.sh — Run on cloud VPN EC2 instance

# 1. System preparation
sudo apt update && sudo apt install -y wireguard wireguard-tools iptables-persistent

# 2. Generate WireGuard keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey

# 3. Create WireGuard configuration
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.1/24
ListenPort = 51820
PrivateKey = <CLOUD_PRIVATE_KEY>
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

# Edge Gateway peer
[Peer]
PublicKey = <EDGE_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
AllowedIPs = 10.200.0.2/32, 192.168.29.0/24
PersistentKeepalive = 25
EOF

# 4. Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf

# 5. Start WireGuard
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# 6. Verify
sudo wg show
ping -c 3 10.200.0.2

14.7.2 Edge VPN Client Setup

#!/bin/bash
# edge-vpn-setup.sh — Run on Intel NUC edge gateway

# 1. Install WireGuard
sudo apt update && sudo apt install -y wireguard wireguard-tools

# 2. Generate keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey

# 3. Configure
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.2/32
PrivateKey = <EDGE_PRIVATE_KEY>
DNS = 10.100.0.2

[Peer]
PublicKey = <CLOUD_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
Endpoint = <CLOUD_PUBLIC_IP>:51820
AllowedIPs = 10.100.0.0/16, 10.200.0.0/24
PersistentKeepalive = 25
EOF

# 4. Start and enable
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# 5. Verify connectivity
ping -c 3 10.200.0.1        # Cloud VPN gateway
ping -c 3 10.100.0.2        # Cloud DNS/internal service

14.8 Database Initialization

14.8.1 Migration Strategy

Database migrations are managed with Alembic (SQLAlchemy) and executed as Kubernetes init containers before application startup:

initContainers:
  - name: db-migrations
    image: sentinel/surveillance-api:v2.3.1
    command: ["alembic", "upgrade", "head"]
    env:
      - name: DATABASE_URL
        valueFrom:
          secretKeyRef:
            name: db-credentials
            key: url
    resources:
      limits:
        cpu: "500m"
        memory: "256Mi"
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false

Migration Rules:

Rule	Implementation
Backward compatibility	All migrations must be backward-compatible within a release
Destructive changes	2-phase deployment: add new column in release N, drop old in release N+1
Automatic execution	Migrations run automatically before application startup via init container
Health check	Migration status exposed via `/health/ready` endpoint
Rollback	`alembic downgrade` script available for emergency rollback
Version tracking	`alembic_version` table tracks current schema version

14.9 SSL Certificate Setup

14.9.1 cert-manager Configuration

# ClusterIssuer for Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: nginx
        selector: {}

---
# Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: sentinel-tls
  namespace: sentinel
spec:
  secretName: sentinel-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - app.example.com
    - api.example.com
    - ws.example.com
  usages:
    - digital signature
    - key encipherment
  privateKey:
    algorithm: ECDSA
    size: 256

Section 15: Testing Plan

15.1 Testing Strategy Overview

The testing strategy encompasses five levels of testing, from isolated unit tests to full system end-to-end validation. The goal is comprehensive coverage of all functional and non-functional requirements with automated execution in CI/CD.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         TESTING PYRAMID                                       │
│                                                                              │
│                        ┌─────────┐                                          │
│                        │  E2E    │  ~20 tests                               │
│                        │ Tests   │  Full system scenarios                   │
│                        ├─────────┤                                          │
│                      ┌─────────────┐                                        │
│                      │ Integration │  ~100 tests                            │
│                      │   Tests     │  Service-to-service                    │
│                      ├─────────────┤                                        │
│                   ┌───────────────────┐                                    │
│                   │    Unit Tests      │  ~300 tests                        │
│                   │  (Components, AI)  │  Isolated functions                  │
│                   └───────────────────┘                                    │
└──────────────────────────────────────────────────────────────────────────────┘

15.2 Unit Testing Strategy

Component	Framework	Coverage Target	Mock Strategy	CI Execution
API backend (Python)	pytest + pytest-asyncio	85%+	pytest-mock, moto (AWS),responses (HTTP)	Every commit
Frontend (React/TS)	Vitest + React Testing Library	80%+	MSW (API mocking), jsdom	Every commit
AI models (Python)	pytest	70%+ (model logic)	Mock inference engine, fixture data	Every commit
Database models	pytest + asyncpg	80%+	testcontainers-postgres	Every commit
Notification adapters	pytest	80%+	responses library for HTTP mocking	Every commit

15.3 Integration Testing

Integration Pair	Scope	Framework	Strategy
API + Database	CRUD operations, transactions, query performance	pytest + testcontainers	PostgreSQL container per test run
API + Redis	Caching, Pub/Sub, session storage	pytest + Redis container	Redis container per test run
API + S3/MinIO	Media upload, download, presigned URLs	pytest + LocalStack	S3 mock via LocalStack
Alert Engine + Router	Rule evaluation, routing decisions	pytest	Mock channel adapters
Telegram Adapter	Message formatting, API calls, error handling	pytest + responses	HTTP request/response mocking
WhatsApp Adapter	Template rendering, API calls, error handling	pytest + responses	HTTP request/response mocking
Auth + Database	User CRUD, password hashing, session management	pytest + testcontainers	Full auth flow testing

15.4 System Testing (End-to-End)

#	Scenario	Steps	Expected Result
1	Full detection pipeline	Trigger motion → verify detection stored → verify alert created → verify notification sent	All components process correctly within SLA
2	Person recognition flow	Known person walks by → verify face detected → verify identity matched → verify no false alert	Correct person identified with > 95% confidence
3	Unknown person flow	Unknown person detected → verify "Unknown" classification → verify review queue updated	Unknown queued for operator review within 5 seconds
4	Watchlist alert (blacklist)	Blacklist person detected → verify immediate critical alert → verify notification to security team	Alert within 5 seconds, correct severity, all channels
5	Night mode detection	Low-light detection scenario → verify night model used → verify detection confidence acceptable	Detection mAP > 0.75 in < 5 lux conditions
6	Privacy mode	Enable privacy mode → verify face blurring in live view → verify no face recognition occurs	Faces blurred, no biometric processing
7	Alert escalation	Create critical alert → don't acknowledge → verify escalation levels trigger at correct times	Level 1 at 5min, Level 2 at 10min, Level 3 at 20min
8	VPN failure recovery	Disconnect VPN → verify local operation continues → reconnect VPN → verify sync resumes	No data loss; automatic recovery
9	Database failover	Trigger RDS failover → verify application continues → verify no data loss	< 60 second downtime; zero data loss
10	Complete user flow	Login → view dashboard → view live cameras → receive alert → acknowledge → logout	All pages load; all actions succeed

15.5 Load Testing Plan

Scenario	Camera Count	Duration	Users	Target Metrics
Baseline	8	1 hour	5 concurrent	Establish baseline metrics
Scale-up	16	2 hours	10 concurrent	Verify 2x capacity; p95 latency < 500ms
Scale-up	32	2 hours	20 concurrent	Verify 4x capacity; auto-scaling triggers
Stress test	64	1 hour	50 concurrent	Find breaking point; error rate < 1%
Sustained	8	24 hours	5 concurrent	Memory leak detection; stability verification
Spike test	8→64→8	30 minutes	Ramp up/down	Verify auto-scaling response time

15.6 Failover Testing

Test Case	Description	Pass Criteria
API pod failure	Kill 1 API pod	Traffic routed to healthy pods; zero failed requests
Database failover	Trigger RDS Multi-AZ failover	< 60s downtime; no data loss; connections re-established
Redis failure	Restart Redis cluster	Session recovery; cache warm within 5 minutes
VPN tunnel failure	Disconnect WireGuard	Auto-reconnect within 30s; streams resume
Edge gateway restart	Reboot edge device	Full recovery within 5 minutes; all streams reconnect
AI inference failure	Kill inference container	Queue buffers frames; recovery < 30s; no frame loss
Complete cloud failure	Simulate region outage	DR test: RTO < 1 hour; RPO < 15 minutes

15.7 Security Testing

Test Type	Tool	Scope	Frequency	Gate
Static Analysis (SAST)	Bandit, Semgrep	Source code	Every commit	Block on HIGH/CRITICAL
Dependency Scan	Snyk, pip-audit	All dependencies	Daily	Block on HIGH/CRITICAL
Container Image Scan	Trivy	Docker images	Every build	Block on HIGH/CRITICAL
Dynamic Analysis (DAST)	OWASP ZAP	Running application	Weekly	Review findings
Penetration Test	External vendor	Full stack	Quarterly	All findings addressed
TLS Configuration	testssl.sh	SSL/TLS endpoints	Monthly	Grade A+ required
API Security	OWASP ZAP API scan	All REST endpoints	Weekly	Review findings
Secrets Scan	TruffleHog, GitLeaks	Git repositories	Every commit	Block on findings

15.8 AI Pipeline Testing

Test	Description	Target Metric	Test Data
Human detection accuracy	Evaluate YOLO on held-out test set	mAP > 0.90	1000 labeled frames
Face detection accuracy	Evaluate SCRFD on test set	Detection rate > 0.85	500 labeled face images
Face recognition accuracy	Evaluate ArcFace on test set	Rank-1 accuracy > 0.95	200 person gallery
False positive rate	Measure incorrect person matches	< 2%	Simulated impostor set
False negative rate	Measure missed person matches	< 5%	Known person test set
Inference latency	Measure end-to-end processing	< 200ms per frame (p95)	Benchmark suite
Night mode accuracy	Test low-light detection	mAP > 0.75	200 low-light frames
Batch processing	Test throughput at batch size 8	> 40 FPS aggregate	Benchmark suite

15.9 Notification Testing

Test	Description	Verification
Telegram delivery	Send test alert via Telegram	Message received; formatting correct; buttons functional
WhatsApp delivery	Send test alert via WhatsApp	Template message received; parameters correct
Routing rules	Trigger alert matching specific rule	Delivered to correct recipients only
Quiet hours	Send alert during quiet hours	Non-critical suppressed; critical bypasses
Escalation	Leave critical alert unacknowledged	Escalation notifications at correct thresholds
Rate limiting	Trigger burst of 50 alerts	Rate limiting applied; no provider blocks
Media attachments	Send alert with image + video	Media processed to correct size; delivered
Delivery tracking	Verify webhook receipts	Status updated correctly in dashboard
DLQ handling	Force 5 failed deliveries	Messages moved to DLQ; admin notification sent

15.10 Test Environments

Environment	Data	Purpose	Pipeline Stage
Local dev	Synthetic (10 cameras, 100 persons)	Developer testing	Pre-commit
CI	Synthetic (generated per run)	Automated test execution	Every commit
Staging	Anonymized production-like (8 cameras, 500 persons)	Pre-production validation	Post-merge
Load test	Generated (64 cameras, 10,000 persons)	Performance testing	Weekly schedule
DR	Minimal (2 cameras, 10 persons)	Disaster recovery validation	Quarterly

15.11 CI/CD Pipeline for Testing

┌─────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│  Push   │──▶│   Lint   │──▶│  Unit    │──▶│  SAST    │──▶│  Build   │
│         │   │ + Format │   │  Tests   │   │ + Scan   │   │  Images  │
└─────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘
                                                                   │
                                                                   ▼
┌─────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│ Deploy  │◀──│   E2E    │◀──│   DAST   │◀──│  Image   │◀──│  Push    │
│Staging  │   │  Tests   │   │  Scan    │   │  Scan    │   │ Registry │
└─────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘
      │
      ▼
┌─────────┐
│ Deploy  │ (Manual approval required)
│   Prod  │
└─────────┘

Stage	Tools	Coverage Gate	Duration
Lint + Format	ruff, black, mypy, ESLint, Prettier	Zero lint errors	30s
Unit Tests	pytest, Vitest	80%+ coverage	3 min
SAST + Secrets	Bandit, Semgrep, TruffleHog	No HIGH/CRITICAL	2 min
Build	Docker buildx	Build succeeds	5 min
Image Scan	Trivy, Snyk	No HIGH/CRITICAL CVEs	2 min
DAST	OWASP ZAP	No HIGH/CRITICAL findings	10 min
E2E Tests	Playwright, pytest	All scenarios pass	8 min
Deploy Staging	ArgoCD	Health checks pass	3 min

Section 16: Self-Test Framework

16.1 Framework Architecture

The Self-Test Framework is a standalone FastAPI service that continuously validates platform health and readiness through automated test execution.

┌──────────────────────────────────────────────────────────────────────────────┐
│                     SELF-TEST FRAMEWORK ARCHITECTURE                          │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                    TEST ORCHESTRATOR (FastAPI)                       │   │
│   │                                                                      │   │
│   │   Scheduler        Queue         Executor       Aggregator          │   │
│   │   (cron/APScheduler) │           (asyncio)        │                 │   │
│   │        │            │            │               │                  │   │
│   │   15m health ◄─────┼────────────┼───────────────┤                  │   │
│   │   Daily 3am  ◄─────┼────────────┼───────────────┤                  │   │
│   │   On-demand  ◄─────┼────────────┼───────────────┤                  │   │
│   │                      │            │               │                  │   │
│   │                      ▼            ▼               ▼                  │   │
│   │              ┌─────────────────────────────────────┐                 │   │
│   │              │         Reporter + Storage           │                 │   │
│   │              │  PostgreSQL + S3 (evidence)          │                 │   │
│   │              └─────────────────────────────────────┘                 │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                     21 TEST SUITES (170+ CASES)                      │   │
│   │                                                                      │   │
│   │   Infrastructure (TC-01..04)    │   Core AI (TC-05..10)             │   │
│   │   Alerts (TC-11..13)            │   Search (TC-14)                  │   │
│   │   Training (TC-15)              │   Security (TC-16..17)            │   │
│   │   Resilience (TC-18..21)        │                                   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────────┘

16.2 Test Suite Catalog (21 Suites)

Suite ID	Name	Tests	Priority	Description
TC-INF-01	DVR Connectivity	8	P0	RTSP handshake, stream access, credential validation
TC-INF-02	VPN Health	6	P0	Tunnel status, latency, packet loss, throughput
TC-INF-03	Database Health	8	P0	Connection pool, query performance, replication lag
TC-INF-04	Storage Health	7	P0	Disk space, read/write performance, object storage
TC-STR-05	Camera Stream Access	10	P0	All 8 channels streaming, FPS, bitrate verification
TC-STR-06	Live Streaming	6	P1	HLS stream delivery to browsers, latency check
TC-AI-07	Human Detection	12	P0	YOLO accuracy, confidence thresholds, edge cases
TC-AI-08	Face Detection	10	P0	SCRFD accuracy, face bounding box quality
TC-AI-09	Face Recognition	12	P0	ArcFace embeddings, person matching accuracy
TC-AI-10	Unknown Clustering	8	P1	Face grouping quality, similarity thresholds
TC-ALT-11	Alert Generation	10	P0	Rule evaluation, severity assignment, routing
TC-ALT-12	Telegram Delivery	8	P1	Message delivery, formatting, media, error handling
TC-ALT-13	WhatsApp Delivery	8	P1	Template delivery, session messages, error handling
TC-CAP-14	Image Capture	6	P1	Frame extraction quality, storage, metadata
TC-CAP-15	Video Clip Capture	6	P1	Clip generation, compression, storage
TC-SEA-16	Search Retrieval	8	P1	Face search accuracy, text search, performance
TC-TRA-17	Training Workflow	8	P2	Model retraining, evaluation, deployment
TC-SEC-18	Admin Login Security	10	P0	Auth flow, MFA, session management, brute force
TC-SEC-19	RBAC Enforcement	12	P0	Permission checks, role-based access, resource-level
TC-RES-20	Restart Recovery	8	P1	Service restart, state recovery, data integrity
TC-RES-21	Load Handling	7	P1	8/16/32/64 camera simulation, throughput

Total: 21 suites, 170 test cases

16.3 Test Scheduling

Schedule	Suites	Trigger	Notification
Every 15 minutes	Infrastructure (TC-01..04)	APScheduler cron	Alert on failure
Daily at 03:00 UTC	All 21 suites	APScheduler cron	Full report via email + Slack
On-demand	Any subset	Admin API call	Immediate report
Post-deployment	Critical path (TC-01,05,07,11,18)	CI/CD webhook	Pipeline gate
Weekly (Sunday 04:00)	Full suite + extended load tests	APScheduler cron	Weekly report

16.4 Production Readiness Scoring

Base Score: 100.0

Deductions:
  P0 failure:  -20.0 points each
  P1 failure:  -10.0 points each
  P2 failure:  -5.0 points each
  P3 failure:  -2.0 points each

Minimum score: 0.0
Maximum score: 100.0

Verdict	Score Range	Meaning	Recommended Action
GO	95.0 - 100.0	All critical systems healthy	Proceed with confidence
GO WITH CAVEATS	85.0 - 94.9	Minor issues, non-critical	Proceed with monitoring plan
CONDITIONAL GO	70.0 - 84.9	Significant issues	Fix P1 issues before deployment
NO-GO	0.0 - 69.9	Critical failures	Do not deploy; address P0 issues first

16.5 Report Generation

Format	Use Case	Generation	Retention
JSON API	Programmatic consumption, CI/CD integration	Immediate	90 days
HTML Dashboard	Web-based viewing, trend analysis	~5 seconds	90 days
PDF Report	Email distribution, compliance archiving	~30 seconds	1 year

Section 17: Sample Self-Test Report

17.1 Report Header

================================================================================
           SENTINEL AI SURVEILLANCE PLATFORM — SELF-TEST REPORT
================================================================================

Report ID:          STR-20250116-030015
Generated:          2025-01-16 03:00:15 UTC
Environment:        production
Version:            v2.3.1
Triggered By:       Scheduled (Daily 3:00 AM)
Duration:           18 minutes 42 seconds
Overall Status:     GO WITH CAVEATS

17.2 Executive Summary

Metric	Value
Verdict	GO WITH CAVEATS
Production Readiness Score	94.8 / 100
Total Test Cases	170
Passed	168 (98.8%)
Failed	2 (1.2%)
Skipped	0 (0.0%)
Previous Run Score	97.2 / 100
Score Change	-2.4 (downward)

Priority Breakdown:

Priority	Total	Passed	Failed	Pass Rate
P0 (Critical)	42	42	0	100.0%
P1 (High)	70	68	2	97.1%
P2 (Medium)	38	38	0	100.0%
P3 (Low)	20	20	0	100.0%

17.3 System Metrics at Test Time

Metric	Value	Status
Active Cameras	8 / 8	Online
Stream FPS (avg)	28.5	Normal
AI Inference Latency (p95)	42ms	Normal
Detection Rate (last hour)	47 events	Normal
Database Connections	18 / 100	Healthy
Storage Usage	67%	Healthy
VPN Latency	12ms	Excellent
API Response Time (p95)	78ms	Normal
Telegram Delivery Rate (24h)	99.2%	Healthy
WhatsApp Delivery Rate (24h)	99.8%	Healthy

17.4 Failed Test Cases

Failure 1: TC-ALT-12-004 — Telegram Media Group Delivery

Field	Value
Test Case	TC-ALT-12-004
Suite	Telegram Delivery (TC-ALT-12)
Priority	P1
Status	FAILED
Duration	12,450 ms
Severity	Medium

Description: Verify that media group (multiple images) is delivered correctly via Telegram when an alert contains multiple evidence images.

Expected Result: All 3 images delivered as a media group album within 10 seconds.

Actual Result: Only 2 of 3 images delivered. Third image failed with error: telegram_api_error: Request Entity Too Large (413). Image size after processing: 10.8 MB (exceeds Telegram's 10 MB per-image limit for media groups).

Root Cause: The media processing pipeline resizes images to 1280x720 but does not enforce a hard 10 MB per-image cap for Telegram media groups. The iterative quality reduction loop stops at quality 50 but can still produce files > 10 MB.

Recommended Fix: Add a hard size cap check after image processing. If image exceeds 10 MB after quality reduction to 50%, apply additional compression (reduce dimensions or use WebP format).

Workaround: Single-image delivery mode works correctly. Multi-image alerts temporarily deliver images individually.

Failure 2: TC-RES-20-006 — AI Inference Recovery After Simulated Crash

Field	Value
Test Case	TC-RES-20-006
Suite	Restart Recovery (TC-RES-20)
Priority	P1
Status	FAILED
Duration	65,200 ms
Severity	Medium

Description: Verify that the AI inference service recovers and resumes processing within 60 seconds after a simulated process crash.

Expected Result: AI inference pod restarts and resumes processing frames within 60 seconds for all 8 cameras.

Actual Result: Pod restarted successfully (18 seconds), but detection did not resume for Camera 3 and Camera 7. Other 6 cameras resumed within 45 seconds. Root cause: model warm-up process failed due to a race condition in GPU memory allocation during concurrent channel initialization.

Root Cause: All 8 channel processors attempt to load the face recognition model simultaneously. On resource-constrained edge hardware, this causes OOM for channels that lose the initialization race.

Recommended Fix: Implement shared model loading — load each model once and share across all channel processors. Add initialization semaphore.

Workaround: Manual restart of affected channel processors via admin API.

17.5 Trending (Last 14 Days)

Date	Score	Verdict	Notes
2025-01-02	96.5	GO	—
2025-01-03	98.2	GO	—
2025-01-04	97.1	GO	—
2025-01-05	98.8	GO	—
2025-01-06	97.5	GO	—
2025-01-07	98.2	GO	—
2025-01-08	96.8	GO	TC-RES-21 had 1 P3 failure
2025-01-09	97.2	GO	—
2025-01-10	98.2	GO	—
2025-01-11	97.5	GO	—
2025-01-12	98.2	GO	—
2025-01-13	97.2	GO	—
2025-01-14	98.2	GO	—
2025-01-15	97.2	GO	—
2025-01-16	94.8	GO WITH CAVEATS	2 P1 failures (see above)

17.6 Conclusion and Recommendations

Verdict: GO WITH CAVEATS

The Sentinel AI Surveillance Platform is operational and safe to use. All 42 P0 (Critical) test cases passed, confirming that core surveillance functions are working correctly.

Two P1 (High) priority issues were identified with documented workarounds. Both fixes are scheduled for v2.3.2.

Recommended Actions:

Address TC-ALT-12-004: Add aggressive compression for Telegram media group images
Address TC-RES-20-006: Implement shared model loading in AI inference service
Monitor Telegram multi-image alert delivery metrics (workaround active)
Monitor AI inference recovery metrics (manual restart documented in runbook)
Validate both fixes in next daily test run after v2.3.2 deployment

Section 18: Risks and Mitigations

18.1 Risk Register Summary

#	Category	Risk	Likelihood	Impact	Score	Mitigation	Owner
T1	Technical	DVR disk full (0 bytes free)	High	Critical	20	Auto-rotation at 85%; emergency cleanup; secondary storage	Platform
T2	Technical	AI false positives in low light	Medium	High	12	Night models; adjustable thresholds; operator review	AI Team
T3	Technical	Face rec accuracy with masks/angles	Medium	Medium	9	Multi-angle training; pose normalization	AI Team
T4	Technical	VPN tunnel instability	Medium	High	12	Auto-reconnect; local buffering; redundant endpoints	Platform
T5	Technical	DB performance at scale	Medium	Medium	9	Partitioning; read replicas; archiving	Platform
O1	Operational	Edge hardware failure	Medium	Critical	15	Cold spare; config backup; documented replacement	Operations
O2	Operational	Internet loss at edge site	Medium	High	12	Local storage buffer; 4G failover; local AI continues	Operations
O3	Operational	Operator training gaps	Medium	Medium	9	Training program; inline help; escalation procedures	Operations
O4	Operational	Alert fatigue	Medium	High	12	Escalation rules; alert grouping; severity routing	Operations
S1	Security	Biometric data breach	Low	Critical	10	AES-256-GCM; signed URLs; GDPR deletion; audit	Security
S2	Security	Unauthorized feed access	Low	Critical	10	RBAC; JWT; MFA; session binding; rate limiting	Security
S3	Security	Bot token compromise	Low	High	8	Vault encryption; 180-day rotation; IP allowlist	Security
A1	AI/ML	Model drift over time	Medium	High	12	Monthly evaluation; auto-monitoring; retraining	AI Team
A2	AI/ML	Training data poisoning	Low	Critical	10	Validation; multi-person review; audit trail	AI Team
A3	AI/ML	Demographic bias	Medium	High	12	Diverse data; fairness audits; human-in-loop	AI Team
A4	AI/ML	Edge hardware insufficient	Medium	High	12	CPU models; cloud offloading; GPU upgrade path	AI Team
I1	Integration	DVR firmware incompatibility	Medium	High	12	RTSP compliance check; firmware validation	Engineering
C1	Compliance	GDPR non-compliance	Low	Critical	10	PIA; consent mgmt; right to deletion; DPO	DPO
R1	Resource	Budget overrun	Medium	Medium	9	Reserved instances; cost monitoring; quotas	Finance
R3	Resource	Timeline delay	Medium	High	12	Phased delivery; parallel work; weekly tracking	PMO

18.2 Critical Risks Requiring Immediate Action

T1 — DVR Disk Full (Score: 20)
- Action: Emergency disk cleanup within 24 hours
- Implement automatic rotation at 85% capacity
- Configure critical alerts at 90%, 95%, 98%
- Owner: Platform Team | Due: 2025-01-17
O1 — Edge Hardware Failure (Score: 15)
- Action: Procure cold spare device
- Document hardware replacement runbook
- Automate configuration restoration from GitOps
- Owner: Operations Team | Due: 2025-02-01

Section 19: Final Implementation Roadmap

19.1 Five-Phase Implementation (20 Weeks)

Phase	Weeks	Name	Theme	Key Milestone
1	1-4	Foundation	Infrastructure, VPN, edge, database	M1: Infrastructure Ready
2	5-8	Core AI Pipeline	Video ingestion, detection, recognition	M2: AI Pipeline Operational
3	9-12	Application Layer	Dashboard, alerts, notifications	M3: Application Live
4	13-16	Intelligence	Night mode, training, self-learning	M4: Intelligence Features
5	17-20	Hardening	Security, testing, operations, go-live	M5: Production Go-Live

19.2 Key Milestones and Deliverables

Milestone	Target Week	Deliverables	Entry Criteria	Exit Criteria
M1 Infrastructure	Week 4	Cloud services, VPN, edge gateway, database, monitoring	Project kickoff, hardware delivered	All services healthy, VPN stable, schema deployed
M2 AI Pipeline	Week 8	Video capture, YOLO, SCRFD, ArcFace, detection DB, API	M1 complete, models ready	All 8 streams ingesting, AI accuracy targets met, API functional
M3 Application	Week 12	Dashboard, alerts, Telegram, WhatsApp, person mgmt, WebSocket	M2 complete, frontend env ready	Dashboard live, alerts delivered, person management working
M4 Intelligence	Week 16	Night mode, training pipeline, self-learning, privacy, search	M3 complete, training data accumulated	All intelligence features operational
M5 Go-Live	Week 20	Security audit, test framework, DR, runbooks, load test, checklist	M4 complete, security audit scheduled	All audits passed, checklist complete, 72h stability

19.3 Phase Details

Phase 1 (Weeks 1-4): VPC, EKS, RDS, Redis, Kafka, S3, WireGuard VPN, edge gateway OS hardening, Docker setup, database schema with migrations, monitoring stack (Prometheus, Grafana).

Phase 2 (Weeks 5-8): RTSP capture service, YOLO human detection, SCRFD face detection, ArcFace face recognition, embedding storage with pgvector, person matching logic, FastAPI core, authentication.

Phase 3 (Weeks 9-12): Next.js frontend, design system, dashboard, live camera view (HLS), alert engine with rules, Telegram Bot API integration, WhatsApp Business API integration, person gallery and profile, unknown review queue, watchlists, WebSocket real-time updates.

Phase 4 (Weeks 13-16): Night mode AI model, AI Vibe Settings page, training pipeline with model versioning, self-learning service for unknown clusters, privacy mode with face blurring, suspicious activity detection, search service (face + text), system health dashboard.

Phase 5 (Weeks 17-20): Penetration testing, SAST/DAST, self-test framework (21 suites), backup/DR setup, incident response runbooks, load testing (8-64 cameras), performance tuning, operations training, go-live checklist (98 items), production cutover, 72-hour stability monitoring.

19.4 Resource Allocation

Phase	Engineering	AI/ML	DevOps	QA	Security
1: Foundation	2	—	2	—	1
2: Core AI	2	2	1	1	—
3: Application	3	1	1	2	—
4: Intelligence	2	2	1	1	—
5: Hardening	2	1	2	2	2

Section 20: Final Production-Readiness Summary

20.1 System at a Glance

Category	Specification
Architecture	Cloud (AWS EKS) + Edge (Intel NUC) + VPN (WireGuard)
Services	12 containerized microservices
Security Zones	5 (Public, App Private, Database, Edge LAN, Camera LAN)
AI Pipeline	YOLO11m (human detection) + SCRFD (face detection) + ArcFace (recognition)
Embeddings	512-Dimensional face vectors stored in pgvector
Database	PostgreSQL 15, 29 tables, partitioned, AES-256-GCM encrypted
Web Application	18 pages, dark mode, Next.js 14, real-time WebSocket
Notifications	Telegram Bot API + WhatsApp Business API (dual channel)
Security	TLS 1.3, Argon2id, JWT ES256, TOTP MFA, RBAC (4 roles, 30+ permissions)
Testing	21 test suites, 170+ test cases, automated readiness scoring
Reliability	99.9% uptime target, RTO 1 hour, RPO 15 minutes
Timeline	20 weeks (5 months) to production

20.2 Readiness Checklist Summary

Category	Items	Status
Infrastructure	14	Ready to implement
Security	18	Ready to implement
AI/ML Pipeline	15	Ready to implement
Application	16	Ready to implement
Operations	15	Ready to implement
Data & Privacy	10	Ready to implement
Documentation	10	Ready to implement
Total	98	Ready to implement

20.3 Estimated Timeline

Milestone	Target	Duration
M1: Infrastructure Ready	Week 4	4 weeks
M2: AI Pipeline Operational	Week 8	4 weeks
M3: Application Live	Week 12	4 weeks
M4: Intelligence Features	Week 16	4 weeks
M5: Production Go-Live	Week 20	4 weeks
Total to Production	20 weeks	~5 months

Appendices

Appendix A: Cross-Reference to Specialist Documents

Document	Path	Content
Notification System	`/mnt/agents/output/notification_system.md`	Telegram, WhatsApp, routing rules, templates, retry logic
Security Architecture	`/mnt/agents/output/security_architecture.md`	SSL/TLS, auth, RBAC, VPN, secrets, audit, GDPR, checklist
Web UX Design	`/mnt/agents/output/web_ux_design.md`	Design system, 18 pages, navigation, user flows, AI vibe settings
Self-Test Framework	`/mnt/agents/output/self_test_framework.md`	Framework architecture, 21 suites, scheduling, sample report
Operations Plan	`/mnt/agents/output/operations_plan.md`	Monitoring, logging, backup, DR, incident response, runbooks
Architecture	`/mnt/agents/output/architecture.md`	System architecture, data flow, scaling strategy, cost estimates

Appendix B: Acronyms

Acronym	Full Form
AI	Artificial Intelligence
ALB	Application Load Balancer
API	Application Programming Interface
ArcFace	Additive Angular Margin Loss for Deep Face Recognition
CSP	Content Security Policy
CSRF	Cross-Site Request Forgery
CORS	Cross-Origin Resource Sharing
DLQ	Dead Letter Queue
DVR	Digital Video Recorder
EKS	Elastic Kubernetes Service
ES256	ECDSA using P-256 and SHA-256
FFmpeg	Fast Forward MPEG (multimedia framework)
FPS	Frames Per Second
GDPR	General Data Protection Regulation
GPU	Graphics Processing Unit
HLS	HTTP Live Streaming
HPA	Horizontal Pod Autoscaler
HSTS	HTTP Strict Transport Security
JWT	JSON Web Token
LUKS	Linux Unified Key Setup
MFA	Multi-Factor Authentication
mTLS	Mutual TLS
mAP	mean Average Precision
NMS	Non-Maximum Suppression
NUC	Next Unit of Computing
OCSP	Online Certificate Status Protocol
PII	Personally Identifiable Information
PSK	Pre-Shared Key
RBAC	Role-Based Access Control
RDS	Relational Database Service
RPO	Recovery Point Objective
RTO	Recovery Time Objective
RTSP	Real Time Streaming Protocol
S3	Simple Storage Service
SAST	Static Application Security Testing
SCRFD	Single-Shot Multi-scale Face Detector
SLA	Service Level Agreement
SQL	Structured Query Language
SSL	Secure Sockets Layer
TLS	Transport Layer Security
TOTP	Time-based One-Time Password
TPM	Trusted Platform Module
UAT	User Acceptance Testing
VPC	Virtual Private Cloud
VPN	Virtual Private Network
WAF	Web Application Firewall
WORM	Write Once Read Many
XSS	Cross-Site Scripting
YOLO	You Only Look Once

End of Document

Document Version: 1.0 Classification: Confidential — Internal Use Only Next Review: 2025-04-16 Owner: Sentinel AI Architecture Team