Unified Blueprint

Unified Blueprint

Master document with the full surveillance platform design.

AI-Powered Industrial Surveillance Platform

Unified Technical Blueprint — Part A: Sections 1-10

Document Property Value
Version 1.0.0
Classification Technical Blueprint — Production Design
Target DVR CP PLUS ORANGE CP-UVR-0801E1-CV2
Channels 8 active (scalable to 64+)
Resolution 960 x 1080 per channel
DVR Network 192.168.29.200/24, RTSP port 554
Date 2025

Cross-Reference Guide: This unified blueprint synthesizes six specialist design documents. For detailed specifications on any subsystem, refer to:

  • architecture.md — Full architecture, scaling, failover, cost estimation
  • video_ingestion.md — RTSP configuration, FFmpeg commands, edge gateway specs
  • ai_vision.md — Model configurations, inference code, benchmarks
  • database_schema.md — Complete DDL, triggers, views, RLS policies
  • suspicious_activity.md — Detection algorithms, scoring engine pseudocode
  • training_system.md — Training pipelines, quality gates, versioning logic

Table of Contents


Section 1: Executive Summary

1.1 Project Objective

This blueprint defines the complete technical design for an AI-powered industrial surveillance platform that transforms a legacy CP PLUS 8-channel DVR system into a modern, intelligent security operations center. The platform processes real-time video from 8 camera channels, applies state-of-the-art computer vision and face recognition AI, detects suspicious activity during night hours, and provides a unified dashboard for security operators — all while maintaining the highest standards of reliability, security, and data privacy.

The system is designed around a cloud+edge hybrid architecture where all compute-intensive AI inference runs in the cloud (AWS Mumbai), while a local edge gateway handles stream ingestion, buffering, and site-local concerns. A WireGuard VPN tunnel protects all communication between edge and cloud, ensuring the DVR has zero public internet exposure.

1.2 Key Capabilities

Capability Description Technology
Human Detection Real-time person detection across all 8 channels at 15-20 FPS YOLO11m + TensorRT FP16, 640x640
Face Detection Accurate face localization with 5-point landmarks for alignment SCRFD-500M-BNKPS, 640x640
Face Recognition 512-D embedding extraction with 99.83% LFW accuracy ArcFace R100 IR-SE100 (MS1MV3)
Person Tracking Persistent identity tracking across frames with occlusion recovery ByteTrack (Kalman + IoU), 80.3% MOTA
Unknown Clustering Automatic grouping of unknown faces for operator review HDBSCAN + DBSCAN fallback, 89.5% purity
Night Mode Surveillance 10-detection-module suspicious activity analysis (22:00-06:00) Composite scoring engine with time-decay
AI Vibe Controls Three intuitive presets (Relaxed/Balanced/Strict) mapping to 4 confidence levels Dynamic threshold adjustment
Safe Self-Learning Three-mode training system with conflict detection and approval workflows MLflow + Airflow + Manual Review
24/7 Reliability Graceful degradation: video never stops, AI catch-up on recovery Tiered storage + circuit breakers + replay
Real-Time Alerts 6-level escalation (NONE to EMERGENCY) with multi-channel notifications Telegram, WhatsApp, Email, Webhook
Live Dashboard Multi-camera grid with HLS streaming and single-camera low-latency WebRTC Next.js 14 + HLS.js + WebRTC

1.3 Architecture Approach

The platform follows a cloud+edge+VPN hybrid pattern with five network security zones:

Cameras (8ch) --> DVR (local) --> Edge Gateway (local) --> WireGuard VPN --> AWS Cloud (EKS)
                                      |                        |
                                      | 2TB NVMe buffer         | Encrypted tunnel
                                      | 7-day ring buffer       | UDP 51820
                                      | FFmpeg ingestion        | ChaCha20-Poly1305

Key architectural decisions:

Decision Choice Rationale
Cloud Provider AWS ap-south-1 (Mumbai) Lowest latency to India, mature managed services
Container Orchestration Amazon EKS + K3s edge Managed control plane, GPU node support, lightweight edge
VPN WireGuard ~60% faster than OpenVPN, modern crypto, simple setup
Message Queue Apache Kafka (MSK) Durable ordered log, replay capability, proven at scale
AI Inference NVIDIA Triton + TensorRT GPU-optimized, dynamic batching, model ensemble
Database PostgreSQL 16 + pgvector ACID compliance, native 512-D vector support
Object Storage MinIO (edge+cloud) + S3 (archive) S3-compatible API, tiered cost optimization

1.4 Target Environment

The platform targets a CP PLUS ORANGE CP-UVR-0801E1-CV2 DVR with the following characteristics:

Property Value Impact on Design
Brand/Model CP PLUS ORANGE CP-UVR-0801E1-CV2 Dahua-compatible RTSP URL scheme
Channels 8 active Initial deployment scope
Resolution 960 x 1080 per channel AI input: letterbox to 640x640
LAN IP 192.168.29.200/24 Edge gateway on same subnet
RTSP Port 554 TCP interleaved mandatory
ONVIF V2.6.1.867657 (Server V19.06) Auto-discovery supported
DVR Disk FULL (0 bytes free) All archival is edge-managed; no DVR recording
VPN Access WireGuard-secured No public exposure; all traffic encrypted

Critical Design Impact: The DVR disk being full means the system cannot rely on DVR-side recording or playback features. All archival storage is managed by the edge gateway's 2TB NVMe buffer and cloud tiering.

1.5 Key Differentiators

1. AI Vibe Controls Instead of exposing complex threshold parameters to operators, the system provides three intuitive "vibe" presets — Relaxed, Balanced, and Strict — that internally map to optimized configurations for detection sensitivity and face match strictness. This innovation makes the system accessible to non-technical security staff while maintaining AI precision.

2. Safe Self-Learning Training System The platform captures operator corrections (confirmations, corrections, merges, rejections) and feeds them back into model improvement through a carefully designed three-mode learning pipeline: Manual Only, Suggested Learning (recommended), and Approved Auto-Update. A synchronous conflict detector blocks five types of label conflicts before they reach the training dataset, ensuring model integrity.

3. 24/7 Reliability with Graceful Degradation The system is architected around a single priority: video recording never stops. If the AI inference service fails, recording continues locally with queued catch-up processing on recovery. If the VPN tunnel fails, the edge gateway maintains 7 days of local buffer. If the cloud database fails, alerts accumulate in Kafka's durable log. Every failure mode has a defined degradation strategy.

4. 10-Module Night Surveillance The suspicious activity detection system goes beyond simple motion detection to provide comprehensive behavioral analysis through 10 specialized detection modules — from intrusion and loitering to abandoned objects and repeated re-entry patterns — all combined through a composite scoring engine with exponential time-decay.

1.6 Production Readiness Assessment

Dimension Status Notes
Architecture Completeness Production-Ready All 12 services fully specified with resource allocations
AI Model Selection Production-Ready Industry-standard models with published benchmarks
Database Design Production-Ready 29 tables, 4 views, 8 triggers, partitioning, RLS
Security Architecture Production-Ready 7-layer defense in depth, encrypted credentials, VPN-only
Scaling Path Defined 8 -> 16 -> 32 -> 64+ cameras with concrete resource allocations
Failover Design Production-Ready Graceful degradation matrix for all failure modes
Estimated Timeline 14 weeks 4 implementation phases defined
Estimated Monthly Cost ~$2,140 USD 8-camera deployment at steady state

Section 2: Kimi Swarm Team and Agent Responsibilities

The unified blueprint was synthesized from the outputs of 11 specialist agents, each responsible for a specific domain of the platform design.

2.1 Agent Responsibility Matrix

# Agent Responsibility Key Deliverables
1 Requirements Analyst Elicited and structured all functional/non-functional requirements Requirements traceability matrix, user stories, acceptance criteria
2 System Architect Designed overall cloud+edge+VPN topology and service interactions Deployment topology, 5 security zones, scaling roadmap, failover matrix
3 Video Ingestion Engineer Specified RTSP configuration, edge gateway, and stream processing RTSP URL patterns, FFmpeg commands, auto-reconnect logic, HLS generation
4 AI Vision Scientist Selected and configured all CV/AI models for the inference pipeline Model selection table, inference pipeline architecture, confidence handling
5 Database Architect Designed complete data model with partitioning, indexing, and security 29 tables + 4 views + 8 triggers, pgvector HNSW index, RLS policies
6 Suspicious Activity Designer Designed 10 detection modules and composite scoring engine Detection algorithms, scoring formula, YAML configuration schema
7 Training System Engineer Designed self-learning pipeline with safety controls 3 learning modes, conflict detection, quality gates, versioning
8 Frontend Developer Designed Next.js dashboard with real-time video and alerts Component architecture, HLS.js integration, WebSocket alerts
9 DevOps Engineer Specified CI/CD, monitoring, and infrastructure-as-code GitHub Actions + ArgoCD, Prometheus/Grafana, alerting rules
10 Security Architect Designed defense-in-depth security across all layers 7 security layers, secret management, encryption standards
11 Technical Writer (this document) Synthesized all specialist outputs into unified blueprint 10-section unified document with cross-references

2.2 Agent Interaction Flow

+-----------------------------------------------------------------------------+
|                         KIMI SWARM TEAM ORCHESTRATION                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|   Requirements Analyst                                                       |
|        |                                                                     |
|        v                                                                     |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | System  |<-->| Video   |<-->| AI      |<-->| Database|                  |
|   |Architect|    |Ingestion|    |Vision   |    |Architect|                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^                                              |                      |
|        |           +---------+    +---------+        |                      |
|        +---------->|Suspicious|<-->|Training |<-------+                      |
|                    |Activity  |    |System   |                               |
|                    |Designer  |    |Engineer |                               |
|                    +---------+    +---------+                               |
|                        |                                              |
|                        v                                              |
|                   +---------+    +---------+    +---------+           |
|                   |Frontend |    |DevOps   |    |Security |           |
|                   |Developer|    |Engineer |    |Architect|           |
|                   +---------+    +---------+    +---------+           |
|                        |                                              |
|                        v                                              |
|                   +---------------------+                             |
|                   | Technical Writer    |                             |
|                   | (Unified Blueprint) |                             |
|                   +---------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

2.3 Cross-Agent Design Consistency

The following cross-cutting concerns were harmonized across all agent outputs during synthesis:

Concern Resolution Agents Coordinated
Video latency budget < 100ms end-to-end (AI); ~35-65s (HLS live) Video Ingestion, AI Vision, Frontend
Face embedding storage 512-D float32, pgvector HNSW index, cosine similarity Database, AI Vision, Training
Event data retention 90 days hot (MinIO), 1 year cold (Glacier), 7 days edge Database, Architecture, Video Ingestion
Alert escalation 6 levels: NONE -> LOW -> MEDIUM -> HIGH -> CRITICAL -> EMERGENCY Suspicious Activity, Database, Frontend
Model versioning Semantic MAJOR.MINOR.PATCH with MLflow registry Training, AI Vision, Architecture
Graceful degradation Video never stops; AI catch-up on recovery Architecture, Video Ingestion, AI Vision
Security zones 5 zones: Internet -> ALB -> Application -> Data -> Edge Architecture, Security, Video Ingestion

Section 3: Assumptions

All assumptions made across the specialist designs are consolidated below. These should be validated before implementation begins.

3.1 Network and Hardware Assumptions

ID Assumption Validation Method Risk if Invalid
NW-01 Edge gateway has dual Ethernet: one for local DVR subnet (192.168.29.0/24), one for internet/VPN Physical site survey Cannot bridge DVR to VPN
NW-02 Site internet bandwidth >= 16 Mbps sustained upload for 8 channels ISP speed test Video drops, AI delays
NW-03 WireGuard UDP port 51820 is not blocked by site firewall Firewall rule check VPN cannot establish
NW-04 DVR RTSP server supports TCP interleaved transport (rtsp_transport tcp) FFmpeg test probe UDP fallback has packet loss
NW-05 DVR supports 16+ concurrent RTSP sessions (8 channels x 2 streams) Session stress test Stream contention
NW-06 MTU 1400 is viable through site NAT/firewall for WireGuard tunnel Ping with DF bit test Fragmentation issues
HW-01 Intel NUC 13 Pro (i5-1340P, 16GB RAM, 512GB NVMe) is available for edge gateway Hardware procurement May need Jetson Orin alternative
HW-02 Edge gateway has UPS backup for graceful shutdown on power loss Electrical survey Data corruption on hard power-off
HW-03 AWS g4dn.xlarge (T4 GPU) instances are available in ap-south-1 AWS EC2 capacity check Need alternative GPU instance

3.2 DVR Capabilities Assumptions

ID Assumption Validation Method Risk if Invalid
DVR-01 DVR RTSP streams are accessible at rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M FFmpeg connectivity test Need alternative URL format
DVR-02 DVR continues serving RTSP streams even with disk full (0 bytes free) 24-hour stream stability test Streams may stall
DVR-03 DVR sub-stream (subtype=1) provides sufficient quality for AI inference (typically 352x288 to 704x576) Frame quality inspection May need main stream for AI
DVR-04 DVR ONVIF server supports device discovery and stream URI retrieval ONVIF Device Manager test Manual camera configuration needed
DVR-05 DVR channel numbering is 1-indexed (1-8) ONVIF profile enumeration Off-by-one errors in configuration
DVR-06 DVR Digest authentication works with the provided credentials RTSP DESCRIBE request test May need Basic auth or different scheme

3.3 Environmental Assumptions

ID Assumption Impact if Invalid
ENV-01 Cameras provide adequate lighting for face recognition during night hours (minimum 10 lux at face distance) Face recognition accuracy degrades; may need IR illumination
ENV-02 Camera angles allow frontal face capture at entry/exit points (yaw < 45 degrees) Face recognition miss rate increases
ENV-03 Indoor industrial environment with minimal weather interference False positive rate from rain/shadows is low
ENV-04 Maximum person-to-camera distance is within 10 meters for face recognition Faces may be too small (< 20px) for reliable detection
ENV-05 Camera positions are stable (no PTZ movement during normal operation) Zone calibration remains valid

3.4 Operational Assumptions

ID Assumption Impact if Invalid
OPS-01 Security operators will review unknown face clusters and provide identity labels daily Unknown person database grows without enrichment
OPS-02 Admin will review training suggestions at least weekly in "Suggested Learning" mode Training queue backlog accumulates
OPS-03 Site has authorized personnel who can access edge gateway for maintenance (SSH, physical) Remote troubleshooting limited
OPS-04 Alert fatigue is a genuine concern — false positive rate > 20% leads to ignored alerts AI vibe controls and suppression tuned accordingly
OPS-05 Incident video review requires 10-second pre-event and 30-second post-event clips Clip configuration fixed

3.5 Security Assumptions

ID Assumption Impact if Invalid
SEC-01 WireGuard encryption (ChaCha20-Poly1305) meets organizational security requirements May need additional encryption layer
SEC-02 AWS VPC with private subnets satisfies data residency requirements for India Compliance review needed
SEC-03 Face embeddings (512-D vectors) do not constitute PII under applicable regulations Legal review needed for biometric data handling
SEC-04 Edge gateway physical security is equivalent to server room security Tampering risk if edge is physically accessible
SEC-05 DVR credentials can be stored encrypted (AES-256) in cloud database Key management infrastructure required

3.6 AI Performance Assumptions

ID Assumption Impact if Invalid
AI-01 YOLO11m TensorRT FP16 achieves > 75% person AP@50 on surveillance footage May need fine-tuning on site-specific data
AI-02 ArcFace R100 achieves > 98% Rank-1 accuracy on enrolled persons with 5+ reference images Enrollment quality gates ensure minimum samples
AI-03 HDBSCAN achieves > 89% cluster purity on 512-D face embeddings from this camera setup Fallback to DBSCAN if density varies too much
AI-04 ByteTrack maintains < 2 ID switches per 100 frames in industrial environment with occlusion May need BoT-SORT upgrade for complex scenes
AI-05 GPU (T4) can sustain 15-20 FPS processing per stream across 8 streams with batching CPU fallback at 5-8 FPS if GPU unavailable

Section 4: Full Architecture

4.1 High-Level System Architecture

The platform employs a cloud+edge hybrid architecture with five network security zones. Video streams are ingested at the edge, processed by AI in the cloud, and presented through a web-based dashboard. A WireGuard VPN tunnel provides encrypted, zero-exposure connectivity between edge and cloud.

+=============================================================================+
|                         CLOUD+EDGE+VPN ARCHITECTURE                          |
+=============================================================================+
|                                                                              |
|   ZONE 0: INTERNET (UNTRUSTED)                                               |
|   +---------------------+                                                    |
|   |  Users / Browsers   |                                                    |
|   |  HTTPS :443         |                                                    |
|   +----------+----------+                                                    |
|              |                                                               |
|              v                                                               |
|   ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                       |
|   +--------------------------------------------------------------+          |
|   |  AWS ALB (:443) + WAF v2 + Rate Limit + Geo-Restriction      |          |
|   |       |                                                      |          |
|   |       v                                                      |          |
|   |  Traefik Ingress Controller (:8443)                          |          |
|   |  - Route: /api/*  -> Backend Service                         |          |
|   |  - Route: /ws/*   -> WebSocket Handler                       |          |
|   |  - Route: /       -> Next.js Web App                         |          |
|   |  - TLS: Let's Encrypt auto certificates                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 2: AWS VPC APPLICATION (TRUSTED)                                      |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Stream      |  | AI Inference|  | Suspicious Activity |   |          |
|   |  | Ingestion   |  | Service     |  | Service (Night Mode)|   |          |
|   |  | (Go/FFmpeg) |  | (Triton)    |  | (Go/Python)         |   |          |
|   |  | :8081       |  | :8001 gRPC  |  | :8083               |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  | Backend API |  | Training    |  | Notification        |   |          |
|   |  | (Go/Gin)    |  | Service     |  | Service             |   |          |
|   |  | :8080       |  | (PyTorch)   |  | (Go)                |   |          |
|   |  +-------------+  +-------------+  +---------------------+   |          |
|   |  +--------------------+                                      |          |
|   |  | Web Frontend       |  HLS Playback Service               |          |
|   |  | (Next.js 14 :3000) |  (Go :8085)                         |          |
|   |  +--------------------+                                      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              v                                                               |
|   ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                                   |
|   +--------------------------------------------------------------+          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  | PostgreSQL  |  | Redis       |  | Kafka       |           |          |
|   |  | 16 (RDS)    |  | 7 Cluster   |  | (MSK)       |           |          |
|   |  | :5432       |  | :6379       |  | :9092       |           |          |
|   |  | pgvector    |  | Pub/Sub     |  | 3 brokers   |           |          |
|   |  | HNSW index  |  | Streams     |  | 3 AZs       |           |          |
|   |  +-------------+  +-------------+  +-------------+           |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   |  | MinIO       |  | S3 (Cold Archive)                 |      |          |
|   |  | (S3-compat) |  | - Standard (30d)                  |      |          |
|   |  | :9000       |  | - IA (31-90d)                     |      |          |
|   |  | 10 TB       |  | - Glacier Deep Archive (90d+)     |      |          |
|   |  +-------------+  +-----------------------------------+      |          |
|   +--------------------------------------------------------------+          |
|              |                                                               |
|              | WireGuard VPN Tunnel (UDP 51820)                                |
|              | ChaCha20-Poly1305 encryption                                    |
|              | Cloud peer: 10.200.0.1/32 <-> Edge peer: 10.200.0.2/32         |
|              v                                                               |
|   ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                                 |
|   +--------------------------------------------------------------+          |
|   |  +--------------------------------------------------------+  |          |
|   |  |              EDGE GATEWAY (Intel NUC)                  |  |          |
|   |  |  Ubuntu 22.04 LTS | K3s v1.28+ | 2TB NVMe             |  |          |
|   |  |                                                          |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Stream Manager  |  | HLS Segmenter   |                |  |          |
|   |  |  | (Python/asyncio)|  | (FFmpeg/nginx)  |                |  |          |
|   |  |  | 8x RTSP feeds   |  | 2s segments     |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  | Frame Extractor |  | Buffer Manager  |                |  |          |
|   |  |  | (AI decimation) |  | (20GB ring buf) |                |  |          |
|   |  |  +-----------------+  +-----------------+                |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  |  | VPN Client (WireGuard)  |  Health Monitor         |    |  |          |
|   |  |  +--------------------------------------------------+    |  |          |
|   |  +--------------------------------------------------------+  |          |
|   |                            |                                             |
|   |   Local Network (192.168.29.0/24)                                       |
|   |   +------------------+    +------------------+                           |
|   |   | CP PLUS DVR      |    | Local Monitor    |                           |
|   |   | 192.168.29.200   |    | 192.168.29.10    |                           |
|   |   | 8ch | RTSP :554  |    | (optional)       |                           |
|   |   +------------------+    +------------------+                           |
|   |   CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8                                      |
|   +--------------------------------------------------------------+          |
|                                                                              |
+=============================================================================+

4.2 Service Interaction Diagram

+-----------------------------------------------------------------------------+
|                           SERVICE INTERACTIONS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|   INTERNET USERS                                                             |
|        |                                                                     |
|        | HTTPS :443                                                          |
|        v                                                                     |
|   +---------+      +----------+      +----------+                           |
|   | AWS ALB |----->| Traefik  |----->| Next.js  |  Web Frontend             |
|   | +WAF    |      | Ingress  |      | (SSR)    |  Dashboard                |
|   +---------+      +----------+      +----+-----+                           |
|                                             |                                |
|                        +--------------------+--------------------+           |
|                        |                    |                    |           |
|                        v                    v                    v           |
|                   +---------+       +------------+      +----------+       |
|                   |Backend  |       | WebSocket  |      | HLS      |       |
|                   |API (Go) |       | Handler    |      | Playback |       |
|                   |:8080    |       | /ws/alerts |      | Service  |       |
|                   +----+----+       +------------+      +----+-----+       |
|                        |                                               |
|                        | gRPC :50051                                    |
|                        v                                               |
|   +---------+    +------------+    +----------+    +----------+       |
|   | Stream  |    | AI         |    |Suspicious|    |Training  |       |
|   |Ingestion|<-->| Inference  |<-->| Activity |    |Service   |       |
|   |(Go)     |    |(Triton)    |    |(Night)   |    |(PyTorch) |       |
|   +----+----+    +------+-----+    +----+-----+    +----+-----+       |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------------------------------------------------------------+   |
|   |                        KAFKA (MSK)                            |   |
|   |  streams.raw (8 parts)  ai.detections (16 parts)             |   |
|   |  alerts.critical (4 parts)  training.data (30-day ret.)      |   |
|   |  notifications.*  system.metrics (7-day ret.)                |   |
|   +---------------------------------------------------------------+   |
|        |                |               |               |             |
|        v                v               v               v             |
|   +---------+    +------------+    +----------+    +----------+       |
|   |PostgreSQL|   | Redis      |    | MinIO    |    | MLflow   |       |
|   |16 +pgvec |   |7 Cluster   |    |S3-compat |    | Model    |       |
|   |:5432     |   |:6379       |    |:9000     |    | Registry |       |
|   +---------+    +------------+    +----------+    +----------+       |
|                                                                              |
|   Edge Gateway: WireGuard peer at 10.200.0.2/32                            |
|   Stream Ingestion pulls frames via VPN -> sends to Kafka                   |
|                                                                              |
+-----------------------------------------------------------------------------+

4.3 Network Security Zones

Five security zones provide defense in depth, from the public internet to the physically isolated edge network.

+=============================================================================+
|                         NETWORK SECURITY ZONES                               |
+=============================================================================+
|                                                                              |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 0: INTERNET (UNTRUSTED)                                        |    |
|  |  - Public users, any source IP                                        |    |
|  |  - AWS Shield Standard DDoS protection                               |    |
|  |  - Geo-restriction: allow specific countries only                    |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | HTTPS :443                                    |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 1: AWS VPC EDGE (DEMILITARIZED)                                |    |
|  |  - ALB + WAF v2 (SQL injection, XSS, rate limiting rules)           |    |
|  |  - Traefik Ingress (:8443)                                          |    |
|  |  - Auth: JWT + RBAC, API keys for edge gateway                     |    |
|  |  - Public API endpoints ONLY                                        |    |
|  |  SG: alb-public-sg: 443 from 0.0.0.0/0                             |    |
|  |  SG: traefik-sg: 8443 from alb-sg ONLY                              |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Internal :8080-8090                         |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 2: AWS VPC APPLICATION (TRUSTED, ISOLATED)                     |    |
|  |  - Stream Ingestion, AI Inference, Suspicious Activity              |    |
|  |  - Training, Backend API, Notification Services                     |    |
|  |  - Pod Security: No root, read-only FS, no privilege escalation    |    |
|  |  - Network Policies: Ingress only from API GW namespace            |    |
|  |  SG: app-sg: 8080-8090 from traefik-sg ONLY                         |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | Data Layer :5432, :6379, :9092, :9000       |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED)                            |    |
|  |  - PostgreSQL (RDS), Redis (ElastiCache), Kafka (MSK)               |    |
|  |  - MinIO object storage, S3 cold archive                            |    |
|  |  - Security Groups: ONLY from app-sg                                |    |
|  |  - RDS: Encrypted at rest (AWS KMS), no public access              |    |
|  |  - S3: Bucket policy deny all except VPC endpoint                   |    |
|  +---------------------------+-----------------------------------------+    |
|                              |                                               |
|                              | WireGuard VPN (UDP 51820)                     |
|                              | ChaCha20-Poly1305                             |
|                              v                                               |
|  +---------------------------------------------------------------------+    |
|  |  ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED)                          |    |
|  |  - Edge Gateway (Intel NUC), K3s node                                |    |
|  |  - WireGuard peer, stream ingestion, local buffer                    |    |
|  |  - DVR (192.168.29.200): NO internet access, local ONLY             |    |
|  |  - Edge Firewall: ALLOW 192.168.29.0/24 -> DVR :554,:80           |    |
|  |                   ALLOW OUT 51820/udp -> Cloud VPN endpoint        |    |
|  |                   DENY ALL other incoming                           |    |
|  +---------------------------------------------------------------------+    |
|                                                                              |
+=============================================================================+

4.4 Service Descriptions

# Service Purpose Technology Port Replicas
1 Edge Gateway Agent RTSP stream pull, local recording, VPN endpoint, heartbeat Go 1.21, systemd + K3s 8080, 51820 1 (per site)
2 Stream Ingestion Receive frames from edge, decode, produce to Kafka, store segments Go 1.21, FFmpeg 8081 3-20 (HPA)
3 AI Inference GPU-accelerated detection, face recognition, embedding Triton 2.40, TensorRT 8000, 8001, 8002 1-4 (GPU HPA)
4 Suspicious Activity Night-mode analysis, 10 detection modules, scoring engine Python 3.11, OpenCV 8083 2-8 (HPA)
5 Training Service Model retraining, fine-tuning, A/B validation PyTorch 2.1, CUDA 12.1 8084 0-1 (GPU spot)
6 Backend API REST API, authentication, business logic Go 1.21, Gin 8080 3-10 (HPA)
7 Web Frontend Dashboard, live view, timeline, analytics Next.js 14, React 18 3000 3 (CDN)
8 Notification Multi-channel alert dispatch (Telegram, WhatsApp, Email) Go 1.21 8086 2-5 (HPA)
9 HLS Playback HLS segment serving for dashboard live view Go 1.21 8085 2-4 (HPA)
10 PostgreSQL Primary database with pgvector for embeddings PostgreSQL 16 (RDS) 5432 1 (Multi-AZ)
11 Redis Session store, cache, pub/sub, stream tracking Redis 7 (ElastiCache) 6379 2 shards x 2 replicas
12 Kafka Event bus, durable log, stream replay Apache Kafka (MSK) 9092 3 brokers x 3 AZs
13 MinIO Object storage for video, snapshots, model artifacts MinIO (S3-compatible) 9000, 9001 Edge: 1, Cloud: 4

4.5 Physical Edge Gateway Specification

Component Specification
Hardware Intel NUC 13 Pro, Core i5-1340P (12 cores, 16 threads)
Alternative NVIDIA Jetson Orin NX 16GB (for on-edge AI inference)
RAM 16GB DDR4-3200 (32GB recommended for 16+ channels)
Storage 2TB NVMe SSD (7-day circular buffer for all 8 streams)
LAN Intel i226-V 2.5GbE (local DVR subnet)
WAN Second Ethernet or WiFi (internet for VPN)
OS Ubuntu 22.04.4 LTS Server (no GUI)
Container Runtime Docker CE 25.x + Docker Compose 2.x
K8s Distribution K3s v1.28+ (lightweight, single-node or 2-node HA)
Power UPS-backed, auto-restart on power loss (BIOS setting)
Network Dual interface: eth0 for local DVR, eth1 for internet/VPN

4.6 Cloud Infrastructure Specification

Component Specification
Region Primary: ap-south-1 (Mumbai), DR: ap-southeast-1 (Singapore)
VPC 10.100.0.0/16, 3 AZs, private subnets only for workloads
EKS Managed node groups: on-demand for API, spot for batch/GPU
GPU Nodes g4dn.xlarge (NVIDIA T4) for Triton inference, 1-4 auto-scaled
ALB Internet-facing, WAF v2 attached, Shield Advanced optional
RDS PostgreSQL 16, db.r6g.xlarge, Multi-AZ, encrypted at rest
ElastiCache Redis 7, cluster mode enabled, 2 shards x 2 replicas
MSK (Kafka) 3 broker nodes, kafka.m5.large, 3 AZs
S3 Standard (hot 30d), IA (31-90d), Glacier Deep Archive (90d+)

4.7 Scaling Approach

The system scales from the initial 8-camera deployment to 64+ cameras through well-defined phases:

+-----------------------------------------------------------------------------+
|                        CAMERA SCALING ROADMAP                                |
+-----------------------------------------------------------------------------+
|                                                                              |
|  CURRENT: 8 cameras (1 DVR)                                                  |
|  +-- Edge: Intel NUC i7, 32GB RAM                                           |
|  +-- Bandwidth: ~16 Mbps upstream (2 Mbps per H.264 stream)                 |
|  +-- Cloud AI: 1x T4 GPU (8 streams @ 1 fps, batch=8)                       |
|  +-- Kafka: 8 partitions (streams.raw)                                      |
|  +-- PostgreSQL: db.r6g.xlarge                                              |
|  +-- Monthly cost: ~$2,140                                                  |
|                                                                              |
|  PHASE 1: 16 cameras (2 DVRs / 2 sites)                                      |
|  +-- Edge: 2x Intel NUC (one per site)                                      |
|  +-- Bandwidth: ~32 Mbps                                                    |
|  +-- Cloud AI: 1x T4 GPU (batch=16, still sufficient)                       |
|  +-- Kafka: 16 partitions                                                   |
|  +-- Monthly cost: ~$3,200                                                  |
|                                                                              |
|  PHASE 2: 32 cameras (4 DVRs / 4 sites)                                      |
|  +-- Edge: 4x Intel NUC                                                     |
|  +-- VPN: Hub-spoke model (4 edge peers -> 1 cloud endpoint)                |
|  +-- Bandwidth: ~64 Mbps                                                    |
|  +-- Cloud AI: 2x T4 GPUs (HPA: 2-6 replicas)                               |
|  +-- Kafka: 32 partitions                                                   |
|  +-- PostgreSQL: db.r6g.2xlarge                                             |
|  +-- Monthly cost: ~$5,500                                                  |
|                                                                              |
|  PHASE 3: 64 cameras (8 DVRs / 8 sites)                                      |
|  +-- Edge: 8x Intel NUC (or Jetson Orin for edge AI pre-filter)              |
|  +-- Bandwidth: ~128 Mbps (dedicated circuit recommended)                   |
|  +-- Cloud AI: 4x T4 GPUs or 2x A10G (g5.2xlarge)                           |
|  +-- Kafka: 64 partitions, consider MSK multi-cluster                        |
|  +-- PostgreSQL: db.r6g.4xlarge + read replica                              |
|  +-- Monthly cost: ~$9,800                                                  |
|                                                                              |
+-----------------------------------------------------------------------------+

4.8 Failover and Reliability Design

The graceful degradation matrix defines behavior for every failure mode:

+=============================================================================+
|                     GRACEFUL DEGRADATION MATRIX                              |
+=============================================================================+
|                                                                              |
|  Failure Mode              | Degradation Strategy                            |
|  ------------------------- | ----------------------------------------------- |
|  AI Inference Service DOWN | Continue recording ALL video locally            |
|  (GPU failure, model crash)| Events stored as "unprocessed"                  |
|                            | No real-time alerts                             |
|                            | Queue frames for later batch processing         |
|                            | Dashboard shows "AI OFFLINE" banner             |
|                                                                              |
|  Kafka DOWN (MSK outage)   | Edge Gateway buffers locally (20GB ring buffer) |
|                            | Backpressure: reduce to key frames only (0.2fps)|
|                            | Auto-reconnect with 2x exponential backoff      |
|                            | Replay from local buffer when Kafka recovers    |
|                                                                              |
|  VPN Tunnel DOWN           | Full local operation mode                       |
|  (internet outage)         | All recording continues locally (7-day buffer)  |
|                            | Local alert buzzer/relay (configurable)         |
|                            | No cloud dashboard access                       |
|                            | Auto-sync when VPN recovers                     |
|                                                                              |
|  PostgreSQL DOWN (RDS)     | Alert queue builds in Kafka (durable log)       |
|                            | Events not lost (Kafka 7-day retention)         |
|                            | Read-only dashboard mode (Redis cache)          |
|                            | Alert on-call engineer                          |
|                                                                              |
|  Notification Service DOWN | Alerts accumulate in DB                         |
|                            | Retry with exponential backoff                  |
|                            | Dead letter after 24 hours                      |
|                            | Dashboard shows pending count                   |
|                                                                              |
|  Edge Gateway DOWN (power) | Cloud dashboard shows "SITE OFFLINE"            |
|                            | Last known recordings in cloud                  |
|                            | Alert sent immediately                          |
|                            | UPS: graceful shutdown, preserve data           |
|                                                                              |
+=============================================================================+

Priority Order (highest first):

  1. Video recording NEVER STOPS (local edge priority)
  2. Critical alerts ALWAYS FIRE (local buzzer + queued cloud alerts)
  3. AI inference gracefully degrades to batch catch-up on recovery
  4. Dashboard operates in read-only/cache mode during DB outage
  5. Cloud sync resumes automatically when connectivity restored

Reliability Mechanisms:

Mechanism Implementation Target
Stream Reconnect Exponential backoff: 1s -> 2s -> 4s -> 8s -> max 30s < 60s recovery
Circuit Breaker 5 failures -> OPEN (60s) -> HALF_OPEN (3 test calls) -> CLOSED Prevent cascade failures
VPN Watchdog Ping every 30s, restart WireGuard on 3 consecutive failures < 90s VPN recovery
Kafka Producer acks=all, retries=10, enable.idempotence=true, LZ4 compression Zero message loss
Kafka Consumer Manual offset commit AFTER DB write success Exactly-once processing
Health Checks 5-layer: K8s probes -> Service metrics -> Dependency checks -> E2E synthetic -> Edge heartbeat < 2 min detection
Auto-scaling GPU util > 80% for 2 min -> scale out; Kafka lag > 1000 for 5 min -> scale out Proactive capacity

Section 5: Data Flow from DVR to Cloud to Dashboard

This section traces the complete data journey from camera capture through AI processing to user presentation.

5.1 Overview: Seven Data Flows

+=============================================================================+
|                        SEVEN DATA FLOW PATHWAYS                              |
+=============================================================================+
|                                                                              |
|  Flow 1: Camera --> DVR --> Edge Gateway                                    |
|          [Analog/Digital] -> [H.264 Encode] -> [RTSP Server]                |
|                                                                              |
|  Flow 2: Edge Gateway --> VPN --> Cloud Kafka                               |
|          [FFmpeg ingest] -> [Frame extract] -> [Kafka Producer]             |
|                                                                              |
|  Flow 3: Stream Ingestion --> AI Inference                                  |
|          [Kafka Consumer] -> [GPU Batch] -> [Detection + Face Recog.]       |
|                                                                              |
|  Flow 4: AI Inference --> Events --> Database                               |
|          [Detection results] -> [Event enrich] -> [PostgreSQL]              |
|                                                                              |
|  Flow 5: Events --> Alerts --> Notifications                                |
|          [Scoring engine] -> [Alert create] -> [Multi-channel send]         |
|                                                                              |
|  Flow 6: Live Streams --> Browser Dashboard                                 |
|          [HLS segmenter] -> [Nginx relay] -> [HLS.js player]                |
|                                                                              |
|  Flow 7: Training Feedback Loop                                             |
|          [Operator review] -> [Conflict detect] -> [Model update]           |
|                                                                              |
+=============================================================================+

5.2 Flow 1: Camera to DVR to Edge Gateway

Path: Analog/Digital Camera -> DVR internal encoder -> DVR RTSP server -> Edge Gateway FFmpeg client

Protocol Stack:

Layer Technology Details
Camera Interface Analog BNC / CVBS / AHD CP PLUS DVR supports multiple analog standards
DVR Encoding H.264 High Profile Hardware encoder, real-time, low latency
DVR Storage Internal HDD (currently FULL) 0 bytes free — no local recording possible
Network Transport RTSP over TCP (interleaved) Mandatory for reliable NAT/VPN traversal
URL Pattern rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M N=1-8, M=0(main)/1(sub)
Client FFmpeg 6.0+ -rtsp_transport tcp -stimeout 5000000
Frame Rate 25 FPS (PAL) or 30 FPS (NTSC) Configurable per channel
Resolution (main) 960 x 1080 (per channel) Full resolution
Resolution (sub) 352 x 288 to 704 x 576 Lower bandwidth for AI

FFmpeg RTSP Connection Command:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp \
    -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c copy -f segment -segment_time 60 -reset_timestamps 1 \
    -strftime 1 "/data/buffer/ch1/%Y%m%d_%H%M%S.mkv"

Latency Budget:

Stage Latency
Camera -> DVR (analog) ~1-5 ms
DVR encoding ~50-100 ms
RTSP over LAN ~1-2 ms
Total (camera to edge gateway) ~52-107 ms

5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud

Path: Edge Gateway FFmpeg -> Frame extraction -> JPEG encoding -> Kafka Producer -> WireGuard VPN -> Cloud MSK

Frame Processing Pipeline:

+------------+    +-------------+    +---------------+    +-------------+    +-----------+
| Raw RTSP   | -> | FFmpeg      | -> | Frame         | -> | JPEG        | -> | Kafka     |
| H.264      |    | Demux/Decode|    | Decimation    |    | Encoder     |    | Producer  |
| 25 FPS     |    |             |    | (1 fps)       |    | Quality 85  |    | (LZ4)     |
| 960x1080   |    |             |    | 640x640 crop  |    |             |    |           |
+------------+    +-------------+    +---------------+    +-------------+    +-----------+

FFmpeg Frame Extraction for AI:

ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt -reorder_queue_size 64 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -vf "fps=1,scale=640:640:force_original_aspect_ratio=decrease,pad=640:640:(ow-iw)/2:(oh-ih)/2:black" \
    -q:v 5 -f image2pipe -vcodec mjpeg pipe:1

WireGuard VPN Tunnel Configuration:

Parameter Value
Protocol UDP 51820
Encryption ChaCha20-Poly1305
Key Exchange Curve25519 (ECDH)
Preshared Key Enabled per-peer
Keepalive 25 seconds
MTU 1400 (to account for WireGuard + IP headers)
Cloud Endpoint 10.200.0.1/32 (EC2 bastion or ALB)
Edge Endpoint 10.200.0.2/32
Route 10.200.0.0/16 (AWS VPC) accessible from edge

VPN watchdog script runs every 30 seconds; restarts WireGuard on 3 consecutive ping failures.

Latency Budget:

Stage Latency
Frame extraction (FFmpeg) ~50-100 ms
JPEG encoding ~5-10 ms
Kafka produce (local) ~1-2 ms
WireGuard tunnel ~5-15 ms (Mumbai -> India site)
MSK broker ~1-2 ms
Total (edge to cloud Kafka) ~62-129 ms

5.4 Flow 3: Stream Ingestion to AI Inference

Path: Kafka streams.raw topic -> Stream Ingestion consumer -> Triton Inference Server -> Kafka ai.detections topic

Pipeline Architecture:

+------------+    +-------------------+    +------------------+    +-------------+
| streams.raw| -> | Stream Ingestion  | -> | NVIDIA Triton    | -> | ai.detections |
| (8 parts)  |    | (Go consumer)     |    | (GPU inference)  |    | (16 parts)    |
| JPEG frames|    | Batch aggregator  |    | gRPC :8001       |    | Detection     |
| + metadata |    | (batch=8, timeout)|    | Dynamic batching |    | + embeddings  |
+------------+    +-------------------+    +------------------+    +-------------+

Triton Model Configuration:

Model Inputs Outputs GPU Memory Latency (P50)
YOLO11m-det (TensorRT FP16) 3x640x640 float16 Bboxes, scores, labels ~2.1 GB 12 ms
SCRFD-500M (TensorRT FP16) 3x640x640 float16 Bboxes, landmarks, scores ~1.8 GB 8 ms
ArcFace R100 (TensorRT FP16) 3x112x112 float16 512-D embedding ~3.2 GB 5 ms

Total GPU memory: ~7.1 GB (fits in T4 16 GB with 8 streams)

Latency Budget:

Stage Latency
Kafka consume (batch) ~10-50 ms
Preprocessing (resize, normalize) ~5-15 ms
YOLO11m inference (GPU) ~12 ms (P50)
SCRFD face detection (GPU) ~8 ms (P50)
ArcFace embedding (GPU, per face) ~5 ms (P50)
Post-processing (NMS, matching) ~10-30 ms
Kafka produce (results) ~1-2 ms
Total (Kafka to detection output) ~51-132 ms

5.5 Flow 4: AI Inference to Events to Database

Path: AI Detection results -> Event enricher -> PostgreSQL (multiple tables)

Data Transformation:

+------------+    +-------------------+    +---------------------+    +------------+
| Detection  | -> | Event Enricher    | -> | PostgreSQL Writer   | -> | events     |
| results    |    | - Add camera_id   |    | - UPSERT person     |    | persons    |
| (raw)      |    | - Match person    |    | - INSERT event      |    | embeddings |
|            |    | - Check whitelist |    | - INSERT embedding  |    | face_crops |
+------------+    +-------------------+    +---------------------+    +------------+

Database Write Operations per Detection:

Operation Table Type Notes
Insert event record events INSERT With bounding box, confidence, timestamp
Upsert person persons INSERT/UPDATE If new face, create person record
Insert face crop face_crops INSERT S3 URL, bounding box, quality score
Upsert embedding face_embeddings INSERT/UPDATE 512-D vector, pgvector HNSW index
Increment counters camera_stats UPDATE Daily aggregation

5.6 Flow 5: Events to Alerts to Notifications

Path: AI events -> Suspicious Activity scoring engine -> Alert creation -> Notification dispatch

Scoring and Escalation:

+------------+    +-------------------+    +------------------+    +-------------+
| AI events  | -> | Suspicious Activity| -> | Alert Manager    | -> | Notification |
| (persons,  |    | Scoring Engine     |    | - Deduplicate    |    | Service      |
|  faces)    |    | - 10 modules       |    | - Rate limit     |    | - Telegram   |
|            |    | - Composite score  |    | - Suppress dup   |    | - WhatsApp   |
|            |    | - Time decay       |    | - Escalation     |    | - Email      |
+------------+    +-------------------+    +------------------+    +-------------+

Alert Escalation Matrix:

Score Level Color Notification Action
0.00 - 0.20 NONE Gray None Log only
0.20 - 0.40 LOW Blue Dashboard only Log + indicator
0.40 - 0.60 MEDIUM Yellow Dashboard + App push Alert dispatched
0.60 - 0.80 HIGH Orange All of above + Telegram Immediate alert
0.80 - 1.00 CRITICAL Red All of above + WhatsApp + Email Critical alert
> 1.00 EMERGENCY Purple + flashing All channels + SMS Emergency dispatch

5.7 Flow 6: Live Streams to Browser Dashboard

Path: DVR RTSP -> Edge Gateway FFmpeg -> HLS segmenter -> Nginx -> CDN -> Browser HLS.js

+--------+    +---------------+    +---------------+    +---------+    +----------+
| DVR    | -> | Edge Gateway  | -> | HLS Segmenter | -> | Nginx   | -> | Browser  |
| RTSP   |    | FFmpeg        |    | (2s segments) |    | (relay) |    | HLS.js   |
| 25 FPS |    | -copyts       |    | H.264 + AAC   |    | HTTPS   |    | Video tag|
+--------+    +---------------+    +---------------+    +---------+    +----------+

HLS Configuration:

Parameter Value
Segment duration 2 seconds
Segment list size 5 segments (10-second sliding window)
Playlist type Live (no #EXT-X-ENDLIST)
Codec H.264 High Profile + AAC-LC
Adaptive bitrate 3 variants: high (3 Mbps), mid (1 Mbps), low (500 Kbps)

Latency:

Stage Latency
DVR encoding ~50-100 ms
RTSP to edge ~1-2 ms
FFmpeg demux/remux ~20-50 ms
HLS segmenting (2s) ~2000 ms
Nginx relay ~1-5 ms
CDN propagation ~10-50 ms
HLS.js buffer ~1-2 segments (2-4s)
Browser decode ~20-50 ms
Total (camera to eye) ~2.1 - 2.3 seconds

5.8 Flow 7: Training Feedback Loop

Path: Operator review actions -> Conflict detection -> Training dataset -> Model training -> Quality gates -> Deployment

+------------+    +------------------+    +----------------+    +-------------+    +-----------+
| Operator   | -> | Conflict         | -> | Training       | -> | Quality     | -> | Deployment |
| Review     |    | Detection        |    | Dataset        |    | Gates       |    | (A/B test) |
| (confirm,  |    | (5 types)        |    | - Curate       |    | - Precision |    |            |
|  correct,  |    | - Block conflicts|    | - Label        |    |   >= 0.97   |    |            |
|  merge,    |    | - Queue safe     |    | - Augment      |    | - Recall    |    |            |
|  reject)   |    |   additions      |    | - Version      |    |   >= 0.95   |    |            |
+------------+    +------------------+    +----------------+    +-------------+    +-----------+

Training Data Flow:

Stage Frequency Trigger
Review action collection Continuous Operator clicks on dashboard
Conflict detection Immediate (synchronous) Every review action
Training dataset build Weekly (or on-demand) Queue threshold or manual
Model training On dataset build Airflow DAG trigger
Quality gate evaluation After training Automated pipeline
A/B deployment After quality pass Admin approval
Full production After A/B success Auto-promote at 48h

Section 6: Recommended Tech Stack

6.1 Technology Selection Matrix

Layer Technology Version Purpose Rationale
Cloud Platform AWS 2025 Infrastructure (ap-south-1 Mumbai) Best India region latency, mature managed services
Container Orchestration Amazon EKS v1.28+ Managed Kubernetes control plane GPU node support, Cluster Autoscaler
Edge K8s K3s v1.28+ Lightweight Kubernetes at edge Single binary, resource-efficient
VPN WireGuard v1.0+ Encrypted tunnel between edge and cloud ~60% faster than OpenVPN, modern crypto
Reverse Proxy Traefik v2.10+ Kubernetes Ingress controller Native K8s integration, automatic TLS
AI Inference NVIDIA Triton 2.40 GPU model serving, dynamic batching Multi-framework, TensorRT optimization
CV Framework OpenCV 4.8+ Image processing, pre/post-processing Industry standard, Python/Go bindings
AI/ML Framework PyTorch 2.1+ Model training, custom inference Ecosystem, CUDA 12 support
Deep Learning TensorRT 8.6+ GPU-optimized inference for YOLO, SCRFD, ArcFace FP16 support, 3-5x speedup
Language: AI Python 3.11 AI inference, training, suspicious activity detection Ecosystem, scientific computing
Language: Services Go 1.21 Stream ingestion, backend API, notifications Performance, concurrency, small binaries
Language: Frontend TypeScript 5.2 Web dashboard Type safety, React ecosystem
Web Framework Next.js 14 (App Router) React SSR dashboard Server components, streaming
UI Library React 18 Component-based UI Concurrent features, Suspense
Styling Tailwind CSS 3.4 Utility-first CSS Rapid development, consistent design
Video Player HLS.js 1.4 Browser HLS playback MSE-based, adaptive bitrate
Database PostgreSQL 16 Primary database, vector storage ACID, pgvector extension
Vector Search pgvector 0.5+ HNSW index for 512-D face embeddings Native PostgreSQL, ivfflat+hnsw
Cache/Session Redis 7 Session store, pub/sub, rate limiting Data structures, cluster mode
Message Queue Apache Kafka 3.6+ (MSK) Durable event log, stream replay Exactly-once, retention, partitions
Object Storage MinIO latest (RELEASE.2024) S3-compatible hot storage Edge + cloud, erasure coding
Cold Archive Amazon S3 Standard/IA/Glacier Tiered archival (30d/90d/365d) Cost optimization
Model Registry MLflow 2.8+ Model versioning, experiment tracking Open source, S3 artifact store
Orchestration Apache Airflow 2.7+ Training pipeline DAGs Backfill, retries, observability
Monitoring Prometheus 2.47+ Metrics collection Pull-based, K8s service discovery
Visualization Grafana 10.1+ Dashboards, alerting Panels, annotations, shared links
Log Aggregation Grafana Loki 2.9+ Centralized logging Label-based, cost-effective
CI/CD GitHub Actions v4 Build, test, lint pipelines Native GitHub integration
GitOps ArgoCD 2.9+ Kubernetes continuous delivery Declarative, drift detection
Infrastructure Terraform 1.6+ IaC for AWS resources State management, modules
Secrets AWS Secrets Manager - Encrypted credential storage Rotation, IAM integration

6.2 Hardware Requirements

Edge Gateway (Per Site)

Component Minimum Recommended High Availability
CPU Intel i5-1340P (12 cores) Intel i7-1370P (14 cores) 2x Intel i7 (HA cluster)
RAM 16 GB DDR4-3200 32 GB DDR4-3200 32 GB per node
Storage 1 TB NVMe SSD 2 TB NVMe SSD 2 TB per node + NAS sync
Network 1 Gbps Ethernet 2.5 Gbps Ethernet Dual NIC + bonding
GPU (optional) None NVIDIA Jetson Orin NX 16GB On-edge AI pre-filtering
Power UPS 600VA UPS 1000VA Dual PSU + generator

Cloud GPU Nodes (AI Inference)

Cameras GPU VRAM Streams Cost/month (spot)
1-8 g4dn.xlarge (T4) 16 GB 8 ~$200-350
8-16 g4dn.xlarge (T4) 16 GB 16 ~$350-500
16-32 g4dn.2xlarge (T4) 16 GB 32 ~$600-900
32-64 g5.2xlarge (A10G) 24 GB 64 ~$1200-1800
64+ p4d.24xlarge (A100) 40 GB 128 ~$5000-8000

6.3 Software Versions Summary

Category Software Version
Operating System Ubuntu Server LTS 22.04.4
Container Runtime Docker CE 25.x
Container Orchestration Kubernetes (EKS/K3s) 1.28+
AI Serving NVIDIA Triton Inference Server 2.40
GPU Runtime CUDA 12.1+
GPU Driver NVIDIA Driver 535+
Deep Learning Optimization TensorRT 8.6+
AI Framework PyTorch 2.1+
Computer Vision OpenCV 4.8+
Video Processing FFmpeg 6.0+
Service Language Go 1.21+
AI/Training Language Python 3.11+
Frontend Framework Next.js 14
UI Library React 18
Database PostgreSQL 16
Message Queue Apache Kafka 3.6+
Cache Redis 7
Object Storage MinIO 2024+
CI/CD GitHub Actions v4
GitOps ArgoCD 2.9+
Monitoring Prometheus + Grafana 2.47+ / 10.1+
Logging Grafana Loki 2.9+
VPN WireGuard 1.0+
Model Registry MLflow 2.8+
Orchestration Apache Airflow 2.7+
Infrastructure Terraform 1.6+

6.4 Port Reference

Service Port Protocol Location Notes
DVR RTSP 554 TCP 192.168.29.200 Local network only
DVR HTTP 80 TCP 192.168.29.200 Admin UI, local only
DVR HTTPS 443 TCP 192.168.29.200 Admin UI, local only
DVR TCP 25001 TCP 192.168.29.200 Proprietary protocol
DVR UDP 25002 UDP 192.168.29.200 Proprietary protocol
DVR NTP 123 UDP 192.168.29.200 Time sync
WireGuard 51820 UDP Cloud + Edge VPN tunnel
Edge Admin 8080 TCP 192.168.29.5 Local admin UI
Edge SSH 22 TCP 192.168.29.5 Admin access only
Traefik HTTP 8000 TCP EKS Internal HTTP entrypoint
Traefik HTTPS 8443 TCP EKS Internal HTTPS entrypoint
ALB HTTPS 443 TCP AWS Public-facing
Backend API 8080 TCP EKS pods Internal service port
Triton HTTP 8000 TCP EKS GPU nodes Model inference HTTP
Triton gRPC 8001 TCP EKS GPU nodes Model inference gRPC
Triton Metrics 8002 TCP EKS GPU nodes Prometheus metrics
PostgreSQL 5432 TCP RDS VPC-private
Redis 6379 TCP ElastiCache VPC-private
Kafka 9092 TCP MSK VPC-private
MinIO API 9000 TCP EKS + Edge S3-compatible API
MinIO Console 9001 TCP EKS + Edge Admin console
Prometheus 9090 TCP EKS Metrics collection
Grafana 3000 TCP EKS Dashboards

Section 7: Database Schema

7.1 Schema Overview

The database is designed around a relational core (PostgreSQL 16) with pgvector extension for 512-dimensional face embedding storage and similarity search. The schema consists of 29 tables, 4 views, and 8 trigger functions, organized into 10 logical domains.

Schema Philosophy:

  • Strict normalization for reference data (cameras, persons, rules) to ensure data integrity
  • JSONB flexibility for event metadata and configuration to accommodate evolving AI outputs
  • Partitioning on all high-volume time-series tables for query performance and lifecycle management
  • pgvector HNSW indexing for sub-10ms face similarity search at scale
  • Row-level security (RLS) for multi-tenant site isolation
  • AES-256 encryption for all stored credentials (DVR passwords, API tokens)

7.2 Entity Relationship Overview

+=============================================================================+
|                    ENTITY RELATIONSHIP DIAGRAM                               |
+=============================================================================+
|                                                                              |
|   SITE (1) --------------------< (N) DVR                                     |
|    |                              |                                          |
|    |                              | (1)                                      |
|    |                              v                                          |
|    |                           CAMERA (N) <------------------< (N) ALERT_RULE|
|    |                              |                              |           |
|    |                              | (N)                            | (1)      |
|    |                              v                              v           |
|    |   +---------------------------------------------------------+           |
|    |   | EVENT (N) -->--(1) PERSON (1)--< (N) FACE_EMBEDDING               |
|    |   |   |                                                      |         |
|    |   |   | (N)                                                  | (N)     |
|    |   |   v                                                      v         |
|    |   | FACE_CROP (N)                                    PERSON_CLUSTER     |
|    |   |   |                                                                  |
|    |   |   | (N)                                                  +---------+|
|    |   |   v                                                      | Training||
|    |   | MEDIA_FILE (1) ----------------------------------------->| Dataset  ||
|    |   |                                                          |---------||
|    |   +--------------------------------------------------------->| Job      ||
|    |                                                              | Model    ||
|    |                              +---------+                     | Version  ||
|    |                              | Review  |                     +---------+|
|    |                              | Action  |                                |
|    |                              +---------+                                |
|    |                                    ^                                    |
|    |                                    | (N)                                |
|    +------------------------------------+                                    |
|   USER (N) -->--(N) ROLE_PERMISSION                                          |
|    |                                                                         |
|    | (1)                                                                     |
|    v                                                                         |
|   WATCHLIST (N) -->--(N) WATCHLIST_ENTRY                                     |
|                                                                              |
|   +---------+    +---------+    +---------+    +---------+                  |
|   | Telegram|    |WhatsApp |    | Email   |    |Webhook  |                  |
|   | Config  |    | Config  |    | Config  |    | Config  |                  |
|   +---------+    +---------+    +---------+    +---------+                  |
|        ^              ^             ^              ^                         |
|        |              |             |              |                         |
|        +--------------+-------------+--------------+                         |
|                         |                                                    |
|                   NOTIFICATION_CHANNEL                                         |
|                         |                                                    |
|                         | (1)                                                |
|                         v                                                    |
|                   NOTIFICATION_LOG                                             |
|                                                                              |
|   +---------+    +---------+    +---------+                                  |
|   | Audit   |    | System  |    | Device  |                                  |
|   | Log     |    | Health  |    | Connect.|                                  |
|   |(partitioned) |  Log    |    |  Log    |                                  |
|   +---------+    +---------+    +---------+                                  |
|                                                                              |
+=============================================================================+

7.3 Core Tables Summary

7.3.1 Site and Infrastructure Tables

Table Purpose Key Fields Rows (est.)
sites Physical locations (factories, warehouses) id, name, location, timezone, settings 1-10
dvrs DVR/NVR devices per site id, site_id, ip_address, port, username, password_encrypted, model, channels, status 1-10
cameras Individual camera channels id, dvr_id, channel_number, name, rtsp_url, resolution, fps, status, zone_config, zone_description 8-64

7.3.2 AI Detection and Identity Tables

Table Purpose Key Fields Rows (est.)
events All AI detection events (partitioned monthly) id, camera_id, event_type, timestamp, confidence, bounding_box, person_id, face_crop_id, track_id 1M-10M/month
persons Known and unknown individuals id, name, status (known/unknown/blacklisted), role, company, notes, created_at 100-10,000
face_crops Cropped face images metadata id, event_id, person_id, storage_path, bounding_box, quality_score, blur_score, pose_yaw, pose_pitch 500K-5M/month
face_embeddings 512-D face embeddings (pgvector) id, person_id, face_crop_id, embedding (vector(512)), model_version, is_primary 500K-5M
person_clusters Unknown person cluster groups id, cluster_label, representative_embedding_id, sample_count, first_seen, last_seen, status 10-1,000

7.3.3 Alert and Notification Tables

Table Purpose Key Fields Rows (est.)
alert_rules Per-camera alert configuration id, camera_id, rule_type, name, config_json, schedule, enabled 50-500
alerts Generated alert records id, camera_id, rule_id, person_id, alert_type, severity, status, message 1K-50K/month
notification_channels Alert destination endpoints id, name, channel_type, config_json, is_active 5-20
telegram_configs Telegram Bot API credentials id, channel_id, bot_token_encrypted, chat_id 1-5
whatsapp_configs WhatsApp Business API credentials id, channel_id, api_key_encrypted, phone_number_id 1-5
notification_log Delivery status per notification id, alert_id, channel_id, status, sent_at, error_message 1K-50K/month

7.3.4 Watchlist and Access Control Tables

Table Purpose Key Fields Rows (est.)
users Dashboard users and operators id, username, email, password_hash, role, is_active 5-50
roles Permission roles id, name, permissions_json 3-10
watchlists Named monitoring lists id, name, watch_type (vip/blacklist/custom), is_active 5-20
watchlist_entries Persons on watchlists id, watchlist_id, person_id, added_by, added_at 10-1,000

7.3.5 Training and ML Pipeline Tables

Table Purpose Key Fields Rows (est.)
training_datasets Curated face datasets for training id, name, description, person_ids_json, sample_count, version, status 10-100
training_jobs Model training job tracking id, dataset_id, model_version_from, model_version_to, status, metrics_json 10-100
model_versions Registry of trained model versions id, version_string, training_job_id, metrics_json, is_production, is_rollback_available 10-50
review_actions Operator review decisions id, event_id, reviewer_id, action, from_person_id, to_person_id, notes 1K-100K

7.3.6 Media and Storage Tables

Table Purpose Key Fields Rows (est.)
media_files Registry of stored video/images id, file_type, storage_path, size_bytes, checksum, camera_id, event_id, retention_until 100K-1M
video_clips Video clip metadata for incidents id, media_file_id, start_time, end_time, camera_id, event_id, duration_seconds 10K-100K

7.3.7 Audit and Monitoring Tables (Partitioned)

Table Purpose Partition Retention
audit_logs All user and system actions Monthly by timestamp 1 year (Glacier)
system_health_logs Component health metrics Monthly by timestamp 90 days
device_connectivity_logs Camera/DVR connectivity events Monthly by timestamp 90 days

7.4 Indexing Strategy

7.4.1 pgvector HNSW Index (Critical Path)

-- HNSW index for sub-10ms face similarity search
-- ef_search controls recall/speed tradeoff (higher = more accurate, slower)
CREATE INDEX idx_face_embeddings_hnsw
ON face_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Query: Find top-K similar faces
SELECT person_id, 1 - (embedding <=> query_vector) AS similarity
FROM face_embeddings
WHERE is_primary = true
ORDER BY embedding <=> query_vector
LIMIT 5;
Parameter Value Rationale
m 16 Number of bi-directional links per node (higher = better recall, more memory)
ef_construction 128 Build-time exploration factor (higher = better index quality)
ef_search (runtime SET) 64-256 Search-time exploration factor (SET hnsw.ef_search = 128)
Distance metric Cosine similarity (<=>) Optimal for normalized face embeddings

7.4.2 B-Tree Indexes (Standard Queries)

Table Index Purpose
events (camera_id, timestamp DESC) Time-range queries per camera
events (event_type, timestamp DESC) Filter by event type
events (person_id) WHERE person_id IS NOT NULL Person event lookup
face_crops (person_id, quality_score DESC) Best quality face per person
alerts (status, created_at DESC) Pending alerts by age
alerts (severity, status) Critical alert dashboard
persons (status, name) Person directory with status filter
persons (created_at DESC) Recently added persons
media_files (retention_until) WHERE retention_until < NOW() + 7 days Expiring media cleanup

7.5 Partitioning Strategy

All high-volume time-series tables are partitioned monthly using pg_partman for automated partition management.

+-----------------------------------------------------------------------------+
|                    PARTITIONING ARCHITECTURE                                 |
+-----------------------------------------------------------------------------+
|                                                                              |
|   events (parent, empty)                                                     |
|   +-- events_y2024m01   (Jan 2024 data)                                     |
|   +-- events_y2024m02   (Feb 2024 data)                                     |
|   +-- events_y2024m03   (Mar 2024 data)                                     |
|   +-- events_y2024m04   (Apr 2024 data)                                     |
|   +-- events_y2024m05   (May 2024 data)  <-- Hot (in memory)               |
|   +-- events_default    (fallback)                                          |
|                                                                              |
|   Partition pruning: WHERE timestamp >= '2024-05-01'                        |
|                      -> Only scans events_y2024m05                           |
|                      -> ~30x faster for time-range queries                  |
|                                                                              |
|   Managed by: pg_partman extension                                          |
|   - Auto-create: 2 months ahead                                             |
|   - Auto-drop: After retention period (detach + archive)                    |
|                                                                              |
+-----------------------------------------------------------------------------+

Partitioned Tables:

Table Partition Key Partition Type Retention
events timestamp Monthly RANGE 90 days hot, 1 year archive
audit_logs timestamp Monthly RANGE 1 year total
system_health_logs timestamp Monthly RANGE 90 days
device_connectivity_logs timestamp Monthly RANGE 90 days
face_crops created_at Monthly RANGE 90 days hot, 1 year archive

7.6 Retention Policies

Data Tier Storage Duration Lifecycle
Hot Tier PostgreSQL + MinIO 0-30 days Fast query, indexed, in-memory cache
Warm Tier S3 Standard 30-90 days Available on-demand, still indexed
Cold Tier S3 Infrequent Access 90-365 days Retrieval within minutes
Archive Tier Glacier Deep Archive 1-7 years Retrieval within 12-48 hours
Compliance Glacier Vault Lock 7+ years Immutable, legal hold

Automated Cleanup:

Task Frequency Mechanism
Expire old event partitions Daily (pg_partman) DETACH PARTITION + S3 upload
Delete expired media files Daily Cron job: DELETE from media_files + MinIO removal
Purge old notification logs Weekly DELETE WHERE created_at < NOW() - INTERVAL '90 days'
Archive face crops to S3 Daily Lambda: copy to S3 IA, update storage_path
Compress audit logs Monthly pglz/zstd compression on detached partitions
Vacuum and analyze Weekly (auto-vacuum) PostgreSQL autovacuum daemon

7.7 Security Considerations

7.7.1 Credential Encryption

All sensitive credentials stored with AES-256 encryption:

Table Encrypted Field Encryption
dvrs password_encrypted AES-256-CBC, key from AWS Secrets Manager
telegram_configs bot_token_encrypted AES-256-CBC
whatsapp_configs api_key_encrypted AES-256-CBC

7.7.2 Row-Level Security (RLS)

For multi-site deployments, RLS policies enforce that users only see data for sites they have access to:

-- Enable RLS on critical tables
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
ALTER TABLE persons ENABLE ROW LEVEL SECURITY;
ALTER TABLE alerts ENABLE ROW LEVEL SECURITY;

-- Policy: Users see only data from their assigned sites
CREATE POLICY site_isolation_events ON events
    USING (camera_id IN (
        SELECT c.id FROM cameras c
        JOIN dvrs d ON c.dvr_id = d.id
        JOIN site_users su ON d.site_id = su.site_id
        WHERE su.user_id = current_setting('app.current_user_id')::UUID
    ));

7.7.3 Access Control

Role Permissions
super_admin Full access to all sites, all operations
site_admin Full access to assigned sites, user management
operator View dashboards, acknowledge alerts, review persons
viewer Read-only access to dashboards and events

7.7.4 Audit Trail

The audit_logs table (partitioned monthly) captures every significant action:

Action Captured Data
login User, IP, timestamp, MFA status, success/failure
person_create Creator, name, initial status, source event
person_update Updater, changed fields, old/new values
alert_acknowledge Acknowledger, alert ID, timestamp
alert_resolve Resolver, resolution notes
training_approve Approver, model version, dataset version
model_deploy Deployer, version, A/B split percentage
config_change Changer, changed parameters, old/new values

7.7.5 Backup Strategy

Component Method Frequency Retention
PostgreSQL RDS automated backups Daily 35 days
PostgreSQL Manual snapshots Before any schema change 90 days
MinIO/S3 Cross-region replication Continuous 90 days in DR region
Face embeddings pg_dump + vector export Weekly 90 days
Model artifacts MLflow artifact store On training completion Indefinite

Reference: For complete DDL including all CREATE TABLE statements, triggers, views, and functions, see database_schema.md — Sections 2 through 15 contain the full schema definition with comments and constraints.


Section 8: AI Model and Training Strategy

8.1 AI Model Selection

The inference pipeline uses three complementary deep learning models — for human detection, face detection, and face recognition — all optimized with TensorRT for GPU inference. All models run on a single NVIDIA T4 GPU with dynamic batching.

Component Model Framework Input Size FPS (T4) Accuracy
Human Detection YOLO11m (Ultralytics) PyTorch -> ONNX -> TensorRT FP16 640 x 640 213 mAP@50: 80.5% (COCO)
Face Detection SCRFD-500M-BNKPS (InsightFace) PyTorch -> ONNX -> TensorRT FP16 640 x 640 ~400 AP_medium: 87.2% (WIDERFace)
Face Recognition ArcFace R100 (IR-SE100) PyTorch -> ONNX -> TensorRT FP16 112 x 112 ~800 99.83% (LFW), 98.35% (MegaFace)
Person Tracking ByteTrack Native Python + NumPy N/A N/A 80.3% MOTA (MOT17)
Unknown Clustering HDBSCAN + DBSCAN fallback scikit-learn 512-D vectors N/A 89.5% purity, 0.855 BCubed F
Fall Detection YOLOv8n-pose TensorRT FP16 640 x 640 ~300 Part of suspicious activity
Object Detection YOLOv8s TensorRT FP16 640 x 640 ~450 Abandoned object detection

8.1.1 Human Detection: YOLO11m

Property Value
Architecture CSPDarknet backbone + PANet neck + Decoupled head
Parameters 19.6 M
FLOPs 68.2 B (at 640x640)
TensorRT Optimization FP16, dynamic batch (1-16), layer fusion
GPU Memory ~2.1 GB at batch=8
Person class priority Highest NMS score weighting for person class
Preprocessing Letterbox resize to 640x640, normalize [0,1]

Export pipeline:

# PyTorch -> ONNX -> TensorRT Engine
yolo export model=yolo11m.pt format=onnx imgsz=640 half=True opset=17 simplify=True
trtexec --onnx=yolo11m.onnx --saveEngine=yolo11m.engine --fp16 \
  --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:16x3x640x640

8.1.2 Face Detection: SCRFD-500M-BNKPS

Property Value
Architecture Single-stage detector with FPN, BN+KPS head
Parameters 500 M (large variant for high accuracy)
Detects Face bounding box + 5 facial landmarks
Minimum face size 20 x 20 pixels (configurable)
NMS threshold 0.45 (IoU)
Confidence threshold 0.5 (minimum detection score)
GPU Memory ~1.8 GB at batch=32

8.1.3 Face Recognition: ArcFace R100 (IR-SE100)

Property Value
Backbone IR-SE100 (Improved ResNet-100 with SE blocks)
Training data MS1MV3 (5.8M images, 85K identities)
Loss function ArcFace additive angular margin (m=0.5)
Embedding dimension 512 (float32, L2-normalized)
Distance metric Cosine similarity (1 - cosine_distance)
Matching threshold (strict) 0.60
Matching threshold (balanced) 0.45
Matching threshold (relaxed) 0.30
GPU Memory ~3.2 GB at batch=64

Published benchmarks on standard datasets:

Dataset Accuracy Notes
LFW (Labeled Faces in the Wild) 99.83% Unconstrained face verification
CFP-FP (Frontal-Profile) 99.17% Cross-pose evaluation
AgeDB-30 98.28% Age-invariant recognition
MegaFace (1M distractors) 98.35% Large-scale recognition
IJB-C 96.18% (TAR@FAR=1e-4) Template-based verification

8.2 Inference Pipeline Architecture

+=============================================================================+
|                    REAL-TIME INFERENCE PIPELINE                              |
+=============================================================================+
|                                                                              |
|  INPUT: RTSP Frame (640x640, 1 fps per stream)                              |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Frame Preprocessor| -> | YOLO11m Detector  | -> | Person Detection  |    |
|  | - Resize          |    | (TensorRT FP16)   |    | Results:          |    |
|  | - Normalize       |    | GPU: 12ms (P50)   |    | - bbox (x1,y1,x2, |    |
|  | - NCHW layout     |    | Batch: 1-16       |    |   y2)             |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     | - class (person)  |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Crop Extract | <- | SCRFD-500M        | <- | Face Detection    |    |
|  | (ROI from person  |    | (TensorRT FP16)   |    | Results:          |    |
|  |  bounding box)    |    | GPU: 8ms (P50)    |    | - face bbox       |    |
|  |                   |    | Batch: per-face   |    | - 5 landmarks     |    |
|  +-------------------+    +-------------------+    | - confidence      |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Alignment    | <- | ArcFace R100      | <- | Embedding Vector  |    |
|  | (5-point affine   |    | (TensorRT FP16)   |    | 512-D float32,   |    |
|  |  transform to     |    | GPU: 5ms (P50)    |    | L2-normalized     |    |
|  |  112x112)         |    | Batch: 1-64       |    |                   |    |
|  +-------------------+    +-------------------+    +---------+---------+    |
|                                                               |              |
|                                                               v              |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Face Matching     | <- | Person Tracking   | <- | Track-to-Person   |    |
|  | (cosine similarity|    | (ByteTrack)       |    | Association       |    |
|  |  vs. known DB)    |    | CPU: 2ms/frame    |    | - Match embedding |    |
|  +-------------------+    +-------------------+    |   to known persons  |    |
|       |  |  |                                      | - Create/update     |    |
|       |  |  |                                      |   track             |    |
|       v  v  v                                      +-------------------+    |
|  +-------------------+                                                        |
|  | Confidence Scorer |                                                        |
|  | (aggregate score  |                                                        |
|  |  for all detect)  |                                                        |
|  +-------------------+                                                        |
|       |                                                                       |
|       v                                                                       |
|  OUTPUT: DetectionEvent (JSON)                                               |
|  { person_id, track_id, confidence, bbox, face_crop,                         |
|    embedding, recognized_name?, quality_scores }                             |
|                                                                              |
+=============================================================================+

End-to-end latency budget per frame:

Stage GPU CPU Fallback
Frame preprocessing 2-5 ms 5-10 ms
YOLO11m detection 12 ms (P50) 35-56 ms (ONNX+OpenVINO)
SCRFD face detection 8 ms (P50) 15-25 ms
ArcFace embedding (per face) 5 ms (P50) 12-18 ms
ByteTrack tracking 2 ms 2-5 ms
Post-processing 5-10 ms 10-20 ms
Total (no face) ~29 ms ~67-116 ms
Total (1 face) ~34 ms ~79-134 ms
Total (5 faces) ~54 ms ~127-214 ms

8.3 Face Recognition Matching Strategy

8.3.1 Known Person Matching

+-----------------------------------------------------------------------------+
|                    FACE RECOGNITION MATCHING FLOW                            |
+-----------------------------------------------------------------------------+
|                                                                              |
|  New Face Embedding (512-D)                                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | L2 Normalize      |  embedding = embedding / ||embedding||_2              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | pgvector HNSW     | -> | Top-5 Candidates  |                              |
|  | Similarity Search |    | (cosine distance) |                              |
|  | ef_search=128     |    +-------------------+                              |
|  +-------------------+            |                                          |
|                                   v                                          |
|  +-------------------+    +-------------------+                              |
|  | Threshold Check   | <- | Best Match Score  |                              |
|  | (per AI Vibe)     |    +-------------------+                              |
|  +-------------------+            |                                          |
|       |                          |                                          |
|       +------------+-------------+                                          |
|                    |                                                        |
|         +----------+----------+                                             |
|         |                     |                                             |
|         v                     v                                             |
|    Above threshold      Below threshold                                     |
|    (Recognized)         (Unknown)                                           |
|         |                     |                                             |
|         v                     v                                             |
|  +------------+       +------------------+                                 |
|  | Assign to  |       | Check against    |                                 |
|  | known      |       | recent unknown   |                                 |
|  | person_id  |       | embeddings       |                                 |
|  | (with      |       | (5-min window)   |                                 |
|  | confidence)|       +--------+---------+                                 |
|  +------------+                |                                            |
|                                |                                            |
|                       +--------+--------+                                   |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Similar unknown    No similar unknown                      |
|                  (same person)      (new unknown)                           |
|                       |                 |                                   |
|                       v                 v                                   |
|                  Reuse person_id   Create new                              |
|                  Update centroid   unknown person                           |
|                                    record                                   |
|                                                                              |
+-----------------------------------------------------------------------------+

8.3.2 AI Vibe Threshold Mapping

The AI Vibe system maps three intuitive presets to internal confidence thresholds:

Vibe Face Match Threshold Detection Confidence Use Case
Relaxed 0.30 cosine similarity 0.40 minimum Known persons re-identified more easily; more false positives acceptable
Balanced 0.45 cosine similarity 0.55 minimum Default; good precision-recall tradeoff
Strict 0.60 cosine similarity 0.70 minimum High-security scenarios; minimize false positives

Per-stream Vibe Selection:

  • Vibe can be set per camera via dashboard
  • Night mode automatically applies Strict vibe
  • Alert-triggered cameras automatically upgrade to Strict for 5 minutes

8.4 Unknown Person Clustering Approach

Unknown persons (faces that don't match any known person above threshold) are automatically clustered to help operators identify recurring visitors.

8.4.1 Clustering Pipeline

+-----------------------------------------------------------------------------+
|                    UNKNOWN PERSON CLUSTERING PIPELINE                        |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Unknown Face Embeddings (streaming)                                         |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+                                                       |
|  | Sliding Window    |  Keep last N embeddings in memory (configurable)     |
|  | Buffer (500)      |  + persistent storage for long-term clustering       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+                              |
|  | HDBSCAN Clustering| -> | Primary clusters  |  min_cluster_size=5        |
|  | (density-based)   |    | formed             |  min_samples=2             |
|  | metric=cosine     |    +-------------------+  eps=auto                   |
|  +-------------------+            |                                          |
|       | (fallback)                |                                          |
|       v                           v                                          |
|  +-------------------+    +-------------------+                              |
|  | DBSCAN Fallback   |    | Merge with        |  Check: temporal gap       |
|  | (if HDBSCAN fails |    | existing clusters |  < 30 days, cosine sim     |
|  |  to find structure|    | - centroid        |  > 0.85                    |
|  +-------------------+    |   distance        |                            |
|                           +-------------------+                            |
|                                   |                                          |
|                                   v                                          |
|                           +-------------------+                              |
|                           | Operator Review   |  Dashboard shows clusters   |
|                           | Queue             |  pending identification     |
|                           +-------------------+                              |
|                                                                              |
+-----------------------------------------------------------------------------+

8.4.2 Clustering Parameters

Parameter Value Description
Algorithm HDBSCAN (primary), DBSCAN (fallback) Density-based for irregular cluster shapes
Distance metric Cosine similarity Optimal for face embeddings
Minimum cluster size 5 embeddings Minimum to form a cluster
Minimum samples 2 Core point density threshold
Merge threshold 0.85 cosine similarity Merge clusters if centroids are close
Temporal window 30 days Maximum gap between cluster appearances
Review trigger 10+ embeddings Send to operator review queue

8.4.3 Clustering Quality Targets

Metric Target Measurement
Cluster Purity > 89% % of embeddings in a cluster belonging to the same person
BCubed F-Measure > 0.85 Harmonic mean of precision and recall for clustering
Silhouette Score > 0.3 Separation quality between clusters
False Merge Rate < 5% Different persons incorrectly merged
Split Rate < 15% Same person split into multiple clusters

8.5 Confidence Handling

8.5.1 Confidence Score Computation

Each detection event carries an aggregate confidence score computed from multiple signals:

confidence_aggregate = weighted_average(
    detection_confidence:    0.35 * yolo_confidence,
    face_detection_quality:  0.25 * scrfd_confidence,
    face_recognition_score:  0.25 * (1 - cosine_distance_to_match),
    face_quality_score:      0.15 * quality_composite
)

Where quality_composite = average(
    1.0 - blur_score,       # Sharpness (higher is better)
    1.0 - abs(pose_yaw)/90, # Frontal preference
    illumination_score,      # Well-lit face
    resolution_adequacy      # Sufficient pixels for face
)

8.5.2 Confidence Levels

Level Score Range Color Action
High Confidence 0.80 - 1.00 Green Auto-accept, no review needed
Medium Confidence 0.60 - 0.79 Yellow Accepted, flagged for periodic review
Low Confidence 0.40 - 0.59 Orange Requires operator review within 24h
Very Low Confidence 0.00 - 0.39 Red Rejected, not used for training

8.6 Training Workflow Overview

The safe self-learning system captures operator feedback and converts it into model improvements through a carefully controlled pipeline.

8.6.1 Three Learning Modes

Mode Description Use Case Risk Level
Manual Only Operator explicitly triggers training runs Highly regulated environments Lowest
Suggested Learning (Recommended) System suggests training candidates; operator approves Standard production deployment Low
Approved Auto-Update Auto-training triggers after admin approval threshold Mature deployment with trusted operators Medium

8.6.2 Training Pipeline Architecture

+=============================================================================+
|                    SAFE SELF-LEARNING PIPELINE                               |
+=============================================================================+
|                                                                              |
|  STEP 1: COLLECTION                                                          |
|  +-------------------+                                                       |
|  | Operator Review   |  confirm, correct_name, merge, reject                |
|  | Actions           |  + automatic high-confidence acceptances              |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 2: CONFLICT DETECTION (Synchronous, blocks immediately)               |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Label Conflict    | -> | If conflict found | -> | Block from training |   |
|  | Detector          |    | (5 types)         |    | dataset, alert admin |   |
|  | - Same face, diff |    +-------------------+    +-------------------+    |
|  |   names           |                                                       |
|  | - Diff faces, same|                                                       |
|  |   name            |                                                       |
|  | - Merge circular  |                                                       |
|  |   reference       |                                                       |
|  | - Name to already-|                                                       |
|  |   deleted person  |                                                       |
|  | - Quality below   |                                                       |
|  |   threshold       |                                                       |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 3: DATASET CURATION                                                    |
|  +-------------------+                                                       |
|  | Training Dataset  |  - Collect approved examples                         |
|  | Builder           |  - Balance classes (min 5 per person)                |
|  |                   |  - Augmentation (flip, rotate, brightness)           |
|  |                   |  - Quality filter (blur, pose, illumination)         |
|  |                   |  - Train/val split (80/20)                            |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 4: MODEL TRAINING                                                      |
|  +-------------------+                                                       |
|  | Training Job      |  - ArcFace R100 backbone                              |
|  | (Airflow DAG)     |  - Fine-tuning on curated dataset                     |
|  |                   |  - Cosine annealing LR schedule                        |
|  |                   |  - Early stopping (patience=10)                       |
|  |                   |  - Mixed precision (AMP)                              |
|  |                   |  - Typical duration: 2-8 hours on V100                |
|  +-------------------+                                                       |
|       |                                                                      |
|       v                                                                      |
|  STEP 5: QUALITY GATES                                                       |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Gate 1: Hold-out  | -> | Gate 2: Compare   | -> | Gate 3: Identity  |    |
|  |    evaluation     |    |    vs current     |    |    accuracy       |    |
|  |    (precision,    |    |    production     |    |    (100% known)   |    |
|  |     recall, f1)   |    |    (no >2% regress)|   |                   |    |
|  +-------------------+    +-------------------+    +-------------------+    |
|       |                          |                          |                |
|       +------------+-------------+--------------------------+                |
|                    |                                                          |
|         +----------+----------+                                              |
|         |                     |                                              |
|         v                     v                                              |
|     ALL PASSED            ANY FAILED                                       |
|         |                     |                                              |
|         v                     v                                              |
|  +------------+       +------------------+                                 |
|  | Proceed to |       | REJECT           |                                 |
|  | Deployment |       | - Log failure    |                                 |
|  +------------+       | - Alert admin    |                                 |
|                       | - Keep in staging|                                 |
|                       +------------------+                                 |
|                                                                              |
|  STEP 6: DEPLOYMENT                                                          |
|  +-------------------+                                                       |
|  | A/B Testing       |  - Shadow mode: 0% traffic (validation)              |
|  | (gradual rollout) |  - Canary: 5% traffic for 24h                        |
|  |                   |  - Monitor: latency, error rate, FP rate              |
|  |                   |  - Full rollout: 100% traffic                         |
|  |                   |  - Rollback: < 60 seconds to previous version         |
|  +-------------------+                                                       |
|                                                                              |
+=============================================================================+

8.7 Model Versioning and Rollback

8.7.1 Semantic Versioning

Version Component Increment When Example
MAJOR (X.0.0) Full retraining, architecture change, breaking embedding change 1.0.0 -> 2.0.0 (new backbone)
MINOR (x.Y.0) Fine-tuning, significant new data (>50 new identities) 1.0.0 -> 1.1.0 (new employees)
PATCH (x.y.Z) Incremental update, centroid update, hotfix 1.0.0 -> 1.0.1 (new photos added)

8.7.2 Version States

State Description Transition
TRAINING Model is being trained Auto -> STAGING on completion
STAGING Awaiting quality gate evaluation Auto -> AWAITING_APPROVAL on pass
AWAITING_APPROVAL Pending admin approval Manual -> CANARY on approve
CANARY 5% traffic, monitoring Auto -> PRODUCTION on success (24h)
PRODUCTION 100% traffic, active serving Manual -> ARCHIVED on new version deploy
ARCHIVED Kept for rollback, no traffic Auto -> ROLLBACK_AVAILABLE after 30 days
ROLLBACK_AVAILABLE Can be rolled back to Manual -> PRODUCTION on rollback trigger
DEPRECATED Cannot be rolled back to Final state

8.7.3 Rollback Procedure

+-----------------------------------------------------------------------------+
|                    EMERGENCY ROLLBACK PROCEDURE                              |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Trigger: Admin initiates rollback or automatic rollback on failure         |
|                                                                              |
|  Step 1: Validate target version exists and is in ROLLBACK_AVAILABLE state  |
|  Step 2: Load target model artifacts from S3/MinIO (pre-warm GPU)          |
|  Step 3: Atomic switch: update model reference in Triton config             |
|  Step 4: Triton SIGHUP reload (zero-downtime model swap)                   |
|  Step 5: Validate: send test inference requests, check latency              |
|  Step 6: If validation fails -> auto-revert to previous production          |
|  Step 7: If validation passes -> update database model version records      |
|  Step 8: Log rollback event in audit_logs                                   |
|                                                                              |
|  Maximum rollback time: < 60 seconds                                        |
|  Zero inference downtime during rollback                                    |
|                                                                              |
+-----------------------------------------------------------------------------+

8.8 Quality Gates

8.8.1 Gate Thresholds

Gate Metric Minimum Maximum Critical
Hold-out Evaluation Precision 0.97 Yes (cannot override)
Hold-out Evaluation Recall 0.95 Yes
Hold-out Evaluation F1 Score 0.96 Yes
No Regression Metric regression vs production 2% No (admin can override)
Identity Accuracy Known identity recall 100% Yes
Latency P99 inference latency 150 ms Yes
Confusion Analysis False positive rate 5% No

8.8.2 Quality Gate Report Example

{
  "gate_run_id": "550e8400-e29b-41d4-a716-446655440000",
  "candidate_model_version": "1.2.0",
  "baseline_model_version": "1.1.0",
  "timestamp": "2024-01-25T10:30:00Z",
  "overall_result": "PASSED",
  "gates": [
    {
      "name": "holdout_performance",
      "status": "PASSED",
      "critical": true,
      "metrics": {
        "precision": 0.9842,
        "recall": 0.9678,
        "f1_score": 0.9759
      }
    },
    {
      "name": "no_regression",
      "status": "PASSED",
      "metrics": {
        "max_regression_pct": 0.8,
        "per_metric": {
          "precision": 0.003,
          "recall": -0.008,
          "f1_score": -0.002
        }
      }
    },
    {
      "name": "known_identity_accuracy",
      "status": "PASSED",
      "metrics": {
        "known_identities_tested": 142,
        "perfect_accuracy": 142,
        "accuracy_below_threshold": 0
      }
    },
    {
      "name": "latency_requirement",
      "status": "PASSED",
      "metrics": {
        "p50_latency_ms": 45,
        "p99_latency_ms": 128,
        "threshold_ms": 150
      }
    }
  ]
}

8.8.3 Embedding Update Strategies

After a model passes quality gates and is deployed, the face embedding database must be updated. Five strategies are available:

Strategy When to Use Duration Impact
Centroid Update Few new examples (<10 per identity), same model Seconds Update running mean only
Incremental Add Many new examples (10-100 per identity), same model Minutes Add new embeddings, keep existing
Full Reindex Model version changed, or >10% of identities updated Hours Recompute all embeddings
Merge and Update Identity merge operation Seconds Weighted centroid merge
Rollback Reindex Model rollback Minutes Restore previous embeddings

Decision Matrix:

+-----------------------------------------------------------------------------+
|                    EMBEDDING UPDATE STRATEGY SELECTION                       |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Model changed?                                                              |
|       |                                                                      |
|       +-- YES -> FULL_REINDEX (required, embeddings are model-dependent)     |
|       |                                                                      |
|       NO -> What changed?                                                    |
|               |                                                              |
|               +-- Identity merge -> MERGE_AND_UPDATE                         |
|               |                                                              |
|               +-- Rollback -> ROLLBACK_REINDEX                               |
|               |                                                              |
|               +-- New examples?                                              |
|                       |                                                      |
|                       +-- < 10 per identity, < 10% total -> CENTROID_UPDATE |
|                       |                                                      |
|                       +-- Otherwise -> INCREMENTAL_ADD                       |
|                                                                              |
+-----------------------------------------------------------------------------+

Reference: For complete model export commands, INT8 calibration scripts, performance benchmarks, and the full Python module structure, see ai_vision.md — Sections 10-14. For the complete training pipeline code, Airflow DAG definitions, and quality gate implementations, see training_system.md — Sections 5-10.


Section 9: Suspicious Activity Night-Mode Design

9.1 Overview

The suspicious activity detection system provides comprehensive behavioral analysis during night hours (22:00-06:00 by default) through 10 specialized detection modules. Each module operates on the output of the AI inference pipeline (detected persons, tracked positions, and face identities) to identify anomalous behavior patterns.

The system features a composite scoring engine that combines signals from all modules with exponential time-decay, enabling unified threat assessment and intelligent escalation. Each camera can be independently configured with custom zones, thresholds, and schedules.

9.2 Ten Detection Modules Summary

# Module Description Severity Key CV Model
1 Intrusion Detection Detects persons entering restricted polygon zones HIGH (default) YOLO11m detections + zone polygon
2 Loitering Detection Flags persons dwelling in an area longer than threshold MEDIUM (default) ByteTrack + timer per track
3 Running Detection Identifies abnormally fast movement MEDIUM (default) YOLOv8n-pose + optical flow speed
4 Crowding Detection Alerts when group density exceeds threshold HIGH (default) DBSCAN spatial clustering
5 Fall Detection Detects persons falling or collapsing CRITICAL YOLOv8n-pose keypoint analysis
6 Abandoned Object Identifies unattended objects left behind HIGH (default) YOLOv8s + MOG2 background subtraction
7 After-Hours Presence Detects any person presence during night hours MEDIUM (default) YOLO11m person class only
8 Zone Breach Triggers on crossing virtual boundary lines MEDIUM (default) ByteTrack + line crossing algorithm
9 Repeated Re-entry Flags patterns of entering/exiting an area multiple times MEDIUM (default) ByteTrack + entry/exit state machine
10 Suspicious Dwell Time Alerts on extended presence near sensitive areas MEDIUM (configurable) ByteTrack + per-zone timers

9.3 Module Details

9.3.1 Module 1: Intrusion Detection

Detects when a person enters a user-defined restricted polygon zone.

Parameter Default Range Description
confidence_threshold 0.55 0.3-0.9 Minimum person detection confidence
overlap_threshold 0.30 0.1-0.9 Min IoU between person bbox and zone
cooldown_seconds 60 0-3600 Cooldown before re-alerting same zone
zone_severity HIGH LOW/MEDIUM/HIGH Per-zone configurable

Algorithm:

For each detected person:
    For each restricted zone polygon:
        Compute IoU(person_bbox, zone_polygon)
        If IoU > overlap_threshold AND confidence > confidence_threshold:
            If zone not in cooldown:
                Trigger INTRUSION alert
                Start cooldown timer

9.3.2 Module 2: Loitering Detection

Flags persons who remain in an area longer than a threshold.

Parameter Default Range Description
dwell_time_threshold_seconds 300 30-1800 Time before triggering loitering alert
movement_tolerance_pixels 50 10-200 Max centroid movement to still count as "stationary"
cooldown_seconds 300 0-3600 Cooldown after alert

Algorithm:

For each active track:
    If track centroid moved < tolerance in last N seconds:
        Increment dwell timer
        If dwell_timer > threshold:
            Trigger LOITERING alert
            Reset timer (or hold until movement detected)
    Else:
        Reset dwell timer

9.3.3 Module 3: Running Detection

Identifies abnormally fast movement using pose keypoints and optical flow.

Parameter Default Range Description
speed_threshold_pixels_per_second 150 50-500 Pixel speed threshold
speed_threshold_kmh 15.0 5-40 Real-world speed (requires calibration)
confirmation_frames 3 1-10 Consecutive frames to confirm running

Algorithm:

For each active track:
    Compute torso keypoint displacement between frames
    Convert pixel speed to km/h (if calibration available)
    Apply Farneback optical flow for refinement
    If speed > threshold for confirmation_frames:
        Trigger RUNNING alert

9.3.4 Module 4: Crowding Detection

Alerts when person group density exceeds threshold.

Parameter Default Range Description
count_threshold 5 2-50 Minimum person count in cluster
area_threshold 0.15 0.05-0.5 Fraction of frame covered by group
density_threshold 0.05 0.01-0.2 Persons per square meter (calibrated)
dbscan_eps 0.08 0.01-0.3 DBSCAN neighborhood radius (normalized)

Algorithm:

Collect all person centroids in current frame
Run DBSCAN(eps=0.08, min_samples=2) on centroids
For each cluster:
    If cluster_size >= count_threshold OR cluster_area >= area_threshold:
        Trigger CROWDING alert

9.3.5 Module 5: Fall Detection

Detects persons falling or collapsing using pose keypoint analysis.

Parameter Default Range Description
fall_score_threshold 0.75 0.5-0.95 Combined fall confidence score
min_keypoint_confidence 0.30 0.1-0.5 Minimum keypoint detection confidence
torso_angle_threshold_deg 45 30-75 Torso angle from vertical to trigger
aspect_ratio_threshold 1.2 0.8-2.0 Width/height ratio of person bbox
temporal_confirmation_ms 1000 500-3000 Duration to confirm fall (not just bend)

Algorithm:

For each detected person with pose keypoints:
    Compute torso angle from vertical (using shoulder-hip line)
    Compute bbox aspect ratio
    Check if person is on ground (feet keypoint confidence drops)
    Calculate fall_score = weighted_combination(angle, aspect_ratio, ground_contact)
    If fall_score > threshold AND duration > confirmation_ms:
        Trigger FALL alert (CRITICAL severity)

9.3.6 Module 6: Abandoned Object Detection

Identifies unattended objects using background subtraction and object detection.

Parameter Default Range Description
unattended_time_threshold_seconds 60 10-600 Time before object is considered abandoned
proximity_threshold_pixels 100 20-300 Max distance from owner before "unattended"
watchlist_classes ["backpack", "suitcase", "box", "bag"] Object classes to monitor
bg_learning_rate 0.005 0.001-0.01 MOG2 background model learning rate

Algorithm:

Run YOLOv8s to detect objects in watchlist_classes
Run MOG2 background subtraction to identify static foreground
For each detected object:
    Track owner proximity (nearest person)
    If owner distance > threshold AND object stationary > time_threshold:
        Trigger ABANDONED_OBJECT alert

9.3.7 Module 7: After-Hours Presence

Simple but effective: any person detected during night hours triggers an alert.

Parameter Default Range Description
detection_confidence_threshold 0.50 0.3-0.9 Minimum person detection confidence
min_detection_frames 5 1-30 Frames to confirm (avoid false positives)
check_authorized_personnel false true/false If true, check against known persons whitelist

9.3.8 Module 8: Zone Breach

Detects crossing of virtual boundary lines (directional or bidirectional).

Parameter Default Range Description
boundary_lines [] (user-defined) Array of {start, end, direction, severity}
allowed_direction "both" both/a_to_b/b_to_a Which direction is allowed
crossing_threshold_pixels 20 5-100 Min distance past line to trigger
cooldown_seconds 30 0-3600 Cooldown per (track, line) pair

Algorithm:

For each active track:
    For each boundary line:
        Check if track centroid crosses line in forbidden direction
        Using line equation: ax + by + c = 0, check sign change
        If crossed AND distance_past_line > threshold:
            Trigger ZONE_BREACH alert

9.3.9 Module 9: Repeated Re-entry Patterns

Detects suspicious patterns of entering and exiting an area multiple times.

Parameter Default Range Description
reentry_zone Full frame polygon Area to monitor for entries/exits
time_window_seconds 600 60-3600 Time window for counting cycles
reentry_threshold 3 2-10 Min entry/exit cycles to trigger
min_cycle_duration_seconds 30 5-300 Min duration of one cycle

State Machine:

For each track:
    Track state: OUTSIDE -> ENTERING -> INSIDE -> EXITING -> OUTSIDE
    Each complete cycle (entry + exit) increments counter
    If cycle_count >= threshold within time_window:
        Trigger REENTRY_PATTERN alert

9.3.10 Module 10: Suspicious Dwell Time

Extended presence near sensitive areas (different from general loitering).

Parameter Default Range Description
sensitive_zones [] (user-defined) Zones with custom dwell thresholds
default_dwell_threshold_seconds 120 10-1800 Default threshold
max_gap_seconds 5.0 1.0-30.0 Max disappearance gap before timer reset

Predefined zone types with default thresholds:

Zone Type Default Threshold Default Severity
main_entrance 60s MEDIUM
emergency_exit 30s HIGH
equipment_room 45s HIGH
storage_area 120s MEDIUM
elevator_bank 90s LOW
parking_access 60s MEDIUM

9.4 Activity Scoring Engine

9.4.1 Composite Score Formula

All 10 modules feed into a unified scoring engine that produces a single suspicious activity score per camera:

S_total(t) = SUM_i( weight_i * signal_i(t) * decay(t - t_i) ) + bonus_cross_module

Where:
    weight_i: module-specific weight (see table below)
    signal_i(t): normalized signal value from module i [0, 1]
    decay(delta_t): exponential time-decay function
    bonus_cross_module: extra score when multiple modules fire simultaneously
    t_i: timestamp of most recent event from module i

9.4.2 Module Weights

Module Weight Signal Source Signal Range
Intrusion Detection 0.25 overlap_ratio * confidence 0.0 - 1.0
Loitering Detection 0.15 dwell_ratio (dwell_time / threshold) 0.0 - 1.0+
Running Detection 0.10 speed_ratio normalized 0.0 - 1.0+
Crowding Detection 0.12 crowd_density_score 0.0 - 1.0
Fall Detection 0.20 fall_confidence_score 0.0 - 1.0
Abandoned Object 0.18 unattended_ratio (duration / threshold) 0.0 - 1.0+
After-Hours Presence 0.05 binary (1 if detected) * zone_severity_multiplier 0.0 - 1.0
Zone Breach 0.12 severity_mapped (LOW=0.3, MED=0.6, HIGH=1.0) 0.0 - 1.0
Re-entry Patterns 0.10 cycle_ratio (count / threshold) 0.0 - 1.0+
Suspicious Dwell 0.13 dwell_ratio (duration / zone_threshold) 0.0 - 1.0+

Note: Weights sum to 1.40 — this is intentional to allow cross-module amplification when multiple modules fire simultaneously.

9.4.3 Time-Decay Function

def time_decay(delta_t_seconds, half_life=300):
    """Exponential decay with 5-minute half-life by default."""
    import math
    return math.exp(-0.693 * delta_t_seconds / half_life)

# Decay reference:
#   0 min -> 1.000 (full contribution)
#   1 min -> 0.871
#   5 min -> 0.500
#  10 min -> 0.250
#  20 min -> 0.063
#  30 min -> 0.016 (effectively zero)

9.4.4 Cross-Module Amplification Bonus

When multiple modules detect simultaneously for the same track or in close proximity:

def compute_cross_module_bonus(active_signals, proximity_weight=0.15):
    n_modules = len(active_signals)
    if n_modules <= 1:
        return 0.0

    # Base bonus: +15% per additional module
    base_bonus = proximity_weight * (n_modules - 1)

    # Track overlap: same person triggering multiple rules -> higher threat
    track_bonus = 0.10 * (n_same_track_signals - 1) if n_same_track_signals >= 2 else 0

    # Zone overlap: multiple signals in same zone -> higher threat
    zone_bonus = 0.08 * (n_same_zone_signals - 1) if n_same_zone_signals >= 2 else 0

    return min(base_bonus + track_bonus + zone_bonus, 0.50)  # Cap at +0.50

9.4.5 Escalation Thresholds

Score Range Threat Level Color Actions
0.00 - 0.20 NONE Gray Log only, no alert
0.20 - 0.40 LOW Blue Log + dashboard indicator
0.40 - 0.60 MEDIUM Yellow Log + non-urgent alert dispatch
0.60 - 0.80 HIGH Orange Log + immediate alert + highlight
0.80 - 1.00 CRITICAL Red Log + all channels + security dispatch recommendation
> 1.00 EMERGENCY Purple/Flashing All channels + automatic escalation to security lead

9.5 Night Mode Scheduler

9.5.1 Automatic Schedule

Parameter Default Configurable
Start time 22:00 (10 PM) Yes, per camera
End time 06:00 (6 AM) Yes, per camera
Gradual transition 15 minutes Yes (0-60 min)
Timezone Local site timezone Yes
Override Manual toggle available Admin only

9.5.2 Gradual Transition

During the 15-minute transition window, sensitivity ramps linearly:

Transition Start (21:45)          Night Full (22:00)         Transition End (22:15)
      |                                  |                           |
      v                                  v                           v
Sensitivity: 0% ---- 25% ---- 50% ---- 75% ---- 100% ---- 100% ---- 100%
              |__________|__________|__________|__________|__________|
                  Ramp up to full night sensitivity over 15 minutes

This prevents sudden spikes in alerts when night mode activates.

9.5.3 Night Mode Behavior Changes

Aspect Day Mode Night Mode
Detection modules Intrusion, Crowding, Fall, Abandoned Object All 10 modules active
AI Vibe preset Per-camera setting Automatically Strict
Confidence threshold Per-camera setting +0.10 (stricter)
Scoring engine weights Standard weights +25% intrusion, +20% fall
Alert suppression 5-minute cooldown 2-minute cooldown (faster alerts)
After-hours detection Disabled Enabled (primary night function)

9.6 Per-Camera Configuration

Each camera has independent configuration for all detection modules:

# Example: Camera 1 - Main Entrance
cam_01:
  enabled: true
  location: "Main Entrance Lobby"
  night_mode:
    enabled: true
    custom_schedule: null        # Use system default (22:00-06:00)
    sensitivity_multiplier: 1.0   # Standard sensitivity

  intrusion_detection:
    enabled: true
    confidence_threshold: 0.65
    overlap_threshold: 0.30
    cooldown_seconds: 30
    restricted_zones:
      - zone_id: "server_room_door"
        polygon: [[0.65,0.20], [0.85,0.20], [0.85,0.60], [0.65,0.60]]
        severity: "HIGH"

  loitering_detection:
    enabled: true
    dwell_time_threshold_seconds: 300
    movement_tolerance_pixels: 50

  running_detection:
    enabled: true
    speed_threshold_pixels_per_second: 150
    confirmation_frames: 3

  fall_detection:
    enabled: true
    fall_score_threshold: 0.75
    temporal_confirmation_ms: 1000

  # ... (all 10 modules configured)

9.7 Alert Generation Logic

9.7.1 Alert Lifecycle

+------------+    +------------+    +------------+    +------------+
|  DETECTED  | -> | SUPPRESSED | -> |  EVIDENCE  | -> | DISPATCHED |
| (Rule fire)|    | (Dedup)    |    | (Capture)  |    | (Send)     |
+------------+    +------------+    +------------+    +------------+
                                                          |
                                                          v
                                                   +------------+
                                                   | ACKNOWLEDGE|
                                                   | or AUTO    |
                                                   +------------+

9.7.2 Suppression Rules

Condition Action Reason
Duplicate within suppression window Log + increment counter Prevent alert spam
Detection confidence < rule minimum Log only Insufficient evidence
Threat score < LOW threshold Log only Below alert threshold
Max alerts/hour for camera exceeded Log + rate-limit flag Prevent overflow
Composite score indicates low overall threat Log + dashboard only Reduce noise

9.7.3 Suppression Configuration

Parameter Default Range
Default suppression window 5 minutes 0-60 minutes
Max alerts per hour per camera 20 5-100
Max alerts per hour per rule 10 5-50
Evidence snapshot frames before 5 frames 1-30
Evidence snapshot frames after 10 frames 1-30
Evidence clip duration 10 seconds 5-60

9.7.4 Severity Assignment

Final alert severity considers both the triggering module and the composite score context:

def assign_alert_severity(detection_event, composite_score):
    base_severity = detection_event['severity']  # From module config
    severity_levels = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3, 'CRITICAL': 4}
    base_level = severity_levels.get(base_severity, 2)

    # Escalation: high composite score bumps severity up one level
    if composite_score >= 0.80 and base_level < 3:
        base_level = min(base_level + 1, 4)

    # Escalation: multiple concurrent detections for same track
    if detection_event.get('concurrent_detections_count', 0) >= 2:
        base_level = min(base_level + 1, 4)

    # Zone-specific escalation override
    if detection_event.get('zone_severity_override'):
        zone_level = severity_levels.get(detection_event['zone_severity_override'], base_level)
        base_level = max(base_level, zone_level)

    reverse_levels = {v: k for k, v in severity_levels.items()}
    return reverse_levels.get(base_level, 'MEDIUM')

9.8 Integration with Main AI Pipeline

The suspicious activity service consumes detection events from the main AI pipeline:

+-----------------------------------------------------------------------------+
|               SUSPICIOUS ACTIVITY INTEGRATION WITH MAIN PIPELINE             |
+-----------------------------------------------------------------------------+
|                                                                              |
|  Main AI Pipeline Output:                                                    |
|  { person_id, track_id, bbox, keypoints, face_embedding, timestamp,        |
|    camera_id, confidence, face_crop_path }                                  |
|       |                                                                      |
|       v                                                                      |
|  +-------------------+    +-------------------+    +-------------------+    |
|  | Kafka Topic       | -> | Suspicious Activity| -> | Scoring Engine    |    |
|  | ai.detections     |    | Service            |    | (per camera)      |    |
|  | (JSON events)     |    | - 10 modules       |    | - Composite score |    |
|  +-------------------+    | - Per-camera config|    | - Time decay      |    |
|                           | - Zone polygons    |    | - Cross-module    |    |
|                           +-------------------+    |   bonus           |    |
|                                                     +---------+---------+    |
|                                                               |              |
|                                                               v              |
|                           +-------------------+    +-------------------+    |
|                           | Alert Manager     | <- | Scoring Output    |    |
|                           | - Deduplicate     |    | - Score [0, 1.5]  |    |
|                           | - Rate limit      |    | - Threat level    |    |
|                           | - Severity assign |    | - Active signals  |    |
|                           +---------+---------+    +-------------------+    |
|                                     |                                        |
|                                     v                                        |
|                           +-------------------+                             |
|                           | Alerts Table (DB) |                             |
|                           | Notification Svc  |                             |
|                           +-------------------+                             |
|                                                                              |
+-----------------------------------------------------------------------------+

Key integration points:

  • Suspicious Activity Service is a Kafka consumer on the ai.detections topic
  • Processes events after face recognition (has access to person identity)
  • Produces alert records to the alerts.critical topic for notification dispatch
  • Updates the composite score in Redis (with TTL = 2 * half_life) for dashboard real-time display
  • Stores all alert records in PostgreSQL for history and analytics

Reference: For complete detection algorithm pseudocode, zone configuration YAML schema, scoring engine implementation, and evidence capture logic, see suspicious_activity.md — Sections 2-6.


Section 10: Live Video Streaming Design

10.1 RTSP Stream Configuration for CP PLUS DVR

10.1.1 URL Format

The CP PLUS ORANGE DVR uses a Dahua-compatible RTSP URL scheme:

rtsp://admin:{password}@{dvr_ip}:554/cam/realmonitor?channel={N}&subtype={M}

Where:
    N = channel number (1-8)
    M = stream type (0 = main stream, 1 = sub stream)

Example URLs for all 8 channels:

Channel Main Stream Sub Stream
CH1 rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0 rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=1
CH2 ...channel=2&subtype=0 ...channel=2&subtype=1
CH3 ...channel=3&subtype=0 ...channel=3&subtype=1
CH4 ...channel=4&subtype=0 ...channel=4&subtype=1
CH5 ...channel=5&subtype=0 ...channel=5&subtype=1
CH6 ...channel=6&subtype=0 ...channel=6&subtype=1
CH7 ...channel=7&subtype=0 ...channel=7&subtype=1
CH8 ...channel=8&subtype=0 ...channel=8&subtype=1

10.1.2 Stream Properties

Property Main Stream (subtype=0) Sub Stream (subtype=1)
Resolution 960 x 1080 352 x 288 to 704 x 576
Frame rate 25 FPS (PAL) 25 FPS
Video codec H.264 High Profile H.264 Baseline/Main
Bitrate ~4 Mbps per channel ~1 Mbps per channel
Audio G.711/AAC (optional) None
Use case Fullscreen viewing, evidence clips AI inference, multi-camera grid

10.1.3 Stream Discovery

The edge gateway can auto-discover streams via ONVIF:

from onvif import ONVIFCamera

camera = ONVIFCamera('192.168.29.200', 80, 'admin', 'password')
media_service = camera.create_media_service()
profiles = media_service.GetProfiles()

for profile in profiles:
    stream_uri = media_service.GetStreamUri({
        'StreamSetup': {'Stream': 'RTP_unicast', 'Transport': 'RTSP'},
        'ProfileToken': profile.token
    })
    print(f"Channel: {profile.token}, URI: {stream_uri.Uri}")

10.2 Edge Gateway Stream Handling

10.2.1 FFmpeg Ingestion Pipeline

The edge gateway runs one FFmpeg process per camera stream:

# Main stream: HLS generation for live viewing
ffmpeg -hide_banner -loglevel warning \
    -rtsp_transport tcp -stimeout 5000000 \
    -fflags +genpts+discardcorrupt+igndts+ignidx \
    -reorder_queue_size 64 -buffer_size 655360 \
    -i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
    -c:v copy -c:a copy \
    -f hls -hls_time 2 -hls_list_size 5 -hls_delete_threshold 2 \
    -hls_flags delete_segments+omit_endlist+program_date_time \
    -hls_segment_filename "/data/hls/ch1_%04d.ts" \
    "/data/hls/ch1.m3u8" \
    2>> /var/log/ffmpeg_ch1.log

10.2.2 Stream Health Monitoring

Check Frequency Failure Action
FFmpeg process alive Every 5s Restart process
RTSP connection health Every 10s Reconnect with backoff
Frame rate validation Every 30s Alert if FPS < 20
Bitrate validation Every 30s Alert if bitrate < 50% expected
Disk space check Every 60s Alert if < 10% free, emergency if < 5%

10.2.3 Auto-Reconnect Logic

class StreamReconnectManager:
    """Handles RTSP stream reconnection with exponential backoff."""

    INITIAL_BACKOFF = 1.0       # seconds
    MAX_BACKOFF = 60.0          # seconds
    BACKOFF_MULTIPLIER = 2.0
    JITTER = 0.1                # 10% random jitter

    def __init__(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def on_disconnect(self):
        self.consecutive_failures += 1
        wait_time = min(
            self.current_backoff * (self.BACKOFF_MULTIPLIER ** self.consecutive_failures),
            self.MAX_BACKOFF
        )
        # Add jitter to prevent thundering herd
        wait_time *= (1 + random.uniform(-self.JITTER, self.JITTER))
        return wait_time

    def on_success(self):
        self.current_backoff = self.INITIAL_BACKOFF
        self.consecutive_failures = 0

    def should_circuit_break(self):
        return self.consecutive_failures >= 5  # Open circuit after 5 failures

10.3 HLS Generation for Dashboard

10.3.1 HLS Segment Configuration

Parameter Value Rationale
Segment duration (-hls_time) 2 seconds Balance between latency and segment count
Playlist size (-hls_list_size) 5 segments 10-second sliding window for live playback
Delete threshold 2 segments beyond playlist size Disk cleanup
Flags delete_segments+omit_endlist+program_date_time Live mode, no end list, accurate timing
Segment naming ch{N}_%04d.ts Sequential numbering for cache busting
Segment path /data/hls/ Fast NVMe storage

10.3.2 Multi-Bitrate HLS (Optional)

For adaptive bitrate streaming, three variants are generated per channel:

# High quality (main stream, copy codec)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v copy -f hls -hls_time 2 \
    -hls_playlist_type vod -hls_segment_filename "ch1_high_%04d.ts" "ch1_high.m3u8"

# Medium quality (transcoded)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v libx264 -preset fast -crf 23 \
    -vf "scale=640:480" -f hls -hls_time 2 \
    -hls_segment_filename "ch1_mid_%04d.ts" "ch1_mid.m3u8"

# Low quality (sub stream)
ffmpeg -i "rtsp://...channel=1&subtype=1" -c:v copy -f hls -hls_time 2 \
    -hls_segment_filename "ch1_low_%04d.ts" "ch1_low.m3u8"

Master playlist:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=960x1080
ch1_high.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=640x480
ch1_mid.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=352x288
ch1_low.m3u8

10.3.3 HLS Latency Budget

Stage Latency
DVR encoding 50-100 ms
RTSP to edge 1-2 ms
FFmpeg demux/remux 20-50 ms
HLS segment duration 2000 ms (2-second segments)
Nginx/CDN delivery 10-50 ms
HLS.js buffer 2000-4000 ms (1-2 segments)
Browser decode + render 20-50 ms
Total (camera to eye) ~2.1 - 2.3 seconds

10.4 WebRTC for Low-Latency Single Camera

For single-camera fullscreen viewing where low latency is critical, WebRTC provides sub-second delivery.

10.4.1 WebRTC Architecture

+------------+    +-------------------+    +-------------------+    +--------+
| Browser    |    | Edge Gateway      |    | FFmpeg            |    | DVR    |
| (WebRTC    |<-->| (WHIP/WHEP        |<-->| (decode RTSP,     |<-->| RTSP   |
|  client)   |    |  bridge)          |    |  encode VP8/H.264)|    | Server |
+------------+    +-------------------+    +-------------------+    +--------+

10.4.2 WebRTC Configuration

Parameter Value
Signaling protocol WHIP ( ingress) / WHEP (egress)
Video codec H.264 (hardware) or VP8 (software)
Latency target < 500 ms end-to-end
ICE servers STUN only (both peers behind NAT)
Max bitrate 3 Mbps
Resolution 960x1080 (main stream)

10.4.3 WebRTC Latency Budget

Stage Latency
DVR encoding 50-100 ms
RTSP to edge 1-2 ms
FFmpeg decode + WebRTC encode 30-80 ms
Network (edge to browser via VPN) 100-200 ms
Browser decode 20-50 ms
Total ~200-430 ms

10.5 Multi-Camera Grid Layout

10.5.1 Layout Configurations

Layout Cameras Stream Used Per-Camera Resolution Total Bandwidth
1x1 (fullscreen) 1 Main (subtype=0) 960x1080 ~4 Mbps
2x2 grid 4 Sub (subtype=1) 352x288 ~4 Mbps total
3x3 grid 8+1 empty Sub (subtype=1) 352x288 ~8 Mbps total
4x2 grid 8 Sub (subtype=1) 352x288 ~8 Mbps total
Custom User-defined Mixed Mixed Sum of selected

Smart stream selection: The dashboard automatically switches streams based on layout:

  • Fullscreen single camera -> Main stream (high quality)
  • Grid layout -> Sub stream (bandwidth-efficient)
  • Camera clicked for fullscreen -> Dynamically switch to main stream

10.5.2 Grid Rendering

+-----------------------------------------------------------------------------+
|                         DASHBOARD GRID LAYOUTS                               |
+-----------------------------------------------------------------------------+
|                                                                              |
|  1x1 Layout:                         2x2 Layout:                            |
|  +------------------------+          +----------+----------+                 |
|  |                        |          | CH1      | CH2      |                 |
|  |   Camera 1             |          | (sub)    | (sub)    |                 |
|  |   Main stream          |          |          |          |                 |
|  |   960x1080             |          +----------+----------+                 |
|  |   ~4 Mbps              |          | CH3      | CH4      |                 |
|  +------------------------+          | (sub)    | (sub)    |                 |
|                                      |          |          |                 |
|                                      +----------+----------+                 |
|                                                                              |
|  3x3 Layout (8 cameras):                                                     |
|  +----------+----------+----------+                                          |
|  | CH1      | CH2      | CH3      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH4      | CH5      | CH6      |                                          |
|  | (sub)    | (sub)    | (sub)    |                                          |
|  +----------+----------+----------+                                          |
|  | CH7      | CH8      | [Empty]  |                                          |
|  | (sub)    | (sub)    |          |                                          |
|  +----------+----------+----------+                                          |
|                                                                              |
|  Bandwidth: ~8 Mbps total for 3x3 layout (8 x ~1 Mbps sub streams)          |
|                                                                              |
+-----------------------------------------------------------------------------+

10.6 Bandwidth Optimization

10.6.1 Total Bandwidth Budget

Traffic Type Direction Bandwidth Notes
8x RTSP ingestion Edge -> DVR (local) ~32 Mbps receive Local LAN only
8x HLS upload to cloud Edge -> Cloud (via VPN) ~8-16 Mbps upload Transcoded and compressed
AI frames to cloud Edge -> Cloud (via VPN) ~2-4 Mbps upload 1 FPS, JPEG compressed
Dashboard HLS playback Cloud -> Browser ~8 Mbps per user Cached at CDN
Control/management Bidirectional < 1 Mbps WebSocket, API calls
Total edge upload ~10-20 Mbps Primary concern for site bandwidth

10.6.2 Optimization Techniques

Technique Savings Implementation
Sub-stream for grid view 75% bandwidth reduction Use subtype=1 (352x288) instead of subtype=0 (960x1080)
H.264 copy (no re-encode) for main stream Zero CPU overhead -c:v copy when no format change needed
JPEG quality tuning for AI frames 50-70% size reduction Quality 70-85 depending on scene complexity
Frame deduplication for AI 10-30% frame reduction Skip frames with < 2% pixel change
HLS segment caching at edge Reduces cloud upload spikes 5-segment buffer smooths burstiness
Gzip compression for API/WebSocket 60-80% reduction Content-Encoding: gzip

10.7 Fallback Handling

10.7.1 Stream Failure Fallback Chain

Step 1: RTSP connection fails
    +-> Retry with exponential backoff (3 attempts)
    +-> Try UDP transport if TCP fails
    +-> Circuit breaker opens after 5 consecutive failures
    |
Step 2: Stream stall detected (no frames for 10s)
    +-> Kill FFmpeg process
    +-> Restart with fresh connection
    |
Step 3: Camera marked OFFLINE
    +-> Dashboard shows "Camera Offline" placeholder
    +-> HLS playlist returns 404
    +-> Last known frame displayed with timestamp overlay
    +-> Alert sent to operations team
    |
Step 4: Camera recovers
    +-> Circuit breaker transitions to HALF_OPEN
    +-> Test stream pulled for 10 seconds
    +-> On success: circuit CLOSED, stream resumes
    +-> Dashboard auto-refreshes

10.7.2 Offline Placeholder

When a camera is offline, the HLS endpoint returns a static playlist:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ERROR: "Camera OFFLINE - Channel 1"
#EXTINF:2.000,
offline_placeholder.ts

The dashboard detects the #EXT-X-ERROR tag and displays a camera offline indicator with the last known timestamp.

10.7.3 Edge Buffer Management

The 2TB NVMe edge storage is partitioned for circular buffer operation:

Directory Max Size Retention Cleanup
/data/hls/ 20 GB Rolling (5 segments) Automatic via FFmpeg
/data/buffer/ch1-ch8/ 1.5 TB 7 days circular Age-based FIFO
/data/buffer/ai_frames/ 100 GB 24 hours Age-based
/data/buffer/evidence/ 200 GB 30 days Event-linked retention
/data/logs/ 10 GB 30 days Logrotate
/data/tmp/ 50 GB On process exit Cleanup on restart
Total reserved ~1.88 TB Fits in 2TB NVMe

Buffer exhaustion handling:

  1. At 80% capacity: Alert admin, begin aggressive cleanup of old non-evidence data
  2. At 90% capacity: Stop non-critical buffering (AI frames), preserve HLS + evidence only
  3. At 95% capacity: Emergency mode — evidence-only recording, all other buffers purged
  4. Never delete evidence clips linked to unresolved alerts

10.7.4 DVR Full Disk Mitigation

Since the DVR disk is full (0 bytes free), the system does not rely on DVR-side recording:

Function Traditional Our Design
Continuous recording DVR internal HDD Edge gateway 2TB NVMe buffer
Event/alert clips DVR playback export Cloud MinIO + S3 archival
Long-term storage DVR disk rotation AWS S3 tiered lifecycle
Playback DVR web UI Cloud dashboard with timeline

Reference: For complete FFmpeg commands including multi-output tee muxer, frame extraction for AI, WebRTC bridge code, and the ring buffer implementation, see video_ingestion.md — Sections 4-7.


End of Part A (Sections 1-10)

This unified technical blueprint synthesizes outputs from 11 specialist agents across 6 domain-specific design documents. For detailed implementation code, DDL, algorithms, and configuration, refer to the individual specialist documents listed in the cross-reference guide at the top of this document.

Document Path Content
Architecture architecture.md Full deployment specs, scaling, cost, failover
Video Ingestion video_ingestion.md RTSP config, FFmpeg, edge gateway, HLS, WebRTC
AI Vision ai_vision.md Model configs, inference pipeline, benchmarks
Database Schema database_schema.md Complete DDL, triggers, views, RLS
Suspicious Activity suspicious_activity.md 10 detection modules, scoring engine
Training System training_system.md Learning pipeline, quality gates, versioning

Sentinel AI Surveillance Platform — Unified Technical Blueprint (Part B)

Document Version: 1.0 Date: 2025-01-16 Classification: Confidential — Internal Use Only Author: Technical Architecture Team


Part B Table of Contents

  • Section 11: Alerting Design (Notification System)
  • Section 12: Security Design
  • Section 13: UX / Website Structure
  • Section 14: Deployment Plan
  • Section 15: Testing Plan
  • Section 16: Self-Test Framework
  • Section 17: Sample Self-Test Report
  • Section 18: Risks and Mitigations
  • Section 19: Final Implementation Roadmap
  • Section 20: Final Production-Readiness Summary

Section 11: Alerting Design

11.1 Architecture Overview

The notification system employs an event-driven architecture built on Redis Pub/Sub for real-time message distribution. All detection events, system alerts, and manual triggers flow through a unified pipeline that supports dual-channel delivery via Telegram Bot API and WhatsApp Business API (Meta Official). The system is designed to ensure that critical security alerts are never lost while maintaining high performance and reliability through sophisticated rate limiting, retry logic, and dead letter queue handling.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         ALERTING ARCHITECTURE                                 │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                         EVENT SOURCES                            │       │
│   │                                                                  │       │
│   │  Detection Pipeline ──▶ New person detected                     │       │
│   │  Face Recognition ────▶ Known/Unknown/Watchlist match           │       │
│   │  System Monitors ─────▶ Camera offline, Storage full, VPN down   │       │
│   │  Manual Triggers ─────▶ Operator-initiated alerts                │       │
│   │  AI Anomaly Engine ───▶ Suspicious activity detected             │       │
│   └──────────────────────────┬──────────────────────────────────────┘       │
│                              │                                               │
│                              ▼                                               │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                    REDIS PUB/SUB                                 │       │
│   │                                                                  │       │
│   │  Channel: alerts.critical  ─── High priority, immediate process  │       │
│   │  Channel: alerts.high      ─── Standard priority                 │       │
│   │  Channel: alerts.medium    ─── Batched processing                │       │
│   │  Channel: system.health    ─── System health events              │       │
│   └──────────────┬───────────────────────────────────────────────────┘       │
│                  │                                                            │
│   ┌──────────────┴───────────────────────────────────────────────────┐       │
│   │                    NOTIFICATION ROUTER                           │       │
│   │                     (Python/FastAPI)                             │       │
│   │                                                                  │       │
│   │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │       │
│   │  │ Event Parser │──▶ Rules Engine │──▶ Channel Selector    │  │       │
│   │  └──────────────┘  └──────────────┘  └──────────────────────┘  │       │
│   └──────────────────────────┬───────────────────────────────────────┘       │
│                              │                                               │
│          ┌───────────────────┼───────────────────┐                           │
│          ▼                   ▼                   ▼                           │
│   ┌─────────────┐   ┌───────────────┐   ┌──────────────────┐               │
│   │   TEMPLATE  │   │    RATE       │   │   ESCALATION     │               │
│   │   RENDERER  │   │   LIMITER     │   │    ENGINE        │               │
│   │             │   │               │   │                  │               │
│   │  HTML/TXT   │   │ Token Bucket  │   │ 3-level timeout  │               │
│   │  per channel│   │ 4-tier limits │   │ Auto-escalation  │               │
│   └──────┬──────┘   └───────┬───────┘   └────────┬─────────┘               │
│          │                  │                     │                         │
│          └──────────────────┼─────────────────────┘                         │
│                             ▼                                               │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    CHANNEL ADAPTERS                              │      │
│   │                                                                  │      │
│   │  ┌──────────────────────────┐  ┌──────────────────────────┐     │      │
│   │  │    TELEGRAM BOT API      │  │  WHATSAPP BUSINESS API   │     │      │
│   │  │                          │  │                          │     │      │
│   │  │  - HTML formatting       │  │  - Template messages     │     │      │
│   │  │  - Inline keyboards      │  │  - Session messages      │     │      │
│   │  │  - Media groups          │  │  - Interactive messages  │     │      │
│   │  │  - Edit/Delete messages  │  │  - Media attachments     │     │      │
│   │  │  - Webhook receipts      │  │  - Message status API    │     │      │
│   │  └──────────┬───────────────┘  └──────────┬───────────────┘     │      │
│   └─────────────┼─────────────────────────────┼─────────────────────┘      │
│                 │                             │                              │
│                 ▼                             ▼                              │
│          ┌──────────────┐            ┌──────────────┐                       │
│          │  Telegram    │            │  WhatsApp    │                       │
│          │  Servers     │            │  Cloud API   │                       │
│          └──────────────┘            └──────────────┘                       │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────┐       │
│   │                    SUPPORTING SERVICES                           │       │
│   │                                                                  │       │
│   │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │       │
│   │  │ RETRY MGR    │  │    DLQ       │  │  DELIVERY TRACKER    │  │       │
│   │  │ Exponential  │  │ Redis-backed │  │  Webhook callbacks   │  │       │
│   │  │ 5 max        │  │ Admin review │  │  Status dashboard    │  │       │
│   │  └──────────────┘  └──────────────┘  └──────────────────────┘  │       │
│   └─────────────────────────────────────────────────────────────────┘       │
└──────────────────────────────────────────────────────────────────────────────┘

Key Design Principles:

Principle Implementation
Guaranteed delivery At-least-once delivery via retry with exponential backoff; dead letter queue for permanent failures
Ordered processing Events within a single camera stream processed in sequence; no alert reordering
Non-blocking Alert generation does not block the detection pipeline; async processing via queues
Channel isolation Failure in one channel (e.g., Telegram down) does not affect the other (WhatsApp continues)
Deduplication 5-minute window for duplicate suppression; composite key based on camera + person + event type
Observability Every notification tracked from creation through delivery with full audit trail

11.2 Telegram Integration

11.2.1 Bot API Configuration

Telegram integration uses the official Telegram Bot API for message delivery. The bot is configured with encrypted tokens stored in HashiCorp Vault, with HTML message formatting for rich alert presentation.

Parameter Value Notes
API Base URL https://api.telegram.org/bot<TOKEN>/ Standard Bot API endpoint
API Version Bot API 7.x Latest stable as of Q1 2025
Token Storage HashiCorp Vault (AES-256-GCM encrypted) Rotated every 180 days
Communication HTTPS POST + WebSocket fallback TLS 1.3 required for all calls
Message Format HTML subset <b>, <i>, <code>, <pre>, <a href> tags supported
Max Message Size 4096 characters per message Longer messages auto-split into parts
Media Size Limit (Image) 10 MB per image Processed via Pillow for compression
Media Size Limit (Video) 50 MB per video Processed via FFmpeg for re-encoding
Media Group Limit Up to 10 items per media group Album delivery for multi-image alerts
Global Rate Limit 30 messages per second Across all chats
Per-Chat Rate Limit 1 message per second Per conversation throttling
Webhook Endpoint /webhooks/telegram Receives delivery receipts and callback queries

11.2.2 Bot Features and Capabilities

Inline Keyboards: Every alert message includes contextual action buttons that allow operators to respond directly from Telegram without opening the web dashboard.

Keyboard Type Buttons Actions
Standard Alert Acknowledge / View Live / Details Confirm receipt, open stream, view full info
Watchlist Alert Acknowledge / View Live / Escalate / Details Includes escalation for watchlist matches
Blacklist Alert ACKNOWLEDGE NOW / View Live / Dispatch Security / Escalate / Details Highest priority actions for blacklist
Escalation Notice Acknowledge / View Original Alert Acknowledge escalated alert or view source
System Alert Acknowledge / View Dashboard / Details System-level alert actions

Media Groups: When an alert contains multiple evidence images (up to 10), they are sent as a Telegram media group (album). This presents all related images in a single scrollable gallery rather than individual messages, reducing chat clutter.

Webhook Receipts: Telegram delivers message status updates via webhooks:

Webhook Type Trigger Action
message Bot receives a command Process command (e.g., /status, /acknowledge)
callback_query User clicks inline button Execute action, update message status
edited_message Message edited externally Log for audit trail
my_chat_member Bot added/removed from chat Update recipient group membership

Chat Commands:

Command Description Response
/status Get system health status Camera count, offline count, last alert time
/acknowledge <alert_id> Acknowledge an alert Confirmation or error message
/cameras List all cameras and their status Camera name, status, last seen
/health Get edge gateway health CPU, memory, disk, VPN status
/help Show available commands Command reference

11.2.3 Security Considerations

Telegram bot tokens are among the most sensitive credentials in the system. The following security measures are implemented:

Measure Implementation
Encryption at rest AES-256-GCM in Vault
Token rotation Every 180 days or immediately on compromise suspicion
Rotation procedure 1) Generate new token via BotFather, 2) Update Vault, 3) Notify services to hot-reload, 4) 5-minute grace period, 5) Revoke old token
IP allowlisting Webhook endpoint accepts only Telegram IP ranges
Webhook secret HMAC verification on incoming webhook payloads
No token logging Tokens never appear in application logs
No token in code Tokens injected via Vault at runtime

11.3 WhatsApp Business API Integration

11.3.1 Meta Cloud API Configuration

WhatsApp integration uses Meta's official Cloud API (Business Platform), which provides a reliable, enterprise-grade messaging channel. This requires a verified Meta Business account and pre-approved message templates for proactive messaging.

Parameter Value Notes
API Base URL https://graph.facebook.com/v18.0/ Meta Graph API v18.0 minimum
Authentication Permanent Access Token Scoped to WhatsApp Business Management
Token Storage HashiCorp Vault (AES-256-GCM encrypted) Rotated every 180 days
Phone Number ID Dedicated business phone number Not shared with other WhatsApp uses
Business Account Verified Meta Business Account Required for template message approval
Message Types Template messages + Session messages Template for first contact; session for replies
Media Size Limit (All) 16 MB per file Stricter than Telegram; aggressive compression needed
Supported Media JPEG, PNG, MP4 (H.264), PDF, Audio Format validation before upload
Global Rate Limit 80 messages per second Across all recipients
Per-Recipient Rate Limit 20 messages per minute Per WhatsApp ID throttling
Webhook Endpoint /webhooks/whatsapp Receives message status updates

11.3.2 Message Types

Template Messages: Pre-approved message templates are required for any proactive (business-initiated) message. Templates must be created and submitted for approval in Meta Business Manager. Each template contains named parameters that are dynamically populated at send time.

Template Name Purpose Parameters Approval Status
person_detected_known Known person detected name, role, camera, date, time, confidence, alert_id Approved
person_detected_unknown Unknown person alert camera, date, time, confidence Approved
watchlist_match Person on watchlist detected name, watchlist_type, camera, date, time Approved
blacklist_alert Blacklisted person detected name, camera, date, time Approved
suspicious_activity Suspicious behavior detected activity_type, camera, date, time, confidence Approved
system_alert System health alert message, timestamp, severity Approved
escalation_notice Alert escalation notification alert_id, level, summary, elapsed_minutes Approved
daily_digest Daily summary of activity date, total_detections, total_alerts, top_cameras Approved
test_message System test timestamp Approved

Session Messages: Within a 24-hour window after a user sends a message to the business, free-form session messages can be sent without template restrictions. This is used for:

  • Acknowledgment confirmations
  • Escalation follow-ups
  • Interactive conversations initiated by the recipient
  • Quick reply responses

11.3.3 Webhook Event Handling

Webhook Event Trigger System Action
messages.delivered Message delivered to device Update delivery status to delivered
messages.read Recipient read the message Update delivery status to read
messages.failed Message delivery failed Trigger retry or move to DLQ
message_reaction Recipient reacted to message Log for engagement metrics
account_alerts Meta account issue Alert admin, review account status
template_category_update Template status change Update template catalog

11.4 Alert Routing Rules Engine

11.4.1 Condition Types

The routing engine evaluates 9 distinct condition types to determine which recipients receive which alerts through which channels. Multiple conditions can be combined with AND/OR logic for precise targeting.

# Condition Type Description Example Values Operators
1 camera Source camera identifier "CAM-01", "CAM-02", "entrance-cam" equals, in, not_in
2 person Detected known person "John Smith", "Jane Doe" equals, in, not_in
3 role Person role category "employee", "visitor", "vendor", "contractor", "security" equals, in
4 event_type Type of detection event "person_detected", "unknown_person", "suspicious_activity", "crowd_gathering", "camera_tamper" equals, in
5 zone Detection zone name "entrance", "restricted_area", "parking", "lobby", "warehouse" equals, in
6 time Time of day range "08:00-18:00", "22:00-06:00" between, not_between
7 day Day of week "monday", "weekday", "weekend" equals, in
8 severity Alert severity level "critical", "high", "medium", "low", "info" equals, in, gte
9 watchlist Watchlist membership "vip", "blacklist", "authorized", "temporary_access" equals, in

11.4.2 Rule Structure

Each routing rule consists of conditions, actions, and metadata:

rule:
  id: "rule-001"
  name: "Blacklist Immediate Alert"
  enabled: true
  priority: 100  # Higher number = evaluated first
  
  conditions:
    operator: "AND"
    conditions:
      - field: "watchlist"
        operator: "equals"
        value: "blacklist"
      - field: "severity"
        operator: "in"
        value: ["critical", "high"]
  
  actions:
    - channel: "telegram"
      recipients: ["security_team", "management"]
      template: "blacklist_alert"
      media: ["image", "video"]
      bypass_quiet_hours: true
      priority: "high"
    
    - channel: "whatsapp"
      recipients: ["security_manager"]
      template: "blacklist_alert"
      media: ["image"]
      bypass_quiet_hours: true
  
  metadata:
    created_by: "admin"
    created_at: "2025-01-01T00:00:00Z"
    last_modified: "2025-01-10T12:00:00Z"
    tags: ["critical", "blacklist"]

11.4.3 Default Routing Rules

The system ships with a comprehensive set of default routing rules that cover common surveillance scenarios:

# Scenario Conditions Severity Recipients Channels Media Quiet Hours
1 Known employee normal hours role=employee, time=08:00-18:00, weekday Info None (log only) N/A
2 Known employee after hours role=employee, time=18:00-08:00 Low Security team Telegram Image Respected
3 Known visitor during hours role=visitor, time=08:00-18:00 Low Reception desk Telegram Image Respected
4 Unknown person detected event_type=unknown_person Medium Security team Telegram + WhatsApp Image Respected
5 Unknown person after hours event_type=unknown_person, time=22:00-06:00 High Security team + Manager Both Image + Video Bypassed
6 Watchlist match watchlist=watchlist High Security team Both Image + Video Respected
7 Blacklist match watchlist=blacklist Critical All groups Both (bypass quiet) Image + Video Bypassed
8 VIP detected watchlist=vip Low Reception desk Telegram Image Respected
9 Camera offline event_type=camera_offline High IT team + Security team Telegram None Bypassed
10 Storage > 90% event_type=storage_warning High IT team + Management Both None Bypassed
11 Storage > 95% event_type=storage_critical Critical All groups Both (bypass quiet) None Bypassed
12 VPN tunnel down event_type=vpn_down Critical IT team + Management Both (bypass quiet) None Bypassed
13 Suspicious activity event_type=suspicious_activity High Security team Both Image + Video Respected
14 Crowd gathering event_type=crowd_gathering Medium Security team Telegram Image Respected

11.5 Recipient Groups and Quiet Hours

11.5.1 Recipient Group Management

Recipient groups are the primary mechanism for organizing alert destinations. Each group contains one or more contacts with specified channels.

Group Name Members Primary Channel Backup Channel Alert Preferences Quiet Hours
Security Team On-site security guards Telegram WhatsApp All except info Disabled
Security Manager Shift supervisor WhatsApp Telegram Medium and above Disabled
IT Team Infrastructure staff Telegram WhatsApp System alerts only Nights
Management Facility managers WhatsApp Telegram Critical only Disabled
Reception Front desk staff Telegram None Visitor-related, VIP Disabled
After-Hours On-call personnel WhatsApp Telegram High and Critical Disabled

Group Configuration Interface:

Groups are managed through the web dashboard at /settings/notifications/groups. Each group can be configured with:

Setting Description
Group name Human-readable identifier
Description Purpose of the group
Members List of Telegram chat IDs and WhatsApp phone numbers
Default channel Primary delivery channel
Alert severity filter Minimum severity to deliver
Quiet hours override Whether quiet hours apply to this group
Media preferences Which media types to include
Max alerts per hour Rate limit for this group

11.5.2 Quiet Hours Configuration

Quiet hours allow suppressing non-critical alerts during configured time windows. Critical alerts always bypass quiet hours — this is a non-configurable safety measure.

quiet_hours:
  enabled: false                    # DISABLED BY DEFAULT for security
  preset: "none"                    # none / nights / weekends / custom
  
  custom_schedule:
    - label: "Weekday Nights"
      days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
      start_time: "22:00"
      end_time: "06:00"
      timezone: "Asia/Kolkata"
      
    - label: "Weekend All Day"
      days: ["Saturday", "Sunday"]
      start_time: "00:00"
      end_time: "23:59"
      timezone: "Asia/Kolkata"
  
  allowed_during_quiet:             # Which severities bypass quiet hours
    - "critical"                    # Always delivered (non-configurable)
    
  emergency_bypass:
    enabled: true
    triggers:
      - severity: "critical"
      - tag: "emergency"
      - rule_override: "bypass_quiet_hours"
    notification_method: "all_channels"
    
  suppression_behavior: "queue"     # queue / discard / digest
  # "queue": Hold until quiet hours end
  # "discard": Drop non-critical alerts entirely
  # "digest": Send summary when quiet hours end

Security Note: Quiet hours are disabled by default because the surveillance use case requires continuous awareness. Any decision to enable quiet hours must be documented with security team sign-off.

11.5.3 Per-Recipient Quiet Hours

Individual recipients can configure personal quiet hours that override group settings:

Recipient Personal Quiet Hours Group Override Effect
Security Guard A None Security Team (Disabled) Receives all alerts
IT Manager 23:00-07:00 IT Team (Nights) Matches group — no IT alerts at night
Manager B 22:00-08:00 Management (Disabled) Personal quiet hours applied

11.6 Message Templates

11.6.1 Telegram HTML Templates

All Telegram templates use a safe HTML subset for rich formatting with inline action keyboards.

Template: Person Detected (Known)

🔍 <b>Person Detected</b>

<b>{name}</b> ({role})
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<a href="{dashboard_url}">View in Dashboard</a>

Template: Unknown Person Detected

❓ <b>Unknown Person Detected</b>

📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<i>This person is not in the database.</i>

<a href="{naming_url}">Name This Person</a>

Template: Watchlist Match

⚠️ <b>WATCHLIST ALERT</b>

<b>{name}</b>
📋 Watchlist: {watchlist_type}
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<i>This person is on a watchlist and requires attention.</i>

Template: Blacklist Alert

🚨 <b>BLACKLIST ALERT</b> 🚨

⚠️ <b>{name}</b> has been detected!
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%

<b>This person is BLACKLISTED. Immediate attention required.</b>

<a href="{dispatch_url}">🚨 Dispatch Security</a>

Template: Escalation Notice

⬆️ <b>Alert Escalated — Level {escalation_level}</b>

Alert #{alert_id} has been escalated.

Original: {alert_summary}
⏱️ Unacknowledged for {elapsed_minutes} minutes
Threshold: {threshold_minutes} minutes

<i>Please review immediately.</i>

Template: System Alert

⚙️ <b>System Alert</b>

{message}

🕐 {timestamp}
Severity: {severity}

<a href="{health_dashboard_url}">View System Health</a>

Template: Daily Digest

📊 <b>Daily Activity Digest — {date}</b>

👥 Persons Detected: {total_detections}
🔔 Alerts Generated: {total_alerts}
📹 Cameras Online: {cameras_online}/{cameras_total}

Top Cameras:
{camera_list}

<a href="{full_report_url}">View Full Report</a>

11.6.2 WhatsApp Template Format

WhatsApp templates use a different format — they are pre-registered with Meta and use numbered parameter substitution:

Template: person_detected_known

🔍 Person Detected

{{1}} ({{2}})
📍 Camera: {{3}}
🕐 {{4}} at {{5}}
🎯 Confidence: {{6}}%
Alert ID: {{7}}

Parameters: {{1}}=name, {{2}}=role, {{3}}=camera_name, {{4}}=date, {{5}}=time, {{6}}=confidence, {{7}}=alert_id

11.6.3 Template Variable Reference

Variable Description Source Example
{name} Detected person's name Person database "John Smith"
{role} Person's role Person database "Employee"
{camera_name} Camera display name Camera configuration "Main Entrance"
{date} Event date Event timestamp "2025-01-16"
{time} Event time Event timestamp "14:32:15"
{confidence} Detection confidence % AI inference result "97.3"
{alert_id} Unique alert identifier Alert database "ALT-20250116-001"
{watchlist_type} Watchlist category Watchlist configuration "Blacklist"
{activity_type} Type of suspicious activity AI classification "Loitering"
{severity} Alert severity Rules engine "Critical"
{dashboard_url} Deep link to dashboard System configuration "https://..."
{elapsed_minutes} Time since alert creation System clock "15"

11.7 Retry Logic and Rate Limiting

11.7.1 Retry Configuration

Failed notifications are retried using an exponential backoff strategy to avoid overwhelming downstream services.

Parameter Value Description
Maximum retries 5 After 5 failures, move to DLQ
Base delay 2 seconds Initial retry wait time
Exponential base 2 Delay multiplier (2^n)
Maximum delay 300 seconds (5 minutes) Cap on retry delay
Jitter Up to 1 second random Prevents thundering herd

Retry Schedule:

Attempt Delay Cumulative Time
1 (initial) Immediate 0s
2 2s + jitter ~2s
3 4s + jitter ~6s
4 8s + jitter ~14s
5 16s + jitter ~30s
6 (final) 32s + jitter ~62s
DLQ After 62s total

Retryable Errors:

Error Code Description Retry?
Timeout Request timed out Yes
429 Too Many Requests Rate limited by provider Yes (with longer delay)
500 Internal Server Error Provider error Yes
502 Bad Gateway Provider gateway error Yes
503 Service Unavailable Provider temporarily down Yes
409 Conflict Request conflict Yes
401 Unauthorized Authentication failed No (credential issue)
403 Forbidden Permission denied No (configuration issue)
400 Bad Request Invalid request No (template/parameter issue)
Chat not found Recipient blocked bot No

Non-Retryable Errors (Immediate DLQ):

  • Invalid bot token (401)
  • Bot blocked by user (403)
  • Chat not found
  • Malformed template (400)
  • Message too long (after split)
  • Unsupported media format

11.7.2 Circuit Breaker

Each channel adapter implements a circuit breaker to prevent cascading failures:

Parameter Value
Failure threshold 10 consecutive failures
Open state duration 60 seconds
Half-open test calls 3 successful calls required
Monitoring window 5 minutes

Circuit States:

State Behavior Transition Trigger
Closed Normal operation — all requests pass Initial state, or after half-open success
Open Fast fail — no requests sent to provider 10 consecutive failures
Half-Open Limited test requests allowed After 60-second open timeout

11.7.3 Rate Limiting Tiers

The notification system implements multi-tier rate limiting to prevent abuse and ensure fair resource distribution:

Tier Limit Scope Burst
Global (all channels) 200 messages/minute Across all channels combined 20
Telegram Global 30 messages/second All Telegram traffic 5
Telegram Per-Chat 1 message/second Per conversation 1
WhatsApp Global 80 messages/second All WhatsApp traffic 10
WhatsApp Per-Recipient 20 messages/minute Per phone number 3
Per Camera Source 30 alerts/minute Prevents camera spam 5
Per Severity (Critical) No limit Critical alerts bypass rate limits N/A

Token Bucket Algorithm: Each tier maintains a token bucket. A token is consumed per message. Tokens replenish at the configured rate. If no tokens are available, the message is queued or rejected based on priority.

11.7.4 Alert Deduplication

Alerts are deduplicated to prevent notification spam when the same event triggers repeatedly:

Deduplication Key Components Window Action on Duplicate
Known person camera_id + person_id + event_type 5 minutes Suppress, append counter to original
Unknown person camera_id + event_type 5 minutes Suppress, append counter to original
System alert alert_type + source_id 15 minutes Suppress, update existing message
Watchlist match camera_id + person_id + watchlist_id 10 minutes Suppress, append counter

When a duplicate is detected, the original message is updated with a counter (e.g., "+3 more detections"), avoiding a flood of similar messages.

11.8 Escalation Rules

11.8.1 Escalation Thresholds

When an alert goes unacknowledged, it automatically escalates through up to 3 levels, each with increasing urgency and broader recipient distribution.

Severity Level 1 (Primary) Level 2 (Secondary) Level 3 (Final)
Critical 5 minutes 10 minutes 20 minutes
High 15 minutes 30 minutes 60 minutes
Medium 30 minutes 60 minutes 120 minutes
Low 60 minutes 120 minutes 240 minutes
Info Never Never Never

11.8.2 Escalation Actions per Level

Level Name Notification Action Recipient Expansion Severity Change
0 Original Standard routing rules Primary recipients only Original severity
1 Primary Re-notify with escalation prefix Add management group Increase by one level
2 Secondary Force all channels, bypass quiet hours Add all groups, increase severity Increase by one level
3 Final All-hands notification, include audit trail All configured recipients Set to Critical

Escalation Cancellation: Acknowledgment cancels ALL pending escalation timers for an alert. Acknowledgment can occur via:

  • Telegram inline "Acknowledge" button click
  • WhatsApp quick reply "Ack"
  • Web dashboard "Acknowledge" button
  • REST API POST /api/v1/alerts/{id}/acknowledge
  • Chat command /acknowledge {alert_id}

11.8.3 Escalation Notification Template

⬆️ <b>ESCALATION — Level {level}</b>

Original Alert: {alert_summary}
Alert ID: {alert_id}
First Detected: {first_detected_time}
Current Time: {current_time}
Unacknowledged: {elapsed_minutes} minutes
Escalation Threshold: {threshold_minutes} minutes

This alert has been escalated because it has not been acknowledged.
Please review immediately.

<a href="{acknowledge_url}">✅ Acknowledge Now</a>
<a href="{view_alert_url}">👁 View Details</a>

11.9 Media Attachment Handling

11.9.1 Media Processing Pipeline

When an alert includes media (snapshot images or video clips), a multi-stage processing pipeline ensures the media meets channel-specific requirements:

Original Media (from detection)
       │
       ▼
┌──────────────────┐
│  1. Store Original│  ──▶ MinIO/S3 (full resolution archival)
│     in Storage    │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  2. Process for  │
│     Telegram     │
└────────┬─────────┘
         │
         ├──▶ Image: Resize 1280x720, JPEG quality 85, max 10 MB
         ├──▶ Video: H.264, 1280x720, max 50 MB, max 60 seconds
         └──▶ Media Group: Each image < 10 MB, max 10 items
         │
         ▼
┌──────────────────┐
│  3. Process for  │
│     WhatsApp     │
└────────┬─────────┘
         │
         ├──▶ Image: Resize 1600x900, JPEG quality 80, max 16 MB
         └──▶ Video: H.264, 1280x720, max 16 MB, max 60 seconds

11.9.2 Image Processing Details

Step Operation Parameters
1. Load Open source image Pillow (PIL)
2. Convert Convert to RGB Drop alpha channel if present
3. Resize Scale to target dimensions Lanczos resampling
4. Compress JPEG encoding Quality: 85 (Telegram), 80 (WhatsApp)
5. Check size Verify file size under limit If over limit, reduce quality iteratively
6. Fallback Aggressive compression If quality < 50 and still over limit, reduce dimensions

Iterative Quality Reduction:

def compress_image_to_limit(image, size_limit_mb, channel):
    quality = 85 if channel == 'telegram' else 80
    min_quality = 40
    
    while quality >= min_quality:
        buffer = io.BytesIO()
        image.save(buffer, format='JPEG', quality=quality, optimize=True)
        size_mb = buffer.tell() / (1024 * 1024)
        
        if size_mb <= size_limit_mb:
            return buffer.getvalue()
        
        quality -= 5
    
    # If still over limit, reduce dimensions by 25% and retry
    new_size = (int(image.width * 0.75), int(image.height * 0.75))
    image = image.resize(new_size, Image.LANCZOS)
    return compress_image_to_limit(image, size_limit_mb, channel)

11.9.3 Video Processing Details

Videos are processed with FFmpeg using two-pass encoding to achieve the target bitrate calculated from the size limit:

# Calculate target bitrate: (size_limit_bytes * 8) / duration_seconds
# Example: 16 MB limit, 10 second clip = (16*1024*1024*8) / 10 = ~13.4 Mbps

ffmpeg -i input.mp4 \
    -c:v libx264 \
    -b:v 10M \                          # Target video bitrate
    -maxrate 12M \                      # Maximum bitrate
    -bufsize 20M \                      # Buffer size
    -vf "scale=1280:720:force_original_aspect_ratio=decrease" \
    -c:a aac -b:a 128k \               # Audio encoding
    -movflags +faststart \              # Web-optimized
    -preset fast \                      # Encoding speed/quality tradeoff
    -y output.mp4

11.10 Delivery Tracking

11.10.1 Delivery Status Lifecycle

Every notification progresses through a well-defined status lifecycle, tracked in the database for audit and troubleshooting:

Status Description Terminal?
pending Queued, waiting to be sent No
processing Currently being sent to provider No
sent API request to provider succeeded No
delivered Provider confirmed delivery to device No
read Recipient opened/read the message No
engaged User interacted (button click, reaction) Yes
failed Permanently failed (non-retryable error) Yes
retrying Scheduled for retry attempt No
dead_letter Moved to DLQ after all retries exhausted Yes
suppressed Blocked by quiet hours or deduplication Yes
cancelled Cancelled (e.g., acknowledged before send) Yes
expired Message TTL expired before delivery Yes

Status Transitions:

pending → processing → sent → delivered → read → engaged
   │          │          │         │
   ▼          ▼          ▼         ▼
retrying  cancelled   failed   suppressed
   │
   ▼
dead_letter

11.10.2 Dead Letter Queue (DLQ)

Failed notifications that exhaust all retry attempts are moved to a Redis-backed Dead Letter Queue. Admin users can review and manage DLQ entries through the web dashboard.

DLQ Feature Description
Storage Redis sorted set, ordered by failure timestamp
Retention 30 days
View Filterable by channel, error type, date range
Actions Retry individual, Retry all (batch), Discard, Export
Alert Daily digest of DLQ count; alert if > 10 entries
Auto-retry Optional: automatically retry DLQ entries every 6 hours

11.11 API Endpoints Summary

11.11.1 REST Endpoints (13 endpoints)

# Method Endpoint Purpose Auth
1 GET /api/v1/notifications/rules List all routing rules Admin
2 POST /api/v1/notifications/rules Create new routing rule Admin
3 GET /api/v1/notifications/rules/{id} Get specific rule Admin
4 PUT /api/v1/notifications/rules/{id} Update routing rule Admin
5 DELETE /api/v1/notifications/rules/{id} Delete routing rule Admin
6 GET /api/v1/notifications/templates List message templates Admin
7 POST /api/v1/notifications/templates Create/update template Admin
8 GET /api/v1/notifications/delivery-status/{alert_id} Get delivery status for alert Operator+
9 GET /api/v1/notifications/{id}/status Single notification status Operator+
10 POST /api/v1/notifications/{id}/retry Manual retry of failed notification Admin
11 GET /api/v1/notifications/dlq List dead letter queue Admin
12 POST /api/v1/notifications/dlq/retry-all Retry all DLQ entries Admin
13 POST /api/v1/notifications/dlq/clear Clear all DLQ entries Admin

11.11.2 Alert Management Endpoints

# Method Endpoint Purpose Auth
1 GET /api/v1/alerts List alerts with filters Operator+
2 GET /api/v1/alerts/{id} Get single alert details Operator+
3 POST /api/v1/alerts/{id}/acknowledge Acknowledge alert Operator+
4 POST /api/v1/alerts/{id}/resolve Resolve alert Operator+
5 POST /api/v1/alerts/{id}/ignore Ignore alert Operator+
6 POST /api/v1/alerts/{id}/false-positive Mark as false positive Operator+
7 POST /api/v1/alerts/bulk/acknowledge Bulk acknowledge Operator+
8 POST /api/v1/alerts/bulk/ignore Bulk ignore Operator+

11.11.3 WebSocket Endpoints (2 endpoints)

Endpoint Purpose Authentication
WS /api/v1/notifications/live Real-time notification stream for connected clients JWT token in query parameter
WS /api/v1/alerts/stream Live alert feed for operator dashboards JWT token in query parameter

11.11.4 Webhook Endpoints (2 endpoints)

Endpoint Source Purpose
POST /webhooks/telegram Telegram servers Receive delivery receipts, callback queries, chat events
POST /webhooks/whatsapp Meta servers Receive message status updates, incoming messages

Webhook Security:

Measure Implementation
Telegram HMAC-SHA256 signature verification using bot token
WhatsApp SHA-256 signature verification using app secret
IP allowlisting Only accept requests from Telegram/Meta IP ranges
Replay protection Reject messages with timestamps older than 5 minutes
Rate limiting 100 requests per minute per source IP

Section 12: Security Design

12.1 Security Architecture Overview

The Sentinel AI Surveillance Platform implements defense-in-depth security across seven distinct layers. Every component — from network perimeter to data storage — has been designed with security as a primary consideration, reflecting the sensitive nature of surveillance data, biometric information, and the critical safety function the system performs.

┌──────────────────────────────────────────────────────────────────────────────┐
│                      DEFENSE IN DEPTH ARCHITECTURE                            │
│                                                                              │
│   LAYER 1: PERIMETER                                                         │
│   ─────────────────                                                          │
│   AWS WAF v2 │ Geo-restriction │ DDoS protection │ Rate limiting             │
│                                                                              │
│   LAYER 2: TRANSPORT                                                         │
│   ─────────────────                                                          │
│   TLS 1.3 │ mTLS internal │ WireGuard ChaCha20-Poly1305 │ Certificate mgmt  │
│                                                                              │
│   LAYER 3: AUTHENTICATION & AUTHORIZATION                                    │
│   ─────────────────────────────────────────                                  │
│   Argon2id │ JWT ES256 │ TOTP MFA │ RBAC 4 roles │ API keys                 │
│                                                                              │
│   LAYER 4: APPLICATION SECURITY                                              │
│   ────────────────────────────                                               │
│   Input validation │ Parameterized queries │ CSP │ CSRF │ CORS │ File upload │
│                                                                              │
│   LAYER 5: DATA SECURITY                                                     │
│   ────────────────────                                                       │
│   AES-256-GCM at rest │ Field-level encryption │ Signed URLs │ Key rotation  │
│                                                                              │
│   LAYER 6: NETWORK SEGMENTATION                                              │
│   ───────────────────────────                                                │
│   VPC private subnets │ Security groups │ Network Policies │ Firewall rules  │
│                                                                              │
│   LAYER 7: AUDIT & MONITORING                                                │
│   ─────────────────────────                                                  │
│   Hash-chain audit log │ Real-time alerts │ CloudTrail │ Flow Logs           │
└──────────────────────────────────────────────────────────────────────────────┘

12.2 SSL/TLS Configuration

12.2.1 Protocol and Cipher Suite Requirements

All external-facing services enforce strong TLS configuration with modern cipher suites:

Setting Value Rationale
Minimum TLS Version TLS 1.2 Fallback for older clients; TLS 1.3 preferred
Preferred TLS Version TLS 1.3 Fastest, most secure handshake
Cipher Suites (TLS 1.2) ECDHE-ECDSA-AES256-GCM-SHA384 Forward secrecy, AES-GCM authenticated encryption
Cipher Suites (TLS 1.2) ECDHE-RSA-AES256-GCM-SHA384 Same with RSA certificates
Cipher Suites (TLS 1.2) ECDHE-ECDSA-CHACHA20-POLY1305 Mobile-optimized cipher
Cipher Suites (TLS 1.2) ECDHE-RSA-CHACHA20-POLY1305 Mobile-optimized with RSA
Cipher Suites (TLS 1.3) TLS_AES_256_GCM_SHA384 Mandatory TLS 1.3 cipher
Cipher Suites (TLS 1.3) TLS_CHACHA20_POLY1305_SHA256 Alternative TLS 1.3 cipher
Disabled Ciphers CBC mode, RC4, 3DES, DES, MD5, SHA1, RSA key exchange (no forward secrecy) Known weaknesses
HSTS max-age=63072000; includeSubDomains; preload 2-year HSTS with preload eligibility
OCSP Stapling Enabled Reduces certificate validation latency
Certificate Provider Let's Encrypt (ACME v2) Free, automated, trusted
Auto-renewal 60 days before expiry Ensures 30+ day buffer
Certificate Transparency Required All certificates publicly logged

12.2.2 mTLS for Internal Service Communication

All inter-service communication uses mutual TLS (mTLS) with client certificate verification. This means both the client and server must present valid certificates signed by the internal Certificate Authority.

Parameter Value
Internal CA Self-managed ECDSA P-256 CA
Certificate lifetime 90 days (auto-rotated)
Verification mode Required (reject if no client cert)
Revocation CRL + OCSP
Service identity SPIFFE URI in certificate Subject Alternative Name

Benefits of mTLS:

  • Even if network boundaries are breached, unauthorized services cannot access internal APIs
  • Every service-to-service call is authenticated and encrypted
  • Certificates provide strong service identity (not just IP-based)
  • No shared secrets between services (except Vault tokens)

12.2.3 TLS Configuration Code Example

# FastAPI TLS configuration
from fastapi import FastAPI
from uvicorn.config import Config

app = FastAPI()

# TLS settings for uvicorn
ssl_config = {
    "ssl_keyfile": "/certs/server.key",
    "ssl_certfile": "/certs/server.crt",
    "ssl_ca_certs": "/certs/ca.crt",          # For mTLS
    "ssl_cert_reqs": ssl.CERT_REQUIRED,        # Require client cert
    "ssl_min_version": ssl.TLSVersion.TLSv1_2,
    "ssl_ciphers": "ECDHE-ECDSA-AES256-GCM-SHA384:"
                    "ECDHE-RSA-AES256-GCM-SHA384:"
                    "ECDHE-ECDSA-CHACHA20-POLY1305:"
                    "ECDHE-RSA-CHACHA20-POLY1305",
}

12.3 Authentication

12.3.1 Password Policy

Requirement Value Enforcement
Minimum length 12 characters Hard validation
Complexity At least one uppercase, one lowercase, one digit, one special character Regex validation
Password history Last 12 passwords cannot be reused Database check
Hashing algorithm Argon2id (memory-hard, resistant to GPU cracking) Passwords never stored in plaintext
Argon2id parameters Time cost: 3, Memory: 64MB, Parallelism: 4 Tuned for 500ms hash time
HaveIBeenPwned check Enabled for all new passwords k-anonymity API (no full password sent)
Maximum age 90 days Configurable; reminder at 75 days
Lockout after failures 5 failed attempts 30-minute lockout
Password change Users cannot reuse current password Immediate validation

12.3.2 JWT Token Configuration

Parameter Value Notes
Signing algorithm ES256 (ECDSA with P-256 curve) Smaller signatures than RS256; same security
Access token lifetime 15 minutes Short-lived for security
Refresh token lifetime 7 days Long-lived but revocable
Key rotation Every 180 days Dual-key support for zero-downtime rotation
Key storage HashiCorp Vault Private key never exposed to application filesystem
Token binding Session ID + browser fingerprint Detects token theft/reuse
Claims sub, iss, aud, exp, iat, jti, role, permissions, mfa_verified Standard + custom claims
Issuer sentinel-ai Verified by all services
Audience sentinel-api Scope-limited

JWT Token Structure:

{
  "header": {
    "alg": "ES256",
    "typ": "JWT",
    "kid": "key-2025-01"
  },
  "payload": {
    "sub": "user-uuid-here",
    "iss": "sentinel-ai",
    "aud": "sentinel-api",
    "exp": 1705500000,
    "iat": 1705499100,
    "jti": "unique-token-id",
    "role": "operator",
    "permissions": ["alerts:view", "alerts:acknowledge", "cameras:view"],
    "mfa_verified": true,
    "session_id": "sess-uuid-here"
  }
}

12.3.3 Multi-Factor Authentication (MFA)

Parameter Value
Method TOTP (Time-based One-Time Password) per RFC 6238
Issuer label "Sentinel AI Surveillance"
Algorithm SHA-1 (for compatibility)
Digit length 6 digits
Time step 30 seconds
Valid window 1 step before and after current (3-step tolerance)
Recovery codes 10 single-use codes generated at setup
Enforced for Super Admin, Admin roles (mandatory)
Optional for Operator, Viewer roles (recommended)
QR code format otpauth://totp/Sentinel%20AI:{username}?secret={secret}&issuer=Sentinel%20AI

MFA Enforcement Matrix:

Role MFA Required Can Disable
Super Admin Yes No
Admin Yes No
Operator No (Recommended) Yes
Viewer No Yes

12.4 Role-Based Access Control (RBAC)

12.4.1 Role Definitions

Role Level Description Typical Users Count
Super Admin L1 Full system access; can manage other admins CISO, CTO, Platform Lead 1-2
Admin L2 Administrative functions; day-to-day management Security Manager, IT Manager 2-4
Operator L3 Day-to-day surveillance operations Security guards, SOC analysts 5-20
Viewer L4 Read-only access for review and audit Auditors, Management 2-10

12.4.2 Permission Matrix (30+ Permissions)

Permission Super Admin Admin Operator Viewer
users:full_access Y N N N
users:manage (create/edit/deactivate) Y Y N N
users:view (list, details) Y Y Y Y
users:reset_password Y Y N N
users:reset_mfa Y Y N N
cameras:full_access Y N N N
cameras:manage (add/edit/remove) Y Y N N
cameras:view (list, status) Y Y Y Y
cameras:control (PTZ, restart stream) Y Y Y N
cameras:configure_zones Y Y N N
alerts:manage (edit rules, bulk actions) Y Y N N
alerts:view (list, filter, search) Y Y Y Y
alerts:acknowledge Y Y Y N
alerts:resolve Y Y Y N
alerts:mark_false_positive Y Y Y N
persons:full_access Y N N N
persons:manage (create/edit/delete) Y Y N N
persons:view (gallery, profiles) Y Y Y Y
persons:name_unknown Y Y Y N
persons:merge Y Y Y N
watchlists:manage (create/edit/delete) Y Y N N
watchlists:view (list, members) Y Y Y Y
watchlists:add_remove_members Y Y Y N
ai_settings:manage (change defaults) Y Y N N
ai_settings:view (see current settings) Y Y Y Y
ai_settings:adjust (operator adjustments) Y Y Y N
reports:full_access Y N N N
reports:view (all reports) Y Y Y Y
reports:export Y Y Y N
system:full_access Y N N N
system:manage (config changes) Y Y N N
system:view (health, status) Y Y Y Y
audit:view (audit logs) Y Y N N
notifications:manage (routing rules) Y Y N N
storage:manage (retention policies) Y Y N N
storage:view (usage, reports) Y Y Y Y
privacy:manage (GDPR actions) Y Y N N
privacy:view (consent status) Y Y Y Y

12.4.3 Resource-Level Permissions

Beyond global permissions, the system supports resource-level access control:

Resource Type Granularity Example
Cameras Per-camera access Operator A can only view CAM-01, CAM-02
Zones Per-zone access Operator B can only view "entrance" zone
Alerts Per-camera origin Viewer can only see alerts from specific cameras
Persons Per-department HR can only view employee records
Watchlists Per-watchlist Security can only view "blacklist", not "vip"

12.5 VPN and Network Security

12.5.1 WireGuard VPN Configuration

WireGuard provides the encrypted tunnel between cloud infrastructure and the edge site:

Parameter Value Notes
Protocol WireGuard Modern, simple, fast VPN
Port UDP 51820 Single port, firewall-friendly
Authentication Ed25519 key pairs + Preshared Key (PSK) Defense in depth
Encryption ChaCha20-Poly1305 Fast on hardware without AES-NI
Key exchange Curve25519 elliptic curve 128-bit security
Tunnel network 10.200.0.0/24 Dedicated VPN subnet
Cloud endpoint 10.200.0.1/32 Single IP for cloud side
Edge endpoint 10.200.0.2/32 Single IP for edge side
AllowedIPs (cloud) 10.200.0.2/32, 192.168.29.0/24 Edge + camera network only
AllowedIPs (edge) 10.100.0.0/16, 10.200.0.0/24 Full cloud VPC + VPN
Keepalive 25 seconds Prevents NAT timeout
Key rotation 365 days Annual rotation via maintenance window

12.5.2 Network Segmentation Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                         NETWORK ARCHITECTURE                                  │
│                                                                              │
│   INTERNET                                                                   │
│      │                                                                       │
│      ▼                                                                       │
│   ┌──────────────┐                                                           │
│   │  AWS WAF     │                                                           │
│   │  + ALB       │                                                           │
│   └──────┬───────┘                                                           │
│          │                                                                   │
│   ═══════╪════════════════ AWS CLOUD VPC: 10.100.0.0/16 ═══════════════════  │
│          │                                                                   │
│          │    ┌──────────────────────────────────────────────────────┐       │
│          │    │  PUBLIC SUBNET: 10.100.1.0/24                      │       │
│          │    │  - ALB (Application Load Balancer)                  │       │
│          │    │  - NAT Gateway                                       │       │
│          │    │  - WireGuard VPN Gateway (10.200.0.1)               │       │
│          │    │  - Bastion Host (emergency SSH, admin IPs only)     │       │
│          │    └──────────────────────────────────────────────────────┘       │
│          │                                                                   │
│          └────▶┌──────────────────────────────────────────────────────┐      │
│               │  PRIVATE SUBNET: 10.100.2.0/24 (App Tier)            │      │
│               │  - EKS Worker Nodes (API, AI, Web pods)              │      │
│               │  - Stream Ingestion Service                          │      │
│               │  - Alert Engine                                      │      │
│               │  - Notification Service                              │      │
│               └──────────────────────────────────────────────────────┘      │
│               ▲                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  DATA SUBNET: 10.100.3.0/24 (No Internet)          │ │
│               │    │  - RDS PostgreSQL (Multi-AZ)                        │ │
│               │    │  - ElastiCache Redis Cluster                        │ │
│               │    │  - Amazon MSK Kafka                                 │ │
│               │    │  - NO INTERNET ACCESS (VPC endpoints only)          │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  MONITORING SUBNET: 10.100.4.0/24                  │ │
│               │    │  - Prometheus, Grafana, Alertmanager                │ │
│               │    │  - Loki (log aggregation)                           │ │
│               │    │  - Jaeger (distributed tracing)                     │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                                                              │
│   ════════════╪══════════════════════════════════════════════════════════    │
│               │                                                              │
│               │         WireGuard VPN Tunnel (UDP 51820)                   │
│               │                                                              │
│   ════════════╪══════════════════════════════════════════════════════════    │
│               │                                                              │
│               │    ┌──────────────────────────────────────────────────────┐ │
│               │    │  EDGE GATEWAY: 192.168.29.5/24 (Intel NUC)           │ │
│               │    │  OS: Ubuntu Server 22.04 LTS (minimal)               │ │
│               │    │  - Docker Compose stack                              │ │
│               │    │  - WireGuard Client (10.200.0.2)                     │ │
│               │    │  - Local MinIO (hot storage)                         │ │
│               │    │  - Redis (local cache)                               │ │
│               │    │  - Video Capture Service                             │ │
│               │    │  - AI Inference (edge models)                        │ │
│               │    └──────────────────────────────────────────────────────┘ │
│               │                              │                               │
│               │    ┌─────────────────────────┴──────────────────────┐       │
│               │    │  CAMERA LAN: 192.168.29.0/24                    │       │
│               │    │  - CP PLUS DVR: 192.168.29.200 (8 channels)     │       │
│               │    │  - RTSP streams on port 554                     │       │
│               │    │  - NO INTERNET ACCESS                           │       │
│               │    │  - NO ROUTE TO CLOUD (only via edge gateway)    │       │
│               │    └────────────────────────────────────────────────┘       │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

12.5.3 Firewall Rules

Edge Gateway Firewall (iptables):

Direction Protocol Port Source Destination Action Purpose
IN TCP 22 Admin IP range only Edge gateway ACCEPT SSH management
IN UDP 51820 Cloud VPN IP Edge gateway ACCEPT WireGuard tunnel
IN TCP 8080 Local LAN only Edge gateway ACCEPT Admin UI
IN Any Edge gateway DROP Default deny
OUT TCP 443 Edge gateway AWS S3 endpoint ACCEPT Cloud storage sync
OUT UDP 51820 Edge gateway Cloud VPN IP ACCEPT WireGuard tunnel
OUT TCP 8080 Edge gateway Local LAN ACCEPT Internal services
OUT Edge gateway Internet DROP No direct internet

Cloud Firewall (AWS Security Groups):

Direction Protocol Port Source Action Purpose
IN TCP 443 0.0.0.0/0 ACCEPT Public HTTPS
IN UDP 51820 Edge gateway IP ACCEPT WireGuard
IN TCP 5432 App security group ACCEPT PostgreSQL
IN TCP 6379 App security group ACCEPT Redis
IN TCP 9092 App security group ACCEPT Kafka
IN TCP 22 Admin IPs only ACCEPT Bastion SSH
IN Any DROP Default deny

12.6 Secret Management

12.6.1 Vault Integration

All secrets are stored in HashiCorp Vault with automatic rotation policies:

Secret Type Encryption Rotation Frequency Rotation Method Access Pattern
Database passwords AES-256-GCM 90 days Terraform + Vault dynamic credentials Short-lived (1-hour TTL)
JWT signing keys AES-256-GCM 180 days Dual-key grace period Zero-downtime rotation
Internal API keys AES-256-GCM 90 days Zero-downtime rotation Automated
Telegram bot tokens AES-256-GCM 180 days Regenerate via BotFather Semi-automated
WhatsApp API tokens AES-256-GCM 180 days Regenerate via Meta Business Manager Semi-automated
DVR credentials AES-256-GCM 180 days Manual via DVR web UI Manual
TLS certificates ACME auto 60 days cert-manager + Let's Encrypt Fully automated
WireGuard keys AES-256-GCM 365 days Maintenance window rotation Scripted
Backup encryption keys AES-256-GCM 365 days Re-encrypt all backups Automated
Session secrets AES-256-GCM On security incident Immediate revocation Admin trigger

12.6.2 Dynamic Database Credentials

Instead of static database passwords, the system uses Vault's dynamic credential engine:

Application → Vault (request db credentials)
                  │
                  ▼
           Vault creates temporary DB user
           (TTL: 1 hour, auto-revoke)
                  │
                  ▼
           Application receives credentials
           Uses them for DB connections
                  │
                  ▼
           After TTL expires → Vault revokes DB user
           Application requests new credentials

Benefits:

  • No long-lived database passwords in application configuration
  • Each application instance gets unique credentials
  • Automatic credential rotation without application restart
  • Full audit trail of credential issuance and revocation
  • Instant credential revocation on compromise

12.6.3 Field-Level Encryption

PII and biometric data in the database uses AES-256-GCM field-level encryption:

Field Category Example Fields Encryption
Personal identification name_encrypted, email_encrypted, phone_encrypted AES-256-GCM per-field
Employment data employee_id_encrypted, department_encrypted AES-256-GCM per-field
Biometric data face_encoding_encrypted (512-D vector) AES-256-GCM per-field
Media metadata location_encrypted (GPS coordinates) AES-256-GCM per-field

Encryption Architecture:

Application receives plaintext data
       │
       ▼
[Encrypt field-by-field using Vault KMS]
       │
       ▼
Store ciphertext in PostgreSQL
       │
       ▼
[Decrypt only in application layer when needed]
       │
       ▼
Decrypted data never logged, never cached

12.7 Audit Logging

12.7.1 Tamper-Resistant Hash-Chain

The audit log implements a cryptographically linked chain to ensure integrity:

Field Purpose Example
event_id Unique UUID for each audit event 550e8400-e29b-41d4-a716-446655440000
timestamp ISO 8601 timestamp 2025-01-16T14:32:15Z
event_type Category of event user_login, person_viewed, alert_acknowledged
actor_id User who performed the action user-uuid-here
actor_role Role of the actor at the time operator
resource_type Type of resource accessed person, camera, alert
resource_id Specific resource identifier person-123, cam-01
action Action performed view, edit, delete, create
result Success or failure success, failure, denied
ip_address Source IP address 10.100.2.15
session_id Session identifier sess-uuid-here
previous_hash SHA-256 hash of the previous entry a3f5c2...
entry_hash SHA-256 hash of current entry content b7e1d9...
signature ECDSA signature of the entry hash 30450221...

Chain Verification: Any modification to historical entries invalidates all subsequent hashes and signatures, making tampering detectable.

12.7.2 Log Retention Policy

Log Type Online Retention Archive Retention Storage Type
Authentication events 1 year 6 years WORM (Write-Once-Read-Many)
Authorization decisions 1 year 6 years WORM
Person data modifications 1 year 6 years WORM
Alert actions (ack, resolve) 1 year 3 years Standard
Configuration changes 2 years 5 years Standard
Security events 1 year 6 years WORM
System health events 90 days 1 year Standard
API access logs 90 days 1 year Standard

12.7.3 Real-Time Security Alerting

Automated detection rules trigger alerts on suspicious patterns:

Rule ID Rule Name Condition Auto-Response
SEC-001 Brute force login > 5 failed logins from same IP in 5 minutes Block IP for 1 hour; alert security team
SEC-002 Credential stuffing > 10 unique usernames from same IP in 5 minutes Block IP for 24 hours; alert security team
SEC-003 Impossible travel Logins > 500 km apart within 1 hour Force MFA re-verification; alert security team
SEC-004 Privilege escalation > 20 admin actions in 10 minutes from new user Alert security team; log for review
SEC-005 Data exfiltration > 1 GB downloaded by single user in 1 hour Suspend account; alert security team
SEC-006 Off-hours admin Admin action between 22:00-06:00 Log + notify security manager
SEC-007 MFA bypass attempt > 3 MFA failures then success without MFA Block account; alert security team
SEC-008 Suspicious media access > 50 media downloads by non-security role Alert security team
SEC-009 Unknown device login Login from unrecognized device fingerprint Require MFA; notify user
SEC-010 Concurrent sessions > 3 concurrent sessions for same user Force logout of oldest session

12.8 Media Access Security

12.8.1 Signed URL Architecture

Media files are never served directly from object storage. All access is mediated through signed URLs:

Parameter Value Notes
Default expiration 5 minutes Short-lived to prevent sharing
Maximum expiration 1 hour For bulk exports only
URL binding Tied to user session Invalidated on logout
Single-use option Available for sensitive media Blacklist incident footage
Access logging Every media request logged User ID, media ID, timestamp, IP
IP binding Optional URL valid only from requesting IP
Watermarking Optional Username/timestamp overlay on images

Signed URL Flow:

1. User requests to view media
2. System checks: authentication + authorization + consent
3. If allowed: generate signed URL with HMAC-SHA256 signature
4. URL format: https://cdn.example.com/media/{id}?token={jwt}&sig={hmac}
5. Redirect user to signed URL
6. CDN/Object storage validates signature and expiry
7. Media served if valid; 403 if expired or invalid
8. Access logged with full context

12.8.2 Media Access Controls

Control Implementation
No direct S3/MinIO URLs All access via signed URL proxy
Authentication required Valid JWT session required for all media requests
Authorization enforced RBAC checks per media item; camera-level permissions respected
Access logging Every media request logged with user ID, media ID, timestamp, IP, session
DPO notification Automatic notification for access to sensitive media (blacklist incidents)
Secure deletion Overwrite with random data + verification before removal
Download tracking Number of downloads per media item tracked and reported

12.9 API Security

12.9.1 Defense Layers

Layer Implementation Details
Rate limiting Per-endpoint, per-user tiers Token bucket algorithm; 100 req/min default; 10 req/min for auth endpoints
Input validation Pydantic models on all endpoints Strict type checking; reject unknown fields; max length limits
SQL injection prevention Parameterized queries only No dynamic SQL construction; ORM for all database access
XSS prevention Output encoding + CSP headers User input never rendered as HTML; Content-Security-Policy enforced
CSRF protection SameSite=Strict cookies + tokens State-changing operations require CSRF token validation
CORS Restricted to known origins No wildcard origins; explicit allowlist per environment
Request size limits 10 MB default; 50 MB for media upload Prevents DoS via large payloads
Request timeout 30 seconds default Prevents resource exhaustion

12.9.2 Security Headers

Header Value Purpose
Strict-Transport-Security max-age=63072000; includeSubDomains; preload Enforce HTTPS for 2 years
X-Content-Type-Options nosniff Prevent MIME-type sniffing
X-Frame-Options DENY Prevent clickjacking
X-XSS-Protection 0 Disabled — CSP is preferred defense
Referrer-Policy strict-origin-when-cross-origin Minimal referrer information
Permissions-Policy camera=(), microphone=(), geolocation=() Disable browser APIs not needed
Content-Security-Policy default-src 'self'; script-src 'self' 'nonce-{random}'; style-src 'self' 'unsafe-inline'; img-src 'self' blob: data: https://*.amazonaws.com; media-src 'self' blob: https://*.amazonaws.com; connect-src 'self' wss://*.example.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; Comprehensive CSP
Cache-Control (API) no-store, no-cache, must-revalidate, proxy-revalidate Prevent caching of API responses
Pragma (API) no-cache Legacy cache directive

12.10 Session Security

Parameter Value Notes
Cookie flags HttpOnly; Secure; SameSite=Strict Full protection against XSS and CSRF
Access token storage Memory only (JavaScript variable) Never stored in localStorage
Access token max-age 15 minutes Short-lived
Refresh token storage HttpOnly secure cookie Cannot be accessed by JavaScript
Refresh token max-age 7 days Long-lived but revocable
Session absolute timeout 8 hours Force re-login after 8 hours
Idle timeout 30 minutes Expire if no activity
Max concurrent sessions 3 per user Prevents session abuse
Session fixation protection Regenerate session ID on login Prevent fixation attacks
Session binding Browser fingerprint + IP validation Detect session theft
Force logout capability Admin can revoke all sessions for any user Immediate effect via Redis
Session storage Redis with AUTH enabled Encrypted at rest

12.11 Data Privacy (GDPR Compliance)

12.11.1 GDPR Compliance Matrix

GDPR Principle Implementation Detail Evidence
Lawful Basis Legitimate interest assessment documented per processing purpose LIA document filed with DPO
Data Minimization Only facial feature embeddings (512-D vector) stored; raw images discarded after encoding Architecture documentation
Purpose Limitation Facial data used ONLY for security/safety purposes; no marketing or secondary use Privacy policy
Storage Limitation Automated retention enforcement; cryptographic deletion after expiry Retention policy configuration
Accuracy Regular review and correction procedures; user can request correction Data correction workflow
Integrity & Confidentiality AES-256-GCM encryption, RBAC access controls, audit logging Security architecture
Accountability DPO appointed; Privacy Impact Assessment completed; Records of Processing maintained Compliance documentation
Transparency Privacy notice displayed at camera entry points; privacy policy on website Physical signage + web policy

12.11.2 Consent Management

Consent is managed through a comprehensive lifecycle:

Stage Description Transition Trigger
pending Consent requested but not yet obtained Initial system setup
granted Explicit consent obtained User signs consent form
withdrawn Consent actively withdrawn User requests deletion/stop processing
deleted All data removed; audit trail only Deletion workflow complete

Consent Metadata:

Field Description
Consent method written / digital / verbal
Consent document reference ID of signed consent form
Consent date When consent was obtained
Consent recorder Who recorded the consent
Consent expiry Annual expiry date
Consent scope What processing is consented to

Withdrawal Processing:

  1. User submits withdrawal request (any channel)
  2. System flags person record for deletion
  3. Delete face embeddings (biometric data) within 72 hours
  4. Delete all personal images from storage
  5. Anonymize detection events (keep event, replace name with [REDACTED], remove person link)
  6. Delete related event clips
  7. Log all deletion actions in audit trail
  8. Confirm completion to user within 30 days

12.11.3 Privacy Mode Controls

Four privacy modes are available per camera:

Mode Recording Face Recognition Alerts Live View Use Case
Full Operation Yes Yes All Yes Standard surveillance
Recording Only Yes No Motion only (no face) Yes Areas where facial recognition is not needed
Live View Only No No No Yes Privacy-sensitive areas; viewing only
Privacy Mode No No No Privacy overlay Break rooms, restrooms — privacy completely protected

12.12 Edge Gateway Security

12.12.1 Hardening Checklist

# Hardening Measure Implementation
1 Minimal OS Ubuntu Server 22.04 LTS — no desktop packages
2 Disabled Bluetooth systemctl stop bluetooth; systemctl disable bluetooth
3 Disabled WiFi nmcli radio wifi off; modprobe -r iwlwifi
4 Disabled CUPS systemctl stop cups; systemctl disable cups
5 Disabled avahi/mDNS systemctl stop avahi-daemon; systemctl disable avahi-daemon
6 Disabled snapd systemctl stop snapd; systemctl disable snapd
7 Disabled modemmanager systemctl stop ModemManager; systemctl disable ModemManager
8 SSH key-only PasswordAuthentication no; PubkeyAuthentication yes
9 SSH LAN-only ListenAddress 192.168.29.5
10 SSH root disabled PermitRootLogin no
11 SSH rate limit MaxAuthTries 3; ClientAliveInterval 300
12 SSH protocol 2 Protocol 2 (only)
13 SSH modern ciphers Ciphers chacha20-poly1305@openssh.com
14 Auto-updates unattended-upgrades — security updates only
15 Update schedule Daily at 03:00; auto-reboot at 04:00 if required
16 Disk encryption LUKS + TPM2 auto-unseal
17 Tamper detection File integrity monitoring (AIDE) for critical config
18 Container security Non-root users, read-only root FS, no new privileges
19 Firewall iptables default deny; explicit allow only
20 No internet access All outbound traffic via VPN tunnel only

12.12.2 LUKS Disk Encryption with TPM2

The edge gateway uses LUKS full-disk encryption with TPM2 auto-unseal for headless operation:

# During setup — encrypt the data partition
cryptsetup luksFormat /dev/nvme0n1p2 \
    --type luks2 \
    --cipher aes-xts-plain64 \
    --key-size 512 \
    --pbkdf argon2id \
    --tpm2-device=auto

# Bind the LUKS key to TPM2 PCR measurements
cryptsetup luksAddKey /dev/nvme0n1p2 \
    --key-slot 1 \
    --tpm2-device=auto \
    --tpm2-pcrs=0,2,7

# During boot — TPM2 auto-unseals if PCRs match
cryptsetup open --tpm2-device=auto /dev/nvme0n1p2 data

PCR Measurements Bound:

PCR Purpose
PCR 0 Core system firmware executable code
PCR 2 Extended or pluggable executable code
PCR 7 Secure Boot state

12.13 Cloud Infrastructure Security

Control Implementation Verification
Private subnets All internal services in private subnets; no public IPs VPC flow logs
Security groups Least privilege; explicit allow only; no default allow-all Quarterly review
Database access No public access; app servers only via security group reference AWS Config rule
Bastion host Emergency access only; non-standard SSH port (2222); admin IP allowlist only Access log audit
IMDSv2 Enforced on all EC2 instances; no IMDSv1 fallback Instance metadata check
Container security Non-root users, read-only root FS, no new privileges, drop ALL capabilities Pod Security admission
Image scanning Trivy + Snyk on every build; HIGH/CRITICAL vulnerabilities block deployment CI/CD pipeline gate
Image signing Cosign signature verification required before deployment Admission controller
Resource quotas Kubernetes LimitRange on all namespaces Resource quota monitoring
Network policies Default deny all ingress/egress; explicit rules per service Policy audit
Pod Security Restricted standard enforced cluster-wide Pod Security admission
Secrets management Vault + External Secrets Operator; no secrets in Git Secret scanning
Logging All AWS API calls logged via CloudTrail; VPC Flow Logs enabled Log analysis

12.14 Secrets Rotation Policy

Secret Type Frequency Method Automation Rollback
Database passwords 90 days Terraform + Vault dynamic credentials Full N/A (short-lived)
JWT signing keys 180 days Dual-key grace period; new key signs, old key verifies for 7 days Full Keep old key for 7 days
Internal API keys 90 days Zero-downtime: add new key, deploy, remove old key Full Immediate via config revert
Telegram/WhatsApp tokens 180 days or on suspicion Generate new via provider, update Vault, 5-min grace, revoke old Semi Old token valid for 5-minute grace
TLS certificates 60 days cert-manager + Let's Encrypt auto-renewal Full Previous certificate cached
WireGuard keys 365 days Maintenance window: generate new keys, update both endpoints simultaneously Scripted Manual key restore
DVR credentials 180 days Manual via DVR web UI Manual Previous password documented
Backup encryption keys 365 days Generate new key, re-encrypt all backups in background Full Previous key kept for 30 days
Session secrets On security incident Immediate: generate new secret, force all re-authentication Admin trigger Not applicable

12.15 Incident Response

12.15.1 Security Event Detection and Response

Phase Timeline Actions Responsible
Detection Automated (real-time) Automated rules + behavioral analysis detect anomaly; alert generated System
Assessment 0-15 minutes On-call engineer evaluates severity; determines if genuine security event On-call Engineer
Containment 15-60 minutes Isolate affected systems; revoke compromised credentials; block malicious IPs Security Team
Eradication 1-4 hours Remove root cause; patch vulnerabilities; rotate all exposed secrets Engineering
Recovery 4-24 hours Restore from clean backups; verify system integrity; re-enable services Platform Team
Lessons Learned 24-48 hours Post-mortem; update procedures; implement preventive measures Security Team

12.15.2 Breach Notification Procedure

Phase Timeline GDPR Requirement Actions
Detection & Assessment 0-24 hours Confirm breach; contain; assemble response team
Investigation 24-72 hours Article 33(1) Forensic analysis; determine scope of affected data
Supervisory Authority Within 72 hours Article 33 Notify Data Protection Authority
Data Subjects Without undue delay Article 34 Notify affected individuals if high risk
Recovery Post-notification Restore from clean backups; apply patches
Post-Incident Within 48 hours Article 5(2) Root cause analysis; update plans; document

12.15.3 Breach Severity Classification

Level Criteria Notification Required Example
Low No personal data accessed Internal only Failed attack attempt; no data exposure
Medium Limited personal data; no sensitive data DPA notification Username/email list exposed
High Sensitive personal data or biometric data accessed DPA + Data subjects Facial embeddings database accessed
Critical Large-scale biometric exfiltration; ongoing threat DPA + Data subjects + Public Ransomware attack with biometric data theft

12.16 Security Checklist Summary

The complete security checklist contains 100+ items across 15 categories. The following table summarizes the key items per category:

Category Items Key Requirements
SSL/TLS 8 TLS 1.3, strong cipher suites only, HSTS, OCSP stapling, auto-renewal
Authentication 13 Argon2id, JWT ES256, MFA enforcement, password policy, HaveIBeenPwned
RBAC 7 4 roles, 30+ permissions, resource-level access, default deny
VPN & Network 10 WireGuard + PSK, 5 security zones, firewall deny-all, network policies
Secret Management 10 Vault storage, dynamic credentials, field encryption, rotation schedule
Audit Logging 11 Hash-chain integrity, 20+ fields per entry, WORM storage, real-time alerts
Media Access 8 Signed URLs, session-bound, 5-min expiry, single-use option, watermarking
API Security 11 Rate limiting, Pydantic validation, parameterized queries, CSP, CSRF, CORS
Session Security 8 HttpOnly/Secure/Strict cookies, 8h absolute timeout, 30m idle timeout
Data Privacy (GDPR) 13 Consent tracking, right to deletion, anonymization, DPO, PIA
Edge Gateway 12 20-point hardening, LUKS + TPM2, tamper detection, auto-updates
Cloud Infrastructure 11 Private subnets, image scanning, Pod Security, IMDSv2, CloudTrail
Secrets Rotation 7 All types scheduled, 60-day TLS, 90-day DB, dual-key JWT
Incident Response 9 Detection rules, breach notification, severity classification, post-mortem
Total 130+

Section 13: UX / Website Structure

13.1 Design System

13.1.1 Design Philosophy

The UX design follows a "dark cockpit" philosophy optimized for 24/7 surveillance operations. The interface minimizes eye strain during long monitoring shifts while ensuring critical information is immediately visible. All design decisions prioritize operator efficiency and rapid threat identification.

Principle Implementation
Dark mode default Near-black background with blue-tinted grays to reduce eye strain in low-light environments
Information density High-density layouts that maximize data visible without scrolling
At-a-glance status Color-coded status indicators for immediate situational awareness
Progressive disclosure Advanced controls hidden behind "Expand" toggles; essential info always visible
Consistent patterns Same interaction patterns reused across all 18 pages
Responsive feedback Every action produces visible feedback within 100ms

13.1.2 Color Palette

Token Hex RGBA Usage Contrast Ratio
--bg-primary #0B0E14 rgb(11, 14, 20) Main application background
--bg-secondary #151922 rgb(21, 25, 34) Card and panel backgrounds
--bg-tertiary #1E2330 rgb(30, 35, 48) Elevated surfaces, modals, dropdowns
--bg-sidebar #0D1117 rgb(13, 17, 23) Sidebar navigation background
--bg-hover #1A2030 rgb(26, 32, 48) Row/card hover state
--bg-selected #1E3A5F rgb(30, 58, 95) Selected item background
--text-primary #E2E8F0 rgb(226, 232, 240) Headings, important content 15.8:1
--text-secondary #94A3B8 rgb(148, 163, 184) Labels, descriptions, metadata 9.2:1
--text-muted #64748B rgb(100, 115, 139) Placeholder text, disabled states 6.1:1
--accent-blue #3B82F6 rgb(59, 130, 246) Primary accent — buttons, links, active states 4.5:1
--accent-blue-hover #2563EB rgb(37, 99, 235) Button/link hover state 5.1:1
--accent-green #10B981 rgb(16, 185, 129) Success, online status, positive trends 5.3:1
--accent-red #EF4444 rgb(239, 68, 68) Critical alerts, errors, offline status 5.0:1
--accent-orange #F59E0B rgb(245, 158, 11) Warnings, medium severity 5.4:1
--accent-yellow #FBBF24 rgb(251, 191, 36) Watchlist indicators, highlights 6.1:1
--accent-purple #8B5CF6 rgb(139, 92, 246) AI features, special highlights 4.8:1
--border-color #1E293B rgb(30, 41, 59) Card borders, dividers, separators
--border-focus #3B82F6 rgb(59, 130, 246) Focus ring color
--shadow-sm 0 1px 2px rgba(0,0,0,0.3) Subtle elevation
--shadow-md 0 4px 6px rgba(0,0,0,0.4) Card elevation
--shadow-lg 0 10px 25px rgba(0,0,0,0.5) Modal/dialog elevation

13.1.3 Typography

Token Font Family Size Weight Line Height Letter Spacing Usage
Display Inter 28px 700 (Bold) 1.2 -0.02em Page titles
H1 Inter 22px 600 (Semi-bold) 1.3 -0.01em Section headings
H2 Inter 18px 600 (Semi-bold) 1.4 0 Card titles, modal headers
H3 Inter 15px 500 (Medium) 1.4 0 Sub-sections, form labels
Body Inter 14px 400 (Regular) 1.5 0 General text, descriptions
Body Small Inter 13px 400 (Regular) 1.5 0 Secondary body text
Caption Inter 12px 400 (Regular) 1.4 0.01em Captions, metadata, footnotes
Timestamp JetBrains Mono 12px 400 (Regular) 1.4 0 All timestamps, durations
Code JetBrains Mono 13px 400 (Regular) 1.5 0 Code snippets, IDs, technical data
Badge Inter 11px 500 (Medium) 1 0.02em Status badges, tags

13.1.4 Spacing and Layout

Token Value Usage
Sidebar expanded 260px Full navigation with labels and icons
Sidebar collapsed 72px Icons only; hover for tooltip
Top bar height 56px Clock, alerts, user menu
Content padding 24px Page content horizontal padding
Content max-width 1400px Maximum content width; centered above
Card padding 16px Internal card padding
Card border radius 12px Card and panel corners
Card gap 16px Gap between cards in grid
Button border radius 8px Button corners
Input border radius 6px Form input corners
Modal border radius 16px Modal/dialog corners
Toast border radius 8px Toast notification corners
Avatar size (small) 24px Inline avatars
Avatar size (medium) 40px Card headers, lists
Avatar size (large) 64px Profile pages
Icon size (default) 20px Navigation and actions
Icon size (small) 16px Inline icons
Scrollbar width 8px Custom styled scrollbar

13.2 Global Navigation Structure

13.2.1 Layout Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│ [Logo]  Sentinel AI Surveillance              [Clock] [Alerts] [👤 User] │  ▲ 56px
├────────┬───────────────────────────────────────────────────────────────────┤
│        │                                                                    │
│  [📊]  │                    MAIN CONTENT AREA                              │
│  Dash  │                                                                    │
│  board │    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│        │    │   Card 1     │  │   Card 2     │  │   Card 3     │         │
│  [📹]  │    │              │  │              │  │              │         │
│  Live  │    └──────────────┘  └──────────────┘  └──────────────┘         │
│        │                                                                    │
│  [🔔]  │    ┌──────────────────────────────────────────────────┐         │
│ Alerts │    │              Wide Card / Table                   │         │
│        │    └──────────────────────────────────────────────────┘         │
│  [🔍]  │                                                                    │
│ Detec  │                                                                    │
│ tions  │                                                                    │
│        │                                                                    │
│ [remaining navigation items...]                                            │
│        │                                                                    │
├────────┤                                                                    │
│◁ / ▷  │                                                                    │
└────────┴───────────────────────────────────────────────────────────────────┘
  ◄── 260px (expanded) / 72px (collapsed) ──►

13.2.2 Navigation Menu Items

# Icon Label Route Badge Type Required Permission
1 LayoutDashboard Dashboard /dashboard None Any
2 Video Live View /live Online camera count cameras:view
3 Bell Alert Center /alerts Pending alert count alerts:view
4 ScanEye Detections /detections None cameras:view
5 Users Person Gallery /persons Total person count persons:view
6 UserQuestion Unknown Review /unknowns Queue count persons:view
7 ClockAlert Suspicious Activity /timeline None alerts:view
8 Search Search /search None Any
9 ShieldAlert Watchlists /watchlists None watchlists:view
10 Sparkles AI Vibe Settings /settings/ai None ai_settings:view
11 Brain Training Review /training Pending suggestions ai_settings:view
12 Activity System Health /health Status dot (green/yellow/red) system:view
13 Settings Settings /settings None Admin functions

Settings Submenu:

# Icon Label Route Required Permission
13a Camera Camera Management /settings/cameras cameras:manage
13b HardDrive Retention & Storage /settings/storage storage:manage
13c UserCog Admin Users /settings/users users:manage
13d BellRing Notification Settings /settings/notifications notifications:manage

13.2.3 Top Bar

Element Position Content Update Frequency
Logo + Brand Left Sentinel AI logo + text Static
Current Time Center-Right HH:MM:SS live clock Every second
Alert Badge Right Bell icon with red count badge On alert change
User Menu Far right Avatar + dropdown menu Static

User Menu Dropdown:

Item Action
Profile Navigate to user profile
Preferences Theme, timezone, notification preferences
Keyboard Shortcuts Show shortcut reference modal
Help & Documentation Open help center
Logout End session (clears all tokens)

13.3 Page Descriptions

13.3.1 Page 1: Login (/login)

The login page is the entry point to the system. It is designed for quick, secure access with minimal friction.

Feature Specification
Layout Centered card on dark background
Logo Sentinel AI logo (large) centered above form
Fields Username/email (text input), Password (password input with show/hide toggle)
Remember me Checkbox — "Keep me signed in for 7 days"
Submit "Sign In" button — full width, accent blue
MFA step Appears after successful password; 6-digit TOTP input with auto-focus
Error states Inline validation; shake animation on error
Footer "v2.3.1" version number, copyright, privacy policy link
Security Rate limiting (5 attempts / 15 min), CAPTCHA after 3 failures
Redirect After login, redirect to originally requested URL (or Dashboard)
Session JWT access token (15 min) + refresh token cookie (7 days)

13.3.2 Page 2: Dashboard (/dashboard)

The Dashboard is the primary landing page providing at-a-glance situational awareness.

┌──────────────────────────────────────────────────────────────────────────────┐
│  Dashboard                                          [Refresh] [Date Range] │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌──────────────┐ │
│  │  📹 8/8       │ │  🔔 12        │ │  👥 47        │ │  ✓ Healthy  │ │
│  │  Cameras      │ │  Alerts Today │ │  Persons      │ │  System     │ │
│  │  Online       │ │  3 Critical   │ │  Detected     │ │  All Good   │ │
│  └────────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │
│                                                                              │
│  ┌────────────────────────────────────────┐ ┌──────────────────────────┐   │
│  │  Alert Distribution (Last 24 Hours)   │ │  Recent Alerts           │   │
│  │                                        │ │                          │   │
│  │  8 ┤          ██                       │ │  🔴 CAM-01  Unknown     │   │
│  │  6 ┤    ██    ██  ██                   │ │     14:32 — Entrance    │   │
│  │  4 ┤    ██ ██ ██  ██ ██                │ │  🟡 CAM-03  Watchlist   │   │
│  │  2 ┤ ██ ██ ██ ██  ██ ██ ██             │ │     13:15 — Parking     │   │
│  │  0 ┼────┬────┬────┬────┬────┬────┬──  │ │  🟠 CAM-05  System      │   │
│  │     00  04  08  12  16  20           │ │     12:08 — Storage 90% │   │
│  │                                        │ │                          │   │
│  └────────────────────────────────────────┘ └──────────────────────────┘   │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────┐     │
│  │  Camera Status Grid (2x4)                                        │     │
│  │                                                                    │     │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │     │
│  │  │ CAM-01  │ │ CAM-02  │ │ CAM-03  │ │ CAM-04  │           │     │
│  │  │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │           │     │
│  │  │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│           │     │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘           │     │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │     │
│  │  │ CAM-05  │ │ CAM-06  │ │ CAM-07  │ │ CAM-08  │           │     │
│  │  │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │ │ [LIVE]  │           │     │
│  │  │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│           │     │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘           │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────┐     │
│  │  Activity Feed                                                    │     │
│  │  14:32 — Unknown person detected at CAM-01 (Entrance)            │     │
│  │  14:15 — Watchlist match: John Smith at CAM-03 (Parking)         │     │
│  │  13:58 — Operator Alice acknowledged alert #ALT-2847             │     │
│  │  13:42 — Camera CAM-05 stream reconnected                        │     │
│  │  13:30 — Daily training completed: 3 new face clusters          │     │
│  └──────────────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘

Dashboard Components:

Component Refresh Rate Description
Stat cards 30 seconds Active cameras, alerts today, persons detected, system health
Alert distribution chart 5 minutes Bar chart showing alerts by hour for last 24 hours
Recent alerts card 30 seconds Last 5 alerts with severity badge, camera, timestamp
Camera status grid 30 seconds 2x4 grid of all 8 cameras with live thumbnail and status dot
Activity feed Real-time (WebSocket) Recent system events — detections, alerts, operator actions

13.3.3 Page 3: Live Camera View (/live)

The live view is the primary monitoring interface, showing real-time streams from all 8 cameras.

Feature Specification
Default layout 2x4 grid (8 cameras)
Layout options 1x1 (single), 2x2 (4 cameras), 2x4 (8 cameras), 4x4 (16 cameras for future scaling)
Stream format HLS (HTTP Live Streaming) with WebRTC fallback for lower latency
Per-camera overlay Camera name, status dot, expand button, snapshot button
Grid controls Play all / Pause all, Refresh all streams, Layout selector
Camera states Loading (spinner), Playing, Paused, Error (retry button), Offline (gray placeholder)
Fullscreen Click any camera to expand; press F to toggle fullscreen for focused camera
Camera switching Press 1-8 to focus camera by number
Snapshot Press S or click camera snapshot button to capture current frame
Recording indicator Red pulsing dot on cameras actively recording
Alert overlay Flashing border on camera that triggered recent alert

13.3.4 Page 4: Alert Center (/alerts)

The Alert Center provides comprehensive alert management with filtering, batch actions, and detailed investigation tools.

Feature Specification
Filter bar Date range picker, severity multi-select (Critical/High/Medium/Low/Info), camera multi-select, status filter (Pending/Acknowledged/Resolved/Ignored), type filter
Severity legend Color-coded badges: Critical (red), High (orange), Medium (yellow), Low (blue), Info (gray)
Alert cards Each card: thumbnail image, camera name, timestamp, severity badge, person name (if known), description, current status
Card actions Acknowledge, Resolve, Ignore, View Details, Mark False Positive
Bulk actions Checkbox selection; batch Acknowledge or Ignore
Sort options Newest first (default), Oldest first, Severity (highest first), Camera name
Pagination 20 alerts per page; infinite scroll option
Empty state "No alerts in the selected period" with illustration
Detail panel Slide-out panel with full alert info: images, video clip, AI confidence, detection metadata, person profile link

13.3.5 Page 5: Recent Detections (/detections)

Shows all recent detection events with face thumbnails and recognition results.

Feature Specification
Filter controls Known/Unknown/All toggle, date range picker, camera selector, person name search
Detection cards Face thumbnail + name (or "Unknown") + confidence percentage + camera name + timestamp + watchlist badge
Card click Opens detail view with full-size image, sighting history for that person, camera info
Actions "Name This Person" (unknowns), "View Profile" (known), "Add to Watchlist"
Confidence indicator Visual bar showing confidence level; color-coded (green > 90%, yellow 70-90%, orange < 70%)
Grid layout 4 columns desktop, 3 tablet, 2 mobile
Auto-refresh New detections appear at top without page reload (WebSocket)

13.3.6 Page 6: Person Gallery (/persons)

A browsable gallery of all known persons in the system.

Feature Specification
Search bar Full-text search across names, roles, departments, tags
Role filters Employee / Visitor / Vendor / Contractor / Other — pill-style toggle buttons
Sort options Name (A-Z), Last Seen (recent first), Sightings Count (highest first), Date Added (newest first)
Person cards Face image, name, role badge, department, last seen timestamp, total sightings count
Grid layout 5 columns desktop (xl), 4 columns (lg), 3 columns (md), 2 columns (sm)
Pagination 50 persons per page
Actions Click card → navigate to Person Profile; right-click context menu
Bulk actions Select multiple for bulk add to watchlist
Empty state "No persons found" with "Add your first person" CTA

13.3.7 Page 7: Unknown Persons Review (/unknowns)

The review queue for unidentifified persons — a critical workflow for building the person database.

Feature Specification
Queue view Cards of unknown person clusters (grouped by face similarity via DBSCAN)
Cluster card Representative face image + cluster size (number of sightings) + first/last seen + cameras detected at + confidence range
Actions per cluster Name This Person, Merge with Existing, Ignore Cluster, Mark as Reviewed
AI insight panel Pattern suggestion: "Seen 5x at entrance between 08:00-09:00 — possibly employee"
Progress indicator "23 unknown clusters remaining" with progress bar
Batch review Keyboard navigation (arrow keys + Enter to select action) for rapid review
Empty state "Great job! No unknown persons to review. All caught up!" with celebration animation
Reviewed history Tab to view previously reviewed clusters

13.3.8 Page 8: Person Profile (/persons/{id})

Detailed view of a single person's information, detection history, and management options.

Feature Specification
Header Name, role badge, status (Active/Inactive), action buttons (Edit, Delete, Add to Watchlist)
Photo gallery Primary face photo (large) + additional reference photos in thumbnail grid below
Info panel Department, employee ID, contact information, notes, tags, date added, added by
Sighting history Timeline of all detections — timestamp, camera name, confidence, thumbnail image
Sighting stats Total sightings, first seen, last seen, most common camera, most common time
Watchlist memberships Which watchlists this person belongs to, with badge per watchlist
Activity log Who created/edited the profile and when; full audit trail
Danger zone Delete person (with confirmation dialog explaining consequences)

13.3.9 Page 9: Suspicious Activity Timeline (/timeline)

A timeline-based visualization of flagged events for pattern analysis.

Feature Specification
Timeline view Horizontal time axis with event markers positioned by timestamp
Event types Unusual movement (orange), Loitering (yellow), Unauthorized access (red), Crowd gathering (purple)
Color coding Each event type has a distinct color; severity affects marker size
Filters Event type multi-select, camera selector, date range, severity threshold
Zoom levels Hour view, Day view (default), Week view, Month view
Click marker Opens detail panel with description, evidence images, AI reasoning, confidence
Density heatmap Background shows detection density to identify high-activity periods

13.3.10 Page 10: Search (/search)

Global search across all data types in the system.

Feature Specification
Search bar Prominent centered search input with clear button
Category filters Person, Camera, Event, Alert — toggle pills
Results grouping Results grouped by category with section headers
Person search Type name or upload a photo for face recognition similarity search
Camera search By name, location, or status
Event search By description, camera, person, or event type
Alert search By ID, description, or camera
Keyboard shortcut / (forward slash) focuses search from any page
Recent searches Dropdown shows recent searches for quick access
Empty state "No results found" with search tips

13.3.11 Page 11: Watchlists (/watchlists)

Management interface for watchlist categories and their members.

Feature Specification
Watchlist cards Name, icon (selected from preset), color, member count, alert settings summary
Create button "+ New Watchlist" with modal: name, icon picker, color picker, alert configuration
Default watchlists VIP (green), Blacklist (red), Authorized (blue), Temporary Access (yellow)
Card click Opens watchlist detail with full member list
Member management Add from gallery (search + select), remove member, bulk import via CSV
Alert settings Per-watchlist: alert timing, severity override, notify groups, quiet hours override
Test button "Test Alert" — sends test notification for this watchlist to verify configuration
Member table Sortable by name, date added, added by, sightings count

13.3.12 Page 12: AI Vibe Settings (/settings/ai)

The AI Vibe Settings page presents AI configuration as friendly questions rather than technical parameters.

# Setting Question Options Description
1 Detection Sensitivity "How carefully should the AI watch?" Relaxed / Balanced / High / Maximum Controls how aggressively the AI reports detections
2 Face Match Threshold "How confident should the AI be before naming someone?" Lenient / Normal / Strict / Very Strict Lower = more matches but more false positives
3 Night Mode "How should the AI behave at night?" Off / Diminished / Active / Enhanced Night-specific model and sensitivity adjustment
4 Evidence Capture "What should be saved when someone is detected?" Photo Only / Photo + 5s Clip / Photo + 10s Clip / Full Recording Media stored per detection event
5 Alert Style "When should alerts be sent?" Silent / Digest / Normal / Urgent / Critical Controls alert frequency and channels used
6 Learning Mode "Should the AI learn from new sightings?" Off / Review First / Auto-Learn Cautiously / Auto-Learn Aggressively How unknown face clusters are handled
7 Privacy Mode "How should privacy be handled?" Full Recognition / Blur Unrecognized / Blur All Faces / Privacy Zones Face processing and display privacy

Each setting control:

  • Segmented button group (pill-shaped options)
  • Selected option highlighted in accent blue
  • Brief description below updates on selection
  • Current value displayed as badge
  • Auto-save (no save button); toast confirms: "Detection Sensitivity updated to High"
  • Expand toggle reveals internal numerical values (Admin permission required)

Advanced Mode (Admin only): When expanded, each control shows the internal parameter values:

Setting Option Internal Value
Detection Sensitivity Relaxed Confidence threshold: 0.85, NMS: 0.5
Detection Sensitivity Balanced Confidence threshold: 0.70, NMS: 0.45
Detection Sensitivity High Confidence threshold: 0.55, NMS: 0.4
Detection Sensitivity Maximum Confidence threshold: 0.40, NMS: 0.35
Face Match Threshold Lenient Similarity threshold: 0.60
Face Match Threshold Normal Similarity threshold: 0.70
Face Match Threshold Strict Similarity threshold: 0.80
Face Match Threshold Very Strict Similarity threshold: 0.90

13.3.13 Page 13: Training Review (/training)

Interface for reviewing AI-suggested face clusters and approving them for model training.

Feature Specification
Suggestion cards Face cluster the AI is uncertain about — multiple face images + AI confidence + reason for suggestion
Card layout Grid of face thumbnails + confidence bar + suggestion reason ("Seen 8x at different cameras, high confidence match")
Actions per suggestion Approve (add to training data), Reject (not a valid cluster), Merge with Existing Person
Batch actions Select multiple suggestions for bulk Approve/Reject
Queue status "12 suggestions pending review" with progress bar
Filter By confidence level, camera, date range
History Tab showing previously reviewed suggestions with outcome
Training metrics Model accuracy trend, training data count, last training time

13.3.14 Page 14: System Health (/health)

Real-time system health monitoring dashboard.

Feature Specification
Status overview Large status indicator: All Systems Operational (green) / Degraded (yellow) / Critical (red)
Service cards Per-service status card: Video Capture, AI Inference, Database, Storage, Notifications, VPN
Per-service metrics Status dot, uptime percentage, last restart, CPU, memory
Camera health table All 8 cameras: stream status, FPS, bitrate, last seen, error count
System metrics CPU usage (%), memory usage (%), disk usage (%), network I/O
Logs viewer Recent system logs with severity filtering (DEBUG/INFO/WARNING/ERROR/CRITICAL); tail -f style auto-scroll
Refresh Auto-refresh every 30 seconds; manual refresh button
Historical view Toggle to show metrics history (last 1h, 6h, 24h, 7d)

13.3.15 Page 15: Notifications Settings (/settings/notifications)

Configuration interface for the notification system.

Feature Specification
Recipient groups Add/edit/delete groups; each group has name, Telegram chat IDs, WhatsApp numbers, alert preferences
Routing rules Visual rule builder with drag-and-drop condition blocks (camera, person, role, event_type, zone, time, day, severity, watchlist)
Quiet hours Schedule builder with day-of-week checkboxes, time range pickers, timezone selector
Template editor Edit message templates per alert type; live preview with sample data; variable reference panel
Delivery status Real-time view showing notification delivery states (pending/sent/delivered/failed)
Test buttons "Send Test Alert" per channel to verify configuration
DLQ viewer Dead letter queue entries with retry/discard actions

13.3.16 Page 16: Admin Users (/settings/users)

User management interface for administrators.

Feature Specification
Users table Username, email, role badge, status (Active/Inactive), last login, MFA status, actions menu
Add user Modal: username, email, role selector, password (or send invite link), MFA toggle
Edit user Role, status, force password change on next login, reset 2FA, session revocation
User activity log Login history (timestamp, IP, device), actions taken, settings changed
Bulk actions Deactivate multiple accounts simultaneously
Filter By role, status, last login date range
Sort By username, role, last login, created date
Pagination 25 users per page

13.3.17 Page 17: Camera Management (/settings/cameras)

Configuration interface for camera setup and zone management.

Feature Specification
Camera cards Name, status (Online/Offline/Disabled), IP/connection string, stream info (resolution, FPS), action buttons (Edit, Test, Disable)
Add camera Modal: name, location, stream URL, credentials, channel number, description
Edit camera All camera properties; test connection button
Zone configuration Interactive polygon drawing on live camera feed; zone name, color, sensitivity, type (Entrance/Restricted/Detection/Ignore)
Stream settings Resolution (720p/1080p), frame rate (5/10/15/25/30 FPS), codec (H.264/H.265), night mode toggle
Recording settings Continuous/event-triggered, retention policy, storage location
Camera ordering Drag to reorder cameras in grid layout

13.3.18 Page 18: Retention & Storage (/settings/storage)

Storage management and retention policy configuration.

Feature Specification
Storage overview Donut chart showing usage breakdown: Video recordings, Detection snapshots, Training data, System logs, Free space
Numerical values Total capacity / Used / Free; warning at > 80% (yellow), critical at > 95% (red)
Retention policies Dropdown per category: 7 days / 14 days / 30 days / 60 days / 90 days / 180 days / 365 days / Forever
Auto-cleanup Enable toggle + schedule time picker (daily at 03:00 default)
Actions "Save Settings", "Run Cleanup Now" (with confirmation), "Export Storage Report"
Growth projection Estimated days until full based on current growth rate
Storage alerts Configure alert thresholds (80% warning, 90% high, 95% critical)

13.4 Key User Flows

13.4.1 Flow 1: Daily Operator — Monitor & Respond

┌──────────────────────────────────────────────────────────────────────────────┐
│                  FLOW 1: DAILY OPERATOR (Monitor & Respond)                   │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: LOGIN                                                              │
│  ──────────────                                                              │
│  Enter username → Enter password → MFA code (if enabled)                    │
│  → Redirect to Dashboard                                                     │
│                                                                              │
│  STEP 2: DASHBOARD REVIEW (~30 seconds)                                     │
│  ─────────────────────────────────────                                       │
│  Glance at stat cards:                                                       │
│    ├─ All 8 cameras online? ✓                                               │
│    ├─ Any critical alerts pending? (red badge)                               │
│    ├─ Any unknown persons detected?                                          │
│    └─ System health OK?                                                      │
│                                                                              │
│  If critical alert visible:                                                  │
│    → Click alert card → Go to Alert Center                                  │
│  If no urgent alerts:                                                        │
│    → Click "Live View" in sidebar                                           │
│                                                                              │
│  STEP 3: LIVE CAMERA MONITORING (ongoing)                                   │
│  ─────────────────────────────────────                                       │
│  View 2x4 grid of all cameras                                               │
│  Observe feeds for anomalies                                                │
│                                                                              │
│  When alert toast appears (top-right):                                       │
│    → Toast slides in with sound notification                                 │
│    → Click toast to view alert details                                       │
│                                                                              │
│  STEP 4: ALERT RESPONSE                                                     │
│  ──────────────────                                                          │
│  Click alert toast OR navigate to Alert Center                               │
│  Review alert card:                                                          │
│    ├─ Thumbnail image                                                        │
│    ├─ Camera name, timestamp                                                 │
│    ├─ Alert type (unknown person, watchlist match, etc.)                     │
│    └─ Severity level                                                         │
│                                                                              │
│  Click "View Details" for full information:                                  │
│    ├─ Full-size image / video clip                                           │
│    ├─ AI confidence score                                                    │
│    ├─ Detection metadata (bounding box, zone)                                │
│    └─ Person profile link (if known)                                         │
│                                                                              │
│  DECISION:                                                                   │
│    ├─ False detection → Click "Mark as False Positive"                       │
│    ├─ Legitimate alert → Click "Acknowledge" or "Resolve"                    │
│    ├─ Unknown person → Click "Name This Person"                              │
│    ├─ Needs escalation → Click "Escalate"                                    │
│    └─ Need live view → Click "View Live" to jump to camera                   │
│                                                                              │
│  STEP 5: RETURN TO MONITORING                                               │
│  ────────────────────────────                                                │
│  After handling alert, return to Live View                                   │
│  Continue monitoring cycle                                                   │
│                                                                              │
│  STEP 6: END OF SHIFT                                                       │
│  ──────────────────                                                          │
│  Review unacknowledged alerts (if any)                                       │
│  Check System Health page                                                    │
│  Hand over to next operator (verbal + note any pending issues)               │
│  Click user menu → Logout                                                    │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.2 Flow 2: New Person Onboarding

┌──────────────────────────────────────────────────────────────────────────────┐
│                    FLOW 2: NEW PERSON ONBOARDING                              │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRIGGER: System detects unknown person → Alert created → Operator notified │
│                                                                              │
│  STEP 1: REVIEW DETECTION                                                    │
│  ────────────────────                                                        │
│  Navigate to "Recent Detections" via sidebar                                 │
│  Filter: "Unknown" (toggle button)                                           │
│  Click on unknown detection card                                             │
│                                                                              │
│  Detail view shows:                                                          │
│    ├─ Full-size face image                                                   │
│    ├─ Camera: CAM-01 (Entrance)                                              │
│    ├─ Timestamp: 2025-01-16 14:32:15                                        │
│    ├─ Confidence: 87.3%                                                      │
│    └─ AI note: "No matching person found in database"                        │
│                                                                              │
│  STEP 2: NAME THE PERSON                                                     │
│  ────────────────────                                                        │
│  Click "Name This Person" button                                             │
│  Modal dialog appears:                                                       │
│                                                                              │
│    ┌────────────────────────────────────┐                                   │
│    │  Name This Person                   │                                   │
│    │                                     │                                   │
│    │  Face: [thumbnail]                  │                                   │
│    │                                     │                                   │
│    │  Full Name *     [____________]     │                                   │
│    │  Role *          [Employee ▼]       │                                   │
│    │  Department      [____________]     │                                   │
│    │  Employee ID     [____________]     │                                   │
│    │  Notes           [____________]     │                                   │
│    │  Tags            [____________]     │                                   │
│    │                                     │                                   │
│    │  Similar existing persons:          │                                   │
│    │  [No similar persons found]         │                                   │
│    │                                     │                                   │
│    │  [Cancel]  [Save & Create Profile]  │                                   │
│    └────────────────────────────────────┘                                   │
│                                                                              │
│  STEP 3: SIMILARITY CHECK                                                    │
│  ────────────────────                                                        │
│  System searches for similar existing persons                                │
│  If matches found: display side-by-side comparison                           │
│    → Option to merge with existing person instead of creating new            │
│  If no matches: proceed with creation                                        │
│                                                                              │
│  STEP 4: SAVE PROFILE                                                        │
│  ──────────────                                                              │
│  Click "Save & Create Profile"                                               │
│  Toast notification: "Profile created for [Name]"                            │
│  Detection card updates with person name                                     │
│  Person now appears in Person Gallery                                        │
│                                                                              │
│  STEP 5: ADD TRAINING IMAGES (Optional)                                      │
│  ────────────────────────────────────                                        │
│  Navigate to Person Profile                                                  │
│  Click "Upload Reference Photos"                                             │
│  Select additional clear face images                                         │
│  System queues for model retraining                                          │
│  Toast: "3 new training images added. Model will retrain automatically."     │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.3 Flow 3: Unknown Person Review Queue

┌──────────────────────────────────────────────────────────────────────────────┐
│                  FLOW 3: UNKNOWN PERSON REVIEW QUEUE                          │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: OPEN REVIEW QUEUE                                                   │
│  ──────────────────────                                                      │
│  Sidebar → "Unknown Persons Review"                                          │
│  View: Grid of unknown person cluster cards                                  │
│  Header: "23 unknown clusters remaining"                                     │
│                                                                              │
│  STEP 2: SELECT CLUSTER                                                      │
│  ────────────────                                                            │
│  Click on a cluster card to expand                                           │
│  Shows:                                                                      │
│    ├─ Representative face (largest)                                          │
│    ├─ Gallery of all face instances in cluster                               │
│    ├─ Sighting history (camera, time, count)                                 │
│    ├─ AI pattern insight: "Seen 5x at entrance between 08:00-09:00"         │
│    └─ Confidence distribution graph                                          │
│                                                                              │
│  STEP 3: MAKE DECISION                                                       │
│  ────────────────                                                            │
│  Options:                                                                    │
│    ├─ [Name This Person] → Enter details → Create new profile               │
│    ├─ [Merge with Existing] → Search/select person → Confirm merge          │
│    ├─ [Ignore Cluster] → "False detection / not a person" → Remove         │
│    └─ [Mark Reviewed] → "Unsure, keep in queue for later"                   │
│                                                                              │
│  STEP 4: QUEUE UPDATES                                                       │
│  ────────────────                                                            │
│  Processed item removed from queue                                           │
│  Toast confirms action: "Cluster marked as [Name]. 22 remaining."            │
│  Auto-advance to next cluster (optional)                                     │
│  Keyboard shortcut: Right arrow → next cluster                               │
│                                                                              │
│  STEP 5: CONTINUE REVIEW                                                     │
│  ────────────────                                                            │
│  Process all clusters or stop and resume later                               │
│  Queue persists across sessions                                              │
│  New clusters automatically added as detected                                │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.4 Flow 4: AI Settings Adjustment

┌──────────────────────────────────────────────────────────────────────────────┐
│                    FLOW 4: AI SETTINGS ADJUSTMENT                             │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: NAVIGATE TO AI VIBE SETTINGS                                        │
│  ────────────────────────────────────                                        │
│  Sidebar → "AI Vibe Settings" (Sparkles icon)                                │
│  View: Scrollable page with 7 setting sections                               │
│                                                                              │
│  STEP 2: ADJUST DETECTION SENSITIVITY                                        │
│  ────────────────────────────────                                            │
│  Section: "How carefully should the AI watch?"                               │
│  Current: [Relaxed] [Balanced] [High] [Maximum]                              │
│  Change: Click "High"                                                        │
│  Description updates:                                                        │
│    "High: The AI will catch almost everything.                               │
│     Expect more alerts, including some false positives."                     │
│  Toast: "Detection Sensitivity updated to High"                              │
│  Change takes effect immediately                                             │
│                                                                              │
│  STEP 3: ADJUST ALERT STYLE                                                  │
│  ────────────────────                                                        │
│  Section: "When should alerts be sent?"                                      │
│  Current: [Silent] [Digest] [Normal] [Urgent] [Critical]                     │
│  Change: Click "Critical"                                                    │
│  Description updates:                                                        │
│    "Critical: Only truly important events trigger alerts.                    │
│     All other activity is logged but not alerted."                           │
│  Toast: "Alert Style updated to Critical"                                    │
│                                                                              │
│  STEP 4: REVIEW ADVANCED (Admin only)                                        │
│  ────────────────────────────────────                                        │
│  Click "Expand" on Advanced Settings                                         │
│  Shows internal values:                                                      │
│    Detection Sensitivity: High                                               │
│    └─ Confidence Threshold: 0.55                                             │
│    └─ NMS Threshold: 0.40                                                    │
│    └─ Model: yolo11m.onnx                                                    │
│  Admin can directly edit numerical values                                    │
│                                                                              │
│  STEP 5: DONE                                                                │
│  ────────                                                                    │
│  All changes auto-saved                                                      │
│  Return to monitoring — changes effective immediately                        │
└──────────────────────────────────────────────────────────────────────────────┘

13.4.5 Flow 5: Watchlist Alert Configuration

┌──────────────────────────────────────────────────────────────────────────────┐
│                 FLOW 5: WATCHLIST ALERT CONFIGURATION                         │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: NAVIGATE TO WATCHLISTS                                              │
│  ────────────────────────────                                                │
│  Sidebar → "Watchlists"                                                      │
│  View: Grid of existing watchlist cards                                      │
│  Default: VIP, Blacklist, Authorized, Temporary Access                      │
│                                                                              │
│  STEP 2: CREATE NEW WATCHLIST (Optional)                                     │
│  ────────────────────────────────────                                        │
│  Click "+ New Watchlist"                                                     │
│  Modal:                                                                      │
│    Name: [Security Escort Required]                                          │
│    Icon: [🛡️] (icon picker)                                                 │
│    Color: [Orange] (color picker)                                            │
│    Description: [People who require security escort]                         │
│    Click "Create"                                                            │
│  New watchlist card appears in grid                                          │
│                                                                              │
│  STEP 3: ADD MEMBERS                                                         │
│  ────────────────                                                            │
│  Click on watchlist card                                                     │
│  Click "Add from Gallery"                                                    │
│  Search/select persons to add:                                               │
│    [☑] John Doe                                                             │
│    [☑] Jane Smith                                                           │
│    [☐] Bob Johnson (not selected)                                            │
│  Click "Add to Watchlist"                                                    │
│  Toast: "2 persons added to Security Escort Required"                        │
│                                                                              │
│  STEP 4: CONFIGURE ALERTS                                                    │
│  ────────────────────                                                        │
│  Click "Settings" tab on watchlist detail                                    │
│  Configure:                                                                  │
│    Alert Timing:    [☑] Immediate    [☐] Delayed (___ min)                  │
│    Severity:        [☐] Inherit    [☑] Force Critical                       │
│    Notify Groups:   [☑] Security Team    [☐] Management                     │
│    Media:           [☑] Image    [☑] Video                                  │
│    Quiet Hours:     [☐] Respect global    [☑] Always alert                  │
│    Escalation:      [☑] Enable escalation (5/10/20 min)                     │
│  Click "Save"                                                                │
│                                                                              │
│  STEP 5: TEST                                                                │
│  ────────                                                                    │
│  Click "Test Alert" button                                                   │
│  System sends test alert through configured channels                         │
│  Verify: Telegram message received ✓                                         │
│  Verify: WhatsApp message received ✓                                         │
│  Watchlist is now active and monitoring                                      │
└──────────────────────────────────────────────────────────────────────────────┘

13.5 Component Specifications

13.5.1 Camera Feed Component

State Visual Interaction
Loading Centered spinner overlay, camera name visible None — wait for stream
Playing Live stream active, recording dot if applicable Click to focus, hover for controls
Paused Stream paused, large play button overlay Click to resume
Error Error icon + "Connection failed" + Retry button Click Retry to reconnect
Offline Gray placeholder with camera icon + "Offline" Shows last online timestamp
Disabled Grayed out with "Disabled" badge No stream attempted
Prop Type Required Default Description
cameraId string Yes Unique camera identifier (e.g., "cam-01")
name string Yes Display name shown as overlay
streamUrl string Yes HLS or WebRTC stream URL
status 'online' | 'offline' | 'reconnecting' | 'disabled' Yes Current camera status
layout 'grid' | 'fullscreen' No 'grid' Current layout mode
quality 'auto' | 'hd' | 'sd' No 'auto' Stream quality preference
showControls boolean No true Show overlay controls
onFocus (id: string) => void No Callback when camera is focused
onSnapshot (id: string) => void No Callback when snapshot is taken

13.5.2 Alert Card Component

Prop Type Required Description
id string Yes Alert unique identifier
severity 'critical' | 'high' | 'medium' | 'low' | 'info' Yes Alert severity level
type string Yes Alert type classification
cameraName string Yes Source camera display name
timestamp Date Yes When the alert occurred
thumbnail string No URL to thumbnail image
personName string No Identified person name (if known)
status 'pending' | 'acknowledged' | 'resolved' | 'ignored' Yes Current alert status
onAcknowledge () => void No Acknowledge callback
onResolve () => void No Resolve callback
onIgnore () => void No Ignore callback
onViewDetails () => void No View details callback

13.5.3 Stat Card Component

Prop Type Required Description
title string Yes Card label (e.g., "Cameras Online")
value string | number Yes Main displayed value (e.g., "8/8")
icon LucideIcon Yes Icon component from Lucide React
color 'green' | 'blue' | 'orange' | 'red' | 'purple' No Color theme (default: blue)
trend number No Percentage change from previous period
subtitle string No Secondary text below value
href string No Navigation link (e.g., to detail page)

13.6 Toast Notification System

Type Icon Color Duration Use Case
Success Check circle Green (#10B981) 3 seconds Action completed successfully
Error X circle Red (#EF4444) 5 seconds (or persistent) Action failed; may require user attention
Warning Alert triangle Orange (#F59E0B) 4 seconds Non-critical issue; may need attention
Info Info circle Blue (#3B82F6) 3 seconds Informational message
Alert Bell Red (#EF4444) Persistent (until dismissed) Critical alert notification

Toast behavior:

  • Appears in top-right corner
  • Stacks up to 5 toasts simultaneously
  • Older toasts pushed down when new ones arrive
  • Hovering pauses auto-dismiss timer
  • Click to dismiss immediately
  • Swipe right to dismiss (mobile)

13.7 Modal System

Size Width Use Case
Small 400px Confirmations, simple forms
Medium (default) 560px Standard forms, detail views
Large 800px Complex forms, image viewers
Fullscreen 100% Camera fullscreen, large data tables

Modal behavior:

  • Backdrop click to close (configurable)
  • Escape key to close (configurable)
  • Focus trap — Tab cycles within modal
  • Return focus to trigger element on close
  • Body scroll locked when modal open
  • Enter key submits primary action (forms)

13.8 Responsive Behavior

Breakpoint Width Layout Changes
xs < 576px Single column; stacked layouts; bottom tab bar; hamburger menu; camera grid 1x1 or 2x1
sm 576-767px Two column layouts; sidebar as overlay drawer; camera grid 2x2
md 768-991px Collapsed sidebar (72px); filters as drawer; camera grid 2x3; 3-column person gallery
lg 992-1199px Sidebar expanded (260px); full desktop layout; 4-column person gallery
xl 1200-1399px Full desktop layout; 5-column person gallery; 2x4 camera grid
xxl 1400px+ Max content width 1400px centered; all features visible

13.9 Keyboard Shortcuts

Shortcut Context Action
? Global Show keyboard shortcuts reference modal
/ Global Focus global search bar
Escape Global Close modal / exit fullscreen / deselect
F Live View Toggle fullscreen on focused camera
S Live View Take snapshot of focused camera
1-8 Live View Focus camera 1-8
Space Live View Pause/play focused camera stream
A Alert Center Acknowledge selected alert
R Alert Center Resolve selected alert
N Detections / Unknowns Name unknown person
Unknown Review Next cluster
Unknown Review Previous cluster
Ctrl+K Global Command palette (quick navigation)
Ctrl+Shift+A Global Acknowledge most recent alert
M Live View Toggle mute on camera audio
+ / - Timeline Zoom in / zoom out

13.10 Animation Guidelines

Animation Duration Easing Description
Page transition 200ms ease-out Fade in on route change
Modal open 250ms cubic-bezier(0.16, 1, 0.3, 1) Scale up + fade in
Modal close 150ms ease-in Scale down + fade out
Sidebar toggle 250ms ease-in-out Width transition 260px ↔ 72px
Toast slide-in 300ms ease-out Slide from right + fade in
Toast fade-out 200ms ease-in Fade out before removal
Card hover lift 150ms ease Subtle translateY(-2px) + shadow increase
Segmented slider 200ms ease Sliding background between options
Pulse (recording) 2s ease-in-out infinite Red dot opacity oscillation
Stats update 500ms ease Number count-up animation
Skeleton shimmer 1.5s linear infinite Shimmer gradient sweep
Alert flash 1s ease-out Border flash on camera with new alert
Camera focus 300ms ease-out Expand to fullscreen
Dropdown open 150ms ease-out Fade + slight translateY
Tooltip 100ms ease Fade in on hover

13.11 Technology Stack

Layer Technology Version Purpose
Framework React 18.x UI library
Meta-framework Next.js 14.x SSR, routing, API routes
Language TypeScript 5.x Type safety
Styling Tailwind CSS 3.x Utility-first CSS
Theme CSS Custom Properties Dark mode via dark class
UI Components shadcn/ui latest Base component library
Icons Lucide React latest Consistent icon set
State Management Zustand 4.x Lightweight global state
Data Fetching TanStack Query (React Query) 5.x Server state management
Real-time Socket.IO Client 4.x WebSocket for live updates
Video hls.js latest HLS stream playback
Video (WebRTC) native WebRTC stream fallback
Charts Recharts 2.x Data visualization
Date/Time date-fns 2.x Date formatting and manipulation
Forms React Hook Form 7.x Form state management
Validation Zod 3.x Schema validation
Zone Drawing SVG + native events Polygon drawing on camera feed
Testing Vitest 1.x Unit testing
E2E Testing Playwright 1.x Browser automation testing
Build Next.js built-in Production optimization

Section 14: Deployment Plan

14.1 Deployment Architecture Overview

The deployment architecture spans two physical environments: AWS cloud for centralized services and an Intel NUC edge gateway at the surveillance site. Both environments are connected via an encrypted WireGuard VPN tunnel. All deployments use containerization (Docker/Kubernetes) with GitOps-based continuous delivery.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         DEPLOYMENT ARCHITECTURE                               │
│                                                                              │
│    ┌────────────────────────────────────────────────────────────────────┐   │
│    │                        AWS CLOUD                                   │   │
│    │                                                                    │   │
│    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │   │
│    │  │ Route 53 │──▶  ALB     │──▶  EKS     │──▶  App Pods       │ │   │
│    │  │   DNS    │  │ TLS 1.3  │  │ Cluster  │  │  (FastAPI/Next)  │ │   │
│    │  └──────────┘  └──────────┘  └────┬─────┘  └──────────────────┘ │   │
│    │                                    │                              │   │
│    │  ┌──────────┐  ┌──────────┐  ┌────┴─────┐  ┌──────────────────┐ │   │
│    │  │   S3     │  │   RDS    │  │ ElastiCache│  │  MSK Kafka       │ │   │
│    │  │  Media   │  │ Postgres │  │  Redis    │  │  (Event Bus)     │ │   │
│    │  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘ │   │
│    │                                                                    │   │
│    │  ┌──────────────────────────────────────────────────────────────┐ │   │
│    │  │  WireGuard VPN Gateway (EC2)  ←────→  Edge Gateway          │ │   │
│    │  │  UDP 51820                    Tunnel   (Intel NUC, Site)     │ │   │
│    │  └──────────────────────────────────────────────────────────────┘ │   │
│    └────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│    ┌────────────────────────────────────────────────────────────────────┐   │
│    │                        EDGE SITE                                   │   │
│    │                                                                    │   │
│    │  ┌─────────────────────────────────────────────────────────────┐  │   │
│    │  │              Intel NUC (Ubuntu Server 22.04)                │  │   │
│    │  │                                                             │  │   │
│    │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │   │
│    │  │  │ Video Capture│  │ AI Inference │  │   MinIO      │    │  │   │
│    │  │  │  (RTSP/FFmpeg)│  │ (YOLO/Face) │  │  (Storage)   │    │  │   │
│    │  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │   │
│    │  │                                                             │  │   │
│    │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │   │
│    │  │  │    Redis     │  │  WireGuard   │  │ Node Exporter│    │  │   │
│    │  │  │   (Cache)    │  │   (VPN)      │  │  (Metrics)   │    │  │   │
│    │  │  └──────────────┘  └──────────────┘  └──────────────┘    │  │   │
│    │  │                                                             │  │   │
│    │  └─────────────────────────────────────────────────────────────┘  │   │
│    │                              │                                     │   │
│    │                    ┌─────────┴──────────┐                         │   │
│    │                    │  Camera LAN         │                         │   │
│    │                    │  CP PLUS DVR        │                         │   │
│    │                    │  192.168.29.200:554 │                         │   │
│    │                    │  (8 channels)       │                         │   │
│    │                    └─────────────────────┘                         │   │
│    └────────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────────┘

14.2 Cloud Deployment (AWS EKS)

14.2.1 EKS Cluster Configuration

Parameter Value Notes
Kubernetes version 1.28+ Latest stable at deployment
Control plane Managed by AWS Multi-AZ availability
Node group type Managed (EC2) t3.large for general, g4dn.xlarge for GPU
CNI Amazon VPC CNI Native VPC networking for pods
Ingress controller NGINX Ingress + cert-manager TLS termination at ALB
GitOps ArgoCD Declarative continuous deployment
Pod identity IRSA (IAM Roles for Service Accounts) No long-term AWS credentials

14.2.2 Cloud Service Resources

Service AWS Service Instance/Tier HA Mode Monthly Est.
Orchestration Amazon EKS Managed control plane Multi-AZ $73
Application nodes EC2 (t3.large) 3 nodes (on-demand) Multi-AZ spread $200
GPU nodes EC2 (g4dn.xlarge) 1 node (spot preferred) Single + auto-recovery $350
Database RDS PostgreSQL 15 db.r6g.xlarge Multi-AZ Multi-AZ with failover $520
Cache ElastiCache Redis cache.r6g.large (2 shards) Cluster mode $260
Message bus Amazon MSK kafka.m5.large (3 brokers) Multi-AZ $350
Object storage S3 Standard + IA + Glacier Cross-region replication $200
Load balancer ALB Application Load Balancer Multi-AZ $25
DNS Route 53 Hosted zone + health checks Global $15
VPN gateway EC2 (t3.micro) WireGuard endpoint Single (monitor for HA) $15
Secrets AWS Secrets Manager Vault integration Multi-AZ $10
Monitoring CloudWatch Logs + metrics + alarms Multi-AZ $50
Total ~$2,088/month

14.3 Edge Deployment (Intel NUC)

14.3.1 Edge Hardware Specification

Component Specification Notes
Device Intel NUC 13 Pro (or equivalent) Fanless preferred for reliability
CPU Intel Core i7-1360P (12 cores, 16 threads) Sufficient for 8 streams + AI inference
RAM 32 GB DDR4-3200 (2x16 GB) Dual channel for memory bandwidth
Storage (OS) 500 GB NVMe SSD (Samsung 980 Pro or equivalent) Fast boot and application loading
Storage (Data) 2 TB NVMe SSD (Samsung 990 Pro or equivalent) 7-day local recording buffer
Network Intel i226-V 2.5 GbE (dual port) Dual NIC for WAN + LAN separation
WiFi Disabled in BIOS Security — no wireless
Bluetooth Disabled in BIOS Security — no wireless
TPM TPM 2.0 enabled For LUKS auto-unseal
OS Ubuntu Server 22.04 LTS (minimal install) No desktop environment

14.3.2 Edge Docker Compose Configuration

version: "3.8"

services:
  # RTSP stream capture and frame extraction
  video-capture:
    image: sentinel/surveillance-video-capture:v2.3.1
    restart: unless-stopped
    network_mode: host
    environment:
      - DVR_IP=192.168.29.200
      - DVR_PORT=554
      - NUM_CHANNELS=8
      - FRAME_EXTRACT_FPS=1
      - RECORDING_SEGMENT_SEC=10
      - REDIS_HOST=localhost
      - REDIS_PORT=6379
      - MINIO_ENDPOINT=localhost:9000
    volumes:
      - /data/frames:/app/frames
      - /data/recordings:/app/recordings
      - ./secrets:/run/secrets:ro
    depends_on:
      - redis
      - minio
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"

  # AI inference service (lightweight edge models)
  ai-inference:
    image: sentinel/surveillance-ai-inference:edge-v2.3.1
    restart: unless-stopped
    runtime: nvidia  # If NVIDIA GPU available; fallback to CPU
    environment:
      - MODEL_PATH=/models
      - REDIS_HOST=localhost
      - REDIS_PORT=6379
      - MINIO_ENDPOINT=localhost:9000
      - INFERENCE_BATCH_SIZE=8
      - CONFIDENCE_THRESHOLD=0.7
      - NMS_THRESHOLD=0.45
    volumes:
      - ./models:/models:ro
      - /data/frames:/app/frames:ro
      - ./secrets:/run/secrets:ro
    depends_on:
      - redis
    deploy:
      resources:
        limits:
          cpus: '6.0'
          memory: 8G
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"

  # Local object storage (S3-compatible)
  minio:
    image: minio/minio:RELEASE.2024-latest
    restart: unless-stopped
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - /data/minio:/data
    environment:
      - MINIO_ROOT_USER_FILE=/run/secrets/minio_user
      - MINIO_ROOT_PASSWORD_FILE=/run/secrets/minio_password
    secrets:
      - minio_user
      - minio_password
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G

  # Local cache and Pub/Sub
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: >
      redis-server
      --requirepass """
      --appendonly yes
      --maxmemory 512mb
      --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    ports:
      - "127.0.0.1:6379:6379"
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M

  # WireGuard VPN client
  wireguard:
    image: linuxserver/wireguard:latest
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    environment:
      - PUID=1000
      - PGID=1000
    volumes:
      - ./wireguard-config:/config
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
    deploy:
      resources:
        limits:
          cpus: '0.25'
          memory: 64M

  # Metrics exporter for Prometheus
  node-exporter:
    image: prom/node-exporter:latest
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

volumes:
  redis_data:
    driver: local

secrets:
  minio_user:
    file: ./secrets/minio_user.txt
  minio_password:
    file: ./secrets/minio_password.txt

14.4 Configuration and Environment Variables

14.4.1 Environment Structure

Environment URL Pattern Data Purpose
Development *.dev.internal Synthetic test data Feature development, local testing
Staging *.staging.example.com Anonymized production-like data Integration testing, UAT
Production *.example.com Real operational data Live surveillance operations

14.4.2 Required Environment Variables

# ─── APPLICATION ───
APP_ENV=production                    # dev | staging | production
APP_NAME="Sentinel AI Surveillance"
APP_VERSION=2.3.1
APP_DEBUG=false
APP_SECRET_KEY=<random-256-bit-key>   # Used for session signing
LOG_LEVEL=INFO                         # DEBUG | INFO | WARNING | ERROR | CRITICAL

# ─── SERVER ───
API_HOST=0.0.0.0
API_PORT=8080
WORKERS=4                              # Uvicorn worker processes
TIMEZONE=Asia/Kolkata

# ─── DATABASE ───
DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/surveillance
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10
DB_POOL_TIMEOUT=30
DB_ECHO=false                          # Set true for SQL logging (dev only)

# ─── REDIS ───
REDIS_URL=redis://:password@redis-endpoint:6379/0
REDIS_POOL_SIZE=50
REDIS_SOCKET_TIMEOUT=5

# ─── OBJECT STORAGE (S3 or MinIO) ───
STORAGE_TYPE=s3                        # s3 | minio
STORAGE_ENDPOINT=s3.amazonaws.com
STORAGE_BUCKET=sentinel-surveillance-media
STORAGE_REGION=ap-south-1
STORAGE_ACCESS_KEY=<access-key>
STORAGE_SECRET_KEY=<secret-key>
STORAGE_SECURE=true
STORAGE_URL_EXPIRY=300                 # Signed URL expiry in seconds

# ─── DVR / CAMERA CONNECTION ───
DVR_IP=192.168.29.200
DVR_PORT=554
DVR_USERNAME=admin
DVR_PASSWORD=<dvr-password>
DVR_CHANNELS=8
DVR_STREAM_QUALITY=0                   # 0=main (high), 1=sub (low)
DVR_RTSP_TEMPLATE="rtsp://{user}:{pass}@{ip}:{port}/user={user}&password={pass}&channel={ch}&stream={quality}.sdp?"

# ─── AI MODELS ───
MODEL_PATH=/models
HUMAN_DETECTION_MODEL=yolo11m.onnx
FACE_DETECTION_MODEL=scrfd_10g_bnkps.onnx
FACE_RECOGNITION_MODEL=arcface_r100.onnx
CONFIDENCE_THRESHOLD=0.7
NMS_THRESHOLD=0.45
FACE_MATCH_THRESHOLD=0.70
UNKNOWN_CLUSTER_EPS=0.35
UNKNOWN_CLUSTER_MIN_SAMPLES=3

# ─── TELEGRAM NOTIFICATIONS ───
TELEGRAM_ENABLED=true
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_WEBHOOK_URL=https://api.example.com/webhooks/telegram
TELEGRAM_WEBHOOK_SECRET=<webhook-secret>
TELEGRAM_ADMIN_CHAT_ID=<admin-chat-id>

# ─── WHATSAPP NOTIFICATIONS ───
WHATSAPP_ENABLED=true
WHATSAPP_API_VERSION=v18.0
WHATSAPP_ACCESS_TOKEN=<access-token>
WHATSAPP_PHONE_NUMBER_ID=<phone-number-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN=<verify-token>
WHATSAPP_BUSINESS_ACCOUNT_ID=<business-account-id>

# ─── VPN ───
VPN_ENABLED=true
VPN_TYPE=wireguard
VPN_ENDPOINT=wg.example.com:51820
VPN_PUBLIC_KEY=<server-public-key>
VPN_PRIVATE_KEY=<client-private-key>
VPN_PRESHARED_KEY=<preshared-key>
VPN_ALLOWED_IPS=10.100.0.0/16
VPN_KEEPALIVE=25

# ─── AUTHENTICATION ───
JWT_SECRET_KEY=<ecdsa-private-key-pem>
JWT_PUBLIC_KEY=<ecdsa-public-key-pem>
JWT_ALGORITHM=ES256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=15
JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
MFA_REQUIRED_ROLES=super_admin,admin
MFA_ISSUER="Sentinel AI Surveillance"

# ─── MONITORING ───
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
GRAFANA_URL=https://grafana.example.com
SENTRY_DSN=<sentry-dsn>
HEALTH_CHECK_INTERVAL=30

# ─── RETENTION ───
RECORDING_RETENTION_DAYS=90
DETECTION_SNAPSHOT_RETENTION_DAYS=90
EVENT_LOG_RETENTION_DAYS=365
AUDIT_LOG_RETENTION_DAYS=365
TRAINING_DATA_RETENTION_DAYS=365
AUTO_CLEANUP_ENABLED=true
AUTO_CLEANUP_HOUR=3                    # 3:00 AM daily

# ─── SECURITY ───
CORS_ALLOWED_ORIGINS=https://app.example.com,https://staging.example.com
CSP_REPORT_ONLY=false
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
SESSION_MAX_AGE_HOURS=8
SESSION_IDLE_TIMEOUT_MINUTES=30

14.5 Rollout Stages

14.5.1 Stage 1: Foundation (Weeks 1-4)

Objective: Infrastructure, VPN connectivity, and core data layer operational.

Week Tasks Deliverables Success Criteria
1 AWS account setup, VPC creation (3 AZs), EKS cluster deployment, IAM roles Cloud network ready VPC flow logs active; EKS nodes Ready
1 RDS PostgreSQL Multi-AZ, ElastiCache Redis cluster Data layer ready DB connections successful; replication lag < 1s
2 S3 buckets (media, backups, logs), lifecycle policies, CORS Storage ready Upload/download test successful
2 WireGuard VPN gateway (EC2), key generation, firewall rules VPN endpoint ready Tunnel handshake successful
3 Edge gateway: OS install, hardening, Docker, WireGuard client Edge device ready Edge connects to cloud over VPN
3 Edge services: MinIO, Redis, video capture container Edge services running RTSP streams reachable from edge
4 Database schema migration (29 tables), seed data (admin user, 8 cameras) Database ready Schema matches design; seed data present
4 Monitoring: Prometheus, Grafana, CloudWatch dashboards Monitoring active Dashboards accessible; metrics flowing
4 End-to-end connectivity test Full pipeline verified Video from DVR → Edge → Cloud (VPN) → S3

Milestone M1 — Infrastructure Ready (End of Week 4):

  • All cloud services deployed and healthy
  • VPN tunnel established and stable (< 100ms latency)
  • Edge gateway online, all Docker services running
  • Database schema deployed with migrations and seed data
  • All 8 camera streams reachable from edge
  • Basic monitoring and alerting in place

14.5.2 Stage 2: Core AI Pipeline (Weeks 5-8)

Objective: Video ingestion, AI detection, face recognition, and basic API operational.

Week Tasks Deliverables Success Criteria
5 Video capture service: RTSP ingestion, frame extraction, segment recording Stream ingestion working All 8 streams connected; FPS > 5 per stream
5 Kafka topic setup, stream ingestion producer Event streaming ready Frames published to Kafka
6 AI Inference Service: YOLO (human detection), SCRFD (face detection) Detection models running mAP > 0.90 for human detection
6 Detection event storage in PostgreSQL Detection database working Events queryable via API
7 ArcFace (face recognition) model deployment, embedding generation, pgvector Face recognition working Rank-1 accuracy > 95% on test set
7 Person matching logic: known person lookup, unknown person handling Person matching working Correct identification in < 100ms
8 FastAPI core: health endpoints, camera endpoints, detection endpoints API core functional All endpoints return correct data
8 Basic authentication: login, JWT token issuance, password hashing Auth working Login → token → authenticated requests

Milestone M2 — AI Pipeline Operational (End of Week 8):

  • All 8 camera streams ingesting at target FPS
  • Human detection, face detection, and face recognition operational
  • Detection events stored and queryable
  • Person matching (known/unknown) working
  • Basic REST API serving authenticated requests
  • End-to-end: Camera → Detection → Database → API

14.5.3 Stage 3: Application Layer (Weeks 9-12)

Objective: Web dashboard, alerting, notifications, and person management operational.

Week Tasks Deliverables Success Criteria
9 Next.js project setup, design system, Tailwind config, dark theme Frontend foundation Login page renders correctly
9 Authentication flow: login form, MFA input, token management, logout Auth UI working Full login → dashboard flow
10 Dashboard page: stat cards, alert chart, camera grid, activity feed Dashboard live All widgets populated with real data
10 Live camera view: HLS player, grid layout, fullscreen, camera controls Live view working All 8 streams visible, playable
10 Alert engine: rule evaluation, severity assignment, routing Alert generation working Alerts created within 5s of detection
11 Telegram integration: bot setup, message templates, inline keyboards Telegram alerts working Test alert received in Telegram
11 WhatsApp integration: template messages, session messages WhatsApp alerts working Test template message received
11 Person management: gallery, profile, CRUD, face matching display Person management working Person created, detected, viewed
12 Unknown review queue: cluster display, naming, merging, ignore Review queue working Unknown person processed through queue
12 Watchlists: CRUD, member management, alert routing Watchlists working Watchlist match triggers correct alert
12 WebSocket: real-time alert feed, dashboard updates Real-time working Alerts appear without page refresh

Milestone M3 — Application Live (End of Week 12):

  • Web dashboard accessible with live camera feeds
  • Alerts generated and delivered via Telegram and WhatsApp
  • Person management (add, view, match, review unknowns) working
  • Watchlist alerts functional with correct routing
  • Real-time updates via WebSocket
  • All RBAC permissions enforced in UI

14.5.4 Stage 4: Intelligence (Weeks 13-16)

Objective: Night mode, training pipeline, self-learning, and advanced features.

Week Tasks Deliverables Success Criteria
13 Night mode: low-light model training, deployment, auto-scheduling Night mode working Detection mAP > 0.75 in < 5 lux conditions
13 AI Vibe Settings page: all 7 controls, auto-save, advanced mode Settings page working All controls functional, changes effective immediately
14 Training pipeline: data collection, model training job, evaluation Training pipeline working Model accuracy improves with new training data
14 Model versioning: A/B testing, shadow mode, promotion workflow Model management working Blue/green model deployment
15 Self-learning service: automatic unknown clustering, suggestions Self-learning working Suggestions generated for unknown clusters
15 Privacy mode: face blurring, privacy zones, per-camera settings Privacy mode working Faces blurred according to settings
15 Suspicious activity detection: pattern rules, anomaly scoring Advanced alerts working Anomaly alerts generated for unusual behavior
16 Search service: face similarity search, text search, filters Search working Results returned in < 500ms
16 System health dashboard: service cards, metrics, logs viewer Health dashboard working All systems visible with status

Milestone M4 — Intelligence Features Live (End of Week 16):

  • Night mode detection operational
  • Training pipeline runs and improves models
  • Self-learning suggestions appear in review queue
  • Privacy modes configurable and effective
  • Suspicious activity alerts functional
  • Search returns results in acceptable time
  • All AI Vibe Settings controls operational

14.5.5 Stage 5: Hardening (Weeks 17-20)

Objective: Security hardening, testing framework, operations readiness, production go-live.

Week Tasks Deliverables Success Criteria
17 Security penetration test (external vendor) Pen test report All critical/high findings addressed
17 SAST/DAST scans, dependency vulnerability scan Scan reports Zero critical vulnerabilities
17 Self-test framework: 21 test suites, scheduling, reporting Testing framework deployed All test suites execute successfully
18 Backup configuration: pgBackRest, S3 sync, restore procedures Backup system ready Restore test successful
18 DR environment setup, failover procedures, quarterly drill schedule DR ready DR failover test: RTO < 1 hour
18 Incident response runbooks: 5 documented procedures Runbooks complete All scenarios documented
19 Load testing: 8/16/32/64 camera simulation Load test report System handles 64 cameras within SLA
19 Performance tuning: database queries, API response times, cache optimization Tuning complete p95 API response < 200ms
19 Operations team training: system overview, runbooks, escalation procedures Team trained Training sign-off complete
19 98-item go-live checklist review Checklist complete All items pass
20 Final readiness review, security sign-off, management approval Go approval All stakeholders sign off
20 Production DNS cutover, monitoring, 72-hour stability period Production live 72-hour stability confirmed

Milestone M5 — Production Go-Live (End of Week 20):

  • Security audit complete with all findings addressed
  • Self-test framework passing (score >= 85)
  • DR tested and verified (RTO < 1 hour, RPO < 15 minutes)
  • Operations team trained and runbooks reviewed
  • Load test passed at 64-camera target
  • 98-item go-live checklist: all items complete
  • System stable in production for 72+ hours

14.6 Kubernetes Manifests Overview

Resource Type Name Purpose Namespace
Deployment api FastAPI application server (3 replicas) sentinel
Deployment ai-inference AI model serving (GPU node) sentinel
Deployment video-capture RTSP stream ingestion (edge) sentinel
Deployment alert-engine Alert generation and routing sentinel
Deployment notification-service Telegram/WhatsApp delivery sentinel
Deployment frontend Next.js web application sentinel
Deployment websocket WebSocket real-time server sentinel
StatefulSet redis Session cache and Pub/Sub sentinel-data
Service api-service Internal API access (ClusterIP) sentinel
Service ai-service AI inference access (ClusterIP) sentinel
Service frontend-service Web app access (ClusterIP) sentinel
Ingress sentinel-ingress External HTTPS routing sentinel
ConfigMap app-config Application configuration sentinel
ConfigMap nginx-config Ingress/Nginx configuration sentinel
Secret app-secrets Encrypted secrets (Vault agent injector) sentinel
Secret tls-cert TLS certificate (cert-manager) sentinel
HPA api-hpa Auto-scale API: 3-10 replicas sentinel
HPA ai-hpa Auto-scale AI: 1-4 replicas sentinel
NetworkPolicy default-deny Block all unauthorized traffic sentinel
NetworkPolicy allow-api API ingress rules sentinel
NetworkPolicy allow-ai AI service communication rules sentinel
PodDisruptionBudget api-pdb Ensure 2 API pods minimum sentinel
ServiceMonitor api-metrics Prometheus scraping config sentinel-monitoring
PrometheusRule alert-rules Alerting rules for platform sentinel-monitoring

14.7 VPN Setup Procedure

14.7.1 Cloud VPN Gateway Setup

#!/bin/bash
# cloud-vpn-setup.sh — Run on cloud VPN EC2 instance

# 1. System preparation
sudo apt update && sudo apt install -y wireguard wireguard-tools iptables-persistent

# 2. Generate WireGuard keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey

# 3. Create WireGuard configuration
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.1/24
ListenPort = 51820
PrivateKey = <CLOUD_PRIVATE_KEY>
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

# Edge Gateway peer
[Peer]
PublicKey = <EDGE_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
AllowedIPs = 10.200.0.2/32, 192.168.29.0/24
PersistentKeepalive = 25
EOF

# 4. Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf

# 5. Start WireGuard
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# 6. Verify
sudo wg show
ping -c 3 10.200.0.2

14.7.2 Edge VPN Client Setup

#!/bin/bash
# edge-vpn-setup.sh — Run on Intel NUC edge gateway

# 1. Install WireGuard
sudo apt update && sudo apt install -y wireguard wireguard-tools

# 2. Generate keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey

# 3. Configure
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.2/32
PrivateKey = <EDGE_PRIVATE_KEY>
DNS = 10.100.0.2

[Peer]
PublicKey = <CLOUD_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
Endpoint = <CLOUD_PUBLIC_IP>:51820
AllowedIPs = 10.100.0.0/16, 10.200.0.0/24
PersistentKeepalive = 25
EOF

# 4. Start and enable
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# 5. Verify connectivity
ping -c 3 10.200.0.1        # Cloud VPN gateway
ping -c 3 10.100.0.2        # Cloud DNS/internal service

14.8 Database Initialization

14.8.1 Migration Strategy

Database migrations are managed with Alembic (SQLAlchemy) and executed as Kubernetes init containers before application startup:

initContainers:
  - name: db-migrations
    image: sentinel/surveillance-api:v2.3.1
    command: ["alembic", "upgrade", "head"]
    env:
      - name: DATABASE_URL
        valueFrom:
          secretKeyRef:
            name: db-credentials
            key: url
    resources:
      limits:
        cpu: "500m"
        memory: "256Mi"
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false

Migration Rules:

Rule Implementation
Backward compatibility All migrations must be backward-compatible within a release
Destructive changes 2-phase deployment: add new column in release N, drop old in release N+1
Automatic execution Migrations run automatically before application startup via init container
Health check Migration status exposed via /health/ready endpoint
Rollback alembic downgrade script available for emergency rollback
Version tracking alembic_version table tracks current schema version

14.9 SSL Certificate Setup

14.9.1 cert-manager Configuration

# ClusterIssuer for Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: nginx
        selector: {}

---
# Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: sentinel-tls
  namespace: sentinel
spec:
  secretName: sentinel-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - app.example.com
    - api.example.com
    - ws.example.com
  usages:
    - digital signature
    - key encipherment
  privateKey:
    algorithm: ECDSA
    size: 256

Section 15: Testing Plan

15.1 Testing Strategy Overview

The testing strategy encompasses five levels of testing, from isolated unit tests to full system end-to-end validation. The goal is comprehensive coverage of all functional and non-functional requirements with automated execution in CI/CD.

┌──────────────────────────────────────────────────────────────────────────────┐
│                         TESTING PYRAMID                                       │
│                                                                              │
│                        ┌─────────┐                                          │
│                        │  E2E    │  ~20 tests                               │
│                        │ Tests   │  Full system scenarios                   │
│                        ├─────────┤                                          │
│                      ┌─────────────┐                                        │
│                      │ Integration │  ~100 tests                            │
│                      │   Tests     │  Service-to-service                    │
│                      ├─────────────┤                                        │
│                   ┌───────────────────┐                                    │
│                   │    Unit Tests      │  ~300 tests                        │
│                   │  (Components, AI)  │  Isolated functions                  │
│                   └───────────────────┘                                    │
└──────────────────────────────────────────────────────────────────────────────┘

15.2 Unit Testing Strategy

Component Framework Coverage Target Mock Strategy CI Execution
API backend (Python) pytest + pytest-asyncio 85%+ pytest-mock, moto (AWS),responses (HTTP) Every commit
Frontend (React/TS) Vitest + React Testing Library 80%+ MSW (API mocking), jsdom Every commit
AI models (Python) pytest 70%+ (model logic) Mock inference engine, fixture data Every commit
Database models pytest + asyncpg 80%+ testcontainers-postgres Every commit
Notification adapters pytest 80%+ responses library for HTTP mocking Every commit

15.3 Integration Testing

Integration Pair Scope Framework Strategy
API + Database CRUD operations, transactions, query performance pytest + testcontainers PostgreSQL container per test run
API + Redis Caching, Pub/Sub, session storage pytest + Redis container Redis container per test run
API + S3/MinIO Media upload, download, presigned URLs pytest + LocalStack S3 mock via LocalStack
Alert Engine + Router Rule evaluation, routing decisions pytest Mock channel adapters
Telegram Adapter Message formatting, API calls, error handling pytest + responses HTTP request/response mocking
WhatsApp Adapter Template rendering, API calls, error handling pytest + responses HTTP request/response mocking
Auth + Database User CRUD, password hashing, session management pytest + testcontainers Full auth flow testing

15.4 System Testing (End-to-End)

# Scenario Steps Expected Result
1 Full detection pipeline Trigger motion → verify detection stored → verify alert created → verify notification sent All components process correctly within SLA
2 Person recognition flow Known person walks by → verify face detected → verify identity matched → verify no false alert Correct person identified with > 95% confidence
3 Unknown person flow Unknown person detected → verify "Unknown" classification → verify review queue updated Unknown queued for operator review within 5 seconds
4 Watchlist alert (blacklist) Blacklist person detected → verify immediate critical alert → verify notification to security team Alert within 5 seconds, correct severity, all channels
5 Night mode detection Low-light detection scenario → verify night model used → verify detection confidence acceptable Detection mAP > 0.75 in < 5 lux conditions
6 Privacy mode Enable privacy mode → verify face blurring in live view → verify no face recognition occurs Faces blurred, no biometric processing
7 Alert escalation Create critical alert → don't acknowledge → verify escalation levels trigger at correct times Level 1 at 5min, Level 2 at 10min, Level 3 at 20min
8 VPN failure recovery Disconnect VPN → verify local operation continues → reconnect VPN → verify sync resumes No data loss; automatic recovery
9 Database failover Trigger RDS failover → verify application continues → verify no data loss < 60 second downtime; zero data loss
10 Complete user flow Login → view dashboard → view live cameras → receive alert → acknowledge → logout All pages load; all actions succeed

15.5 Load Testing Plan

Scenario Camera Count Duration Users Target Metrics
Baseline 8 1 hour 5 concurrent Establish baseline metrics
Scale-up 16 2 hours 10 concurrent Verify 2x capacity; p95 latency < 500ms
Scale-up 32 2 hours 20 concurrent Verify 4x capacity; auto-scaling triggers
Stress test 64 1 hour 50 concurrent Find breaking point; error rate < 1%
Sustained 8 24 hours 5 concurrent Memory leak detection; stability verification
Spike test 8→64→8 30 minutes Ramp up/down Verify auto-scaling response time

15.6 Failover Testing

Test Case Description Pass Criteria
API pod failure Kill 1 API pod Traffic routed to healthy pods; zero failed requests
Database failover Trigger RDS Multi-AZ failover < 60s downtime; no data loss; connections re-established
Redis failure Restart Redis cluster Session recovery; cache warm within 5 minutes
VPN tunnel failure Disconnect WireGuard Auto-reconnect within 30s; streams resume
Edge gateway restart Reboot edge device Full recovery within 5 minutes; all streams reconnect
AI inference failure Kill inference container Queue buffers frames; recovery < 30s; no frame loss
Complete cloud failure Simulate region outage DR test: RTO < 1 hour; RPO < 15 minutes

15.7 Security Testing

Test Type Tool Scope Frequency Gate
Static Analysis (SAST) Bandit, Semgrep Source code Every commit Block on HIGH/CRITICAL
Dependency Scan Snyk, pip-audit All dependencies Daily Block on HIGH/CRITICAL
Container Image Scan Trivy Docker images Every build Block on HIGH/CRITICAL
Dynamic Analysis (DAST) OWASP ZAP Running application Weekly Review findings
Penetration Test External vendor Full stack Quarterly All findings addressed
TLS Configuration testssl.sh SSL/TLS endpoints Monthly Grade A+ required
API Security OWASP ZAP API scan All REST endpoints Weekly Review findings
Secrets Scan TruffleHog, GitLeaks Git repositories Every commit Block on findings

15.8 AI Pipeline Testing

Test Description Target Metric Test Data
Human detection accuracy Evaluate YOLO on held-out test set mAP > 0.90 1000 labeled frames
Face detection accuracy Evaluate SCRFD on test set Detection rate > 0.85 500 labeled face images
Face recognition accuracy Evaluate ArcFace on test set Rank-1 accuracy > 0.95 200 person gallery
False positive rate Measure incorrect person matches < 2% Simulated impostor set
False negative rate Measure missed person matches < 5% Known person test set
Inference latency Measure end-to-end processing < 200ms per frame (p95) Benchmark suite
Night mode accuracy Test low-light detection mAP > 0.75 200 low-light frames
Batch processing Test throughput at batch size 8 > 40 FPS aggregate Benchmark suite

15.9 Notification Testing

Test Description Verification
Telegram delivery Send test alert via Telegram Message received; formatting correct; buttons functional
WhatsApp delivery Send test alert via WhatsApp Template message received; parameters correct
Routing rules Trigger alert matching specific rule Delivered to correct recipients only
Quiet hours Send alert during quiet hours Non-critical suppressed; critical bypasses
Escalation Leave critical alert unacknowledged Escalation notifications at correct thresholds
Rate limiting Trigger burst of 50 alerts Rate limiting applied; no provider blocks
Media attachments Send alert with image + video Media processed to correct size; delivered
Delivery tracking Verify webhook receipts Status updated correctly in dashboard
DLQ handling Force 5 failed deliveries Messages moved to DLQ; admin notification sent

15.10 Test Environments

Environment Data Purpose Pipeline Stage
Local dev Synthetic (10 cameras, 100 persons) Developer testing Pre-commit
CI Synthetic (generated per run) Automated test execution Every commit
Staging Anonymized production-like (8 cameras, 500 persons) Pre-production validation Post-merge
Load test Generated (64 cameras, 10,000 persons) Performance testing Weekly schedule
DR Minimal (2 cameras, 10 persons) Disaster recovery validation Quarterly

15.11 CI/CD Pipeline for Testing

┌─────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│  Push   │──▶│   Lint   │──▶│  Unit    │──▶│  SAST    │──▶│  Build   │
│         │   │ + Format │   │  Tests   │   │ + Scan   │   │  Images  │
└─────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘
                                                                   │
                                                                   ▼
┌─────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│ Deploy  │◀──│   E2E    │◀──│   DAST   │◀──│  Image   │◀──│  Push    │
│Staging  │   │  Tests   │   │  Scan    │   │  Scan    │   │ Registry │
└─────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘
      │
      ▼
┌─────────┐
│ Deploy  │ (Manual approval required)
│   Prod  │
└─────────┘
Stage Tools Coverage Gate Duration
Lint + Format ruff, black, mypy, ESLint, Prettier Zero lint errors 30s
Unit Tests pytest, Vitest 80%+ coverage 3 min
SAST + Secrets Bandit, Semgrep, TruffleHog No HIGH/CRITICAL 2 min
Build Docker buildx Build succeeds 5 min
Image Scan Trivy, Snyk No HIGH/CRITICAL CVEs 2 min
DAST OWASP ZAP No HIGH/CRITICAL findings 10 min
E2E Tests Playwright, pytest All scenarios pass 8 min
Deploy Staging ArgoCD Health checks pass 3 min

Section 16: Self-Test Framework

16.1 Framework Architecture

The Self-Test Framework is a standalone FastAPI service that continuously validates platform health and readiness through automated test execution.

┌──────────────────────────────────────────────────────────────────────────────┐
│                     SELF-TEST FRAMEWORK ARCHITECTURE                          │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                    TEST ORCHESTRATOR (FastAPI)                       │   │
│   │                                                                      │   │
│   │   Scheduler        Queue         Executor       Aggregator          │   │
│   │   (cron/APScheduler) │           (asyncio)        │                 │   │
│   │        │            │            │               │                  │   │
│   │   15m health ◄─────┼────────────┼───────────────┤                  │   │
│   │   Daily 3am  ◄─────┼────────────┼───────────────┤                  │   │
│   │   On-demand  ◄─────┼────────────┼───────────────┤                  │   │
│   │                      │            │               │                  │   │
│   │                      ▼            ▼               ▼                  │   │
│   │              ┌─────────────────────────────────────┐                 │   │
│   │              │         Reporter + Storage           │                 │   │
│   │              │  PostgreSQL + S3 (evidence)          │                 │   │
│   │              └─────────────────────────────────────┘                 │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                     21 TEST SUITES (170+ CASES)                      │   │
│   │                                                                      │   │
│   │   Infrastructure (TC-01..04)    │   Core AI (TC-05..10)             │   │
│   │   Alerts (TC-11..13)            │   Search (TC-14)                  │   │
│   │   Training (TC-15)              │   Security (TC-16..17)            │   │
│   │   Resilience (TC-18..21)        │                                   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────────┘

16.2 Test Suite Catalog (21 Suites)

Suite ID Name Tests Priority Description
TC-INF-01 DVR Connectivity 8 P0 RTSP handshake, stream access, credential validation
TC-INF-02 VPN Health 6 P0 Tunnel status, latency, packet loss, throughput
TC-INF-03 Database Health 8 P0 Connection pool, query performance, replication lag
TC-INF-04 Storage Health 7 P0 Disk space, read/write performance, object storage
TC-STR-05 Camera Stream Access 10 P0 All 8 channels streaming, FPS, bitrate verification
TC-STR-06 Live Streaming 6 P1 HLS stream delivery to browsers, latency check
TC-AI-07 Human Detection 12 P0 YOLO accuracy, confidence thresholds, edge cases
TC-AI-08 Face Detection 10 P0 SCRFD accuracy, face bounding box quality
TC-AI-09 Face Recognition 12 P0 ArcFace embeddings, person matching accuracy
TC-AI-10 Unknown Clustering 8 P1 Face grouping quality, similarity thresholds
TC-ALT-11 Alert Generation 10 P0 Rule evaluation, severity assignment, routing
TC-ALT-12 Telegram Delivery 8 P1 Message delivery, formatting, media, error handling
TC-ALT-13 WhatsApp Delivery 8 P1 Template delivery, session messages, error handling
TC-CAP-14 Image Capture 6 P1 Frame extraction quality, storage, metadata
TC-CAP-15 Video Clip Capture 6 P1 Clip generation, compression, storage
TC-SEA-16 Search Retrieval 8 P1 Face search accuracy, text search, performance
TC-TRA-17 Training Workflow 8 P2 Model retraining, evaluation, deployment
TC-SEC-18 Admin Login Security 10 P0 Auth flow, MFA, session management, brute force
TC-SEC-19 RBAC Enforcement 12 P0 Permission checks, role-based access, resource-level
TC-RES-20 Restart Recovery 8 P1 Service restart, state recovery, data integrity
TC-RES-21 Load Handling 7 P1 8/16/32/64 camera simulation, throughput

Total: 21 suites, 170 test cases

16.3 Test Scheduling

Schedule Suites Trigger Notification
Every 15 minutes Infrastructure (TC-01..04) APScheduler cron Alert on failure
Daily at 03:00 UTC All 21 suites APScheduler cron Full report via email + Slack
On-demand Any subset Admin API call Immediate report
Post-deployment Critical path (TC-01,05,07,11,18) CI/CD webhook Pipeline gate
Weekly (Sunday 04:00) Full suite + extended load tests APScheduler cron Weekly report

16.4 Production Readiness Scoring

Base Score: 100.0

Deductions:
  P0 failure:  -20.0 points each
  P1 failure:  -10.0 points each
  P2 failure:  -5.0 points each
  P3 failure:  -2.0 points each

Minimum score: 0.0
Maximum score: 100.0
Verdict Score Range Meaning Recommended Action
GO 95.0 - 100.0 All critical systems healthy Proceed with confidence
GO WITH CAVEATS 85.0 - 94.9 Minor issues, non-critical Proceed with monitoring plan
CONDITIONAL GO 70.0 - 84.9 Significant issues Fix P1 issues before deployment
NO-GO 0.0 - 69.9 Critical failures Do not deploy; address P0 issues first

16.5 Report Generation

Format Use Case Generation Retention
JSON API Programmatic consumption, CI/CD integration Immediate 90 days
HTML Dashboard Web-based viewing, trend analysis ~5 seconds 90 days
PDF Report Email distribution, compliance archiving ~30 seconds 1 year

Section 17: Sample Self-Test Report

17.1 Report Header

================================================================================
           SENTINEL AI SURVEILLANCE PLATFORM — SELF-TEST REPORT
================================================================================

Report ID:          STR-20250116-030015
Generated:          2025-01-16 03:00:15 UTC
Environment:        production
Version:            v2.3.1
Triggered By:       Scheduled (Daily 3:00 AM)
Duration:           18 minutes 42 seconds
Overall Status:     GO WITH CAVEATS

17.2 Executive Summary

Metric Value
Verdict GO WITH CAVEATS
Production Readiness Score 94.8 / 100
Total Test Cases 170
Passed 168 (98.8%)
Failed 2 (1.2%)
Skipped 0 (0.0%)
Previous Run Score 97.2 / 100
Score Change -2.4 (downward)

Priority Breakdown:

Priority Total Passed Failed Pass Rate
P0 (Critical) 42 42 0 100.0%
P1 (High) 70 68 2 97.1%
P2 (Medium) 38 38 0 100.0%
P3 (Low) 20 20 0 100.0%

17.3 System Metrics at Test Time

Metric Value Status
Active Cameras 8 / 8 Online
Stream FPS (avg) 28.5 Normal
AI Inference Latency (p95) 42ms Normal
Detection Rate (last hour) 47 events Normal
Database Connections 18 / 100 Healthy
Storage Usage 67% Healthy
VPN Latency 12ms Excellent
API Response Time (p95) 78ms Normal
Telegram Delivery Rate (24h) 99.2% Healthy
WhatsApp Delivery Rate (24h) 99.8% Healthy

17.4 Failed Test Cases

Failure 1: TC-ALT-12-004 — Telegram Media Group Delivery

Field Value
Test Case TC-ALT-12-004
Suite Telegram Delivery (TC-ALT-12)
Priority P1
Status FAILED
Duration 12,450 ms
Severity Medium

Description: Verify that media group (multiple images) is delivered correctly via Telegram when an alert contains multiple evidence images.

Expected Result: All 3 images delivered as a media group album within 10 seconds.

Actual Result: Only 2 of 3 images delivered. Third image failed with error: telegram_api_error: Request Entity Too Large (413). Image size after processing: 10.8 MB (exceeds Telegram's 10 MB per-image limit for media groups).

Root Cause: The media processing pipeline resizes images to 1280x720 but does not enforce a hard 10 MB per-image cap for Telegram media groups. The iterative quality reduction loop stops at quality 50 but can still produce files > 10 MB.

Recommended Fix: Add a hard size cap check after image processing. If image exceeds 10 MB after quality reduction to 50%, apply additional compression (reduce dimensions or use WebP format).

Workaround: Single-image delivery mode works correctly. Multi-image alerts temporarily deliver images individually.


Failure 2: TC-RES-20-006 — AI Inference Recovery After Simulated Crash

Field Value
Test Case TC-RES-20-006
Suite Restart Recovery (TC-RES-20)
Priority P1
Status FAILED
Duration 65,200 ms
Severity Medium

Description: Verify that the AI inference service recovers and resumes processing within 60 seconds after a simulated process crash.

Expected Result: AI inference pod restarts and resumes processing frames within 60 seconds for all 8 cameras.

Actual Result: Pod restarted successfully (18 seconds), but detection did not resume for Camera 3 and Camera 7. Other 6 cameras resumed within 45 seconds. Root cause: model warm-up process failed due to a race condition in GPU memory allocation during concurrent channel initialization.

Root Cause: All 8 channel processors attempt to load the face recognition model simultaneously. On resource-constrained edge hardware, this causes OOM for channels that lose the initialization race.

Recommended Fix: Implement shared model loading — load each model once and share across all channel processors. Add initialization semaphore.

Workaround: Manual restart of affected channel processors via admin API.

17.5 Trending (Last 14 Days)

Date Score Verdict Notes
2025-01-02 96.5 GO
2025-01-03 98.2 GO
2025-01-04 97.1 GO
2025-01-05 98.8 GO
2025-01-06 97.5 GO
2025-01-07 98.2 GO
2025-01-08 96.8 GO TC-RES-21 had 1 P3 failure
2025-01-09 97.2 GO
2025-01-10 98.2 GO
2025-01-11 97.5 GO
2025-01-12 98.2 GO
2025-01-13 97.2 GO
2025-01-14 98.2 GO
2025-01-15 97.2 GO
2025-01-16 94.8 GO WITH CAVEATS 2 P1 failures (see above)

17.6 Conclusion and Recommendations

Verdict: GO WITH CAVEATS

The Sentinel AI Surveillance Platform is operational and safe to use. All 42 P0 (Critical) test cases passed, confirming that core surveillance functions are working correctly.

Two P1 (High) priority issues were identified with documented workarounds. Both fixes are scheduled for v2.3.2.

Recommended Actions:

  • Address TC-ALT-12-004: Add aggressive compression for Telegram media group images
  • Address TC-RES-20-006: Implement shared model loading in AI inference service
  • Monitor Telegram multi-image alert delivery metrics (workaround active)
  • Monitor AI inference recovery metrics (manual restart documented in runbook)
  • Validate both fixes in next daily test run after v2.3.2 deployment

Section 18: Risks and Mitigations

18.1 Risk Register Summary

# Category Risk Likelihood Impact Score Mitigation Owner
T1 Technical DVR disk full (0 bytes free) High Critical 20 Auto-rotation at 85%; emergency cleanup; secondary storage Platform
T2 Technical AI false positives in low light Medium High 12 Night models; adjustable thresholds; operator review AI Team
T3 Technical Face rec accuracy with masks/angles Medium Medium 9 Multi-angle training; pose normalization AI Team
T4 Technical VPN tunnel instability Medium High 12 Auto-reconnect; local buffering; redundant endpoints Platform
T5 Technical DB performance at scale Medium Medium 9 Partitioning; read replicas; archiving Platform
O1 Operational Edge hardware failure Medium Critical 15 Cold spare; config backup; documented replacement Operations
O2 Operational Internet loss at edge site Medium High 12 Local storage buffer; 4G failover; local AI continues Operations
O3 Operational Operator training gaps Medium Medium 9 Training program; inline help; escalation procedures Operations
O4 Operational Alert fatigue Medium High 12 Escalation rules; alert grouping; severity routing Operations
S1 Security Biometric data breach Low Critical 10 AES-256-GCM; signed URLs; GDPR deletion; audit Security
S2 Security Unauthorized feed access Low Critical 10 RBAC; JWT; MFA; session binding; rate limiting Security
S3 Security Bot token compromise Low High 8 Vault encryption; 180-day rotation; IP allowlist Security
A1 AI/ML Model drift over time Medium High 12 Monthly evaluation; auto-monitoring; retraining AI Team
A2 AI/ML Training data poisoning Low Critical 10 Validation; multi-person review; audit trail AI Team
A3 AI/ML Demographic bias Medium High 12 Diverse data; fairness audits; human-in-loop AI Team
A4 AI/ML Edge hardware insufficient Medium High 12 CPU models; cloud offloading; GPU upgrade path AI Team
I1 Integration DVR firmware incompatibility Medium High 12 RTSP compliance check; firmware validation Engineering
C1 Compliance GDPR non-compliance Low Critical 10 PIA; consent mgmt; right to deletion; DPO DPO
R1 Resource Budget overrun Medium Medium 9 Reserved instances; cost monitoring; quotas Finance
R3 Resource Timeline delay Medium High 12 Phased delivery; parallel work; weekly tracking PMO

18.2 Critical Risks Requiring Immediate Action

  1. T1 — DVR Disk Full (Score: 20)

    • Action: Emergency disk cleanup within 24 hours
    • Implement automatic rotation at 85% capacity
    • Configure critical alerts at 90%, 95%, 98%
    • Owner: Platform Team | Due: 2025-01-17
  2. O1 — Edge Hardware Failure (Score: 15)

    • Action: Procure cold spare device
    • Document hardware replacement runbook
    • Automate configuration restoration from GitOps
    • Owner: Operations Team | Due: 2025-02-01

Section 19: Final Implementation Roadmap

19.1 Five-Phase Implementation (20 Weeks)

Phase Weeks Name Theme Key Milestone
1 1-4 Foundation Infrastructure, VPN, edge, database M1: Infrastructure Ready
2 5-8 Core AI Pipeline Video ingestion, detection, recognition M2: AI Pipeline Operational
3 9-12 Application Layer Dashboard, alerts, notifications M3: Application Live
4 13-16 Intelligence Night mode, training, self-learning M4: Intelligence Features
5 17-20 Hardening Security, testing, operations, go-live M5: Production Go-Live

19.2 Key Milestones and Deliverables

Milestone Target Week Deliverables Entry Criteria Exit Criteria
M1 Infrastructure Week 4 Cloud services, VPN, edge gateway, database, monitoring Project kickoff, hardware delivered All services healthy, VPN stable, schema deployed
M2 AI Pipeline Week 8 Video capture, YOLO, SCRFD, ArcFace, detection DB, API M1 complete, models ready All 8 streams ingesting, AI accuracy targets met, API functional
M3 Application Week 12 Dashboard, alerts, Telegram, WhatsApp, person mgmt, WebSocket M2 complete, frontend env ready Dashboard live, alerts delivered, person management working
M4 Intelligence Week 16 Night mode, training pipeline, self-learning, privacy, search M3 complete, training data accumulated All intelligence features operational
M5 Go-Live Week 20 Security audit, test framework, DR, runbooks, load test, checklist M4 complete, security audit scheduled All audits passed, checklist complete, 72h stability

19.3 Phase Details

Phase 1 (Weeks 1-4): VPC, EKS, RDS, Redis, Kafka, S3, WireGuard VPN, edge gateway OS hardening, Docker setup, database schema with migrations, monitoring stack (Prometheus, Grafana).

Phase 2 (Weeks 5-8): RTSP capture service, YOLO human detection, SCRFD face detection, ArcFace face recognition, embedding storage with pgvector, person matching logic, FastAPI core, authentication.

Phase 3 (Weeks 9-12): Next.js frontend, design system, dashboard, live camera view (HLS), alert engine with rules, Telegram Bot API integration, WhatsApp Business API integration, person gallery and profile, unknown review queue, watchlists, WebSocket real-time updates.

Phase 4 (Weeks 13-16): Night mode AI model, AI Vibe Settings page, training pipeline with model versioning, self-learning service for unknown clusters, privacy mode with face blurring, suspicious activity detection, search service (face + text), system health dashboard.

Phase 5 (Weeks 17-20): Penetration testing, SAST/DAST, self-test framework (21 suites), backup/DR setup, incident response runbooks, load testing (8-64 cameras), performance tuning, operations training, go-live checklist (98 items), production cutover, 72-hour stability monitoring.

19.4 Resource Allocation

Phase Engineering AI/ML DevOps QA Security
1: Foundation 2 2 1
2: Core AI 2 2 1 1
3: Application 3 1 1 2
4: Intelligence 2 2 1 1
5: Hardening 2 1 2 2 2

Section 20: Final Production-Readiness Summary

20.1 System at a Glance

Category Specification
Architecture Cloud (AWS EKS) + Edge (Intel NUC) + VPN (WireGuard)
Services 12 containerized microservices
Security Zones 5 (Public, App Private, Database, Edge LAN, Camera LAN)
AI Pipeline YOLO11m (human detection) + SCRFD (face detection) + ArcFace (recognition)
Embeddings 512-Dimensional face vectors stored in pgvector
Database PostgreSQL 15, 29 tables, partitioned, AES-256-GCM encrypted
Web Application 18 pages, dark mode, Next.js 14, real-time WebSocket
Notifications Telegram Bot API + WhatsApp Business API (dual channel)
Security TLS 1.3, Argon2id, JWT ES256, TOTP MFA, RBAC (4 roles, 30+ permissions)
Testing 21 test suites, 170+ test cases, automated readiness scoring
Reliability 99.9% uptime target, RTO 1 hour, RPO 15 minutes
Timeline 20 weeks (5 months) to production

20.2 Readiness Checklist Summary

Category Items Status
Infrastructure 14 Ready to implement
Security 18 Ready to implement
AI/ML Pipeline 15 Ready to implement
Application 16 Ready to implement
Operations 15 Ready to implement
Data & Privacy 10 Ready to implement
Documentation 10 Ready to implement
Total 98 Ready to implement

20.3 Estimated Timeline

Milestone Target Duration
M1: Infrastructure Ready Week 4 4 weeks
M2: AI Pipeline Operational Week 8 4 weeks
M3: Application Live Week 12 4 weeks
M4: Intelligence Features Week 16 4 weeks
M5: Production Go-Live Week 20 4 weeks
Total to Production 20 weeks ~5 months

Appendices

Appendix A: Cross-Reference to Specialist Documents

Document Path Content
Notification System /mnt/agents/output/notification_system.md Telegram, WhatsApp, routing rules, templates, retry logic
Security Architecture /mnt/agents/output/security_architecture.md SSL/TLS, auth, RBAC, VPN, secrets, audit, GDPR, checklist
Web UX Design /mnt/agents/output/web_ux_design.md Design system, 18 pages, navigation, user flows, AI vibe settings
Self-Test Framework /mnt/agents/output/self_test_framework.md Framework architecture, 21 suites, scheduling, sample report
Operations Plan /mnt/agents/output/operations_plan.md Monitoring, logging, backup, DR, incident response, runbooks
Architecture /mnt/agents/output/architecture.md System architecture, data flow, scaling strategy, cost estimates

Appendix B: Acronyms

Acronym Full Form
AI Artificial Intelligence
ALB Application Load Balancer
API Application Programming Interface
ArcFace Additive Angular Margin Loss for Deep Face Recognition
CSP Content Security Policy
CSRF Cross-Site Request Forgery
CORS Cross-Origin Resource Sharing
DLQ Dead Letter Queue
DVR Digital Video Recorder
EKS Elastic Kubernetes Service
ES256 ECDSA using P-256 and SHA-256
FFmpeg Fast Forward MPEG (multimedia framework)
FPS Frames Per Second
GDPR General Data Protection Regulation
GPU Graphics Processing Unit
HLS HTTP Live Streaming
HPA Horizontal Pod Autoscaler
HSTS HTTP Strict Transport Security
JWT JSON Web Token
LUKS Linux Unified Key Setup
MFA Multi-Factor Authentication
mTLS Mutual TLS
mAP mean Average Precision
NMS Non-Maximum Suppression
NUC Next Unit of Computing
OCSP Online Certificate Status Protocol
PII Personally Identifiable Information
PSK Pre-Shared Key
RBAC Role-Based Access Control
RDS Relational Database Service
RPO Recovery Point Objective
RTO Recovery Time Objective
RTSP Real Time Streaming Protocol
S3 Simple Storage Service
SAST Static Application Security Testing
SCRFD Single-Shot Multi-scale Face Detector
SLA Service Level Agreement
SQL Structured Query Language
SSL Secure Sockets Layer
TLS Transport Layer Security
TOTP Time-based One-Time Password
TPM Trusted Platform Module
UAT User Acceptance Testing
VPC Virtual Private Cloud
VPN Virtual Private Network
WAF Web Application Firewall
WORM Write Once Read Many
XSS Cross-Site Scripting
YOLO You Only Look Once

End of Document

Document Version: 1.0 Classification: Confidential — Internal Use Only Next Review: 2025-04-16 Owner: Sentinel AI Architecture Team