AI-Powered Industrial Surveillance Platform
Unified Technical Blueprint — Part A: Sections 1-10
| Document Property | Value |
|---|---|
| Version | 1.0.0 |
| Classification | Technical Blueprint — Production Design |
| Target DVR | CP PLUS ORANGE CP-UVR-0801E1-CV2 |
| Channels | 8 active (scalable to 64+) |
| Resolution | 960 x 1080 per channel |
| DVR Network | 192.168.29.200/24, RTSP port 554 |
| Date | 2025 |
Cross-Reference Guide: This unified blueprint synthesizes six specialist design documents. For detailed specifications on any subsystem, refer to:
architecture.md— Full architecture, scaling, failover, cost estimationvideo_ingestion.md— RTSP configuration, FFmpeg commands, edge gateway specsai_vision.md— Model configurations, inference code, benchmarksdatabase_schema.md— Complete DDL, triggers, views, RLS policiessuspicious_activity.md— Detection algorithms, scoring engine pseudocodetraining_system.md— Training pipelines, quality gates, versioning logic
Table of Contents
- Section 1: Executive Summary
- Section 2: Kimi Swarm Team and Agent Responsibilities
- Section 3: Assumptions
- Section 4: Full Architecture
- Section 5: Data Flow from DVR to Cloud to Dashboard
- Section 6: Recommended Tech Stack
- Section 7: Database Schema
- Section 8: AI Model and Training Strategy
- Section 9: Suspicious Activity Night-Mode Design
- Section 10: Live Video Streaming Design
Section 1: Executive Summary
1.1 Project Objective
This blueprint defines the complete technical design for an AI-powered industrial surveillance platform that transforms a legacy CP PLUS 8-channel DVR system into a modern, intelligent security operations center. The platform processes real-time video from 8 camera channels, applies state-of-the-art computer vision and face recognition AI, detects suspicious activity during night hours, and provides a unified dashboard for security operators — all while maintaining the highest standards of reliability, security, and data privacy.
The system is designed around a cloud+edge hybrid architecture where all compute-intensive AI inference runs in the cloud (AWS Mumbai), while a local edge gateway handles stream ingestion, buffering, and site-local concerns. A WireGuard VPN tunnel protects all communication between edge and cloud, ensuring the DVR has zero public internet exposure.
1.2 Key Capabilities
| Capability | Description | Technology |
|---|---|---|
| Human Detection | Real-time person detection across all 8 channels at 15-20 FPS | YOLO11m + TensorRT FP16, 640x640 |
| Face Detection | Accurate face localization with 5-point landmarks for alignment | SCRFD-500M-BNKPS, 640x640 |
| Face Recognition | 512-D embedding extraction with 99.83% LFW accuracy | ArcFace R100 IR-SE100 (MS1MV3) |
| Person Tracking | Persistent identity tracking across frames with occlusion recovery | ByteTrack (Kalman + IoU), 80.3% MOTA |
| Unknown Clustering | Automatic grouping of unknown faces for operator review | HDBSCAN + DBSCAN fallback, 89.5% purity |
| Night Mode Surveillance | 10-detection-module suspicious activity analysis (22:00-06:00) | Composite scoring engine with time-decay |
| AI Vibe Controls | Three intuitive presets (Relaxed/Balanced/Strict) mapping to 4 confidence levels | Dynamic threshold adjustment |
| Safe Self-Learning | Three-mode training system with conflict detection and approval workflows | MLflow + Airflow + Manual Review |
| 24/7 Reliability | Graceful degradation: video never stops, AI catch-up on recovery | Tiered storage + circuit breakers + replay |
| Real-Time Alerts | 6-level escalation (NONE to EMERGENCY) with multi-channel notifications | Telegram, WhatsApp, Email, Webhook |
| Live Dashboard | Multi-camera grid with HLS streaming and single-camera low-latency WebRTC | Next.js 14 + HLS.js + WebRTC |
1.3 Architecture Approach
The platform follows a cloud+edge+VPN hybrid pattern with five network security zones:
Cameras (8ch) --> DVR (local) --> Edge Gateway (local) --> WireGuard VPN --> AWS Cloud (EKS)
| |
| 2TB NVMe buffer | Encrypted tunnel
| 7-day ring buffer | UDP 51820
| FFmpeg ingestion | ChaCha20-Poly1305
Key architectural decisions:
| Decision | Choice | Rationale |
|---|---|---|
| Cloud Provider | AWS ap-south-1 (Mumbai) | Lowest latency to India, mature managed services |
| Container Orchestration | Amazon EKS + K3s edge | Managed control plane, GPU node support, lightweight edge |
| VPN | WireGuard | ~60% faster than OpenVPN, modern crypto, simple setup |
| Message Queue | Apache Kafka (MSK) | Durable ordered log, replay capability, proven at scale |
| AI Inference | NVIDIA Triton + TensorRT | GPU-optimized, dynamic batching, model ensemble |
| Database | PostgreSQL 16 + pgvector | ACID compliance, native 512-D vector support |
| Object Storage | MinIO (edge+cloud) + S3 (archive) | S3-compatible API, tiered cost optimization |
1.4 Target Environment
The platform targets a CP PLUS ORANGE CP-UVR-0801E1-CV2 DVR with the following characteristics:
| Property | Value | Impact on Design |
|---|---|---|
| Brand/Model | CP PLUS ORANGE CP-UVR-0801E1-CV2 | Dahua-compatible RTSP URL scheme |
| Channels | 8 active | Initial deployment scope |
| Resolution | 960 x 1080 per channel | AI input: letterbox to 640x640 |
| LAN IP | 192.168.29.200/24 | Edge gateway on same subnet |
| RTSP Port | 554 | TCP interleaved mandatory |
| ONVIF | V2.6.1.867657 (Server V19.06) | Auto-discovery supported |
| DVR Disk | FULL (0 bytes free) | All archival is edge-managed; no DVR recording |
| VPN Access | WireGuard-secured | No public exposure; all traffic encrypted |
Critical Design Impact: The DVR disk being full means the system cannot rely on DVR-side recording or playback features. All archival storage is managed by the edge gateway's 2TB NVMe buffer and cloud tiering.
1.5 Key Differentiators
1. AI Vibe Controls Instead of exposing complex threshold parameters to operators, the system provides three intuitive "vibe" presets — Relaxed, Balanced, and Strict — that internally map to optimized configurations for detection sensitivity and face match strictness. This innovation makes the system accessible to non-technical security staff while maintaining AI precision.
2. Safe Self-Learning Training System The platform captures operator corrections (confirmations, corrections, merges, rejections) and feeds them back into model improvement through a carefully designed three-mode learning pipeline: Manual Only, Suggested Learning (recommended), and Approved Auto-Update. A synchronous conflict detector blocks five types of label conflicts before they reach the training dataset, ensuring model integrity.
3. 24/7 Reliability with Graceful Degradation The system is architected around a single priority: video recording never stops. If the AI inference service fails, recording continues locally with queued catch-up processing on recovery. If the VPN tunnel fails, the edge gateway maintains 7 days of local buffer. If the cloud database fails, alerts accumulate in Kafka's durable log. Every failure mode has a defined degradation strategy.
4. 10-Module Night Surveillance The suspicious activity detection system goes beyond simple motion detection to provide comprehensive behavioral analysis through 10 specialized detection modules — from intrusion and loitering to abandoned objects and repeated re-entry patterns — all combined through a composite scoring engine with exponential time-decay.
1.6 Production Readiness Assessment
| Dimension | Status | Notes |
|---|---|---|
| Architecture Completeness | Production-Ready | All 12 services fully specified with resource allocations |
| AI Model Selection | Production-Ready | Industry-standard models with published benchmarks |
| Database Design | Production-Ready | 29 tables, 4 views, 8 triggers, partitioning, RLS |
| Security Architecture | Production-Ready | 7-layer defense in depth, encrypted credentials, VPN-only |
| Scaling Path | Defined | 8 -> 16 -> 32 -> 64+ cameras with concrete resource allocations |
| Failover Design | Production-Ready | Graceful degradation matrix for all failure modes |
| Estimated Timeline | 14 weeks | 4 implementation phases defined |
| Estimated Monthly Cost | ~$2,140 USD | 8-camera deployment at steady state |
Section 2: Kimi Swarm Team and Agent Responsibilities
The unified blueprint was synthesized from the outputs of 11 specialist agents, each responsible for a specific domain of the platform design.
2.1 Agent Responsibility Matrix
| # | Agent | Responsibility | Key Deliverables |
|---|---|---|---|
| 1 | Requirements Analyst | Elicited and structured all functional/non-functional requirements | Requirements traceability matrix, user stories, acceptance criteria |
| 2 | System Architect | Designed overall cloud+edge+VPN topology and service interactions | Deployment topology, 5 security zones, scaling roadmap, failover matrix |
| 3 | Video Ingestion Engineer | Specified RTSP configuration, edge gateway, and stream processing | RTSP URL patterns, FFmpeg commands, auto-reconnect logic, HLS generation |
| 4 | AI Vision Scientist | Selected and configured all CV/AI models for the inference pipeline | Model selection table, inference pipeline architecture, confidence handling |
| 5 | Database Architect | Designed complete data model with partitioning, indexing, and security | 29 tables + 4 views + 8 triggers, pgvector HNSW index, RLS policies |
| 6 | Suspicious Activity Designer | Designed 10 detection modules and composite scoring engine | Detection algorithms, scoring formula, YAML configuration schema |
| 7 | Training System Engineer | Designed self-learning pipeline with safety controls | 3 learning modes, conflict detection, quality gates, versioning |
| 8 | Frontend Developer | Designed Next.js dashboard with real-time video and alerts | Component architecture, HLS.js integration, WebSocket alerts |
| 9 | DevOps Engineer | Specified CI/CD, monitoring, and infrastructure-as-code | GitHub Actions + ArgoCD, Prometheus/Grafana, alerting rules |
| 10 | Security Architect | Designed defense-in-depth security across all layers | 7 security layers, secret management, encryption standards |
| 11 | Technical Writer (this document) | Synthesized all specialist outputs into unified blueprint | 10-section unified document with cross-references |
2.2 Agent Interaction Flow
+-----------------------------------------------------------------------------+
| KIMI SWARM TEAM ORCHESTRATION |
+-----------------------------------------------------------------------------+
| |
| Requirements Analyst |
| | |
| v |
| +---------+ +---------+ +---------+ +---------+ |
| | System |<-->| Video |<-->| AI |<-->| Database| |
| |Architect| |Ingestion| |Vision | |Architect| |
| +---------+ +---------+ +---------+ +---------+ |
| ^ | |
| | +---------+ +---------+ | |
| +---------->|Suspicious|<-->|Training |<-------+ |
| |Activity | |System | |
| |Designer | |Engineer | |
| +---------+ +---------+ |
| | |
| v |
| +---------+ +---------+ +---------+ |
| |Frontend | |DevOps | |Security | |
| |Developer| |Engineer | |Architect| |
| +---------+ +---------+ +---------+ |
| | |
| v |
| +---------------------+ |
| | Technical Writer | |
| | (Unified Blueprint) | |
| +---------------------+ |
| |
+-----------------------------------------------------------------------------+
2.3 Cross-Agent Design Consistency
The following cross-cutting concerns were harmonized across all agent outputs during synthesis:
| Concern | Resolution | Agents Coordinated |
|---|---|---|
| Video latency budget | < 100ms end-to-end (AI); ~35-65s (HLS live) | Video Ingestion, AI Vision, Frontend |
| Face embedding storage | 512-D float32, pgvector HNSW index, cosine similarity | Database, AI Vision, Training |
| Event data retention | 90 days hot (MinIO), 1 year cold (Glacier), 7 days edge | Database, Architecture, Video Ingestion |
| Alert escalation | 6 levels: NONE -> LOW -> MEDIUM -> HIGH -> CRITICAL -> EMERGENCY | Suspicious Activity, Database, Frontend |
| Model versioning | Semantic MAJOR.MINOR.PATCH with MLflow registry | Training, AI Vision, Architecture |
| Graceful degradation | Video never stops; AI catch-up on recovery | Architecture, Video Ingestion, AI Vision |
| Security zones | 5 zones: Internet -> ALB -> Application -> Data -> Edge | Architecture, Security, Video Ingestion |
Section 3: Assumptions
All assumptions made across the specialist designs are consolidated below. These should be validated before implementation begins.
3.1 Network and Hardware Assumptions
| ID | Assumption | Validation Method | Risk if Invalid |
|---|---|---|---|
| NW-01 | Edge gateway has dual Ethernet: one for local DVR subnet (192.168.29.0/24), one for internet/VPN | Physical site survey | Cannot bridge DVR to VPN |
| NW-02 | Site internet bandwidth >= 16 Mbps sustained upload for 8 channels | ISP speed test | Video drops, AI delays |
| NW-03 | WireGuard UDP port 51820 is not blocked by site firewall | Firewall rule check | VPN cannot establish |
| NW-04 | DVR RTSP server supports TCP interleaved transport (rtsp_transport tcp) |
FFmpeg test probe | UDP fallback has packet loss |
| NW-05 | DVR supports 16+ concurrent RTSP sessions (8 channels x 2 streams) | Session stress test | Stream contention |
| NW-06 | MTU 1400 is viable through site NAT/firewall for WireGuard tunnel | Ping with DF bit test | Fragmentation issues |
| HW-01 | Intel NUC 13 Pro (i5-1340P, 16GB RAM, 512GB NVMe) is available for edge gateway | Hardware procurement | May need Jetson Orin alternative |
| HW-02 | Edge gateway has UPS backup for graceful shutdown on power loss | Electrical survey | Data corruption on hard power-off |
| HW-03 | AWS g4dn.xlarge (T4 GPU) instances are available in ap-south-1 | AWS EC2 capacity check | Need alternative GPU instance |
3.2 DVR Capabilities Assumptions
| ID | Assumption | Validation Method | Risk if Invalid |
|---|---|---|---|
| DVR-01 | DVR RTSP streams are accessible at rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M |
FFmpeg connectivity test | Need alternative URL format |
| DVR-02 | DVR continues serving RTSP streams even with disk full (0 bytes free) | 24-hour stream stability test | Streams may stall |
| DVR-03 | DVR sub-stream (subtype=1) provides sufficient quality for AI inference (typically 352x288 to 704x576) | Frame quality inspection | May need main stream for AI |
| DVR-04 | DVR ONVIF server supports device discovery and stream URI retrieval | ONVIF Device Manager test | Manual camera configuration needed |
| DVR-05 | DVR channel numbering is 1-indexed (1-8) | ONVIF profile enumeration | Off-by-one errors in configuration |
| DVR-06 | DVR Digest authentication works with the provided credentials | RTSP DESCRIBE request test | May need Basic auth or different scheme |
3.3 Environmental Assumptions
| ID | Assumption | Impact if Invalid |
|---|---|---|
| ENV-01 | Cameras provide adequate lighting for face recognition during night hours (minimum 10 lux at face distance) | Face recognition accuracy degrades; may need IR illumination |
| ENV-02 | Camera angles allow frontal face capture at entry/exit points (yaw < 45 degrees) | Face recognition miss rate increases |
| ENV-03 | Indoor industrial environment with minimal weather interference | False positive rate from rain/shadows is low |
| ENV-04 | Maximum person-to-camera distance is within 10 meters for face recognition | Faces may be too small (< 20px) for reliable detection |
| ENV-05 | Camera positions are stable (no PTZ movement during normal operation) | Zone calibration remains valid |
3.4 Operational Assumptions
| ID | Assumption | Impact if Invalid |
|---|---|---|
| OPS-01 | Security operators will review unknown face clusters and provide identity labels daily | Unknown person database grows without enrichment |
| OPS-02 | Admin will review training suggestions at least weekly in "Suggested Learning" mode | Training queue backlog accumulates |
| OPS-03 | Site has authorized personnel who can access edge gateway for maintenance (SSH, physical) | Remote troubleshooting limited |
| OPS-04 | Alert fatigue is a genuine concern — false positive rate > 20% leads to ignored alerts | AI vibe controls and suppression tuned accordingly |
| OPS-05 | Incident video review requires 10-second pre-event and 30-second post-event clips | Clip configuration fixed |
3.5 Security Assumptions
| ID | Assumption | Impact if Invalid |
|---|---|---|
| SEC-01 | WireGuard encryption (ChaCha20-Poly1305) meets organizational security requirements | May need additional encryption layer |
| SEC-02 | AWS VPC with private subnets satisfies data residency requirements for India | Compliance review needed |
| SEC-03 | Face embeddings (512-D vectors) do not constitute PII under applicable regulations | Legal review needed for biometric data handling |
| SEC-04 | Edge gateway physical security is equivalent to server room security | Tampering risk if edge is physically accessible |
| SEC-05 | DVR credentials can be stored encrypted (AES-256) in cloud database | Key management infrastructure required |
3.6 AI Performance Assumptions
| ID | Assumption | Impact if Invalid |
|---|---|---|
| AI-01 | YOLO11m TensorRT FP16 achieves > 75% person AP@50 on surveillance footage | May need fine-tuning on site-specific data |
| AI-02 | ArcFace R100 achieves > 98% Rank-1 accuracy on enrolled persons with 5+ reference images | Enrollment quality gates ensure minimum samples |
| AI-03 | HDBSCAN achieves > 89% cluster purity on 512-D face embeddings from this camera setup | Fallback to DBSCAN if density varies too much |
| AI-04 | ByteTrack maintains < 2 ID switches per 100 frames in industrial environment with occlusion | May need BoT-SORT upgrade for complex scenes |
| AI-05 | GPU (T4) can sustain 15-20 FPS processing per stream across 8 streams with batching | CPU fallback at 5-8 FPS if GPU unavailable |
Section 4: Full Architecture
4.1 High-Level System Architecture
The platform employs a cloud+edge hybrid architecture with five network security zones. Video streams are ingested at the edge, processed by AI in the cloud, and presented through a web-based dashboard. A WireGuard VPN tunnel provides encrypted, zero-exposure connectivity between edge and cloud.
+=============================================================================+
| CLOUD+EDGE+VPN ARCHITECTURE |
+=============================================================================+
| |
| ZONE 0: INTERNET (UNTRUSTED) |
| +---------------------+ |
| | Users / Browsers | |
| | HTTPS :443 | |
| +----------+----------+ |
| | |
| v |
| ZONE 1: AWS VPC EDGE (DEMILITARIZED) |
| +--------------------------------------------------------------+ |
| | AWS ALB (:443) + WAF v2 + Rate Limit + Geo-Restriction | |
| | | | |
| | v | |
| | Traefik Ingress Controller (:8443) | |
| | - Route: /api/* -> Backend Service | |
| | - Route: /ws/* -> WebSocket Handler | |
| | - Route: / -> Next.js Web App | |
| | - TLS: Let's Encrypt auto certificates | |
| +--------------------------------------------------------------+ |
| | |
| v |
| ZONE 2: AWS VPC APPLICATION (TRUSTED) |
| +--------------------------------------------------------------+ |
| | +-------------+ +-------------+ +---------------------+ | |
| | | Stream | | AI Inference| | Suspicious Activity | | |
| | | Ingestion | | Service | | Service (Night Mode)| | |
| | | (Go/FFmpeg) | | (Triton) | | (Go/Python) | | |
| | | :8081 | | :8001 gRPC | | :8083 | | |
| | +-------------+ +-------------+ +---------------------+ | |
| | +-------------+ +-------------+ +---------------------+ | |
| | | Backend API | | Training | | Notification | | |
| | | (Go/Gin) | | Service | | Service | | |
| | | :8080 | | (PyTorch) | | (Go) | | |
| | +-------------+ +-------------+ +---------------------+ | |
| | +--------------------+ | |
| | | Web Frontend | HLS Playback Service | |
| | | (Next.js 14 :3000) | (Go :8085) | |
| | +--------------------+ | |
| +--------------------------------------------------------------+ |
| | |
| v |
| ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED) |
| +--------------------------------------------------------------+ |
| | +-------------+ +-------------+ +-------------+ | |
| | | PostgreSQL | | Redis | | Kafka | | |
| | | 16 (RDS) | | 7 Cluster | | (MSK) | | |
| | | :5432 | | :6379 | | :9092 | | |
| | | pgvector | | Pub/Sub | | 3 brokers | | |
| | | HNSW index | | Streams | | 3 AZs | | |
| | +-------------+ +-------------+ +-------------+ | |
| | +-------------+ +-----------------------------------+ | |
| | | MinIO | | S3 (Cold Archive) | | |
| | | (S3-compat) | | - Standard (30d) | | |
| | | :9000 | | - IA (31-90d) | | |
| | | 10 TB | | - Glacier Deep Archive (90d+) | | |
| | +-------------+ +-----------------------------------+ | |
| +--------------------------------------------------------------+ |
| | |
| | WireGuard VPN Tunnel (UDP 51820) |
| | ChaCha20-Poly1305 encryption |
| | Cloud peer: 10.200.0.1/32 <-> Edge peer: 10.200.0.2/32 |
| v |
| ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED) |
| +--------------------------------------------------------------+ |
| | +--------------------------------------------------------+ | |
| | | EDGE GATEWAY (Intel NUC) | | |
| | | Ubuntu 22.04 LTS | K3s v1.28+ | 2TB NVMe | | |
| | | | | |
| | | +-----------------+ +-----------------+ | | |
| | | | Stream Manager | | HLS Segmenter | | | |
| | | | (Python/asyncio)| | (FFmpeg/nginx) | | | |
| | | | 8x RTSP feeds | | 2s segments | | | |
| | | +-----------------+ +-----------------+ | | |
| | | +-----------------+ +-----------------+ | | |
| | | | Frame Extractor | | Buffer Manager | | | |
| | | | (AI decimation) | | (20GB ring buf) | | | |
| | | +-----------------+ +-----------------+ | | |
| | | +--------------------------------------------------+ | | |
| | | | VPN Client (WireGuard) | Health Monitor | | | |
| | | +--------------------------------------------------+ | | |
| | +--------------------------------------------------------+ | |
| | | |
| | Local Network (192.168.29.0/24) |
| | +------------------+ +------------------+ |
| | | CP PLUS DVR | | Local Monitor | |
| | | 192.168.29.200 | | 192.168.29.10 | |
| | | 8ch | RTSP :554 | | (optional) | |
| | +------------------+ +------------------+ |
| | CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 |
| +--------------------------------------------------------------+ |
| |
+=============================================================================+
4.2 Service Interaction Diagram
+-----------------------------------------------------------------------------+
| SERVICE INTERACTIONS |
+-----------------------------------------------------------------------------+
| |
| INTERNET USERS |
| | |
| | HTTPS :443 |
| v |
| +---------+ +----------+ +----------+ |
| | AWS ALB |----->| Traefik |----->| Next.js | Web Frontend |
| | +WAF | | Ingress | | (SSR) | Dashboard |
| +---------+ +----------+ +----+-----+ |
| | |
| +--------------------+--------------------+ |
| | | | |
| v v v |
| +---------+ +------------+ +----------+ |
| |Backend | | WebSocket | | HLS | |
| |API (Go) | | Handler | | Playback | |
| |:8080 | | /ws/alerts | | Service | |
| +----+----+ +------------+ +----+-----+ |
| | |
| | gRPC :50051 |
| v |
| +---------+ +------------+ +----------+ +----------+ |
| | Stream | | AI | |Suspicious| |Training | |
| |Ingestion|<-->| Inference |<-->| Activity | |Service | |
| |(Go) | |(Triton) | |(Night) | |(PyTorch) | |
| +----+----+ +------+-----+ +----+-----+ +----+-----+ |
| | | | | |
| v v v v |
| +---------------------------------------------------------------+ |
| | KAFKA (MSK) | |
| | streams.raw (8 parts) ai.detections (16 parts) | |
| | alerts.critical (4 parts) training.data (30-day ret.) | |
| | notifications.* system.metrics (7-day ret.) | |
| +---------------------------------------------------------------+ |
| | | | | |
| v v v v |
| +---------+ +------------+ +----------+ +----------+ |
| |PostgreSQL| | Redis | | MinIO | | MLflow | |
| |16 +pgvec | |7 Cluster | |S3-compat | | Model | |
| |:5432 | |:6379 | |:9000 | | Registry | |
| +---------+ +------------+ +----------+ +----------+ |
| |
| Edge Gateway: WireGuard peer at 10.200.0.2/32 |
| Stream Ingestion pulls frames via VPN -> sends to Kafka |
| |
+-----------------------------------------------------------------------------+
4.3 Network Security Zones
Five security zones provide defense in depth, from the public internet to the physically isolated edge network.
+=============================================================================+
| NETWORK SECURITY ZONES |
+=============================================================================+
| |
| +---------------------------------------------------------------------+ |
| | ZONE 0: INTERNET (UNTRUSTED) | |
| | - Public users, any source IP | |
| | - AWS Shield Standard DDoS protection | |
| | - Geo-restriction: allow specific countries only | |
| +---------------------------+-----------------------------------------+ |
| | |
| | HTTPS :443 |
| v |
| +---------------------------------------------------------------------+ |
| | ZONE 1: AWS VPC EDGE (DEMILITARIZED) | |
| | - ALB + WAF v2 (SQL injection, XSS, rate limiting rules) | |
| | - Traefik Ingress (:8443) | |
| | - Auth: JWT + RBAC, API keys for edge gateway | |
| | - Public API endpoints ONLY | |
| | SG: alb-public-sg: 443 from 0.0.0.0/0 | |
| | SG: traefik-sg: 8443 from alb-sg ONLY | |
| +---------------------------+-----------------------------------------+ |
| | |
| | Internal :8080-8090 |
| v |
| +---------------------------------------------------------------------+ |
| | ZONE 2: AWS VPC APPLICATION (TRUSTED, ISOLATED) | |
| | - Stream Ingestion, AI Inference, Suspicious Activity | |
| | - Training, Backend API, Notification Services | |
| | - Pod Security: No root, read-only FS, no privilege escalation | |
| | - Network Policies: Ingress only from API GW namespace | |
| | SG: app-sg: 8080-8090 from traefik-sg ONLY | |
| +---------------------------+-----------------------------------------+ |
| | |
| | Data Layer :5432, :6379, :9092, :9000 |
| v |
| +---------------------------------------------------------------------+ |
| | ZONE 3: AWS VPC DATA (HIGHLY RESTRICTED) | |
| | - PostgreSQL (RDS), Redis (ElastiCache), Kafka (MSK) | |
| | - MinIO object storage, S3 cold archive | |
| | - Security Groups: ONLY from app-sg | |
| | - RDS: Encrypted at rest (AWS KMS), no public access | |
| | - S3: Bucket policy deny all except VPC endpoint | |
| +---------------------------+-----------------------------------------+ |
| | |
| | WireGuard VPN (UDP 51820) |
| | ChaCha20-Poly1305 |
| v |
| +---------------------------------------------------------------------+ |
| | ZONE 4: EDGE NETWORK (PHYSICALLY ISOLATED) | |
| | - Edge Gateway (Intel NUC), K3s node | |
| | - WireGuard peer, stream ingestion, local buffer | |
| | - DVR (192.168.29.200): NO internet access, local ONLY | |
| | - Edge Firewall: ALLOW 192.168.29.0/24 -> DVR :554,:80 | |
| | ALLOW OUT 51820/udp -> Cloud VPN endpoint | |
| | DENY ALL other incoming | |
| +---------------------------------------------------------------------+ |
| |
+=============================================================================+
4.4 Service Descriptions
| # | Service | Purpose | Technology | Port | Replicas |
|---|---|---|---|---|---|
| 1 | Edge Gateway Agent | RTSP stream pull, local recording, VPN endpoint, heartbeat | Go 1.21, systemd + K3s | 8080, 51820 | 1 (per site) |
| 2 | Stream Ingestion | Receive frames from edge, decode, produce to Kafka, store segments | Go 1.21, FFmpeg | 8081 | 3-20 (HPA) |
| 3 | AI Inference | GPU-accelerated detection, face recognition, embedding | Triton 2.40, TensorRT | 8000, 8001, 8002 | 1-4 (GPU HPA) |
| 4 | Suspicious Activity | Night-mode analysis, 10 detection modules, scoring engine | Python 3.11, OpenCV | 8083 | 2-8 (HPA) |
| 5 | Training Service | Model retraining, fine-tuning, A/B validation | PyTorch 2.1, CUDA 12.1 | 8084 | 0-1 (GPU spot) |
| 6 | Backend API | REST API, authentication, business logic | Go 1.21, Gin | 8080 | 3-10 (HPA) |
| 7 | Web Frontend | Dashboard, live view, timeline, analytics | Next.js 14, React 18 | 3000 | 3 (CDN) |
| 8 | Notification | Multi-channel alert dispatch (Telegram, WhatsApp, Email) | Go 1.21 | 8086 | 2-5 (HPA) |
| 9 | HLS Playback | HLS segment serving for dashboard live view | Go 1.21 | 8085 | 2-4 (HPA) |
| 10 | PostgreSQL | Primary database with pgvector for embeddings | PostgreSQL 16 (RDS) | 5432 | 1 (Multi-AZ) |
| 11 | Redis | Session store, cache, pub/sub, stream tracking | Redis 7 (ElastiCache) | 6379 | 2 shards x 2 replicas |
| 12 | Kafka | Event bus, durable log, stream replay | Apache Kafka (MSK) | 9092 | 3 brokers x 3 AZs |
| 13 | MinIO | Object storage for video, snapshots, model artifacts | MinIO (S3-compatible) | 9000, 9001 | Edge: 1, Cloud: 4 |
4.5 Physical Edge Gateway Specification
| Component | Specification |
|---|---|
| Hardware | Intel NUC 13 Pro, Core i5-1340P (12 cores, 16 threads) |
| Alternative | NVIDIA Jetson Orin NX 16GB (for on-edge AI inference) |
| RAM | 16GB DDR4-3200 (32GB recommended for 16+ channels) |
| Storage | 2TB NVMe SSD (7-day circular buffer for all 8 streams) |
| LAN | Intel i226-V 2.5GbE (local DVR subnet) |
| WAN | Second Ethernet or WiFi (internet for VPN) |
| OS | Ubuntu 22.04.4 LTS Server (no GUI) |
| Container Runtime | Docker CE 25.x + Docker Compose 2.x |
| K8s Distribution | K3s v1.28+ (lightweight, single-node or 2-node HA) |
| Power | UPS-backed, auto-restart on power loss (BIOS setting) |
| Network | Dual interface: eth0 for local DVR, eth1 for internet/VPN |
4.6 Cloud Infrastructure Specification
| Component | Specification |
|---|---|
| Region | Primary: ap-south-1 (Mumbai), DR: ap-southeast-1 (Singapore) |
| VPC | 10.100.0.0/16, 3 AZs, private subnets only for workloads |
| EKS | Managed node groups: on-demand for API, spot for batch/GPU |
| GPU Nodes | g4dn.xlarge (NVIDIA T4) for Triton inference, 1-4 auto-scaled |
| ALB | Internet-facing, WAF v2 attached, Shield Advanced optional |
| RDS | PostgreSQL 16, db.r6g.xlarge, Multi-AZ, encrypted at rest |
| ElastiCache | Redis 7, cluster mode enabled, 2 shards x 2 replicas |
| MSK (Kafka) | 3 broker nodes, kafka.m5.large, 3 AZs |
| S3 | Standard (hot 30d), IA (31-90d), Glacier Deep Archive (90d+) |
4.7 Scaling Approach
The system scales from the initial 8-camera deployment to 64+ cameras through well-defined phases:
+-----------------------------------------------------------------------------+
| CAMERA SCALING ROADMAP |
+-----------------------------------------------------------------------------+
| |
| CURRENT: 8 cameras (1 DVR) |
| +-- Edge: Intel NUC i7, 32GB RAM |
| +-- Bandwidth: ~16 Mbps upstream (2 Mbps per H.264 stream) |
| +-- Cloud AI: 1x T4 GPU (8 streams @ 1 fps, batch=8) |
| +-- Kafka: 8 partitions (streams.raw) |
| +-- PostgreSQL: db.r6g.xlarge |
| +-- Monthly cost: ~$2,140 |
| |
| PHASE 1: 16 cameras (2 DVRs / 2 sites) |
| +-- Edge: 2x Intel NUC (one per site) |
| +-- Bandwidth: ~32 Mbps |
| +-- Cloud AI: 1x T4 GPU (batch=16, still sufficient) |
| +-- Kafka: 16 partitions |
| +-- Monthly cost: ~$3,200 |
| |
| PHASE 2: 32 cameras (4 DVRs / 4 sites) |
| +-- Edge: 4x Intel NUC |
| +-- VPN: Hub-spoke model (4 edge peers -> 1 cloud endpoint) |
| +-- Bandwidth: ~64 Mbps |
| +-- Cloud AI: 2x T4 GPUs (HPA: 2-6 replicas) |
| +-- Kafka: 32 partitions |
| +-- PostgreSQL: db.r6g.2xlarge |
| +-- Monthly cost: ~$5,500 |
| |
| PHASE 3: 64 cameras (8 DVRs / 8 sites) |
| +-- Edge: 8x Intel NUC (or Jetson Orin for edge AI pre-filter) |
| +-- Bandwidth: ~128 Mbps (dedicated circuit recommended) |
| +-- Cloud AI: 4x T4 GPUs or 2x A10G (g5.2xlarge) |
| +-- Kafka: 64 partitions, consider MSK multi-cluster |
| +-- PostgreSQL: db.r6g.4xlarge + read replica |
| +-- Monthly cost: ~$9,800 |
| |
+-----------------------------------------------------------------------------+
4.8 Failover and Reliability Design
The graceful degradation matrix defines behavior for every failure mode:
+=============================================================================+
| GRACEFUL DEGRADATION MATRIX |
+=============================================================================+
| |
| Failure Mode | Degradation Strategy |
| ------------------------- | ----------------------------------------------- |
| AI Inference Service DOWN | Continue recording ALL video locally |
| (GPU failure, model crash)| Events stored as "unprocessed" |
| | No real-time alerts |
| | Queue frames for later batch processing |
| | Dashboard shows "AI OFFLINE" banner |
| |
| Kafka DOWN (MSK outage) | Edge Gateway buffers locally (20GB ring buffer) |
| | Backpressure: reduce to key frames only (0.2fps)|
| | Auto-reconnect with 2x exponential backoff |
| | Replay from local buffer when Kafka recovers |
| |
| VPN Tunnel DOWN | Full local operation mode |
| (internet outage) | All recording continues locally (7-day buffer) |
| | Local alert buzzer/relay (configurable) |
| | No cloud dashboard access |
| | Auto-sync when VPN recovers |
| |
| PostgreSQL DOWN (RDS) | Alert queue builds in Kafka (durable log) |
| | Events not lost (Kafka 7-day retention) |
| | Read-only dashboard mode (Redis cache) |
| | Alert on-call engineer |
| |
| Notification Service DOWN | Alerts accumulate in DB |
| | Retry with exponential backoff |
| | Dead letter after 24 hours |
| | Dashboard shows pending count |
| |
| Edge Gateway DOWN (power) | Cloud dashboard shows "SITE OFFLINE" |
| | Last known recordings in cloud |
| | Alert sent immediately |
| | UPS: graceful shutdown, preserve data |
| |
+=============================================================================+
Priority Order (highest first):
- Video recording NEVER STOPS (local edge priority)
- Critical alerts ALWAYS FIRE (local buzzer + queued cloud alerts)
- AI inference gracefully degrades to batch catch-up on recovery
- Dashboard operates in read-only/cache mode during DB outage
- Cloud sync resumes automatically when connectivity restored
Reliability Mechanisms:
| Mechanism | Implementation | Target |
|---|---|---|
| Stream Reconnect | Exponential backoff: 1s -> 2s -> 4s -> 8s -> max 30s | < 60s recovery |
| Circuit Breaker | 5 failures -> OPEN (60s) -> HALF_OPEN (3 test calls) -> CLOSED | Prevent cascade failures |
| VPN Watchdog | Ping every 30s, restart WireGuard on 3 consecutive failures | < 90s VPN recovery |
| Kafka Producer | acks=all, retries=10, enable.idempotence=true, LZ4 compression |
Zero message loss |
| Kafka Consumer | Manual offset commit AFTER DB write success | Exactly-once processing |
| Health Checks | 5-layer: K8s probes -> Service metrics -> Dependency checks -> E2E synthetic -> Edge heartbeat | < 2 min detection |
| Auto-scaling | GPU util > 80% for 2 min -> scale out; Kafka lag > 1000 for 5 min -> scale out | Proactive capacity |
Section 5: Data Flow from DVR to Cloud to Dashboard
This section traces the complete data journey from camera capture through AI processing to user presentation.
5.1 Overview: Seven Data Flows
+=============================================================================+
| SEVEN DATA FLOW PATHWAYS |
+=============================================================================+
| |
| Flow 1: Camera --> DVR --> Edge Gateway |
| [Analog/Digital] -> [H.264 Encode] -> [RTSP Server] |
| |
| Flow 2: Edge Gateway --> VPN --> Cloud Kafka |
| [FFmpeg ingest] -> [Frame extract] -> [Kafka Producer] |
| |
| Flow 3: Stream Ingestion --> AI Inference |
| [Kafka Consumer] -> [GPU Batch] -> [Detection + Face Recog.] |
| |
| Flow 4: AI Inference --> Events --> Database |
| [Detection results] -> [Event enrich] -> [PostgreSQL] |
| |
| Flow 5: Events --> Alerts --> Notifications |
| [Scoring engine] -> [Alert create] -> [Multi-channel send] |
| |
| Flow 6: Live Streams --> Browser Dashboard |
| [HLS segmenter] -> [Nginx relay] -> [HLS.js player] |
| |
| Flow 7: Training Feedback Loop |
| [Operator review] -> [Conflict detect] -> [Model update] |
| |
+=============================================================================+
5.2 Flow 1: Camera to DVR to Edge Gateway
Path: Analog/Digital Camera -> DVR internal encoder -> DVR RTSP server -> Edge Gateway FFmpeg client
Protocol Stack:
| Layer | Technology | Details |
|---|---|---|
| Camera Interface | Analog BNC / CVBS / AHD | CP PLUS DVR supports multiple analog standards |
| DVR Encoding | H.264 High Profile | Hardware encoder, real-time, low latency |
| DVR Storage | Internal HDD (currently FULL) | 0 bytes free — no local recording possible |
| Network Transport | RTSP over TCP (interleaved) | Mandatory for reliable NAT/VPN traversal |
| URL Pattern | rtsp://admin:{PASS}@192.168.29.200:554/cam/realmonitor?channel=N&subtype=M |
N=1-8, M=0(main)/1(sub) |
| Client | FFmpeg 6.0+ | -rtsp_transport tcp -stimeout 5000000 |
| Frame Rate | 25 FPS (PAL) or 30 FPS (NTSC) | Configurable per channel |
| Resolution (main) | 960 x 1080 (per channel) | Full resolution |
| Resolution (sub) | 352 x 288 to 704 x 576 | Lower bandwidth for AI |
FFmpeg RTSP Connection Command:
ffmpeg -hide_banner -loglevel warning \
-rtsp_transport tcp \
-stimeout 5000000 \
-fflags +genpts+discardcorrupt+igndts+ignidx \
-reorder_queue_size 64 \
-i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
-c copy -f segment -segment_time 60 -reset_timestamps 1 \
-strftime 1 "/data/buffer/ch1/%Y%m%d_%H%M%S.mkv"
Latency Budget:
| Stage | Latency |
|---|---|
| Camera -> DVR (analog) | ~1-5 ms |
| DVR encoding | ~50-100 ms |
| RTSP over LAN | ~1-2 ms |
| Total (camera to edge gateway) | ~52-107 ms |
5.3 Flow 2: Edge Gateway to VPN Tunnel to Cloud
Path: Edge Gateway FFmpeg -> Frame extraction -> JPEG encoding -> Kafka Producer -> WireGuard VPN -> Cloud MSK
Frame Processing Pipeline:
+------------+ +-------------+ +---------------+ +-------------+ +-----------+
| Raw RTSP | -> | FFmpeg | -> | Frame | -> | JPEG | -> | Kafka |
| H.264 | | Demux/Decode| | Decimation | | Encoder | | Producer |
| 25 FPS | | | | (1 fps) | | Quality 85 | | (LZ4) |
| 960x1080 | | | | 640x640 crop | | | | |
+------------+ +-------------+ +---------------+ +-------------+ +-----------+
FFmpeg Frame Extraction for AI:
ffmpeg -hide_banner -loglevel warning \
-rtsp_transport tcp -stimeout 5000000 \
-fflags +genpts+discardcorrupt -reorder_queue_size 64 \
-i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
-vf "fps=1,scale=640:640:force_original_aspect_ratio=decrease,pad=640:640:(ow-iw)/2:(oh-ih)/2:black" \
-q:v 5 -f image2pipe -vcodec mjpeg pipe:1
WireGuard VPN Tunnel Configuration:
| Parameter | Value |
|---|---|
| Protocol | UDP 51820 |
| Encryption | ChaCha20-Poly1305 |
| Key Exchange | Curve25519 (ECDH) |
| Preshared Key | Enabled per-peer |
| Keepalive | 25 seconds |
| MTU | 1400 (to account for WireGuard + IP headers) |
| Cloud Endpoint | 10.200.0.1/32 (EC2 bastion or ALB) |
| Edge Endpoint | 10.200.0.2/32 |
| Route | 10.200.0.0/16 (AWS VPC) accessible from edge |
VPN watchdog script runs every 30 seconds; restarts WireGuard on 3 consecutive ping failures.
Latency Budget:
| Stage | Latency |
|---|---|
| Frame extraction (FFmpeg) | ~50-100 ms |
| JPEG encoding | ~5-10 ms |
| Kafka produce (local) | ~1-2 ms |
| WireGuard tunnel | ~5-15 ms (Mumbai -> India site) |
| MSK broker | ~1-2 ms |
| Total (edge to cloud Kafka) | ~62-129 ms |
5.4 Flow 3: Stream Ingestion to AI Inference
Path: Kafka streams.raw topic -> Stream Ingestion consumer -> Triton Inference Server -> Kafka ai.detections topic
Pipeline Architecture:
+------------+ +-------------------+ +------------------+ +-------------+
| streams.raw| -> | Stream Ingestion | -> | NVIDIA Triton | -> | ai.detections |
| (8 parts) | | (Go consumer) | | (GPU inference) | | (16 parts) |
| JPEG frames| | Batch aggregator | | gRPC :8001 | | Detection |
| + metadata | | (batch=8, timeout)| | Dynamic batching | | + embeddings |
+------------+ +-------------------+ +------------------+ +-------------+
Triton Model Configuration:
| Model | Inputs | Outputs | GPU Memory | Latency (P50) |
|---|---|---|---|---|
| YOLO11m-det (TensorRT FP16) | 3x640x640 float16 | Bboxes, scores, labels | ~2.1 GB | 12 ms |
| SCRFD-500M (TensorRT FP16) | 3x640x640 float16 | Bboxes, landmarks, scores | ~1.8 GB | 8 ms |
| ArcFace R100 (TensorRT FP16) | 3x112x112 float16 | 512-D embedding | ~3.2 GB | 5 ms |
Total GPU memory: ~7.1 GB (fits in T4 16 GB with 8 streams)
Latency Budget:
| Stage | Latency |
|---|---|
| Kafka consume (batch) | ~10-50 ms |
| Preprocessing (resize, normalize) | ~5-15 ms |
| YOLO11m inference (GPU) | ~12 ms (P50) |
| SCRFD face detection (GPU) | ~8 ms (P50) |
| ArcFace embedding (GPU, per face) | ~5 ms (P50) |
| Post-processing (NMS, matching) | ~10-30 ms |
| Kafka produce (results) | ~1-2 ms |
| Total (Kafka to detection output) | ~51-132 ms |
5.5 Flow 4: AI Inference to Events to Database
Path: AI Detection results -> Event enricher -> PostgreSQL (multiple tables)
Data Transformation:
+------------+ +-------------------+ +---------------------+ +------------+
| Detection | -> | Event Enricher | -> | PostgreSQL Writer | -> | events |
| results | | - Add camera_id | | - UPSERT person | | persons |
| (raw) | | - Match person | | - INSERT event | | embeddings |
| | | - Check whitelist | | - INSERT embedding | | face_crops |
+------------+ +-------------------+ +---------------------+ +------------+
Database Write Operations per Detection:
| Operation | Table | Type | Notes |
|---|---|---|---|
| Insert event record | events |
INSERT | With bounding box, confidence, timestamp |
| Upsert person | persons |
INSERT/UPDATE | If new face, create person record |
| Insert face crop | face_crops |
INSERT | S3 URL, bounding box, quality score |
| Upsert embedding | face_embeddings |
INSERT/UPDATE | 512-D vector, pgvector HNSW index |
| Increment counters | camera_stats |
UPDATE | Daily aggregation |
5.6 Flow 5: Events to Alerts to Notifications
Path: AI events -> Suspicious Activity scoring engine -> Alert creation -> Notification dispatch
Scoring and Escalation:
+------------+ +-------------------+ +------------------+ +-------------+
| AI events | -> | Suspicious Activity| -> | Alert Manager | -> | Notification |
| (persons, | | Scoring Engine | | - Deduplicate | | Service |
| faces) | | - 10 modules | | - Rate limit | | - Telegram |
| | | - Composite score | | - Suppress dup | | - WhatsApp |
| | | - Time decay | | - Escalation | | - Email |
+------------+ +-------------------+ +------------------+ +-------------+
Alert Escalation Matrix:
| Score | Level | Color | Notification | Action |
|---|---|---|---|---|
| 0.00 - 0.20 | NONE | Gray | None | Log only |
| 0.20 - 0.40 | LOW | Blue | Dashboard only | Log + indicator |
| 0.40 - 0.60 | MEDIUM | Yellow | Dashboard + App push | Alert dispatched |
| 0.60 - 0.80 | HIGH | Orange | All of above + Telegram | Immediate alert |
| 0.80 - 1.00 | CRITICAL | Red | All of above + WhatsApp + Email | Critical alert |
| > 1.00 | EMERGENCY | Purple + flashing | All channels + SMS | Emergency dispatch |
5.7 Flow 6: Live Streams to Browser Dashboard
Path: DVR RTSP -> Edge Gateway FFmpeg -> HLS segmenter -> Nginx -> CDN -> Browser HLS.js
+--------+ +---------------+ +---------------+ +---------+ +----------+
| DVR | -> | Edge Gateway | -> | HLS Segmenter | -> | Nginx | -> | Browser |
| RTSP | | FFmpeg | | (2s segments) | | (relay) | | HLS.js |
| 25 FPS | | -copyts | | H.264 + AAC | | HTTPS | | Video tag|
+--------+ +---------------+ +---------------+ +---------+ +----------+
HLS Configuration:
| Parameter | Value |
|---|---|
| Segment duration | 2 seconds |
| Segment list size | 5 segments (10-second sliding window) |
| Playlist type | Live (no #EXT-X-ENDLIST) |
| Codec | H.264 High Profile + AAC-LC |
| Adaptive bitrate | 3 variants: high (3 Mbps), mid (1 Mbps), low (500 Kbps) |
Latency:
| Stage | Latency |
|---|---|
| DVR encoding | ~50-100 ms |
| RTSP to edge | ~1-2 ms |
| FFmpeg demux/remux | ~20-50 ms |
| HLS segmenting (2s) | ~2000 ms |
| Nginx relay | ~1-5 ms |
| CDN propagation | ~10-50 ms |
| HLS.js buffer | ~1-2 segments (2-4s) |
| Browser decode | ~20-50 ms |
| Total (camera to eye) | ~2.1 - 2.3 seconds |
5.8 Flow 7: Training Feedback Loop
Path: Operator review actions -> Conflict detection -> Training dataset -> Model training -> Quality gates -> Deployment
+------------+ +------------------+ +----------------+ +-------------+ +-----------+
| Operator | -> | Conflict | -> | Training | -> | Quality | -> | Deployment |
| Review | | Detection | | Dataset | | Gates | | (A/B test) |
| (confirm, | | (5 types) | | - Curate | | - Precision | | |
| correct, | | - Block conflicts| | - Label | | >= 0.97 | | |
| merge, | | - Queue safe | | - Augment | | - Recall | | |
| reject) | | additions | | - Version | | >= 0.95 | | |
+------------+ +------------------+ +----------------+ +-------------+ +-----------+
Training Data Flow:
| Stage | Frequency | Trigger |
|---|---|---|
| Review action collection | Continuous | Operator clicks on dashboard |
| Conflict detection | Immediate (synchronous) | Every review action |
| Training dataset build | Weekly (or on-demand) | Queue threshold or manual |
| Model training | On dataset build | Airflow DAG trigger |
| Quality gate evaluation | After training | Automated pipeline |
| A/B deployment | After quality pass | Admin approval |
| Full production | After A/B success | Auto-promote at 48h |
Section 6: Recommended Tech Stack
6.1 Technology Selection Matrix
| Layer | Technology | Version | Purpose | Rationale |
|---|---|---|---|---|
| Cloud Platform | AWS | 2025 | Infrastructure (ap-south-1 Mumbai) | Best India region latency, mature managed services |
| Container Orchestration | Amazon EKS | v1.28+ | Managed Kubernetes control plane | GPU node support, Cluster Autoscaler |
| Edge K8s | K3s | v1.28+ | Lightweight Kubernetes at edge | Single binary, resource-efficient |
| VPN | WireGuard | v1.0+ | Encrypted tunnel between edge and cloud | ~60% faster than OpenVPN, modern crypto |
| Reverse Proxy | Traefik | v2.10+ | Kubernetes Ingress controller | Native K8s integration, automatic TLS |
| AI Inference | NVIDIA Triton | 2.40 | GPU model serving, dynamic batching | Multi-framework, TensorRT optimization |
| CV Framework | OpenCV | 4.8+ | Image processing, pre/post-processing | Industry standard, Python/Go bindings |
| AI/ML Framework | PyTorch | 2.1+ | Model training, custom inference | Ecosystem, CUDA 12 support |
| Deep Learning | TensorRT | 8.6+ | GPU-optimized inference for YOLO, SCRFD, ArcFace | FP16 support, 3-5x speedup |
| Language: AI | Python | 3.11 | AI inference, training, suspicious activity detection | Ecosystem, scientific computing |
| Language: Services | Go | 1.21 | Stream ingestion, backend API, notifications | Performance, concurrency, small binaries |
| Language: Frontend | TypeScript | 5.2 | Web dashboard | Type safety, React ecosystem |
| Web Framework | Next.js | 14 (App Router) | React SSR dashboard | Server components, streaming |
| UI Library | React | 18 | Component-based UI | Concurrent features, Suspense |
| Styling | Tailwind CSS | 3.4 | Utility-first CSS | Rapid development, consistent design |
| Video Player | HLS.js | 1.4 | Browser HLS playback | MSE-based, adaptive bitrate |
| Database | PostgreSQL | 16 | Primary database, vector storage | ACID, pgvector extension |
| Vector Search | pgvector | 0.5+ | HNSW index for 512-D face embeddings | Native PostgreSQL, ivfflat+hnsw |
| Cache/Session | Redis | 7 | Session store, pub/sub, rate limiting | Data structures, cluster mode |
| Message Queue | Apache Kafka | 3.6+ (MSK) | Durable event log, stream replay | Exactly-once, retention, partitions |
| Object Storage | MinIO | latest (RELEASE.2024) | S3-compatible hot storage | Edge + cloud, erasure coding |
| Cold Archive | Amazon S3 | Standard/IA/Glacier | Tiered archival (30d/90d/365d) | Cost optimization |
| Model Registry | MLflow | 2.8+ | Model versioning, experiment tracking | Open source, S3 artifact store |
| Orchestration | Apache Airflow | 2.7+ | Training pipeline DAGs | Backfill, retries, observability |
| Monitoring | Prometheus | 2.47+ | Metrics collection | Pull-based, K8s service discovery |
| Visualization | Grafana | 10.1+ | Dashboards, alerting | Panels, annotations, shared links |
| Log Aggregation | Grafana Loki | 2.9+ | Centralized logging | Label-based, cost-effective |
| CI/CD | GitHub Actions | v4 | Build, test, lint pipelines | Native GitHub integration |
| GitOps | ArgoCD | 2.9+ | Kubernetes continuous delivery | Declarative, drift detection |
| Infrastructure | Terraform | 1.6+ | IaC for AWS resources | State management, modules |
| Secrets | AWS Secrets Manager | - | Encrypted credential storage | Rotation, IAM integration |
6.2 Hardware Requirements
Edge Gateway (Per Site)
| Component | Minimum | Recommended | High Availability |
|---|---|---|---|
| CPU | Intel i5-1340P (12 cores) | Intel i7-1370P (14 cores) | 2x Intel i7 (HA cluster) |
| RAM | 16 GB DDR4-3200 | 32 GB DDR4-3200 | 32 GB per node |
| Storage | 1 TB NVMe SSD | 2 TB NVMe SSD | 2 TB per node + NAS sync |
| Network | 1 Gbps Ethernet | 2.5 Gbps Ethernet | Dual NIC + bonding |
| GPU (optional) | None | NVIDIA Jetson Orin NX 16GB | On-edge AI pre-filtering |
| Power | UPS 600VA | UPS 1000VA | Dual PSU + generator |
Cloud GPU Nodes (AI Inference)
| Cameras | GPU | VRAM | Streams | Cost/month (spot) |
|---|---|---|---|---|
| 1-8 | g4dn.xlarge (T4) | 16 GB | 8 | ~$200-350 |
| 8-16 | g4dn.xlarge (T4) | 16 GB | 16 | ~$350-500 |
| 16-32 | g4dn.2xlarge (T4) | 16 GB | 32 | ~$600-900 |
| 32-64 | g5.2xlarge (A10G) | 24 GB | 64 | ~$1200-1800 |
| 64+ | p4d.24xlarge (A100) | 40 GB | 128 | ~$5000-8000 |
6.3 Software Versions Summary
| Category | Software | Version |
|---|---|---|
| Operating System | Ubuntu Server LTS | 22.04.4 |
| Container Runtime | Docker CE | 25.x |
| Container Orchestration | Kubernetes (EKS/K3s) | 1.28+ |
| AI Serving | NVIDIA Triton Inference Server | 2.40 |
| GPU Runtime | CUDA | 12.1+ |
| GPU Driver | NVIDIA Driver | 535+ |
| Deep Learning Optimization | TensorRT | 8.6+ |
| AI Framework | PyTorch | 2.1+ |
| Computer Vision | OpenCV | 4.8+ |
| Video Processing | FFmpeg | 6.0+ |
| Service Language | Go | 1.21+ |
| AI/Training Language | Python | 3.11+ |
| Frontend Framework | Next.js | 14 |
| UI Library | React | 18 |
| Database | PostgreSQL | 16 |
| Message Queue | Apache Kafka | 3.6+ |
| Cache | Redis | 7 |
| Object Storage | MinIO | 2024+ |
| CI/CD | GitHub Actions | v4 |
| GitOps | ArgoCD | 2.9+ |
| Monitoring | Prometheus + Grafana | 2.47+ / 10.1+ |
| Logging | Grafana Loki | 2.9+ |
| VPN | WireGuard | 1.0+ |
| Model Registry | MLflow | 2.8+ |
| Orchestration | Apache Airflow | 2.7+ |
| Infrastructure | Terraform | 1.6+ |
6.4 Port Reference
| Service | Port | Protocol | Location | Notes |
|---|---|---|---|---|
| DVR RTSP | 554 | TCP | 192.168.29.200 | Local network only |
| DVR HTTP | 80 | TCP | 192.168.29.200 | Admin UI, local only |
| DVR HTTPS | 443 | TCP | 192.168.29.200 | Admin UI, local only |
| DVR TCP | 25001 | TCP | 192.168.29.200 | Proprietary protocol |
| DVR UDP | 25002 | UDP | 192.168.29.200 | Proprietary protocol |
| DVR NTP | 123 | UDP | 192.168.29.200 | Time sync |
| WireGuard | 51820 | UDP | Cloud + Edge | VPN tunnel |
| Edge Admin | 8080 | TCP | 192.168.29.5 | Local admin UI |
| Edge SSH | 22 | TCP | 192.168.29.5 | Admin access only |
| Traefik HTTP | 8000 | TCP | EKS | Internal HTTP entrypoint |
| Traefik HTTPS | 8443 | TCP | EKS | Internal HTTPS entrypoint |
| ALB HTTPS | 443 | TCP | AWS | Public-facing |
| Backend API | 8080 | TCP | EKS pods | Internal service port |
| Triton HTTP | 8000 | TCP | EKS GPU nodes | Model inference HTTP |
| Triton gRPC | 8001 | TCP | EKS GPU nodes | Model inference gRPC |
| Triton Metrics | 8002 | TCP | EKS GPU nodes | Prometheus metrics |
| PostgreSQL | 5432 | TCP | RDS | VPC-private |
| Redis | 6379 | TCP | ElastiCache | VPC-private |
| Kafka | 9092 | TCP | MSK | VPC-private |
| MinIO API | 9000 | TCP | EKS + Edge | S3-compatible API |
| MinIO Console | 9001 | TCP | EKS + Edge | Admin console |
| Prometheus | 9090 | TCP | EKS | Metrics collection |
| Grafana | 3000 | TCP | EKS | Dashboards |
Section 7: Database Schema
7.1 Schema Overview
The database is designed around a relational core (PostgreSQL 16) with pgvector extension for 512-dimensional face embedding storage and similarity search. The schema consists of 29 tables, 4 views, and 8 trigger functions, organized into 10 logical domains.
Schema Philosophy:
- Strict normalization for reference data (cameras, persons, rules) to ensure data integrity
- JSONB flexibility for event metadata and configuration to accommodate evolving AI outputs
- Partitioning on all high-volume time-series tables for query performance and lifecycle management
- pgvector HNSW indexing for sub-10ms face similarity search at scale
- Row-level security (RLS) for multi-tenant site isolation
- AES-256 encryption for all stored credentials (DVR passwords, API tokens)
7.2 Entity Relationship Overview
+=============================================================================+
| ENTITY RELATIONSHIP DIAGRAM |
+=============================================================================+
| |
| SITE (1) --------------------< (N) DVR |
| | | |
| | | (1) |
| | v |
| | CAMERA (N) <------------------< (N) ALERT_RULE|
| | | | |
| | | (N) | (1) |
| | v v |
| | +---------------------------------------------------------+ |
| | | EVENT (N) -->--(1) PERSON (1)--< (N) FACE_EMBEDDING |
| | | | | |
| | | | (N) | (N) |
| | | v v |
| | | FACE_CROP (N) PERSON_CLUSTER |
| | | | |
| | | | (N) +---------+|
| | | v | Training||
| | | MEDIA_FILE (1) ----------------------------------------->| Dataset ||
| | | |---------||
| | +--------------------------------------------------------->| Job ||
| | | Model ||
| | +---------+ | Version ||
| | | Review | +---------+|
| | | Action | |
| | +---------+ |
| | ^ |
| | | (N) |
| +------------------------------------+ |
| USER (N) -->--(N) ROLE_PERMISSION |
| | |
| | (1) |
| v |
| WATCHLIST (N) -->--(N) WATCHLIST_ENTRY |
| |
| +---------+ +---------+ +---------+ +---------+ |
| | Telegram| |WhatsApp | | Email | |Webhook | |
| | Config | | Config | | Config | | Config | |
| +---------+ +---------+ +---------+ +---------+ |
| ^ ^ ^ ^ |
| | | | | |
| +--------------+-------------+--------------+ |
| | |
| NOTIFICATION_CHANNEL |
| | |
| | (1) |
| v |
| NOTIFICATION_LOG |
| |
| +---------+ +---------+ +---------+ |
| | Audit | | System | | Device | |
| | Log | | Health | | Connect.| |
| |(partitioned) | Log | | Log | |
| +---------+ +---------+ +---------+ |
| |
+=============================================================================+
7.3 Core Tables Summary
7.3.1 Site and Infrastructure Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
sites |
Physical locations (factories, warehouses) | id, name, location, timezone, settings | 1-10 |
dvrs |
DVR/NVR devices per site | id, site_id, ip_address, port, username, password_encrypted, model, channels, status | 1-10 |
cameras |
Individual camera channels | id, dvr_id, channel_number, name, rtsp_url, resolution, fps, status, zone_config, zone_description | 8-64 |
7.3.2 AI Detection and Identity Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
events |
All AI detection events (partitioned monthly) | id, camera_id, event_type, timestamp, confidence, bounding_box, person_id, face_crop_id, track_id | 1M-10M/month |
persons |
Known and unknown individuals | id, name, status (known/unknown/blacklisted), role, company, notes, created_at | 100-10,000 |
face_crops |
Cropped face images metadata | id, event_id, person_id, storage_path, bounding_box, quality_score, blur_score, pose_yaw, pose_pitch | 500K-5M/month |
face_embeddings |
512-D face embeddings (pgvector) | id, person_id, face_crop_id, embedding (vector(512)), model_version, is_primary | 500K-5M |
person_clusters |
Unknown person cluster groups | id, cluster_label, representative_embedding_id, sample_count, first_seen, last_seen, status | 10-1,000 |
7.3.3 Alert and Notification Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
alert_rules |
Per-camera alert configuration | id, camera_id, rule_type, name, config_json, schedule, enabled | 50-500 |
alerts |
Generated alert records | id, camera_id, rule_id, person_id, alert_type, severity, status, message | 1K-50K/month |
notification_channels |
Alert destination endpoints | id, name, channel_type, config_json, is_active | 5-20 |
telegram_configs |
Telegram Bot API credentials | id, channel_id, bot_token_encrypted, chat_id | 1-5 |
whatsapp_configs |
WhatsApp Business API credentials | id, channel_id, api_key_encrypted, phone_number_id | 1-5 |
notification_log |
Delivery status per notification | id, alert_id, channel_id, status, sent_at, error_message | 1K-50K/month |
7.3.4 Watchlist and Access Control Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
users |
Dashboard users and operators | id, username, email, password_hash, role, is_active | 5-50 |
roles |
Permission roles | id, name, permissions_json | 3-10 |
watchlists |
Named monitoring lists | id, name, watch_type (vip/blacklist/custom), is_active | 5-20 |
watchlist_entries |
Persons on watchlists | id, watchlist_id, person_id, added_by, added_at | 10-1,000 |
7.3.5 Training and ML Pipeline Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
training_datasets |
Curated face datasets for training | id, name, description, person_ids_json, sample_count, version, status | 10-100 |
training_jobs |
Model training job tracking | id, dataset_id, model_version_from, model_version_to, status, metrics_json | 10-100 |
model_versions |
Registry of trained model versions | id, version_string, training_job_id, metrics_json, is_production, is_rollback_available | 10-50 |
review_actions |
Operator review decisions | id, event_id, reviewer_id, action, from_person_id, to_person_id, notes | 1K-100K |
7.3.6 Media and Storage Tables
| Table | Purpose | Key Fields | Rows (est.) |
|---|---|---|---|
media_files |
Registry of stored video/images | id, file_type, storage_path, size_bytes, checksum, camera_id, event_id, retention_until | 100K-1M |
video_clips |
Video clip metadata for incidents | id, media_file_id, start_time, end_time, camera_id, event_id, duration_seconds | 10K-100K |
7.3.7 Audit and Monitoring Tables (Partitioned)
| Table | Purpose | Partition | Retention |
|---|---|---|---|
audit_logs |
All user and system actions | Monthly by timestamp | 1 year (Glacier) |
system_health_logs |
Component health metrics | Monthly by timestamp | 90 days |
device_connectivity_logs |
Camera/DVR connectivity events | Monthly by timestamp | 90 days |
7.4 Indexing Strategy
7.4.1 pgvector HNSW Index (Critical Path)
-- HNSW index for sub-10ms face similarity search
-- ef_search controls recall/speed tradeoff (higher = more accurate, slower)
CREATE INDEX idx_face_embeddings_hnsw
ON face_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
-- Query: Find top-K similar faces
SELECT person_id, 1 - (embedding <=> query_vector) AS similarity
FROM face_embeddings
WHERE is_primary = true
ORDER BY embedding <=> query_vector
LIMIT 5;
| Parameter | Value | Rationale |
|---|---|---|
m |
16 | Number of bi-directional links per node (higher = better recall, more memory) |
ef_construction |
128 | Build-time exploration factor (higher = better index quality) |
ef_search (runtime SET) |
64-256 | Search-time exploration factor (SET hnsw.ef_search = 128) |
| Distance metric | Cosine similarity (<=>) |
Optimal for normalized face embeddings |
7.4.2 B-Tree Indexes (Standard Queries)
| Table | Index | Purpose |
|---|---|---|
events |
(camera_id, timestamp DESC) |
Time-range queries per camera |
events |
(event_type, timestamp DESC) |
Filter by event type |
events |
(person_id) WHERE person_id IS NOT NULL |
Person event lookup |
face_crops |
(person_id, quality_score DESC) |
Best quality face per person |
alerts |
(status, created_at DESC) |
Pending alerts by age |
alerts |
(severity, status) |
Critical alert dashboard |
persons |
(status, name) |
Person directory with status filter |
persons |
(created_at DESC) |
Recently added persons |
media_files |
(retention_until) WHERE retention_until < NOW() + 7 days |
Expiring media cleanup |
7.5 Partitioning Strategy
All high-volume time-series tables are partitioned monthly using pg_partman for automated partition management.
+-----------------------------------------------------------------------------+
| PARTITIONING ARCHITECTURE |
+-----------------------------------------------------------------------------+
| |
| events (parent, empty) |
| +-- events_y2024m01 (Jan 2024 data) |
| +-- events_y2024m02 (Feb 2024 data) |
| +-- events_y2024m03 (Mar 2024 data) |
| +-- events_y2024m04 (Apr 2024 data) |
| +-- events_y2024m05 (May 2024 data) <-- Hot (in memory) |
| +-- events_default (fallback) |
| |
| Partition pruning: WHERE timestamp >= '2024-05-01' |
| -> Only scans events_y2024m05 |
| -> ~30x faster for time-range queries |
| |
| Managed by: pg_partman extension |
| - Auto-create: 2 months ahead |
| - Auto-drop: After retention period (detach + archive) |
| |
+-----------------------------------------------------------------------------+
Partitioned Tables:
| Table | Partition Key | Partition Type | Retention |
|---|---|---|---|
events |
timestamp |
Monthly RANGE | 90 days hot, 1 year archive |
audit_logs |
timestamp |
Monthly RANGE | 1 year total |
system_health_logs |
timestamp |
Monthly RANGE | 90 days |
device_connectivity_logs |
timestamp |
Monthly RANGE | 90 days |
face_crops |
created_at |
Monthly RANGE | 90 days hot, 1 year archive |
7.6 Retention Policies
| Data Tier | Storage | Duration | Lifecycle |
|---|---|---|---|
| Hot Tier | PostgreSQL + MinIO | 0-30 days | Fast query, indexed, in-memory cache |
| Warm Tier | S3 Standard | 30-90 days | Available on-demand, still indexed |
| Cold Tier | S3 Infrequent Access | 90-365 days | Retrieval within minutes |
| Archive Tier | Glacier Deep Archive | 1-7 years | Retrieval within 12-48 hours |
| Compliance | Glacier Vault Lock | 7+ years | Immutable, legal hold |
Automated Cleanup:
| Task | Frequency | Mechanism |
|---|---|---|
| Expire old event partitions | Daily (pg_partman) | DETACH PARTITION + S3 upload |
| Delete expired media files | Daily | Cron job: DELETE from media_files + MinIO removal |
| Purge old notification logs | Weekly | DELETE WHERE created_at < NOW() - INTERVAL '90 days' |
| Archive face crops to S3 | Daily | Lambda: copy to S3 IA, update storage_path |
| Compress audit logs | Monthly | pglz/zstd compression on detached partitions |
| Vacuum and analyze | Weekly (auto-vacuum) | PostgreSQL autovacuum daemon |
7.7 Security Considerations
7.7.1 Credential Encryption
All sensitive credentials stored with AES-256 encryption:
| Table | Encrypted Field | Encryption |
|---|---|---|
dvrs |
password_encrypted |
AES-256-CBC, key from AWS Secrets Manager |
telegram_configs |
bot_token_encrypted |
AES-256-CBC |
whatsapp_configs |
api_key_encrypted |
AES-256-CBC |
7.7.2 Row-Level Security (RLS)
For multi-site deployments, RLS policies enforce that users only see data for sites they have access to:
-- Enable RLS on critical tables
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
ALTER TABLE persons ENABLE ROW LEVEL SECURITY;
ALTER TABLE alerts ENABLE ROW LEVEL SECURITY;
-- Policy: Users see only data from their assigned sites
CREATE POLICY site_isolation_events ON events
USING (camera_id IN (
SELECT c.id FROM cameras c
JOIN dvrs d ON c.dvr_id = d.id
JOIN site_users su ON d.site_id = su.site_id
WHERE su.user_id = current_setting('app.current_user_id')::UUID
));
7.7.3 Access Control
| Role | Permissions |
|---|---|
super_admin |
Full access to all sites, all operations |
site_admin |
Full access to assigned sites, user management |
operator |
View dashboards, acknowledge alerts, review persons |
viewer |
Read-only access to dashboards and events |
7.7.4 Audit Trail
The audit_logs table (partitioned monthly) captures every significant action:
| Action | Captured Data |
|---|---|
login |
User, IP, timestamp, MFA status, success/failure |
person_create |
Creator, name, initial status, source event |
person_update |
Updater, changed fields, old/new values |
alert_acknowledge |
Acknowledger, alert ID, timestamp |
alert_resolve |
Resolver, resolution notes |
training_approve |
Approver, model version, dataset version |
model_deploy |
Deployer, version, A/B split percentage |
config_change |
Changer, changed parameters, old/new values |
7.7.5 Backup Strategy
| Component | Method | Frequency | Retention |
|---|---|---|---|
| PostgreSQL | RDS automated backups | Daily | 35 days |
| PostgreSQL | Manual snapshots | Before any schema change | 90 days |
| MinIO/S3 | Cross-region replication | Continuous | 90 days in DR region |
| Face embeddings | pg_dump + vector export | Weekly | 90 days |
| Model artifacts | MLflow artifact store | On training completion | Indefinite |
Reference: For complete DDL including all CREATE TABLE statements, triggers, views, and functions, see
database_schema.md— Sections 2 through 15 contain the full schema definition with comments and constraints.
Section 8: AI Model and Training Strategy
8.1 AI Model Selection
The inference pipeline uses three complementary deep learning models — for human detection, face detection, and face recognition — all optimized with TensorRT for GPU inference. All models run on a single NVIDIA T4 GPU with dynamic batching.
| Component | Model | Framework | Input Size | FPS (T4) | Accuracy |
|---|---|---|---|---|---|
| Human Detection | YOLO11m (Ultralytics) | PyTorch -> ONNX -> TensorRT FP16 | 640 x 640 | 213 | mAP@50: 80.5% (COCO) |
| Face Detection | SCRFD-500M-BNKPS (InsightFace) | PyTorch -> ONNX -> TensorRT FP16 | 640 x 640 | ~400 | AP_medium: 87.2% (WIDERFace) |
| Face Recognition | ArcFace R100 (IR-SE100) | PyTorch -> ONNX -> TensorRT FP16 | 112 x 112 | ~800 | 99.83% (LFW), 98.35% (MegaFace) |
| Person Tracking | ByteTrack | Native Python + NumPy | N/A | N/A | 80.3% MOTA (MOT17) |
| Unknown Clustering | HDBSCAN + DBSCAN fallback | scikit-learn | 512-D vectors | N/A | 89.5% purity, 0.855 BCubed F |
| Fall Detection | YOLOv8n-pose | TensorRT FP16 | 640 x 640 | ~300 | Part of suspicious activity |
| Object Detection | YOLOv8s | TensorRT FP16 | 640 x 640 | ~450 | Abandoned object detection |
8.1.1 Human Detection: YOLO11m
| Property | Value |
|---|---|
| Architecture | CSPDarknet backbone + PANet neck + Decoupled head |
| Parameters | 19.6 M |
| FLOPs | 68.2 B (at 640x640) |
| TensorRT Optimization | FP16, dynamic batch (1-16), layer fusion |
| GPU Memory | ~2.1 GB at batch=8 |
| Person class priority | Highest NMS score weighting for person class |
| Preprocessing | Letterbox resize to 640x640, normalize [0,1] |
Export pipeline:
# PyTorch -> ONNX -> TensorRT Engine
yolo export model=yolo11m.pt format=onnx imgsz=640 half=True opset=17 simplify=True
trtexec --onnx=yolo11m.onnx --saveEngine=yolo11m.engine --fp16 \
--minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:16x3x640x640
8.1.2 Face Detection: SCRFD-500M-BNKPS
| Property | Value |
|---|---|
| Architecture | Single-stage detector with FPN, BN+KPS head |
| Parameters | 500 M (large variant for high accuracy) |
| Detects | Face bounding box + 5 facial landmarks |
| Minimum face size | 20 x 20 pixels (configurable) |
| NMS threshold | 0.45 (IoU) |
| Confidence threshold | 0.5 (minimum detection score) |
| GPU Memory | ~1.8 GB at batch=32 |
8.1.3 Face Recognition: ArcFace R100 (IR-SE100)
| Property | Value |
|---|---|
| Backbone | IR-SE100 (Improved ResNet-100 with SE blocks) |
| Training data | MS1MV3 (5.8M images, 85K identities) |
| Loss function | ArcFace additive angular margin (m=0.5) |
| Embedding dimension | 512 (float32, L2-normalized) |
| Distance metric | Cosine similarity (1 - cosine_distance) |
| Matching threshold (strict) | 0.60 |
| Matching threshold (balanced) | 0.45 |
| Matching threshold (relaxed) | 0.30 |
| GPU Memory | ~3.2 GB at batch=64 |
Published benchmarks on standard datasets:
| Dataset | Accuracy | Notes |
|---|---|---|
| LFW (Labeled Faces in the Wild) | 99.83% | Unconstrained face verification |
| CFP-FP (Frontal-Profile) | 99.17% | Cross-pose evaluation |
| AgeDB-30 | 98.28% | Age-invariant recognition |
| MegaFace (1M distractors) | 98.35% | Large-scale recognition |
| IJB-C | 96.18% (TAR@FAR=1e-4) | Template-based verification |
8.2 Inference Pipeline Architecture
+=============================================================================+
| REAL-TIME INFERENCE PIPELINE |
+=============================================================================+
| |
| INPUT: RTSP Frame (640x640, 1 fps per stream) |
| | |
| v |
| +-------------------+ +-------------------+ +-------------------+ |
| | Frame Preprocessor| -> | YOLO11m Detector | -> | Person Detection | |
| | - Resize | | (TensorRT FP16) | | Results: | |
| | - Normalize | | GPU: 12ms (P50) | | - bbox (x1,y1,x2, | |
| | - NCHW layout | | Batch: 1-16 | | y2) | |
| +-------------------+ +-------------------+ | - confidence | |
| | - class (person) | |
| +---------+---------+ |
| | |
| v |
| +-------------------+ +-------------------+ +-------------------+ |
| | Face Crop Extract | <- | SCRFD-500M | <- | Face Detection | |
| | (ROI from person | | (TensorRT FP16) | | Results: | |
| | bounding box) | | GPU: 8ms (P50) | | - face bbox | |
| | | | Batch: per-face | | - 5 landmarks | |
| +-------------------+ +-------------------+ | - confidence | |
| +---------+---------+ |
| | |
| v |
| +-------------------+ +-------------------+ +-------------------+ |
| | Face Alignment | <- | ArcFace R100 | <- | Embedding Vector | |
| | (5-point affine | | (TensorRT FP16) | | 512-D float32, | |
| | transform to | | GPU: 5ms (P50) | | L2-normalized | |
| | 112x112) | | Batch: 1-64 | | | |
| +-------------------+ +-------------------+ +---------+---------+ |
| | |
| v |
| +-------------------+ +-------------------+ +-------------------+ |
| | Face Matching | <- | Person Tracking | <- | Track-to-Person | |
| | (cosine similarity| | (ByteTrack) | | Association | |
| | vs. known DB) | | CPU: 2ms/frame | | - Match embedding | |
| +-------------------+ +-------------------+ | to known persons | |
| | | | | - Create/update | |
| | | | | track | |
| v v v +-------------------+ |
| +-------------------+ |
| | Confidence Scorer | |
| | (aggregate score | |
| | for all detect) | |
| +-------------------+ |
| | |
| v |
| OUTPUT: DetectionEvent (JSON) |
| { person_id, track_id, confidence, bbox, face_crop, |
| embedding, recognized_name?, quality_scores } |
| |
+=============================================================================+
End-to-end latency budget per frame:
| Stage | GPU | CPU Fallback |
|---|---|---|
| Frame preprocessing | 2-5 ms | 5-10 ms |
| YOLO11m detection | 12 ms (P50) | 35-56 ms (ONNX+OpenVINO) |
| SCRFD face detection | 8 ms (P50) | 15-25 ms |
| ArcFace embedding (per face) | 5 ms (P50) | 12-18 ms |
| ByteTrack tracking | 2 ms | 2-5 ms |
| Post-processing | 5-10 ms | 10-20 ms |
| Total (no face) | ~29 ms | ~67-116 ms |
| Total (1 face) | ~34 ms | ~79-134 ms |
| Total (5 faces) | ~54 ms | ~127-214 ms |
8.3 Face Recognition Matching Strategy
8.3.1 Known Person Matching
+-----------------------------------------------------------------------------+
| FACE RECOGNITION MATCHING FLOW |
+-----------------------------------------------------------------------------+
| |
| New Face Embedding (512-D) |
| | |
| v |
| +-------------------+ |
| | L2 Normalize | embedding = embedding / ||embedding||_2 |
| +-------------------+ |
| | |
| v |
| +-------------------+ +-------------------+ |
| | pgvector HNSW | -> | Top-5 Candidates | |
| | Similarity Search | | (cosine distance) | |
| | ef_search=128 | +-------------------+ |
| +-------------------+ | |
| v |
| +-------------------+ +-------------------+ |
| | Threshold Check | <- | Best Match Score | |
| | (per AI Vibe) | +-------------------+ |
| +-------------------+ | |
| | | |
| +------------+-------------+ |
| | |
| +----------+----------+ |
| | | |
| v v |
| Above threshold Below threshold |
| (Recognized) (Unknown) |
| | | |
| v v |
| +------------+ +------------------+ |
| | Assign to | | Check against | |
| | known | | recent unknown | |
| | person_id | | embeddings | |
| | (with | | (5-min window) | |
| | confidence)| +--------+---------+ |
| +------------+ | |
| | |
| +--------+--------+ |
| | | |
| v v |
| Similar unknown No similar unknown |
| (same person) (new unknown) |
| | | |
| v v |
| Reuse person_id Create new |
| Update centroid unknown person |
| record |
| |
+-----------------------------------------------------------------------------+
8.3.2 AI Vibe Threshold Mapping
The AI Vibe system maps three intuitive presets to internal confidence thresholds:
| Vibe | Face Match Threshold | Detection Confidence | Use Case |
|---|---|---|---|
| Relaxed | 0.30 cosine similarity | 0.40 minimum | Known persons re-identified more easily; more false positives acceptable |
| Balanced | 0.45 cosine similarity | 0.55 minimum | Default; good precision-recall tradeoff |
| Strict | 0.60 cosine similarity | 0.70 minimum | High-security scenarios; minimize false positives |
Per-stream Vibe Selection:
- Vibe can be set per camera via dashboard
- Night mode automatically applies Strict vibe
- Alert-triggered cameras automatically upgrade to Strict for 5 minutes
8.4 Unknown Person Clustering Approach
Unknown persons (faces that don't match any known person above threshold) are automatically clustered to help operators identify recurring visitors.
8.4.1 Clustering Pipeline
+-----------------------------------------------------------------------------+
| UNKNOWN PERSON CLUSTERING PIPELINE |
+-----------------------------------------------------------------------------+
| |
| Unknown Face Embeddings (streaming) |
| | |
| v |
| +-------------------+ |
| | Sliding Window | Keep last N embeddings in memory (configurable) |
| | Buffer (500) | + persistent storage for long-term clustering |
| +-------------------+ |
| | |
| v |
| +-------------------+ +-------------------+ |
| | HDBSCAN Clustering| -> | Primary clusters | min_cluster_size=5 |
| | (density-based) | | formed | min_samples=2 |
| | metric=cosine | +-------------------+ eps=auto |
| +-------------------+ | |
| | (fallback) | |
| v v |
| +-------------------+ +-------------------+ |
| | DBSCAN Fallback | | Merge with | Check: temporal gap |
| | (if HDBSCAN fails | | existing clusters | < 30 days, cosine sim |
| | to find structure| | - centroid | > 0.85 |
| +-------------------+ | distance | |
| +-------------------+ |
| | |
| v |
| +-------------------+ |
| | Operator Review | Dashboard shows clusters |
| | Queue | pending identification |
| +-------------------+ |
| |
+-----------------------------------------------------------------------------+
8.4.2 Clustering Parameters
| Parameter | Value | Description |
|---|---|---|
| Algorithm | HDBSCAN (primary), DBSCAN (fallback) | Density-based for irregular cluster shapes |
| Distance metric | Cosine similarity | Optimal for face embeddings |
| Minimum cluster size | 5 embeddings | Minimum to form a cluster |
| Minimum samples | 2 | Core point density threshold |
| Merge threshold | 0.85 cosine similarity | Merge clusters if centroids are close |
| Temporal window | 30 days | Maximum gap between cluster appearances |
| Review trigger | 10+ embeddings | Send to operator review queue |
8.4.3 Clustering Quality Targets
| Metric | Target | Measurement |
|---|---|---|
| Cluster Purity | > 89% | % of embeddings in a cluster belonging to the same person |
| BCubed F-Measure | > 0.85 | Harmonic mean of precision and recall for clustering |
| Silhouette Score | > 0.3 | Separation quality between clusters |
| False Merge Rate | < 5% | Different persons incorrectly merged |
| Split Rate | < 15% | Same person split into multiple clusters |
8.5 Confidence Handling
8.5.1 Confidence Score Computation
Each detection event carries an aggregate confidence score computed from multiple signals:
confidence_aggregate = weighted_average(
detection_confidence: 0.35 * yolo_confidence,
face_detection_quality: 0.25 * scrfd_confidence,
face_recognition_score: 0.25 * (1 - cosine_distance_to_match),
face_quality_score: 0.15 * quality_composite
)
Where quality_composite = average(
1.0 - blur_score, # Sharpness (higher is better)
1.0 - abs(pose_yaw)/90, # Frontal preference
illumination_score, # Well-lit face
resolution_adequacy # Sufficient pixels for face
)
8.5.2 Confidence Levels
| Level | Score Range | Color | Action |
|---|---|---|---|
| High Confidence | 0.80 - 1.00 | Green | Auto-accept, no review needed |
| Medium Confidence | 0.60 - 0.79 | Yellow | Accepted, flagged for periodic review |
| Low Confidence | 0.40 - 0.59 | Orange | Requires operator review within 24h |
| Very Low Confidence | 0.00 - 0.39 | Red | Rejected, not used for training |
8.6 Training Workflow Overview
The safe self-learning system captures operator feedback and converts it into model improvements through a carefully controlled pipeline.
8.6.1 Three Learning Modes
| Mode | Description | Use Case | Risk Level |
|---|---|---|---|
| Manual Only | Operator explicitly triggers training runs | Highly regulated environments | Lowest |
| Suggested Learning (Recommended) | System suggests training candidates; operator approves | Standard production deployment | Low |
| Approved Auto-Update | Auto-training triggers after admin approval threshold | Mature deployment with trusted operators | Medium |
8.6.2 Training Pipeline Architecture
+=============================================================================+
| SAFE SELF-LEARNING PIPELINE |
+=============================================================================+
| |
| STEP 1: COLLECTION |
| +-------------------+ |
| | Operator Review | confirm, correct_name, merge, reject |
| | Actions | + automatic high-confidence acceptances |
| +-------------------+ |
| | |
| v |
| STEP 2: CONFLICT DETECTION (Synchronous, blocks immediately) |
| +-------------------+ +-------------------+ +-------------------+ |
| | Label Conflict | -> | If conflict found | -> | Block from training | |
| | Detector | | (5 types) | | dataset, alert admin | |
| | - Same face, diff | +-------------------+ +-------------------+ |
| | names | |
| | - Diff faces, same| |
| | name | |
| | - Merge circular | |
| | reference | |
| | - Name to already-| |
| | deleted person | |
| | - Quality below | |
| | threshold | |
| +-------------------+ |
| | |
| v |
| STEP 3: DATASET CURATION |
| +-------------------+ |
| | Training Dataset | - Collect approved examples |
| | Builder | - Balance classes (min 5 per person) |
| | | - Augmentation (flip, rotate, brightness) |
| | | - Quality filter (blur, pose, illumination) |
| | | - Train/val split (80/20) |
| +-------------------+ |
| | |
| v |
| STEP 4: MODEL TRAINING |
| +-------------------+ |
| | Training Job | - ArcFace R100 backbone |
| | (Airflow DAG) | - Fine-tuning on curated dataset |
| | | - Cosine annealing LR schedule |
| | | - Early stopping (patience=10) |
| | | - Mixed precision (AMP) |
| | | - Typical duration: 2-8 hours on V100 |
| +-------------------+ |
| | |
| v |
| STEP 5: QUALITY GATES |
| +-------------------+ +-------------------+ +-------------------+ |
| | Gate 1: Hold-out | -> | Gate 2: Compare | -> | Gate 3: Identity | |
| | evaluation | | vs current | | accuracy | |
| | (precision, | | production | | (100% known) | |
| | recall, f1) | | (no >2% regress)| | | |
| +-------------------+ +-------------------+ +-------------------+ |
| | | | |
| +------------+-------------+--------------------------+ |
| | |
| +----------+----------+ |
| | | |
| v v |
| ALL PASSED ANY FAILED |
| | | |
| v v |
| +------------+ +------------------+ |
| | Proceed to | | REJECT | |
| | Deployment | | - Log failure | |
| +------------+ | - Alert admin | |
| | - Keep in staging| |
| +------------------+ |
| |
| STEP 6: DEPLOYMENT |
| +-------------------+ |
| | A/B Testing | - Shadow mode: 0% traffic (validation) |
| | (gradual rollout) | - Canary: 5% traffic for 24h |
| | | - Monitor: latency, error rate, FP rate |
| | | - Full rollout: 100% traffic |
| | | - Rollback: < 60 seconds to previous version |
| +-------------------+ |
| |
+=============================================================================+
8.7 Model Versioning and Rollback
8.7.1 Semantic Versioning
| Version Component | Increment When | Example |
|---|---|---|
| MAJOR (X.0.0) | Full retraining, architecture change, breaking embedding change | 1.0.0 -> 2.0.0 (new backbone) |
| MINOR (x.Y.0) | Fine-tuning, significant new data (>50 new identities) | 1.0.0 -> 1.1.0 (new employees) |
| PATCH (x.y.Z) | Incremental update, centroid update, hotfix | 1.0.0 -> 1.0.1 (new photos added) |
8.7.2 Version States
| State | Description | Transition |
|---|---|---|
TRAINING |
Model is being trained | Auto -> STAGING on completion |
STAGING |
Awaiting quality gate evaluation | Auto -> AWAITING_APPROVAL on pass |
AWAITING_APPROVAL |
Pending admin approval | Manual -> CANARY on approve |
CANARY |
5% traffic, monitoring | Auto -> PRODUCTION on success (24h) |
PRODUCTION |
100% traffic, active serving | Manual -> ARCHIVED on new version deploy |
ARCHIVED |
Kept for rollback, no traffic | Auto -> ROLLBACK_AVAILABLE after 30 days |
ROLLBACK_AVAILABLE |
Can be rolled back to | Manual -> PRODUCTION on rollback trigger |
DEPRECATED |
Cannot be rolled back to | Final state |
8.7.3 Rollback Procedure
+-----------------------------------------------------------------------------+
| EMERGENCY ROLLBACK PROCEDURE |
+-----------------------------------------------------------------------------+
| |
| Trigger: Admin initiates rollback or automatic rollback on failure |
| |
| Step 1: Validate target version exists and is in ROLLBACK_AVAILABLE state |
| Step 2: Load target model artifacts from S3/MinIO (pre-warm GPU) |
| Step 3: Atomic switch: update model reference in Triton config |
| Step 4: Triton SIGHUP reload (zero-downtime model swap) |
| Step 5: Validate: send test inference requests, check latency |
| Step 6: If validation fails -> auto-revert to previous production |
| Step 7: If validation passes -> update database model version records |
| Step 8: Log rollback event in audit_logs |
| |
| Maximum rollback time: < 60 seconds |
| Zero inference downtime during rollback |
| |
+-----------------------------------------------------------------------------+
8.8 Quality Gates
8.8.1 Gate Thresholds
| Gate | Metric | Minimum | Maximum | Critical |
|---|---|---|---|---|
| Hold-out Evaluation | Precision | 0.97 | — | Yes (cannot override) |
| Hold-out Evaluation | Recall | 0.95 | — | Yes |
| Hold-out Evaluation | F1 Score | 0.96 | — | Yes |
| No Regression | Metric regression vs production | — | 2% | No (admin can override) |
| Identity Accuracy | Known identity recall | 100% | — | Yes |
| Latency | P99 inference latency | — | 150 ms | Yes |
| Confusion Analysis | False positive rate | — | 5% | No |
8.8.2 Quality Gate Report Example
{
"gate_run_id": "550e8400-e29b-41d4-a716-446655440000",
"candidate_model_version": "1.2.0",
"baseline_model_version": "1.1.0",
"timestamp": "2024-01-25T10:30:00Z",
"overall_result": "PASSED",
"gates": [
{
"name": "holdout_performance",
"status": "PASSED",
"critical": true,
"metrics": {
"precision": 0.9842,
"recall": 0.9678,
"f1_score": 0.9759
}
},
{
"name": "no_regression",
"status": "PASSED",
"metrics": {
"max_regression_pct": 0.8,
"per_metric": {
"precision": 0.003,
"recall": -0.008,
"f1_score": -0.002
}
}
},
{
"name": "known_identity_accuracy",
"status": "PASSED",
"metrics": {
"known_identities_tested": 142,
"perfect_accuracy": 142,
"accuracy_below_threshold": 0
}
},
{
"name": "latency_requirement",
"status": "PASSED",
"metrics": {
"p50_latency_ms": 45,
"p99_latency_ms": 128,
"threshold_ms": 150
}
}
]
}
8.8.3 Embedding Update Strategies
After a model passes quality gates and is deployed, the face embedding database must be updated. Five strategies are available:
| Strategy | When to Use | Duration | Impact |
|---|---|---|---|
| Centroid Update | Few new examples (<10 per identity), same model | Seconds | Update running mean only |
| Incremental Add | Many new examples (10-100 per identity), same model | Minutes | Add new embeddings, keep existing |
| Full Reindex | Model version changed, or >10% of identities updated | Hours | Recompute all embeddings |
| Merge and Update | Identity merge operation | Seconds | Weighted centroid merge |
| Rollback Reindex | Model rollback | Minutes | Restore previous embeddings |
Decision Matrix:
+-----------------------------------------------------------------------------+
| EMBEDDING UPDATE STRATEGY SELECTION |
+-----------------------------------------------------------------------------+
| |
| Model changed? |
| | |
| +-- YES -> FULL_REINDEX (required, embeddings are model-dependent) |
| | |
| NO -> What changed? |
| | |
| +-- Identity merge -> MERGE_AND_UPDATE |
| | |
| +-- Rollback -> ROLLBACK_REINDEX |
| | |
| +-- New examples? |
| | |
| +-- < 10 per identity, < 10% total -> CENTROID_UPDATE |
| | |
| +-- Otherwise -> INCREMENTAL_ADD |
| |
+-----------------------------------------------------------------------------+
Reference: For complete model export commands, INT8 calibration scripts, performance benchmarks, and the full Python module structure, see
ai_vision.md— Sections 10-14. For the complete training pipeline code, Airflow DAG definitions, and quality gate implementations, seetraining_system.md— Sections 5-10.
Section 9: Suspicious Activity Night-Mode Design
9.1 Overview
The suspicious activity detection system provides comprehensive behavioral analysis during night hours (22:00-06:00 by default) through 10 specialized detection modules. Each module operates on the output of the AI inference pipeline (detected persons, tracked positions, and face identities) to identify anomalous behavior patterns.
The system features a composite scoring engine that combines signals from all modules with exponential time-decay, enabling unified threat assessment and intelligent escalation. Each camera can be independently configured with custom zones, thresholds, and schedules.
9.2 Ten Detection Modules Summary
| # | Module | Description | Severity | Key CV Model |
|---|---|---|---|---|
| 1 | Intrusion Detection | Detects persons entering restricted polygon zones | HIGH (default) | YOLO11m detections + zone polygon |
| 2 | Loitering Detection | Flags persons dwelling in an area longer than threshold | MEDIUM (default) | ByteTrack + timer per track |
| 3 | Running Detection | Identifies abnormally fast movement | MEDIUM (default) | YOLOv8n-pose + optical flow speed |
| 4 | Crowding Detection | Alerts when group density exceeds threshold | HIGH (default) | DBSCAN spatial clustering |
| 5 | Fall Detection | Detects persons falling or collapsing | CRITICAL | YOLOv8n-pose keypoint analysis |
| 6 | Abandoned Object | Identifies unattended objects left behind | HIGH (default) | YOLOv8s + MOG2 background subtraction |
| 7 | After-Hours Presence | Detects any person presence during night hours | MEDIUM (default) | YOLO11m person class only |
| 8 | Zone Breach | Triggers on crossing virtual boundary lines | MEDIUM (default) | ByteTrack + line crossing algorithm |
| 9 | Repeated Re-entry | Flags patterns of entering/exiting an area multiple times | MEDIUM (default) | ByteTrack + entry/exit state machine |
| 10 | Suspicious Dwell Time | Alerts on extended presence near sensitive areas | MEDIUM (configurable) | ByteTrack + per-zone timers |
9.3 Module Details
9.3.1 Module 1: Intrusion Detection
Detects when a person enters a user-defined restricted polygon zone.
| Parameter | Default | Range | Description |
|---|---|---|---|
confidence_threshold |
0.55 | 0.3-0.9 | Minimum person detection confidence |
overlap_threshold |
0.30 | 0.1-0.9 | Min IoU between person bbox and zone |
cooldown_seconds |
60 | 0-3600 | Cooldown before re-alerting same zone |
zone_severity |
HIGH | LOW/MEDIUM/HIGH | Per-zone configurable |
Algorithm:
For each detected person:
For each restricted zone polygon:
Compute IoU(person_bbox, zone_polygon)
If IoU > overlap_threshold AND confidence > confidence_threshold:
If zone not in cooldown:
Trigger INTRUSION alert
Start cooldown timer
9.3.2 Module 2: Loitering Detection
Flags persons who remain in an area longer than a threshold.
| Parameter | Default | Range | Description |
|---|---|---|---|
dwell_time_threshold_seconds |
300 | 30-1800 | Time before triggering loitering alert |
movement_tolerance_pixels |
50 | 10-200 | Max centroid movement to still count as "stationary" |
cooldown_seconds |
300 | 0-3600 | Cooldown after alert |
Algorithm:
For each active track:
If track centroid moved < tolerance in last N seconds:
Increment dwell timer
If dwell_timer > threshold:
Trigger LOITERING alert
Reset timer (or hold until movement detected)
Else:
Reset dwell timer
9.3.3 Module 3: Running Detection
Identifies abnormally fast movement using pose keypoints and optical flow.
| Parameter | Default | Range | Description |
|---|---|---|---|
speed_threshold_pixels_per_second |
150 | 50-500 | Pixel speed threshold |
speed_threshold_kmh |
15.0 | 5-40 | Real-world speed (requires calibration) |
confirmation_frames |
3 | 1-10 | Consecutive frames to confirm running |
Algorithm:
For each active track:
Compute torso keypoint displacement between frames
Convert pixel speed to km/h (if calibration available)
Apply Farneback optical flow for refinement
If speed > threshold for confirmation_frames:
Trigger RUNNING alert
9.3.4 Module 4: Crowding Detection
Alerts when person group density exceeds threshold.
| Parameter | Default | Range | Description |
|---|---|---|---|
count_threshold |
5 | 2-50 | Minimum person count in cluster |
area_threshold |
0.15 | 0.05-0.5 | Fraction of frame covered by group |
density_threshold |
0.05 | 0.01-0.2 | Persons per square meter (calibrated) |
dbscan_eps |
0.08 | 0.01-0.3 | DBSCAN neighborhood radius (normalized) |
Algorithm:
Collect all person centroids in current frame
Run DBSCAN(eps=0.08, min_samples=2) on centroids
For each cluster:
If cluster_size >= count_threshold OR cluster_area >= area_threshold:
Trigger CROWDING alert
9.3.5 Module 5: Fall Detection
Detects persons falling or collapsing using pose keypoint analysis.
| Parameter | Default | Range | Description |
|---|---|---|---|
fall_score_threshold |
0.75 | 0.5-0.95 | Combined fall confidence score |
min_keypoint_confidence |
0.30 | 0.1-0.5 | Minimum keypoint detection confidence |
torso_angle_threshold_deg |
45 | 30-75 | Torso angle from vertical to trigger |
aspect_ratio_threshold |
1.2 | 0.8-2.0 | Width/height ratio of person bbox |
temporal_confirmation_ms |
1000 | 500-3000 | Duration to confirm fall (not just bend) |
Algorithm:
For each detected person with pose keypoints:
Compute torso angle from vertical (using shoulder-hip line)
Compute bbox aspect ratio
Check if person is on ground (feet keypoint confidence drops)
Calculate fall_score = weighted_combination(angle, aspect_ratio, ground_contact)
If fall_score > threshold AND duration > confirmation_ms:
Trigger FALL alert (CRITICAL severity)
9.3.6 Module 6: Abandoned Object Detection
Identifies unattended objects using background subtraction and object detection.
| Parameter | Default | Range | Description |
|---|---|---|---|
unattended_time_threshold_seconds |
60 | 10-600 | Time before object is considered abandoned |
proximity_threshold_pixels |
100 | 20-300 | Max distance from owner before "unattended" |
watchlist_classes |
["backpack", "suitcase", "box", "bag"] | — | Object classes to monitor |
bg_learning_rate |
0.005 | 0.001-0.01 | MOG2 background model learning rate |
Algorithm:
Run YOLOv8s to detect objects in watchlist_classes
Run MOG2 background subtraction to identify static foreground
For each detected object:
Track owner proximity (nearest person)
If owner distance > threshold AND object stationary > time_threshold:
Trigger ABANDONED_OBJECT alert
9.3.7 Module 7: After-Hours Presence
Simple but effective: any person detected during night hours triggers an alert.
| Parameter | Default | Range | Description |
|---|---|---|---|
detection_confidence_threshold |
0.50 | 0.3-0.9 | Minimum person detection confidence |
min_detection_frames |
5 | 1-30 | Frames to confirm (avoid false positives) |
check_authorized_personnel |
false | true/false | If true, check against known persons whitelist |
9.3.8 Module 8: Zone Breach
Detects crossing of virtual boundary lines (directional or bidirectional).
| Parameter | Default | Range | Description |
|---|---|---|---|
boundary_lines |
[] (user-defined) | — | Array of {start, end, direction, severity} |
allowed_direction |
"both" | both/a_to_b/b_to_a | Which direction is allowed |
crossing_threshold_pixels |
20 | 5-100 | Min distance past line to trigger |
cooldown_seconds |
30 | 0-3600 | Cooldown per (track, line) pair |
Algorithm:
For each active track:
For each boundary line:
Check if track centroid crosses line in forbidden direction
Using line equation: ax + by + c = 0, check sign change
If crossed AND distance_past_line > threshold:
Trigger ZONE_BREACH alert
9.3.9 Module 9: Repeated Re-entry Patterns
Detects suspicious patterns of entering and exiting an area multiple times.
| Parameter | Default | Range | Description |
|---|---|---|---|
reentry_zone |
Full frame | polygon | Area to monitor for entries/exits |
time_window_seconds |
600 | 60-3600 | Time window for counting cycles |
reentry_threshold |
3 | 2-10 | Min entry/exit cycles to trigger |
min_cycle_duration_seconds |
30 | 5-300 | Min duration of one cycle |
State Machine:
For each track:
Track state: OUTSIDE -> ENTERING -> INSIDE -> EXITING -> OUTSIDE
Each complete cycle (entry + exit) increments counter
If cycle_count >= threshold within time_window:
Trigger REENTRY_PATTERN alert
9.3.10 Module 10: Suspicious Dwell Time
Extended presence near sensitive areas (different from general loitering).
| Parameter | Default | Range | Description |
|---|---|---|---|
sensitive_zones |
[] (user-defined) | — | Zones with custom dwell thresholds |
default_dwell_threshold_seconds |
120 | 10-1800 | Default threshold |
max_gap_seconds |
5.0 | 1.0-30.0 | Max disappearance gap before timer reset |
Predefined zone types with default thresholds:
| Zone Type | Default Threshold | Default Severity |
|---|---|---|
main_entrance |
60s | MEDIUM |
emergency_exit |
30s | HIGH |
equipment_room |
45s | HIGH |
storage_area |
120s | MEDIUM |
elevator_bank |
90s | LOW |
parking_access |
60s | MEDIUM |
9.4 Activity Scoring Engine
9.4.1 Composite Score Formula
All 10 modules feed into a unified scoring engine that produces a single suspicious activity score per camera:
S_total(t) = SUM_i( weight_i * signal_i(t) * decay(t - t_i) ) + bonus_cross_module
Where:
weight_i: module-specific weight (see table below)
signal_i(t): normalized signal value from module i [0, 1]
decay(delta_t): exponential time-decay function
bonus_cross_module: extra score when multiple modules fire simultaneously
t_i: timestamp of most recent event from module i
9.4.2 Module Weights
| Module | Weight | Signal Source | Signal Range |
|---|---|---|---|
| Intrusion Detection | 0.25 | overlap_ratio * confidence | 0.0 - 1.0 |
| Loitering Detection | 0.15 | dwell_ratio (dwell_time / threshold) | 0.0 - 1.0+ |
| Running Detection | 0.10 | speed_ratio normalized | 0.0 - 1.0+ |
| Crowding Detection | 0.12 | crowd_density_score | 0.0 - 1.0 |
| Fall Detection | 0.20 | fall_confidence_score | 0.0 - 1.0 |
| Abandoned Object | 0.18 | unattended_ratio (duration / threshold) | 0.0 - 1.0+ |
| After-Hours Presence | 0.05 | binary (1 if detected) * zone_severity_multiplier | 0.0 - 1.0 |
| Zone Breach | 0.12 | severity_mapped (LOW=0.3, MED=0.6, HIGH=1.0) | 0.0 - 1.0 |
| Re-entry Patterns | 0.10 | cycle_ratio (count / threshold) | 0.0 - 1.0+ |
| Suspicious Dwell | 0.13 | dwell_ratio (duration / zone_threshold) | 0.0 - 1.0+ |
Note: Weights sum to 1.40 — this is intentional to allow cross-module amplification when multiple modules fire simultaneously.
9.4.3 Time-Decay Function
def time_decay(delta_t_seconds, half_life=300):
"""Exponential decay with 5-minute half-life by default."""
import math
return math.exp(-0.693 * delta_t_seconds / half_life)
# Decay reference:
# 0 min -> 1.000 (full contribution)
# 1 min -> 0.871
# 5 min -> 0.500
# 10 min -> 0.250
# 20 min -> 0.063
# 30 min -> 0.016 (effectively zero)
9.4.4 Cross-Module Amplification Bonus
When multiple modules detect simultaneously for the same track or in close proximity:
def compute_cross_module_bonus(active_signals, proximity_weight=0.15):
n_modules = len(active_signals)
if n_modules <= 1:
return 0.0
# Base bonus: +15% per additional module
base_bonus = proximity_weight * (n_modules - 1)
# Track overlap: same person triggering multiple rules -> higher threat
track_bonus = 0.10 * (n_same_track_signals - 1) if n_same_track_signals >= 2 else 0
# Zone overlap: multiple signals in same zone -> higher threat
zone_bonus = 0.08 * (n_same_zone_signals - 1) if n_same_zone_signals >= 2 else 0
return min(base_bonus + track_bonus + zone_bonus, 0.50) # Cap at +0.50
9.4.5 Escalation Thresholds
| Score Range | Threat Level | Color | Actions |
|---|---|---|---|
| 0.00 - 0.20 | NONE | Gray | Log only, no alert |
| 0.20 - 0.40 | LOW | Blue | Log + dashboard indicator |
| 0.40 - 0.60 | MEDIUM | Yellow | Log + non-urgent alert dispatch |
| 0.60 - 0.80 | HIGH | Orange | Log + immediate alert + highlight |
| 0.80 - 1.00 | CRITICAL | Red | Log + all channels + security dispatch recommendation |
| > 1.00 | EMERGENCY | Purple/Flashing | All channels + automatic escalation to security lead |
9.5 Night Mode Scheduler
9.5.1 Automatic Schedule
| Parameter | Default | Configurable |
|---|---|---|
| Start time | 22:00 (10 PM) | Yes, per camera |
| End time | 06:00 (6 AM) | Yes, per camera |
| Gradual transition | 15 minutes | Yes (0-60 min) |
| Timezone | Local site timezone | Yes |
| Override | Manual toggle available | Admin only |
9.5.2 Gradual Transition
During the 15-minute transition window, sensitivity ramps linearly:
Transition Start (21:45) Night Full (22:00) Transition End (22:15)
| | |
v v v
Sensitivity: 0% ---- 25% ---- 50% ---- 75% ---- 100% ---- 100% ---- 100%
|__________|__________|__________|__________|__________|
Ramp up to full night sensitivity over 15 minutes
This prevents sudden spikes in alerts when night mode activates.
9.5.3 Night Mode Behavior Changes
| Aspect | Day Mode | Night Mode |
|---|---|---|
| Detection modules | Intrusion, Crowding, Fall, Abandoned Object | All 10 modules active |
| AI Vibe preset | Per-camera setting | Automatically Strict |
| Confidence threshold | Per-camera setting | +0.10 (stricter) |
| Scoring engine weights | Standard weights | +25% intrusion, +20% fall |
| Alert suppression | 5-minute cooldown | 2-minute cooldown (faster alerts) |
| After-hours detection | Disabled | Enabled (primary night function) |
9.6 Per-Camera Configuration
Each camera has independent configuration for all detection modules:
# Example: Camera 1 - Main Entrance
cam_01:
enabled: true
location: "Main Entrance Lobby"
night_mode:
enabled: true
custom_schedule: null # Use system default (22:00-06:00)
sensitivity_multiplier: 1.0 # Standard sensitivity
intrusion_detection:
enabled: true
confidence_threshold: 0.65
overlap_threshold: 0.30
cooldown_seconds: 30
restricted_zones:
- zone_id: "server_room_door"
polygon: [[0.65,0.20], [0.85,0.20], [0.85,0.60], [0.65,0.60]]
severity: "HIGH"
loitering_detection:
enabled: true
dwell_time_threshold_seconds: 300
movement_tolerance_pixels: 50
running_detection:
enabled: true
speed_threshold_pixels_per_second: 150
confirmation_frames: 3
fall_detection:
enabled: true
fall_score_threshold: 0.75
temporal_confirmation_ms: 1000
# ... (all 10 modules configured)
9.7 Alert Generation Logic
9.7.1 Alert Lifecycle
+------------+ +------------+ +------------+ +------------+
| DETECTED | -> | SUPPRESSED | -> | EVIDENCE | -> | DISPATCHED |
| (Rule fire)| | (Dedup) | | (Capture) | | (Send) |
+------------+ +------------+ +------------+ +------------+
|
v
+------------+
| ACKNOWLEDGE|
| or AUTO |
+------------+
9.7.2 Suppression Rules
| Condition | Action | Reason |
|---|---|---|
| Duplicate within suppression window | Log + increment counter | Prevent alert spam |
| Detection confidence < rule minimum | Log only | Insufficient evidence |
| Threat score < LOW threshold | Log only | Below alert threshold |
| Max alerts/hour for camera exceeded | Log + rate-limit flag | Prevent overflow |
| Composite score indicates low overall threat | Log + dashboard only | Reduce noise |
9.7.3 Suppression Configuration
| Parameter | Default | Range |
|---|---|---|
| Default suppression window | 5 minutes | 0-60 minutes |
| Max alerts per hour per camera | 20 | 5-100 |
| Max alerts per hour per rule | 10 | 5-50 |
| Evidence snapshot frames before | 5 frames | 1-30 |
| Evidence snapshot frames after | 10 frames | 1-30 |
| Evidence clip duration | 10 seconds | 5-60 |
9.7.4 Severity Assignment
Final alert severity considers both the triggering module and the composite score context:
def assign_alert_severity(detection_event, composite_score):
base_severity = detection_event['severity'] # From module config
severity_levels = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 3, 'CRITICAL': 4}
base_level = severity_levels.get(base_severity, 2)
# Escalation: high composite score bumps severity up one level
if composite_score >= 0.80 and base_level < 3:
base_level = min(base_level + 1, 4)
# Escalation: multiple concurrent detections for same track
if detection_event.get('concurrent_detections_count', 0) >= 2:
base_level = min(base_level + 1, 4)
# Zone-specific escalation override
if detection_event.get('zone_severity_override'):
zone_level = severity_levels.get(detection_event['zone_severity_override'], base_level)
base_level = max(base_level, zone_level)
reverse_levels = {v: k for k, v in severity_levels.items()}
return reverse_levels.get(base_level, 'MEDIUM')
9.8 Integration with Main AI Pipeline
The suspicious activity service consumes detection events from the main AI pipeline:
+-----------------------------------------------------------------------------+
| SUSPICIOUS ACTIVITY INTEGRATION WITH MAIN PIPELINE |
+-----------------------------------------------------------------------------+
| |
| Main AI Pipeline Output: |
| { person_id, track_id, bbox, keypoints, face_embedding, timestamp, |
| camera_id, confidence, face_crop_path } |
| | |
| v |
| +-------------------+ +-------------------+ +-------------------+ |
| | Kafka Topic | -> | Suspicious Activity| -> | Scoring Engine | |
| | ai.detections | | Service | | (per camera) | |
| | (JSON events) | | - 10 modules | | - Composite score | |
| +-------------------+ | - Per-camera config| | - Time decay | |
| | - Zone polygons | | - Cross-module | |
| +-------------------+ | bonus | |
| +---------+---------+ |
| | |
| v |
| +-------------------+ +-------------------+ |
| | Alert Manager | <- | Scoring Output | |
| | - Deduplicate | | - Score [0, 1.5] | |
| | - Rate limit | | - Threat level | |
| | - Severity assign | | - Active signals | |
| +---------+---------+ +-------------------+ |
| | |
| v |
| +-------------------+ |
| | Alerts Table (DB) | |
| | Notification Svc | |
| +-------------------+ |
| |
+-----------------------------------------------------------------------------+
Key integration points:
- Suspicious Activity Service is a Kafka consumer on the
ai.detectionstopic - Processes events after face recognition (has access to person identity)
- Produces alert records to the
alerts.criticaltopic for notification dispatch - Updates the composite score in Redis (with TTL = 2 * half_life) for dashboard real-time display
- Stores all alert records in PostgreSQL for history and analytics
Reference: For complete detection algorithm pseudocode, zone configuration YAML schema, scoring engine implementation, and evidence capture logic, see
suspicious_activity.md— Sections 2-6.
Section 10: Live Video Streaming Design
10.1 RTSP Stream Configuration for CP PLUS DVR
10.1.1 URL Format
The CP PLUS ORANGE DVR uses a Dahua-compatible RTSP URL scheme:
rtsp://admin:{password}@{dvr_ip}:554/cam/realmonitor?channel={N}&subtype={M}
Where:
N = channel number (1-8)
M = stream type (0 = main stream, 1 = sub stream)
Example URLs for all 8 channels:
| Channel | Main Stream | Sub Stream |
|---|---|---|
| CH1 | rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0 |
rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=1 |
| CH2 | ...channel=2&subtype=0 |
...channel=2&subtype=1 |
| CH3 | ...channel=3&subtype=0 |
...channel=3&subtype=1 |
| CH4 | ...channel=4&subtype=0 |
...channel=4&subtype=1 |
| CH5 | ...channel=5&subtype=0 |
...channel=5&subtype=1 |
| CH6 | ...channel=6&subtype=0 |
...channel=6&subtype=1 |
| CH7 | ...channel=7&subtype=0 |
...channel=7&subtype=1 |
| CH8 | ...channel=8&subtype=0 |
...channel=8&subtype=1 |
10.1.2 Stream Properties
| Property | Main Stream (subtype=0) | Sub Stream (subtype=1) |
|---|---|---|
| Resolution | 960 x 1080 | 352 x 288 to 704 x 576 |
| Frame rate | 25 FPS (PAL) | 25 FPS |
| Video codec | H.264 High Profile | H.264 Baseline/Main |
| Bitrate | ~4 Mbps per channel | ~1 Mbps per channel |
| Audio | G.711/AAC (optional) | None |
| Use case | Fullscreen viewing, evidence clips | AI inference, multi-camera grid |
10.1.3 Stream Discovery
The edge gateway can auto-discover streams via ONVIF:
from onvif import ONVIFCamera
camera = ONVIFCamera('192.168.29.200', 80, 'admin', 'password')
media_service = camera.create_media_service()
profiles = media_service.GetProfiles()
for profile in profiles:
stream_uri = media_service.GetStreamUri({
'StreamSetup': {'Stream': 'RTP_unicast', 'Transport': 'RTSP'},
'ProfileToken': profile.token
})
print(f"Channel: {profile.token}, URI: {stream_uri.Uri}")
10.2 Edge Gateway Stream Handling
10.2.1 FFmpeg Ingestion Pipeline
The edge gateway runs one FFmpeg process per camera stream:
# Main stream: HLS generation for live viewing
ffmpeg -hide_banner -loglevel warning \
-rtsp_transport tcp -stimeout 5000000 \
-fflags +genpts+discardcorrupt+igndts+ignidx \
-reorder_queue_size 64 -buffer_size 655360 \
-i "rtsp://admin:password@192.168.29.200:554/cam/realmonitor?channel=1&subtype=0" \
-c:v copy -c:a copy \
-f hls -hls_time 2 -hls_list_size 5 -hls_delete_threshold 2 \
-hls_flags delete_segments+omit_endlist+program_date_time \
-hls_segment_filename "/data/hls/ch1_%04d.ts" \
"/data/hls/ch1.m3u8" \
2>> /var/log/ffmpeg_ch1.log
10.2.2 Stream Health Monitoring
| Check | Frequency | Failure Action |
|---|---|---|
| FFmpeg process alive | Every 5s | Restart process |
| RTSP connection health | Every 10s | Reconnect with backoff |
| Frame rate validation | Every 30s | Alert if FPS < 20 |
| Bitrate validation | Every 30s | Alert if bitrate < 50% expected |
| Disk space check | Every 60s | Alert if < 10% free, emergency if < 5% |
10.2.3 Auto-Reconnect Logic
class StreamReconnectManager:
"""Handles RTSP stream reconnection with exponential backoff."""
INITIAL_BACKOFF = 1.0 # seconds
MAX_BACKOFF = 60.0 # seconds
BACKOFF_MULTIPLIER = 2.0
JITTER = 0.1 # 10% random jitter
def __init__(self):
self.current_backoff = self.INITIAL_BACKOFF
self.consecutive_failures = 0
def on_disconnect(self):
self.consecutive_failures += 1
wait_time = min(
self.current_backoff * (self.BACKOFF_MULTIPLIER ** self.consecutive_failures),
self.MAX_BACKOFF
)
# Add jitter to prevent thundering herd
wait_time *= (1 + random.uniform(-self.JITTER, self.JITTER))
return wait_time
def on_success(self):
self.current_backoff = self.INITIAL_BACKOFF
self.consecutive_failures = 0
def should_circuit_break(self):
return self.consecutive_failures >= 5 # Open circuit after 5 failures
10.3 HLS Generation for Dashboard
10.3.1 HLS Segment Configuration
| Parameter | Value | Rationale |
|---|---|---|
Segment duration (-hls_time) |
2 seconds | Balance between latency and segment count |
Playlist size (-hls_list_size) |
5 segments | 10-second sliding window for live playback |
| Delete threshold | 2 segments beyond playlist size | Disk cleanup |
| Flags | delete_segments+omit_endlist+program_date_time |
Live mode, no end list, accurate timing |
| Segment naming | ch{N}_%04d.ts |
Sequential numbering for cache busting |
| Segment path | /data/hls/ |
Fast NVMe storage |
10.3.2 Multi-Bitrate HLS (Optional)
For adaptive bitrate streaming, three variants are generated per channel:
# High quality (main stream, copy codec)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v copy -f hls -hls_time 2 \
-hls_playlist_type vod -hls_segment_filename "ch1_high_%04d.ts" "ch1_high.m3u8"
# Medium quality (transcoded)
ffmpeg -i "rtsp://...channel=1&subtype=0" -c:v libx264 -preset fast -crf 23 \
-vf "scale=640:480" -f hls -hls_time 2 \
-hls_segment_filename "ch1_mid_%04d.ts" "ch1_mid.m3u8"
# Low quality (sub stream)
ffmpeg -i "rtsp://...channel=1&subtype=1" -c:v copy -f hls -hls_time 2 \
-hls_segment_filename "ch1_low_%04d.ts" "ch1_low.m3u8"
Master playlist:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=960x1080
ch1_high.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=640x480
ch1_mid.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=352x288
ch1_low.m3u8
10.3.3 HLS Latency Budget
| Stage | Latency |
|---|---|
| DVR encoding | 50-100 ms |
| RTSP to edge | 1-2 ms |
| FFmpeg demux/remux | 20-50 ms |
| HLS segment duration | 2000 ms (2-second segments) |
| Nginx/CDN delivery | 10-50 ms |
| HLS.js buffer | 2000-4000 ms (1-2 segments) |
| Browser decode + render | 20-50 ms |
| Total (camera to eye) | ~2.1 - 2.3 seconds |
10.4 WebRTC for Low-Latency Single Camera
For single-camera fullscreen viewing where low latency is critical, WebRTC provides sub-second delivery.
10.4.1 WebRTC Architecture
+------------+ +-------------------+ +-------------------+ +--------+
| Browser | | Edge Gateway | | FFmpeg | | DVR |
| (WebRTC |<-->| (WHIP/WHEP |<-->| (decode RTSP, |<-->| RTSP |
| client) | | bridge) | | encode VP8/H.264)| | Server |
+------------+ +-------------------+ +-------------------+ +--------+
10.4.2 WebRTC Configuration
| Parameter | Value |
|---|---|
| Signaling protocol | WHIP ( ingress) / WHEP (egress) |
| Video codec | H.264 (hardware) or VP8 (software) |
| Latency target | < 500 ms end-to-end |
| ICE servers | STUN only (both peers behind NAT) |
| Max bitrate | 3 Mbps |
| Resolution | 960x1080 (main stream) |
10.4.3 WebRTC Latency Budget
| Stage | Latency |
|---|---|
| DVR encoding | 50-100 ms |
| RTSP to edge | 1-2 ms |
| FFmpeg decode + WebRTC encode | 30-80 ms |
| Network (edge to browser via VPN) | 100-200 ms |
| Browser decode | 20-50 ms |
| Total | ~200-430 ms |
10.5 Multi-Camera Grid Layout
10.5.1 Layout Configurations
| Layout | Cameras | Stream Used | Per-Camera Resolution | Total Bandwidth |
|---|---|---|---|---|
| 1x1 (fullscreen) | 1 | Main (subtype=0) | 960x1080 | ~4 Mbps |
| 2x2 grid | 4 | Sub (subtype=1) | 352x288 | ~4 Mbps total |
| 3x3 grid | 8+1 empty | Sub (subtype=1) | 352x288 | ~8 Mbps total |
| 4x2 grid | 8 | Sub (subtype=1) | 352x288 | ~8 Mbps total |
| Custom | User-defined | Mixed | Mixed | Sum of selected |
Smart stream selection: The dashboard automatically switches streams based on layout:
- Fullscreen single camera -> Main stream (high quality)
- Grid layout -> Sub stream (bandwidth-efficient)
- Camera clicked for fullscreen -> Dynamically switch to main stream
10.5.2 Grid Rendering
+-----------------------------------------------------------------------------+
| DASHBOARD GRID LAYOUTS |
+-----------------------------------------------------------------------------+
| |
| 1x1 Layout: 2x2 Layout: |
| +------------------------+ +----------+----------+ |
| | | | CH1 | CH2 | |
| | Camera 1 | | (sub) | (sub) | |
| | Main stream | | | | |
| | 960x1080 | +----------+----------+ |
| | ~4 Mbps | | CH3 | CH4 | |
| +------------------------+ | (sub) | (sub) | |
| | | | |
| +----------+----------+ |
| |
| 3x3 Layout (8 cameras): |
| +----------+----------+----------+ |
| | CH1 | CH2 | CH3 | |
| | (sub) | (sub) | (sub) | |
| +----------+----------+----------+ |
| | CH4 | CH5 | CH6 | |
| | (sub) | (sub) | (sub) | |
| +----------+----------+----------+ |
| | CH7 | CH8 | [Empty] | |
| | (sub) | (sub) | | |
| +----------+----------+----------+ |
| |
| Bandwidth: ~8 Mbps total for 3x3 layout (8 x ~1 Mbps sub streams) |
| |
+-----------------------------------------------------------------------------+
10.6 Bandwidth Optimization
10.6.1 Total Bandwidth Budget
| Traffic Type | Direction | Bandwidth | Notes |
|---|---|---|---|
| 8x RTSP ingestion | Edge -> DVR (local) | ~32 Mbps receive | Local LAN only |
| 8x HLS upload to cloud | Edge -> Cloud (via VPN) | ~8-16 Mbps upload | Transcoded and compressed |
| AI frames to cloud | Edge -> Cloud (via VPN) | ~2-4 Mbps upload | 1 FPS, JPEG compressed |
| Dashboard HLS playback | Cloud -> Browser | ~8 Mbps per user | Cached at CDN |
| Control/management | Bidirectional | < 1 Mbps | WebSocket, API calls |
| Total edge upload | ~10-20 Mbps | Primary concern for site bandwidth |
10.6.2 Optimization Techniques
| Technique | Savings | Implementation |
|---|---|---|
| Sub-stream for grid view | 75% bandwidth reduction | Use subtype=1 (352x288) instead of subtype=0 (960x1080) |
| H.264 copy (no re-encode) for main stream | Zero CPU overhead | -c:v copy when no format change needed |
| JPEG quality tuning for AI frames | 50-70% size reduction | Quality 70-85 depending on scene complexity |
| Frame deduplication for AI | 10-30% frame reduction | Skip frames with < 2% pixel change |
| HLS segment caching at edge | Reduces cloud upload spikes | 5-segment buffer smooths burstiness |
| Gzip compression for API/WebSocket | 60-80% reduction | Content-Encoding: gzip |
10.7 Fallback Handling
10.7.1 Stream Failure Fallback Chain
Step 1: RTSP connection fails
+-> Retry with exponential backoff (3 attempts)
+-> Try UDP transport if TCP fails
+-> Circuit breaker opens after 5 consecutive failures
|
Step 2: Stream stall detected (no frames for 10s)
+-> Kill FFmpeg process
+-> Restart with fresh connection
|
Step 3: Camera marked OFFLINE
+-> Dashboard shows "Camera Offline" placeholder
+-> HLS playlist returns 404
+-> Last known frame displayed with timestamp overlay
+-> Alert sent to operations team
|
Step 4: Camera recovers
+-> Circuit breaker transitions to HALF_OPEN
+-> Test stream pulled for 10 seconds
+-> On success: circuit CLOSED, stream resumes
+-> Dashboard auto-refreshes
10.7.2 Offline Placeholder
When a camera is offline, the HLS endpoint returns a static playlist:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ERROR: "Camera OFFLINE - Channel 1"
#EXTINF:2.000,
offline_placeholder.ts
The dashboard detects the #EXT-X-ERROR tag and displays a camera offline indicator with the last known timestamp.
10.7.3 Edge Buffer Management
The 2TB NVMe edge storage is partitioned for circular buffer operation:
| Directory | Max Size | Retention | Cleanup |
|---|---|---|---|
/data/hls/ |
20 GB | Rolling (5 segments) | Automatic via FFmpeg |
/data/buffer/ch1-ch8/ |
1.5 TB | 7 days circular | Age-based FIFO |
/data/buffer/ai_frames/ |
100 GB | 24 hours | Age-based |
/data/buffer/evidence/ |
200 GB | 30 days | Event-linked retention |
/data/logs/ |
10 GB | 30 days | Logrotate |
/data/tmp/ |
50 GB | On process exit | Cleanup on restart |
| Total reserved | ~1.88 TB | — | Fits in 2TB NVMe |
Buffer exhaustion handling:
- At 80% capacity: Alert admin, begin aggressive cleanup of old non-evidence data
- At 90% capacity: Stop non-critical buffering (AI frames), preserve HLS + evidence only
- At 95% capacity: Emergency mode — evidence-only recording, all other buffers purged
- Never delete evidence clips linked to unresolved alerts
10.7.4 DVR Full Disk Mitigation
Since the DVR disk is full (0 bytes free), the system does not rely on DVR-side recording:
| Function | Traditional | Our Design |
|---|---|---|
| Continuous recording | DVR internal HDD | Edge gateway 2TB NVMe buffer |
| Event/alert clips | DVR playback export | Cloud MinIO + S3 archival |
| Long-term storage | DVR disk rotation | AWS S3 tiered lifecycle |
| Playback | DVR web UI | Cloud dashboard with timeline |
Reference: For complete FFmpeg commands including multi-output tee muxer, frame extraction for AI, WebRTC bridge code, and the ring buffer implementation, see
video_ingestion.md— Sections 4-7.
End of Part A (Sections 1-10)
This unified technical blueprint synthesizes outputs from 11 specialist agents across 6 domain-specific design documents. For detailed implementation code, DDL, algorithms, and configuration, refer to the individual specialist documents listed in the cross-reference guide at the top of this document.
| Document | Path | Content |
|---|---|---|
| Architecture | architecture.md |
Full deployment specs, scaling, cost, failover |
| Video Ingestion | video_ingestion.md |
RTSP config, FFmpeg, edge gateway, HLS, WebRTC |
| AI Vision | ai_vision.md |
Model configs, inference pipeline, benchmarks |
| Database Schema | database_schema.md |
Complete DDL, triggers, views, RLS |
| Suspicious Activity | suspicious_activity.md |
10 detection modules, scoring engine |
| Training System | training_system.md |
Learning pipeline, quality gates, versioning |
Sentinel AI Surveillance Platform — Unified Technical Blueprint (Part B)
Document Version: 1.0 Date: 2025-01-16 Classification: Confidential — Internal Use Only Author: Technical Architecture Team
Part B Table of Contents
- Section 11: Alerting Design (Notification System)
- Section 12: Security Design
- Section 13: UX / Website Structure
- Section 14: Deployment Plan
- Section 15: Testing Plan
- Section 16: Self-Test Framework
- Section 17: Sample Self-Test Report
- Section 18: Risks and Mitigations
- Section 19: Final Implementation Roadmap
- Section 20: Final Production-Readiness Summary
Section 11: Alerting Design
11.1 Architecture Overview
The notification system employs an event-driven architecture built on Redis Pub/Sub for real-time message distribution. All detection events, system alerts, and manual triggers flow through a unified pipeline that supports dual-channel delivery via Telegram Bot API and WhatsApp Business API (Meta Official). The system is designed to ensure that critical security alerts are never lost while maintaining high performance and reliability through sophisticated rate limiting, retry logic, and dead letter queue handling.
┌──────────────────────────────────────────────────────────────────────────────┐
│ ALERTING ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ EVENT SOURCES │ │
│ │ │ │
│ │ Detection Pipeline ──▶ New person detected │ │
│ │ Face Recognition ────▶ Known/Unknown/Watchlist match │ │
│ │ System Monitors ─────▶ Camera offline, Storage full, VPN down │ │
│ │ Manual Triggers ─────▶ Operator-initiated alerts │ │
│ │ AI Anomaly Engine ───▶ Suspicious activity detected │ │
│ └──────────────────────────┬──────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ REDIS PUB/SUB │ │
│ │ │ │
│ │ Channel: alerts.critical ─── High priority, immediate process │ │
│ │ Channel: alerts.high ─── Standard priority │ │
│ │ Channel: alerts.medium ─── Batched processing │ │
│ │ Channel: system.health ─── System health events │ │
│ └──────────────┬───────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┴───────────────────────────────────────────────────┐ │
│ │ NOTIFICATION ROUTER │ │
│ │ (Python/FastAPI) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │
│ │ │ Event Parser │──▶ Rules Engine │──▶ Channel Selector │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ │
│ └──────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌───────────────┐ ┌──────────────────┐ │
│ │ TEMPLATE │ │ RATE │ │ ESCALATION │ │
│ │ RENDERER │ │ LIMITER │ │ ENGINE │ │
│ │ │ │ │ │ │ │
│ │ HTML/TXT │ │ Token Bucket │ │ 3-level timeout │ │
│ │ per channel│ │ 4-tier limits │ │ Auto-escalation │ │
│ └──────┬──────┘ └───────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └──────────────────┼─────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CHANNEL ADAPTERS │ │
│ │ │ │
│ │ ┌──────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ TELEGRAM BOT API │ │ WHATSAPP BUSINESS API │ │ │
│ │ │ │ │ │ │ │
│ │ │ - HTML formatting │ │ - Template messages │ │ │
│ │ │ - Inline keyboards │ │ - Session messages │ │ │
│ │ │ - Media groups │ │ - Interactive messages │ │ │
│ │ │ - Edit/Delete messages │ │ - Media attachments │ │ │
│ │ │ - Webhook receipts │ │ - Message status API │ │ │
│ │ └──────────┬───────────────┘ └──────────┬───────────────┘ │ │
│ └─────────────┼─────────────────────────────┼─────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Telegram │ │ WhatsApp │ │
│ │ Servers │ │ Cloud API │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SUPPORTING SERVICES │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │
│ │ │ RETRY MGR │ │ DLQ │ │ DELIVERY TRACKER │ │ │
│ │ │ Exponential │ │ Redis-backed │ │ Webhook callbacks │ │ │
│ │ │ 5 max │ │ Admin review │ │ Status dashboard │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
Key Design Principles:
| Principle | Implementation |
|---|---|
| Guaranteed delivery | At-least-once delivery via retry with exponential backoff; dead letter queue for permanent failures |
| Ordered processing | Events within a single camera stream processed in sequence; no alert reordering |
| Non-blocking | Alert generation does not block the detection pipeline; async processing via queues |
| Channel isolation | Failure in one channel (e.g., Telegram down) does not affect the other (WhatsApp continues) |
| Deduplication | 5-minute window for duplicate suppression; composite key based on camera + person + event type |
| Observability | Every notification tracked from creation through delivery with full audit trail |
11.2 Telegram Integration
11.2.1 Bot API Configuration
Telegram integration uses the official Telegram Bot API for message delivery. The bot is configured with encrypted tokens stored in HashiCorp Vault, with HTML message formatting for rich alert presentation.
| Parameter | Value | Notes |
|---|---|---|
| API Base URL | https://api.telegram.org/bot<TOKEN>/ |
Standard Bot API endpoint |
| API Version | Bot API 7.x | Latest stable as of Q1 2025 |
| Token Storage | HashiCorp Vault (AES-256-GCM encrypted) | Rotated every 180 days |
| Communication | HTTPS POST + WebSocket fallback | TLS 1.3 required for all calls |
| Message Format | HTML subset | <b>, <i>, <code>, <pre>, <a href> tags supported |
| Max Message Size | 4096 characters per message | Longer messages auto-split into parts |
| Media Size Limit (Image) | 10 MB per image | Processed via Pillow for compression |
| Media Size Limit (Video) | 50 MB per video | Processed via FFmpeg for re-encoding |
| Media Group Limit | Up to 10 items per media group | Album delivery for multi-image alerts |
| Global Rate Limit | 30 messages per second | Across all chats |
| Per-Chat Rate Limit | 1 message per second | Per conversation throttling |
| Webhook Endpoint | /webhooks/telegram |
Receives delivery receipts and callback queries |
11.2.2 Bot Features and Capabilities
Inline Keyboards: Every alert message includes contextual action buttons that allow operators to respond directly from Telegram without opening the web dashboard.
| Keyboard Type | Buttons | Actions |
|---|---|---|
| Standard Alert | Acknowledge / View Live / Details | Confirm receipt, open stream, view full info |
| Watchlist Alert | Acknowledge / View Live / Escalate / Details | Includes escalation for watchlist matches |
| Blacklist Alert | ACKNOWLEDGE NOW / View Live / Dispatch Security / Escalate / Details | Highest priority actions for blacklist |
| Escalation Notice | Acknowledge / View Original Alert | Acknowledge escalated alert or view source |
| System Alert | Acknowledge / View Dashboard / Details | System-level alert actions |
Media Groups: When an alert contains multiple evidence images (up to 10), they are sent as a Telegram media group (album). This presents all related images in a single scrollable gallery rather than individual messages, reducing chat clutter.
Webhook Receipts: Telegram delivers message status updates via webhooks:
| Webhook Type | Trigger | Action |
|---|---|---|
message |
Bot receives a command | Process command (e.g., /status, /acknowledge) |
callback_query |
User clicks inline button | Execute action, update message status |
edited_message |
Message edited externally | Log for audit trail |
my_chat_member |
Bot added/removed from chat | Update recipient group membership |
Chat Commands:
| Command | Description | Response |
|---|---|---|
/status |
Get system health status | Camera count, offline count, last alert time |
/acknowledge <alert_id> |
Acknowledge an alert | Confirmation or error message |
/cameras |
List all cameras and their status | Camera name, status, last seen |
/health |
Get edge gateway health | CPU, memory, disk, VPN status |
/help |
Show available commands | Command reference |
11.2.3 Security Considerations
Telegram bot tokens are among the most sensitive credentials in the system. The following security measures are implemented:
| Measure | Implementation |
|---|---|
| Encryption at rest | AES-256-GCM in Vault |
| Token rotation | Every 180 days or immediately on compromise suspicion |
| Rotation procedure | 1) Generate new token via BotFather, 2) Update Vault, 3) Notify services to hot-reload, 4) 5-minute grace period, 5) Revoke old token |
| IP allowlisting | Webhook endpoint accepts only Telegram IP ranges |
| Webhook secret | HMAC verification on incoming webhook payloads |
| No token logging | Tokens never appear in application logs |
| No token in code | Tokens injected via Vault at runtime |
11.3 WhatsApp Business API Integration
11.3.1 Meta Cloud API Configuration
WhatsApp integration uses Meta's official Cloud API (Business Platform), which provides a reliable, enterprise-grade messaging channel. This requires a verified Meta Business account and pre-approved message templates for proactive messaging.
| Parameter | Value | Notes |
|---|---|---|
| API Base URL | https://graph.facebook.com/v18.0/ |
Meta Graph API v18.0 minimum |
| Authentication | Permanent Access Token | Scoped to WhatsApp Business Management |
| Token Storage | HashiCorp Vault (AES-256-GCM encrypted) | Rotated every 180 days |
| Phone Number ID | Dedicated business phone number | Not shared with other WhatsApp uses |
| Business Account | Verified Meta Business Account | Required for template message approval |
| Message Types | Template messages + Session messages | Template for first contact; session for replies |
| Media Size Limit (All) | 16 MB per file | Stricter than Telegram; aggressive compression needed |
| Supported Media | JPEG, PNG, MP4 (H.264), PDF, Audio | Format validation before upload |
| Global Rate Limit | 80 messages per second | Across all recipients |
| Per-Recipient Rate Limit | 20 messages per minute | Per WhatsApp ID throttling |
| Webhook Endpoint | /webhooks/whatsapp |
Receives message status updates |
11.3.2 Message Types
Template Messages: Pre-approved message templates are required for any proactive (business-initiated) message. Templates must be created and submitted for approval in Meta Business Manager. Each template contains named parameters that are dynamically populated at send time.
| Template Name | Purpose | Parameters | Approval Status |
|---|---|---|---|
person_detected_known |
Known person detected | name, role, camera, date, time, confidence, alert_id | Approved |
person_detected_unknown |
Unknown person alert | camera, date, time, confidence | Approved |
watchlist_match |
Person on watchlist detected | name, watchlist_type, camera, date, time | Approved |
blacklist_alert |
Blacklisted person detected | name, camera, date, time | Approved |
suspicious_activity |
Suspicious behavior detected | activity_type, camera, date, time, confidence | Approved |
system_alert |
System health alert | message, timestamp, severity | Approved |
escalation_notice |
Alert escalation notification | alert_id, level, summary, elapsed_minutes | Approved |
daily_digest |
Daily summary of activity | date, total_detections, total_alerts, top_cameras | Approved |
test_message |
System test | timestamp | Approved |
Session Messages: Within a 24-hour window after a user sends a message to the business, free-form session messages can be sent without template restrictions. This is used for:
- Acknowledgment confirmations
- Escalation follow-ups
- Interactive conversations initiated by the recipient
- Quick reply responses
11.3.3 Webhook Event Handling
| Webhook Event | Trigger | System Action |
|---|---|---|
messages.delivered |
Message delivered to device | Update delivery status to delivered |
messages.read |
Recipient read the message | Update delivery status to read |
messages.failed |
Message delivery failed | Trigger retry or move to DLQ |
message_reaction |
Recipient reacted to message | Log for engagement metrics |
account_alerts |
Meta account issue | Alert admin, review account status |
template_category_update |
Template status change | Update template catalog |
11.4 Alert Routing Rules Engine
11.4.1 Condition Types
The routing engine evaluates 9 distinct condition types to determine which recipients receive which alerts through which channels. Multiple conditions can be combined with AND/OR logic for precise targeting.
| # | Condition Type | Description | Example Values | Operators |
|---|---|---|---|---|
| 1 | camera |
Source camera identifier | "CAM-01", "CAM-02", "entrance-cam" | equals, in, not_in |
| 2 | person |
Detected known person | "John Smith", "Jane Doe" | equals, in, not_in |
| 3 | role |
Person role category | "employee", "visitor", "vendor", "contractor", "security" | equals, in |
| 4 | event_type |
Type of detection event | "person_detected", "unknown_person", "suspicious_activity", "crowd_gathering", "camera_tamper" | equals, in |
| 5 | zone |
Detection zone name | "entrance", "restricted_area", "parking", "lobby", "warehouse" | equals, in |
| 6 | time |
Time of day range | "08:00-18:00", "22:00-06:00" | between, not_between |
| 7 | day |
Day of week | "monday", "weekday", "weekend" | equals, in |
| 8 | severity |
Alert severity level | "critical", "high", "medium", "low", "info" | equals, in, gte |
| 9 | watchlist |
Watchlist membership | "vip", "blacklist", "authorized", "temporary_access" | equals, in |
11.4.2 Rule Structure
Each routing rule consists of conditions, actions, and metadata:
rule:
id: "rule-001"
name: "Blacklist Immediate Alert"
enabled: true
priority: 100 # Higher number = evaluated first
conditions:
operator: "AND"
conditions:
- field: "watchlist"
operator: "equals"
value: "blacklist"
- field: "severity"
operator: "in"
value: ["critical", "high"]
actions:
- channel: "telegram"
recipients: ["security_team", "management"]
template: "blacklist_alert"
media: ["image", "video"]
bypass_quiet_hours: true
priority: "high"
- channel: "whatsapp"
recipients: ["security_manager"]
template: "blacklist_alert"
media: ["image"]
bypass_quiet_hours: true
metadata:
created_by: "admin"
created_at: "2025-01-01T00:00:00Z"
last_modified: "2025-01-10T12:00:00Z"
tags: ["critical", "blacklist"]
11.4.3 Default Routing Rules
The system ships with a comprehensive set of default routing rules that cover common surveillance scenarios:
| # | Scenario | Conditions | Severity | Recipients | Channels | Media | Quiet Hours |
|---|---|---|---|---|---|---|---|
| 1 | Known employee normal hours | role=employee, time=08:00-18:00, weekday | Info | None (log only) | — | — | N/A |
| 2 | Known employee after hours | role=employee, time=18:00-08:00 | Low | Security team | Telegram | Image | Respected |
| 3 | Known visitor during hours | role=visitor, time=08:00-18:00 | Low | Reception desk | Telegram | Image | Respected |
| 4 | Unknown person detected | event_type=unknown_person | Medium | Security team | Telegram + WhatsApp | Image | Respected |
| 5 | Unknown person after hours | event_type=unknown_person, time=22:00-06:00 | High | Security team + Manager | Both | Image + Video | Bypassed |
| 6 | Watchlist match | watchlist=watchlist | High | Security team | Both | Image + Video | Respected |
| 7 | Blacklist match | watchlist=blacklist | Critical | All groups | Both (bypass quiet) | Image + Video | Bypassed |
| 8 | VIP detected | watchlist=vip | Low | Reception desk | Telegram | Image | Respected |
| 9 | Camera offline | event_type=camera_offline | High | IT team + Security team | Telegram | None | Bypassed |
| 10 | Storage > 90% | event_type=storage_warning | High | IT team + Management | Both | None | Bypassed |
| 11 | Storage > 95% | event_type=storage_critical | Critical | All groups | Both (bypass quiet) | None | Bypassed |
| 12 | VPN tunnel down | event_type=vpn_down | Critical | IT team + Management | Both (bypass quiet) | None | Bypassed |
| 13 | Suspicious activity | event_type=suspicious_activity | High | Security team | Both | Image + Video | Respected |
| 14 | Crowd gathering | event_type=crowd_gathering | Medium | Security team | Telegram | Image | Respected |
11.5 Recipient Groups and Quiet Hours
11.5.1 Recipient Group Management
Recipient groups are the primary mechanism for organizing alert destinations. Each group contains one or more contacts with specified channels.
| Group Name | Members | Primary Channel | Backup Channel | Alert Preferences | Quiet Hours |
|---|---|---|---|---|---|
| Security Team | On-site security guards | Telegram | All except info | Disabled | |
| Security Manager | Shift supervisor | Telegram | Medium and above | Disabled | |
| IT Team | Infrastructure staff | Telegram | System alerts only | Nights | |
| Management | Facility managers | Telegram | Critical only | Disabled | |
| Reception | Front desk staff | Telegram | None | Visitor-related, VIP | Disabled |
| After-Hours | On-call personnel | Telegram | High and Critical | Disabled |
Group Configuration Interface:
Groups are managed through the web dashboard at /settings/notifications/groups. Each group can be configured with:
| Setting | Description |
|---|---|
| Group name | Human-readable identifier |
| Description | Purpose of the group |
| Members | List of Telegram chat IDs and WhatsApp phone numbers |
| Default channel | Primary delivery channel |
| Alert severity filter | Minimum severity to deliver |
| Quiet hours override | Whether quiet hours apply to this group |
| Media preferences | Which media types to include |
| Max alerts per hour | Rate limit for this group |
11.5.2 Quiet Hours Configuration
Quiet hours allow suppressing non-critical alerts during configured time windows. Critical alerts always bypass quiet hours — this is a non-configurable safety measure.
quiet_hours:
enabled: false # DISABLED BY DEFAULT for security
preset: "none" # none / nights / weekends / custom
custom_schedule:
- label: "Weekday Nights"
days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
start_time: "22:00"
end_time: "06:00"
timezone: "Asia/Kolkata"
- label: "Weekend All Day"
days: ["Saturday", "Sunday"]
start_time: "00:00"
end_time: "23:59"
timezone: "Asia/Kolkata"
allowed_during_quiet: # Which severities bypass quiet hours
- "critical" # Always delivered (non-configurable)
emergency_bypass:
enabled: true
triggers:
- severity: "critical"
- tag: "emergency"
- rule_override: "bypass_quiet_hours"
notification_method: "all_channels"
suppression_behavior: "queue" # queue / discard / digest
# "queue": Hold until quiet hours end
# "discard": Drop non-critical alerts entirely
# "digest": Send summary when quiet hours end
Security Note: Quiet hours are disabled by default because the surveillance use case requires continuous awareness. Any decision to enable quiet hours must be documented with security team sign-off.
11.5.3 Per-Recipient Quiet Hours
Individual recipients can configure personal quiet hours that override group settings:
| Recipient | Personal Quiet Hours | Group Override | Effect |
|---|---|---|---|
| Security Guard A | None | Security Team (Disabled) | Receives all alerts |
| IT Manager | 23:00-07:00 | IT Team (Nights) | Matches group — no IT alerts at night |
| Manager B | 22:00-08:00 | Management (Disabled) | Personal quiet hours applied |
11.6 Message Templates
11.6.1 Telegram HTML Templates
All Telegram templates use a safe HTML subset for rich formatting with inline action keyboards.
Template: Person Detected (Known)
🔍 <b>Person Detected</b>
<b>{name}</b> ({role})
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<a href="{dashboard_url}">View in Dashboard</a>
Template: Unknown Person Detected
❓ <b>Unknown Person Detected</b>
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<i>This person is not in the database.</i>
<a href="{naming_url}">Name This Person</a>
Template: Watchlist Match
⚠️ <b>WATCHLIST ALERT</b>
<b>{name}</b>
📋 Watchlist: {watchlist_type}
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<i>This person is on a watchlist and requires attention.</i>
Template: Blacklist Alert
🚨 <b>BLACKLIST ALERT</b> 🚨
⚠️ <b>{name}</b> has been detected!
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<b>This person is BLACKLISTED. Immediate attention required.</b>
<a href="{dispatch_url}">🚨 Dispatch Security</a>
Template: Escalation Notice
⬆️ <b>Alert Escalated — Level {escalation_level}</b>
Alert #{alert_id} has been escalated.
Original: {alert_summary}
⏱️ Unacknowledged for {elapsed_minutes} minutes
Threshold: {threshold_minutes} minutes
<i>Please review immediately.</i>
Template: System Alert
⚙️ <b>System Alert</b>
{message}
🕐 {timestamp}
Severity: {severity}
<a href="{health_dashboard_url}">View System Health</a>
Template: Daily Digest
📊 <b>Daily Activity Digest — {date}</b>
👥 Persons Detected: {total_detections}
🔔 Alerts Generated: {total_alerts}
📹 Cameras Online: {cameras_online}/{cameras_total}
Top Cameras:
{camera_list}
<a href="{full_report_url}">View Full Report</a>
11.6.2 WhatsApp Template Format
WhatsApp templates use a different format — they are pre-registered with Meta and use numbered parameter substitution:
Template: person_detected_known
🔍 Person Detected
{{1}} ({{2}})
📍 Camera: {{3}}
🕐 {{4}} at {{5}}
🎯 Confidence: {{6}}%
Alert ID: {{7}}
Parameters: {{1}}=name, {{2}}=role, {{3}}=camera_name, {{4}}=date, {{5}}=time, {{6}}=confidence, {{7}}=alert_id
11.6.3 Template Variable Reference
| Variable | Description | Source | Example |
|---|---|---|---|
{name} |
Detected person's name | Person database | "John Smith" |
{role} |
Person's role | Person database | "Employee" |
{camera_name} |
Camera display name | Camera configuration | "Main Entrance" |
{date} |
Event date | Event timestamp | "2025-01-16" |
{time} |
Event time | Event timestamp | "14:32:15" |
{confidence} |
Detection confidence % | AI inference result | "97.3" |
{alert_id} |
Unique alert identifier | Alert database | "ALT-20250116-001" |
{watchlist_type} |
Watchlist category | Watchlist configuration | "Blacklist" |
{activity_type} |
Type of suspicious activity | AI classification | "Loitering" |
{severity} |
Alert severity | Rules engine | "Critical" |
{dashboard_url} |
Deep link to dashboard | System configuration | "https://..." |
{elapsed_minutes} |
Time since alert creation | System clock | "15" |
11.7 Retry Logic and Rate Limiting
11.7.1 Retry Configuration
Failed notifications are retried using an exponential backoff strategy to avoid overwhelming downstream services.
| Parameter | Value | Description |
|---|---|---|
| Maximum retries | 5 | After 5 failures, move to DLQ |
| Base delay | 2 seconds | Initial retry wait time |
| Exponential base | 2 | Delay multiplier (2^n) |
| Maximum delay | 300 seconds (5 minutes) | Cap on retry delay |
| Jitter | Up to 1 second random | Prevents thundering herd |
Retry Schedule:
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1 (initial) | Immediate | 0s |
| 2 | 2s + jitter | ~2s |
| 3 | 4s + jitter | ~6s |
| 4 | 8s + jitter | ~14s |
| 5 | 16s + jitter | ~30s |
| 6 (final) | 32s + jitter | ~62s |
| DLQ | — | After 62s total |
Retryable Errors:
| Error Code | Description | Retry? |
|---|---|---|
| Timeout | Request timed out | Yes |
| 429 Too Many Requests | Rate limited by provider | Yes (with longer delay) |
| 500 Internal Server Error | Provider error | Yes |
| 502 Bad Gateway | Provider gateway error | Yes |
| 503 Service Unavailable | Provider temporarily down | Yes |
| 409 Conflict | Request conflict | Yes |
| 401 Unauthorized | Authentication failed | No (credential issue) |
| 403 Forbidden | Permission denied | No (configuration issue) |
| 400 Bad Request | Invalid request | No (template/parameter issue) |
| Chat not found | Recipient blocked bot | No |
Non-Retryable Errors (Immediate DLQ):
- Invalid bot token (401)
- Bot blocked by user (403)
- Chat not found
- Malformed template (400)
- Message too long (after split)
- Unsupported media format
11.7.2 Circuit Breaker
Each channel adapter implements a circuit breaker to prevent cascading failures:
| Parameter | Value |
|---|---|
| Failure threshold | 10 consecutive failures |
| Open state duration | 60 seconds |
| Half-open test calls | 3 successful calls required |
| Monitoring window | 5 minutes |
Circuit States:
| State | Behavior | Transition Trigger |
|---|---|---|
Closed |
Normal operation — all requests pass | Initial state, or after half-open success |
Open |
Fast fail — no requests sent to provider | 10 consecutive failures |
Half-Open |
Limited test requests allowed | After 60-second open timeout |
11.7.3 Rate Limiting Tiers
The notification system implements multi-tier rate limiting to prevent abuse and ensure fair resource distribution:
| Tier | Limit | Scope | Burst |
|---|---|---|---|
| Global (all channels) | 200 messages/minute | Across all channels combined | 20 |
| Telegram Global | 30 messages/second | All Telegram traffic | 5 |
| Telegram Per-Chat | 1 message/second | Per conversation | 1 |
| WhatsApp Global | 80 messages/second | All WhatsApp traffic | 10 |
| WhatsApp Per-Recipient | 20 messages/minute | Per phone number | 3 |
| Per Camera Source | 30 alerts/minute | Prevents camera spam | 5 |
| Per Severity (Critical) | No limit | Critical alerts bypass rate limits | N/A |
Token Bucket Algorithm: Each tier maintains a token bucket. A token is consumed per message. Tokens replenish at the configured rate. If no tokens are available, the message is queued or rejected based on priority.
11.7.4 Alert Deduplication
Alerts are deduplicated to prevent notification spam when the same event triggers repeatedly:
| Deduplication Key | Components | Window | Action on Duplicate |
|---|---|---|---|
| Known person | camera_id + person_id + event_type |
5 minutes | Suppress, append counter to original |
| Unknown person | camera_id + event_type |
5 minutes | Suppress, append counter to original |
| System alert | alert_type + source_id |
15 minutes | Suppress, update existing message |
| Watchlist match | camera_id + person_id + watchlist_id |
10 minutes | Suppress, append counter |
When a duplicate is detected, the original message is updated with a counter (e.g., "+3 more detections"), avoiding a flood of similar messages.
11.8 Escalation Rules
11.8.1 Escalation Thresholds
When an alert goes unacknowledged, it automatically escalates through up to 3 levels, each with increasing urgency and broader recipient distribution.
| Severity | Level 1 (Primary) | Level 2 (Secondary) | Level 3 (Final) |
|---|---|---|---|
| Critical | 5 minutes | 10 minutes | 20 minutes |
| High | 15 minutes | 30 minutes | 60 minutes |
| Medium | 30 minutes | 60 minutes | 120 minutes |
| Low | 60 minutes | 120 minutes | 240 minutes |
| Info | Never | Never | Never |
11.8.2 Escalation Actions per Level
| Level | Name | Notification Action | Recipient Expansion | Severity Change |
|---|---|---|---|---|
| 0 | Original | Standard routing rules | Primary recipients only | Original severity |
| 1 | Primary | Re-notify with escalation prefix | Add management group | Increase by one level |
| 2 | Secondary | Force all channels, bypass quiet hours | Add all groups, increase severity | Increase by one level |
| 3 | Final | All-hands notification, include audit trail | All configured recipients | Set to Critical |
Escalation Cancellation: Acknowledgment cancels ALL pending escalation timers for an alert. Acknowledgment can occur via:
- Telegram inline "Acknowledge" button click
- WhatsApp quick reply "Ack"
- Web dashboard "Acknowledge" button
- REST API
POST /api/v1/alerts/{id}/acknowledge - Chat command
/acknowledge {alert_id}
11.8.3 Escalation Notification Template
⬆️ <b>ESCALATION — Level {level}</b>
Original Alert: {alert_summary}
Alert ID: {alert_id}
First Detected: {first_detected_time}
Current Time: {current_time}
Unacknowledged: {elapsed_minutes} minutes
Escalation Threshold: {threshold_minutes} minutes
This alert has been escalated because it has not been acknowledged.
Please review immediately.
<a href="{acknowledge_url}">✅ Acknowledge Now</a>
<a href="{view_alert_url}">👁 View Details</a>
11.9 Media Attachment Handling
11.9.1 Media Processing Pipeline
When an alert includes media (snapshot images or video clips), a multi-stage processing pipeline ensures the media meets channel-specific requirements:
Original Media (from detection)
│
▼
┌──────────────────┐
│ 1. Store Original│ ──▶ MinIO/S3 (full resolution archival)
│ in Storage │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ 2. Process for │
│ Telegram │
└────────┬─────────┘
│
├──▶ Image: Resize 1280x720, JPEG quality 85, max 10 MB
├──▶ Video: H.264, 1280x720, max 50 MB, max 60 seconds
└──▶ Media Group: Each image < 10 MB, max 10 items
│
▼
┌──────────────────┐
│ 3. Process for │
│ WhatsApp │
└────────┬─────────┘
│
├──▶ Image: Resize 1600x900, JPEG quality 80, max 16 MB
└──▶ Video: H.264, 1280x720, max 16 MB, max 60 seconds
11.9.2 Image Processing Details
| Step | Operation | Parameters |
|---|---|---|
| 1. Load | Open source image | Pillow (PIL) |
| 2. Convert | Convert to RGB | Drop alpha channel if present |
| 3. Resize | Scale to target dimensions | Lanczos resampling |
| 4. Compress | JPEG encoding | Quality: 85 (Telegram), 80 (WhatsApp) |
| 5. Check size | Verify file size under limit | If over limit, reduce quality iteratively |
| 6. Fallback | Aggressive compression | If quality < 50 and still over limit, reduce dimensions |
Iterative Quality Reduction:
def compress_image_to_limit(image, size_limit_mb, channel):
quality = 85 if channel == 'telegram' else 80
min_quality = 40
while quality >= min_quality:
buffer = io.BytesIO()
image.save(buffer, format='JPEG', quality=quality, optimize=True)
size_mb = buffer.tell() / (1024 * 1024)
if size_mb <= size_limit_mb:
return buffer.getvalue()
quality -= 5
# If still over limit, reduce dimensions by 25% and retry
new_size = (int(image.width * 0.75), int(image.height * 0.75))
image = image.resize(new_size, Image.LANCZOS)
return compress_image_to_limit(image, size_limit_mb, channel)
11.9.3 Video Processing Details
Videos are processed with FFmpeg using two-pass encoding to achieve the target bitrate calculated from the size limit:
# Calculate target bitrate: (size_limit_bytes * 8) / duration_seconds
# Example: 16 MB limit, 10 second clip = (16*1024*1024*8) / 10 = ~13.4 Mbps
ffmpeg -i input.mp4 \
-c:v libx264 \
-b:v 10M \ # Target video bitrate
-maxrate 12M \ # Maximum bitrate
-bufsize 20M \ # Buffer size
-vf "scale=1280:720:force_original_aspect_ratio=decrease" \
-c:a aac -b:a 128k \ # Audio encoding
-movflags +faststart \ # Web-optimized
-preset fast \ # Encoding speed/quality tradeoff
-y output.mp4
11.10 Delivery Tracking
11.10.1 Delivery Status Lifecycle
Every notification progresses through a well-defined status lifecycle, tracked in the database for audit and troubleshooting:
| Status | Description | Terminal? |
|---|---|---|
pending |
Queued, waiting to be sent | No |
processing |
Currently being sent to provider | No |
sent |
API request to provider succeeded | No |
delivered |
Provider confirmed delivery to device | No |
read |
Recipient opened/read the message | No |
engaged |
User interacted (button click, reaction) | Yes |
failed |
Permanently failed (non-retryable error) | Yes |
retrying |
Scheduled for retry attempt | No |
dead_letter |
Moved to DLQ after all retries exhausted | Yes |
suppressed |
Blocked by quiet hours or deduplication | Yes |
cancelled |
Cancelled (e.g., acknowledged before send) | Yes |
expired |
Message TTL expired before delivery | Yes |
Status Transitions:
pending → processing → sent → delivered → read → engaged
│ │ │ │
▼ ▼ ▼ ▼
retrying cancelled failed suppressed
│
▼
dead_letter
11.10.2 Dead Letter Queue (DLQ)
Failed notifications that exhaust all retry attempts are moved to a Redis-backed Dead Letter Queue. Admin users can review and manage DLQ entries through the web dashboard.
| DLQ Feature | Description |
|---|---|
| Storage | Redis sorted set, ordered by failure timestamp |
| Retention | 30 days |
| View | Filterable by channel, error type, date range |
| Actions | Retry individual, Retry all (batch), Discard, Export |
| Alert | Daily digest of DLQ count; alert if > 10 entries |
| Auto-retry | Optional: automatically retry DLQ entries every 6 hours |
11.11 API Endpoints Summary
11.11.1 REST Endpoints (13 endpoints)
| # | Method | Endpoint | Purpose | Auth |
|---|---|---|---|---|
| 1 | GET | /api/v1/notifications/rules |
List all routing rules | Admin |
| 2 | POST | /api/v1/notifications/rules |
Create new routing rule | Admin |
| 3 | GET | /api/v1/notifications/rules/{id} |
Get specific rule | Admin |
| 4 | PUT | /api/v1/notifications/rules/{id} |
Update routing rule | Admin |
| 5 | DELETE | /api/v1/notifications/rules/{id} |
Delete routing rule | Admin |
| 6 | GET | /api/v1/notifications/templates |
List message templates | Admin |
| 7 | POST | /api/v1/notifications/templates |
Create/update template | Admin |
| 8 | GET | /api/v1/notifications/delivery-status/{alert_id} |
Get delivery status for alert | Operator+ |
| 9 | GET | /api/v1/notifications/{id}/status |
Single notification status | Operator+ |
| 10 | POST | /api/v1/notifications/{id}/retry |
Manual retry of failed notification | Admin |
| 11 | GET | /api/v1/notifications/dlq |
List dead letter queue | Admin |
| 12 | POST | /api/v1/notifications/dlq/retry-all |
Retry all DLQ entries | Admin |
| 13 | POST | /api/v1/notifications/dlq/clear |
Clear all DLQ entries | Admin |
11.11.2 Alert Management Endpoints
| # | Method | Endpoint | Purpose | Auth |
|---|---|---|---|---|
| 1 | GET | /api/v1/alerts |
List alerts with filters | Operator+ |
| 2 | GET | /api/v1/alerts/{id} |
Get single alert details | Operator+ |
| 3 | POST | /api/v1/alerts/{id}/acknowledge |
Acknowledge alert | Operator+ |
| 4 | POST | /api/v1/alerts/{id}/resolve |
Resolve alert | Operator+ |
| 5 | POST | /api/v1/alerts/{id}/ignore |
Ignore alert | Operator+ |
| 6 | POST | /api/v1/alerts/{id}/false-positive |
Mark as false positive | Operator+ |
| 7 | POST | /api/v1/alerts/bulk/acknowledge |
Bulk acknowledge | Operator+ |
| 8 | POST | /api/v1/alerts/bulk/ignore |
Bulk ignore | Operator+ |
11.11.3 WebSocket Endpoints (2 endpoints)
| Endpoint | Purpose | Authentication |
|---|---|---|
WS /api/v1/notifications/live |
Real-time notification stream for connected clients | JWT token in query parameter |
WS /api/v1/alerts/stream |
Live alert feed for operator dashboards | JWT token in query parameter |
11.11.4 Webhook Endpoints (2 endpoints)
| Endpoint | Source | Purpose |
|---|---|---|
POST /webhooks/telegram |
Telegram servers | Receive delivery receipts, callback queries, chat events |
POST /webhooks/whatsapp |
Meta servers | Receive message status updates, incoming messages |
Webhook Security:
| Measure | Implementation |
|---|---|
| Telegram | HMAC-SHA256 signature verification using bot token |
| SHA-256 signature verification using app secret | |
| IP allowlisting | Only accept requests from Telegram/Meta IP ranges |
| Replay protection | Reject messages with timestamps older than 5 minutes |
| Rate limiting | 100 requests per minute per source IP |
Section 12: Security Design
12.1 Security Architecture Overview
The Sentinel AI Surveillance Platform implements defense-in-depth security across seven distinct layers. Every component — from network perimeter to data storage — has been designed with security as a primary consideration, reflecting the sensitive nature of surveillance data, biometric information, and the critical safety function the system performs.
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEFENSE IN DEPTH ARCHITECTURE │
│ │
│ LAYER 1: PERIMETER │
│ ───────────────── │
│ AWS WAF v2 │ Geo-restriction │ DDoS protection │ Rate limiting │
│ │
│ LAYER 2: TRANSPORT │
│ ───────────────── │
│ TLS 1.3 │ mTLS internal │ WireGuard ChaCha20-Poly1305 │ Certificate mgmt │
│ │
│ LAYER 3: AUTHENTICATION & AUTHORIZATION │
│ ───────────────────────────────────────── │
│ Argon2id │ JWT ES256 │ TOTP MFA │ RBAC 4 roles │ API keys │
│ │
│ LAYER 4: APPLICATION SECURITY │
│ ──────────────────────────── │
│ Input validation │ Parameterized queries │ CSP │ CSRF │ CORS │ File upload │
│ │
│ LAYER 5: DATA SECURITY │
│ ──────────────────── │
│ AES-256-GCM at rest │ Field-level encryption │ Signed URLs │ Key rotation │
│ │
│ LAYER 6: NETWORK SEGMENTATION │
│ ─────────────────────────── │
│ VPC private subnets │ Security groups │ Network Policies │ Firewall rules │
│ │
│ LAYER 7: AUDIT & MONITORING │
│ ───────────────────────── │
│ Hash-chain audit log │ Real-time alerts │ CloudTrail │ Flow Logs │
└──────────────────────────────────────────────────────────────────────────────┘
12.2 SSL/TLS Configuration
12.2.1 Protocol and Cipher Suite Requirements
All external-facing services enforce strong TLS configuration with modern cipher suites:
| Setting | Value | Rationale |
|---|---|---|
| Minimum TLS Version | TLS 1.2 | Fallback for older clients; TLS 1.3 preferred |
| Preferred TLS Version | TLS 1.3 | Fastest, most secure handshake |
| Cipher Suites (TLS 1.2) | ECDHE-ECDSA-AES256-GCM-SHA384 |
Forward secrecy, AES-GCM authenticated encryption |
| Cipher Suites (TLS 1.2) | ECDHE-RSA-AES256-GCM-SHA384 |
Same with RSA certificates |
| Cipher Suites (TLS 1.2) | ECDHE-ECDSA-CHACHA20-POLY1305 |
Mobile-optimized cipher |
| Cipher Suites (TLS 1.2) | ECDHE-RSA-CHACHA20-POLY1305 |
Mobile-optimized with RSA |
| Cipher Suites (TLS 1.3) | TLS_AES_256_GCM_SHA384 |
Mandatory TLS 1.3 cipher |
| Cipher Suites (TLS 1.3) | TLS_CHACHA20_POLY1305_SHA256 |
Alternative TLS 1.3 cipher |
| Disabled Ciphers | CBC mode, RC4, 3DES, DES, MD5, SHA1, RSA key exchange (no forward secrecy) | Known weaknesses |
| HSTS | max-age=63072000; includeSubDomains; preload |
2-year HSTS with preload eligibility |
| OCSP Stapling | Enabled | Reduces certificate validation latency |
| Certificate Provider | Let's Encrypt (ACME v2) | Free, automated, trusted |
| Auto-renewal | 60 days before expiry | Ensures 30+ day buffer |
| Certificate Transparency | Required | All certificates publicly logged |
12.2.2 mTLS for Internal Service Communication
All inter-service communication uses mutual TLS (mTLS) with client certificate verification. This means both the client and server must present valid certificates signed by the internal Certificate Authority.
| Parameter | Value |
|---|---|
| Internal CA | Self-managed ECDSA P-256 CA |
| Certificate lifetime | 90 days (auto-rotated) |
| Verification mode | Required (reject if no client cert) |
| Revocation | CRL + OCSP |
| Service identity | SPIFFE URI in certificate Subject Alternative Name |
Benefits of mTLS:
- Even if network boundaries are breached, unauthorized services cannot access internal APIs
- Every service-to-service call is authenticated and encrypted
- Certificates provide strong service identity (not just IP-based)
- No shared secrets between services (except Vault tokens)
12.2.3 TLS Configuration Code Example
# FastAPI TLS configuration
from fastapi import FastAPI
from uvicorn.config import Config
app = FastAPI()
# TLS settings for uvicorn
ssl_config = {
"ssl_keyfile": "/certs/server.key",
"ssl_certfile": "/certs/server.crt",
"ssl_ca_certs": "/certs/ca.crt", # For mTLS
"ssl_cert_reqs": ssl.CERT_REQUIRED, # Require client cert
"ssl_min_version": ssl.TLSVersion.TLSv1_2,
"ssl_ciphers": "ECDHE-ECDSA-AES256-GCM-SHA384:"
"ECDHE-RSA-AES256-GCM-SHA384:"
"ECDHE-ECDSA-CHACHA20-POLY1305:"
"ECDHE-RSA-CHACHA20-POLY1305",
}
12.3 Authentication
12.3.1 Password Policy
| Requirement | Value | Enforcement |
|---|---|---|
| Minimum length | 12 characters | Hard validation |
| Complexity | At least one uppercase, one lowercase, one digit, one special character | Regex validation |
| Password history | Last 12 passwords cannot be reused | Database check |
| Hashing algorithm | Argon2id (memory-hard, resistant to GPU cracking) | Passwords never stored in plaintext |
| Argon2id parameters | Time cost: 3, Memory: 64MB, Parallelism: 4 | Tuned for 500ms hash time |
| HaveIBeenPwned check | Enabled for all new passwords | k-anonymity API (no full password sent) |
| Maximum age | 90 days | Configurable; reminder at 75 days |
| Lockout after failures | 5 failed attempts | 30-minute lockout |
| Password change | Users cannot reuse current password | Immediate validation |
12.3.2 JWT Token Configuration
| Parameter | Value | Notes |
|---|---|---|
| Signing algorithm | ES256 (ECDSA with P-256 curve) | Smaller signatures than RS256; same security |
| Access token lifetime | 15 minutes | Short-lived for security |
| Refresh token lifetime | 7 days | Long-lived but revocable |
| Key rotation | Every 180 days | Dual-key support for zero-downtime rotation |
| Key storage | HashiCorp Vault | Private key never exposed to application filesystem |
| Token binding | Session ID + browser fingerprint | Detects token theft/reuse |
| Claims | sub, iss, aud, exp, iat, jti, role, permissions, mfa_verified |
Standard + custom claims |
| Issuer | sentinel-ai |
Verified by all services |
| Audience | sentinel-api |
Scope-limited |
JWT Token Structure:
{
"header": {
"alg": "ES256",
"typ": "JWT",
"kid": "key-2025-01"
},
"payload": {
"sub": "user-uuid-here",
"iss": "sentinel-ai",
"aud": "sentinel-api",
"exp": 1705500000,
"iat": 1705499100,
"jti": "unique-token-id",
"role": "operator",
"permissions": ["alerts:view", "alerts:acknowledge", "cameras:view"],
"mfa_verified": true,
"session_id": "sess-uuid-here"
}
}
12.3.3 Multi-Factor Authentication (MFA)
| Parameter | Value |
|---|---|
| Method | TOTP (Time-based One-Time Password) per RFC 6238 |
| Issuer label | "Sentinel AI Surveillance" |
| Algorithm | SHA-1 (for compatibility) |
| Digit length | 6 digits |
| Time step | 30 seconds |
| Valid window | 1 step before and after current (3-step tolerance) |
| Recovery codes | 10 single-use codes generated at setup |
| Enforced for | Super Admin, Admin roles (mandatory) |
| Optional for | Operator, Viewer roles (recommended) |
| QR code format | otpauth://totp/Sentinel%20AI:{username}?secret={secret}&issuer=Sentinel%20AI |
MFA Enforcement Matrix:
| Role | MFA Required | Can Disable |
|---|---|---|
| Super Admin | Yes | No |
| Admin | Yes | No |
| Operator | No (Recommended) | Yes |
| Viewer | No | Yes |
12.4 Role-Based Access Control (RBAC)
12.4.1 Role Definitions
| Role | Level | Description | Typical Users | Count |
|---|---|---|---|---|
| Super Admin | L1 | Full system access; can manage other admins | CISO, CTO, Platform Lead | 1-2 |
| Admin | L2 | Administrative functions; day-to-day management | Security Manager, IT Manager | 2-4 |
| Operator | L3 | Day-to-day surveillance operations | Security guards, SOC analysts | 5-20 |
| Viewer | L4 | Read-only access for review and audit | Auditors, Management | 2-10 |
12.4.2 Permission Matrix (30+ Permissions)
| Permission | Super Admin | Admin | Operator | Viewer |
|---|---|---|---|---|
users:full_access |
Y | N | N | N |
users:manage (create/edit/deactivate) |
Y | Y | N | N |
users:view (list, details) |
Y | Y | Y | Y |
users:reset_password |
Y | Y | N | N |
users:reset_mfa |
Y | Y | N | N |
cameras:full_access |
Y | N | N | N |
cameras:manage (add/edit/remove) |
Y | Y | N | N |
cameras:view (list, status) |
Y | Y | Y | Y |
cameras:control (PTZ, restart stream) |
Y | Y | Y | N |
cameras:configure_zones |
Y | Y | N | N |
alerts:manage (edit rules, bulk actions) |
Y | Y | N | N |
alerts:view (list, filter, search) |
Y | Y | Y | Y |
alerts:acknowledge |
Y | Y | Y | N |
alerts:resolve |
Y | Y | Y | N |
alerts:mark_false_positive |
Y | Y | Y | N |
persons:full_access |
Y | N | N | N |
persons:manage (create/edit/delete) |
Y | Y | N | N |
persons:view (gallery, profiles) |
Y | Y | Y | Y |
persons:name_unknown |
Y | Y | Y | N |
persons:merge |
Y | Y | Y | N |
watchlists:manage (create/edit/delete) |
Y | Y | N | N |
watchlists:view (list, members) |
Y | Y | Y | Y |
watchlists:add_remove_members |
Y | Y | Y | N |
ai_settings:manage (change defaults) |
Y | Y | N | N |
ai_settings:view (see current settings) |
Y | Y | Y | Y |
ai_settings:adjust (operator adjustments) |
Y | Y | Y | N |
reports:full_access |
Y | N | N | N |
reports:view (all reports) |
Y | Y | Y | Y |
reports:export |
Y | Y | Y | N |
system:full_access |
Y | N | N | N |
system:manage (config changes) |
Y | Y | N | N |
system:view (health, status) |
Y | Y | Y | Y |
audit:view (audit logs) |
Y | Y | N | N |
notifications:manage (routing rules) |
Y | Y | N | N |
storage:manage (retention policies) |
Y | Y | N | N |
storage:view (usage, reports) |
Y | Y | Y | Y |
privacy:manage (GDPR actions) |
Y | Y | N | N |
privacy:view (consent status) |
Y | Y | Y | Y |
12.4.3 Resource-Level Permissions
Beyond global permissions, the system supports resource-level access control:
| Resource Type | Granularity | Example |
|---|---|---|
| Cameras | Per-camera access | Operator A can only view CAM-01, CAM-02 |
| Zones | Per-zone access | Operator B can only view "entrance" zone |
| Alerts | Per-camera origin | Viewer can only see alerts from specific cameras |
| Persons | Per-department | HR can only view employee records |
| Watchlists | Per-watchlist | Security can only view "blacklist", not "vip" |
12.5 VPN and Network Security
12.5.1 WireGuard VPN Configuration
WireGuard provides the encrypted tunnel between cloud infrastructure and the edge site:
| Parameter | Value | Notes |
|---|---|---|
| Protocol | WireGuard | Modern, simple, fast VPN |
| Port | UDP 51820 | Single port, firewall-friendly |
| Authentication | Ed25519 key pairs + Preshared Key (PSK) | Defense in depth |
| Encryption | ChaCha20-Poly1305 | Fast on hardware without AES-NI |
| Key exchange | Curve25519 elliptic curve | 128-bit security |
| Tunnel network | 10.200.0.0/24 | Dedicated VPN subnet |
| Cloud endpoint | 10.200.0.1/32 | Single IP for cloud side |
| Edge endpoint | 10.200.0.2/32 | Single IP for edge side |
| AllowedIPs (cloud) | 10.200.0.2/32, 192.168.29.0/24 | Edge + camera network only |
| AllowedIPs (edge) | 10.100.0.0/16, 10.200.0.0/24 | Full cloud VPC + VPN |
| Keepalive | 25 seconds | Prevents NAT timeout |
| Key rotation | 365 days | Annual rotation via maintenance window |
12.5.2 Network Segmentation Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ NETWORK ARCHITECTURE │
│ │
│ INTERNET │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ AWS WAF │ │
│ │ + ALB │ │
│ └──────┬───────┘ │
│ │ │
│ ═══════╪════════════════ AWS CLOUD VPC: 10.100.0.0/16 ═══════════════════ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ PUBLIC SUBNET: 10.100.1.0/24 │ │
│ │ │ - ALB (Application Load Balancer) │ │
│ │ │ - NAT Gateway │ │
│ │ │ - WireGuard VPN Gateway (10.200.0.1) │ │
│ │ │ - Bastion Host (emergency SSH, admin IPs only) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ └────▶┌──────────────────────────────────────────────────────┐ │
│ │ PRIVATE SUBNET: 10.100.2.0/24 (App Tier) │ │
│ │ - EKS Worker Nodes (API, AI, Web pods) │ │
│ │ - Stream Ingestion Service │ │
│ │ - Alert Engine │ │
│ │ - Notification Service │ │
│ └──────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ DATA SUBNET: 10.100.3.0/24 (No Internet) │ │
│ │ │ - RDS PostgreSQL (Multi-AZ) │ │
│ │ │ - ElastiCache Redis Cluster │ │
│ │ │ - Amazon MSK Kafka │ │
│ │ │ - NO INTERNET ACCESS (VPC endpoints only) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ MONITORING SUBNET: 10.100.4.0/24 │ │
│ │ │ - Prometheus, Grafana, Alertmanager │ │
│ │ │ - Loki (log aggregation) │ │
│ │ │ - Jaeger (distributed tracing) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ════════════╪══════════════════════════════════════════════════════════ │
│ │ │
│ │ WireGuard VPN Tunnel (UDP 51820) │
│ │ │
│ ════════════╪══════════════════════════════════════════════════════════ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ EDGE GATEWAY: 192.168.29.5/24 (Intel NUC) │ │
│ │ │ OS: Ubuntu Server 22.04 LTS (minimal) │ │
│ │ │ - Docker Compose stack │ │
│ │ │ - WireGuard Client (10.200.0.2) │ │
│ │ │ - Local MinIO (hot storage) │ │
│ │ │ - Redis (local cache) │ │
│ │ │ - Video Capture Service │ │
│ │ │ - AI Inference (edge models) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ ┌─────────────────────────┴──────────────────────┐ │
│ │ │ CAMERA LAN: 192.168.29.0/24 │ │
│ │ │ - CP PLUS DVR: 192.168.29.200 (8 channels) │ │
│ │ │ - RTSP streams on port 554 │ │
│ │ │ - NO INTERNET ACCESS │ │
│ │ │ - NO ROUTE TO CLOUD (only via edge gateway) │ │
│ │ └────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
12.5.3 Firewall Rules
Edge Gateway Firewall (iptables):
| Direction | Protocol | Port | Source | Destination | Action | Purpose |
|---|---|---|---|---|---|---|
| IN | TCP | 22 | Admin IP range only | Edge gateway | ACCEPT | SSH management |
| IN | UDP | 51820 | Cloud VPN IP | Edge gateway | ACCEPT | WireGuard tunnel |
| IN | TCP | 8080 | Local LAN only | Edge gateway | ACCEPT | Admin UI |
| IN | — | — | Any | Edge gateway | DROP | Default deny |
| OUT | TCP | 443 | Edge gateway | AWS S3 endpoint | ACCEPT | Cloud storage sync |
| OUT | UDP | 51820 | Edge gateway | Cloud VPN IP | ACCEPT | WireGuard tunnel |
| OUT | TCP | 8080 | Edge gateway | Local LAN | ACCEPT | Internal services |
| OUT | — | — | Edge gateway | Internet | DROP | No direct internet |
Cloud Firewall (AWS Security Groups):
| Direction | Protocol | Port | Source | Action | Purpose |
|---|---|---|---|---|---|
| IN | TCP | 443 | 0.0.0.0/0 | ACCEPT | Public HTTPS |
| IN | UDP | 51820 | Edge gateway IP | ACCEPT | WireGuard |
| IN | TCP | 5432 | App security group | ACCEPT | PostgreSQL |
| IN | TCP | 6379 | App security group | ACCEPT | Redis |
| IN | TCP | 9092 | App security group | ACCEPT | Kafka |
| IN | TCP | 22 | Admin IPs only | ACCEPT | Bastion SSH |
| IN | — | — | Any | DROP | Default deny |
12.6 Secret Management
12.6.1 Vault Integration
All secrets are stored in HashiCorp Vault with automatic rotation policies:
| Secret Type | Encryption | Rotation Frequency | Rotation Method | Access Pattern |
|---|---|---|---|---|
| Database passwords | AES-256-GCM | 90 days | Terraform + Vault dynamic credentials | Short-lived (1-hour TTL) |
| JWT signing keys | AES-256-GCM | 180 days | Dual-key grace period | Zero-downtime rotation |
| Internal API keys | AES-256-GCM | 90 days | Zero-downtime rotation | Automated |
| Telegram bot tokens | AES-256-GCM | 180 days | Regenerate via BotFather | Semi-automated |
| WhatsApp API tokens | AES-256-GCM | 180 days | Regenerate via Meta Business Manager | Semi-automated |
| DVR credentials | AES-256-GCM | 180 days | Manual via DVR web UI | Manual |
| TLS certificates | ACME auto | 60 days | cert-manager + Let's Encrypt | Fully automated |
| WireGuard keys | AES-256-GCM | 365 days | Maintenance window rotation | Scripted |
| Backup encryption keys | AES-256-GCM | 365 days | Re-encrypt all backups | Automated |
| Session secrets | AES-256-GCM | On security incident | Immediate revocation | Admin trigger |
12.6.2 Dynamic Database Credentials
Instead of static database passwords, the system uses Vault's dynamic credential engine:
Application → Vault (request db credentials)
│
▼
Vault creates temporary DB user
(TTL: 1 hour, auto-revoke)
│
▼
Application receives credentials
Uses them for DB connections
│
▼
After TTL expires → Vault revokes DB user
Application requests new credentials
Benefits:
- No long-lived database passwords in application configuration
- Each application instance gets unique credentials
- Automatic credential rotation without application restart
- Full audit trail of credential issuance and revocation
- Instant credential revocation on compromise
12.6.3 Field-Level Encryption
PII and biometric data in the database uses AES-256-GCM field-level encryption:
| Field Category | Example Fields | Encryption |
|---|---|---|
| Personal identification | name_encrypted, email_encrypted, phone_encrypted |
AES-256-GCM per-field |
| Employment data | employee_id_encrypted, department_encrypted |
AES-256-GCM per-field |
| Biometric data | face_encoding_encrypted (512-D vector) |
AES-256-GCM per-field |
| Media metadata | location_encrypted (GPS coordinates) |
AES-256-GCM per-field |
Encryption Architecture:
Application receives plaintext data
│
▼
[Encrypt field-by-field using Vault KMS]
│
▼
Store ciphertext in PostgreSQL
│
▼
[Decrypt only in application layer when needed]
│
▼
Decrypted data never logged, never cached
12.7 Audit Logging
12.7.1 Tamper-Resistant Hash-Chain
The audit log implements a cryptographically linked chain to ensure integrity:
| Field | Purpose | Example |
|---|---|---|
event_id |
Unique UUID for each audit event | 550e8400-e29b-41d4-a716-446655440000 |
timestamp |
ISO 8601 timestamp | 2025-01-16T14:32:15Z |
event_type |
Category of event | user_login, person_viewed, alert_acknowledged |
actor_id |
User who performed the action | user-uuid-here |
actor_role |
Role of the actor at the time | operator |
resource_type |
Type of resource accessed | person, camera, alert |
resource_id |
Specific resource identifier | person-123, cam-01 |
action |
Action performed | view, edit, delete, create |
result |
Success or failure | success, failure, denied |
ip_address |
Source IP address | 10.100.2.15 |
session_id |
Session identifier | sess-uuid-here |
previous_hash |
SHA-256 hash of the previous entry | a3f5c2... |
entry_hash |
SHA-256 hash of current entry content | b7e1d9... |
signature |
ECDSA signature of the entry hash | 30450221... |
Chain Verification: Any modification to historical entries invalidates all subsequent hashes and signatures, making tampering detectable.
12.7.2 Log Retention Policy
| Log Type | Online Retention | Archive Retention | Storage Type |
|---|---|---|---|
| Authentication events | 1 year | 6 years | WORM (Write-Once-Read-Many) |
| Authorization decisions | 1 year | 6 years | WORM |
| Person data modifications | 1 year | 6 years | WORM |
| Alert actions (ack, resolve) | 1 year | 3 years | Standard |
| Configuration changes | 2 years | 5 years | Standard |
| Security events | 1 year | 6 years | WORM |
| System health events | 90 days | 1 year | Standard |
| API access logs | 90 days | 1 year | Standard |
12.7.3 Real-Time Security Alerting
Automated detection rules trigger alerts on suspicious patterns:
| Rule ID | Rule Name | Condition | Auto-Response |
|---|---|---|---|
| SEC-001 | Brute force login | > 5 failed logins from same IP in 5 minutes | Block IP for 1 hour; alert security team |
| SEC-002 | Credential stuffing | > 10 unique usernames from same IP in 5 minutes | Block IP for 24 hours; alert security team |
| SEC-003 | Impossible travel | Logins > 500 km apart within 1 hour | Force MFA re-verification; alert security team |
| SEC-004 | Privilege escalation | > 20 admin actions in 10 minutes from new user | Alert security team; log for review |
| SEC-005 | Data exfiltration | > 1 GB downloaded by single user in 1 hour | Suspend account; alert security team |
| SEC-006 | Off-hours admin | Admin action between 22:00-06:00 | Log + notify security manager |
| SEC-007 | MFA bypass attempt | > 3 MFA failures then success without MFA | Block account; alert security team |
| SEC-008 | Suspicious media access | > 50 media downloads by non-security role | Alert security team |
| SEC-009 | Unknown device login | Login from unrecognized device fingerprint | Require MFA; notify user |
| SEC-010 | Concurrent sessions | > 3 concurrent sessions for same user | Force logout of oldest session |
12.8 Media Access Security
12.8.1 Signed URL Architecture
Media files are never served directly from object storage. All access is mediated through signed URLs:
| Parameter | Value | Notes |
|---|---|---|
| Default expiration | 5 minutes | Short-lived to prevent sharing |
| Maximum expiration | 1 hour | For bulk exports only |
| URL binding | Tied to user session | Invalidated on logout |
| Single-use option | Available for sensitive media | Blacklist incident footage |
| Access logging | Every media request logged | User ID, media ID, timestamp, IP |
| IP binding | Optional | URL valid only from requesting IP |
| Watermarking | Optional | Username/timestamp overlay on images |
Signed URL Flow:
1. User requests to view media
2. System checks: authentication + authorization + consent
3. If allowed: generate signed URL with HMAC-SHA256 signature
4. URL format: https://cdn.example.com/media/{id}?token={jwt}&sig={hmac}
5. Redirect user to signed URL
6. CDN/Object storage validates signature and expiry
7. Media served if valid; 403 if expired or invalid
8. Access logged with full context
12.8.2 Media Access Controls
| Control | Implementation |
|---|---|
| No direct S3/MinIO URLs | All access via signed URL proxy |
| Authentication required | Valid JWT session required for all media requests |
| Authorization enforced | RBAC checks per media item; camera-level permissions respected |
| Access logging | Every media request logged with user ID, media ID, timestamp, IP, session |
| DPO notification | Automatic notification for access to sensitive media (blacklist incidents) |
| Secure deletion | Overwrite with random data + verification before removal |
| Download tracking | Number of downloads per media item tracked and reported |
12.9 API Security
12.9.1 Defense Layers
| Layer | Implementation | Details |
|---|---|---|
| Rate limiting | Per-endpoint, per-user tiers | Token bucket algorithm; 100 req/min default; 10 req/min for auth endpoints |
| Input validation | Pydantic models on all endpoints | Strict type checking; reject unknown fields; max length limits |
| SQL injection prevention | Parameterized queries only | No dynamic SQL construction; ORM for all database access |
| XSS prevention | Output encoding + CSP headers | User input never rendered as HTML; Content-Security-Policy enforced |
| CSRF protection | SameSite=Strict cookies + tokens | State-changing operations require CSRF token validation |
| CORS | Restricted to known origins | No wildcard origins; explicit allowlist per environment |
| Request size limits | 10 MB default; 50 MB for media upload | Prevents DoS via large payloads |
| Request timeout | 30 seconds default | Prevents resource exhaustion |
12.9.2 Security Headers
| Header | Value | Purpose |
|---|---|---|
Strict-Transport-Security |
max-age=63072000; includeSubDomains; preload |
Enforce HTTPS for 2 years |
X-Content-Type-Options |
nosniff |
Prevent MIME-type sniffing |
X-Frame-Options |
DENY |
Prevent clickjacking |
X-XSS-Protection |
0 |
Disabled — CSP is preferred defense |
Referrer-Policy |
strict-origin-when-cross-origin |
Minimal referrer information |
Permissions-Policy |
camera=(), microphone=(), geolocation=() |
Disable browser APIs not needed |
Content-Security-Policy |
default-src 'self'; script-src 'self' 'nonce-{random}'; style-src 'self' 'unsafe-inline'; img-src 'self' blob: data: https://*.amazonaws.com; media-src 'self' blob: https://*.amazonaws.com; connect-src 'self' wss://*.example.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; |
Comprehensive CSP |
Cache-Control (API) |
no-store, no-cache, must-revalidate, proxy-revalidate |
Prevent caching of API responses |
Pragma (API) |
no-cache |
Legacy cache directive |
12.10 Session Security
| Parameter | Value | Notes |
|---|---|---|
| Cookie flags | HttpOnly; Secure; SameSite=Strict |
Full protection against XSS and CSRF |
| Access token storage | Memory only (JavaScript variable) | Never stored in localStorage |
| Access token max-age | 15 minutes | Short-lived |
| Refresh token storage | HttpOnly secure cookie |
Cannot be accessed by JavaScript |
| Refresh token max-age | 7 days | Long-lived but revocable |
| Session absolute timeout | 8 hours | Force re-login after 8 hours |
| Idle timeout | 30 minutes | Expire if no activity |
| Max concurrent sessions | 3 per user | Prevents session abuse |
| Session fixation protection | Regenerate session ID on login | Prevent fixation attacks |
| Session binding | Browser fingerprint + IP validation | Detect session theft |
| Force logout capability | Admin can revoke all sessions for any user | Immediate effect via Redis |
| Session storage | Redis with AUTH enabled | Encrypted at rest |
12.11 Data Privacy (GDPR Compliance)
12.11.1 GDPR Compliance Matrix
| GDPR Principle | Implementation Detail | Evidence |
|---|---|---|
| Lawful Basis | Legitimate interest assessment documented per processing purpose | LIA document filed with DPO |
| Data Minimization | Only facial feature embeddings (512-D vector) stored; raw images discarded after encoding | Architecture documentation |
| Purpose Limitation | Facial data used ONLY for security/safety purposes; no marketing or secondary use | Privacy policy |
| Storage Limitation | Automated retention enforcement; cryptographic deletion after expiry | Retention policy configuration |
| Accuracy | Regular review and correction procedures; user can request correction | Data correction workflow |
| Integrity & Confidentiality | AES-256-GCM encryption, RBAC access controls, audit logging | Security architecture |
| Accountability | DPO appointed; Privacy Impact Assessment completed; Records of Processing maintained | Compliance documentation |
| Transparency | Privacy notice displayed at camera entry points; privacy policy on website | Physical signage + web policy |
12.11.2 Consent Management
Consent is managed through a comprehensive lifecycle:
| Stage | Description | Transition Trigger |
|---|---|---|
pending |
Consent requested but not yet obtained | Initial system setup |
granted |
Explicit consent obtained | User signs consent form |
withdrawn |
Consent actively withdrawn | User requests deletion/stop processing |
deleted |
All data removed; audit trail only | Deletion workflow complete |
Consent Metadata:
| Field | Description |
|---|---|
| Consent method | written / digital / verbal |
| Consent document reference | ID of signed consent form |
| Consent date | When consent was obtained |
| Consent recorder | Who recorded the consent |
| Consent expiry | Annual expiry date |
| Consent scope | What processing is consented to |
Withdrawal Processing:
- User submits withdrawal request (any channel)
- System flags person record for deletion
- Delete face embeddings (biometric data) within 72 hours
- Delete all personal images from storage
- Anonymize detection events (keep event, replace name with
[REDACTED], remove person link) - Delete related event clips
- Log all deletion actions in audit trail
- Confirm completion to user within 30 days
12.11.3 Privacy Mode Controls
Four privacy modes are available per camera:
| Mode | Recording | Face Recognition | Alerts | Live View | Use Case |
|---|---|---|---|---|---|
| Full Operation | Yes | Yes | All | Yes | Standard surveillance |
| Recording Only | Yes | No | Motion only (no face) | Yes | Areas where facial recognition is not needed |
| Live View Only | No | No | No | Yes | Privacy-sensitive areas; viewing only |
| Privacy Mode | No | No | No | Privacy overlay | Break rooms, restrooms — privacy completely protected |
12.12 Edge Gateway Security
12.12.1 Hardening Checklist
| # | Hardening Measure | Implementation |
|---|---|---|
| 1 | Minimal OS | Ubuntu Server 22.04 LTS — no desktop packages |
| 2 | Disabled Bluetooth | systemctl stop bluetooth; systemctl disable bluetooth |
| 3 | Disabled WiFi | nmcli radio wifi off; modprobe -r iwlwifi |
| 4 | Disabled CUPS | systemctl stop cups; systemctl disable cups |
| 5 | Disabled avahi/mDNS | systemctl stop avahi-daemon; systemctl disable avahi-daemon |
| 6 | Disabled snapd | systemctl stop snapd; systemctl disable snapd |
| 7 | Disabled modemmanager | systemctl stop ModemManager; systemctl disable ModemManager |
| 8 | SSH key-only | PasswordAuthentication no; PubkeyAuthentication yes |
| 9 | SSH LAN-only | ListenAddress 192.168.29.5 |
| 10 | SSH root disabled | PermitRootLogin no |
| 11 | SSH rate limit | MaxAuthTries 3; ClientAliveInterval 300 |
| 12 | SSH protocol 2 | Protocol 2 (only) |
| 13 | SSH modern ciphers | Ciphers chacha20-poly1305@openssh.com |
| 14 | Auto-updates | unattended-upgrades — security updates only |
| 15 | Update schedule | Daily at 03:00; auto-reboot at 04:00 if required |
| 16 | Disk encryption | LUKS + TPM2 auto-unseal |
| 17 | Tamper detection | File integrity monitoring (AIDE) for critical config |
| 18 | Container security | Non-root users, read-only root FS, no new privileges |
| 19 | Firewall | iptables default deny; explicit allow only |
| 20 | No internet access | All outbound traffic via VPN tunnel only |
12.12.2 LUKS Disk Encryption with TPM2
The edge gateway uses LUKS full-disk encryption with TPM2 auto-unseal for headless operation:
# During setup — encrypt the data partition
cryptsetup luksFormat /dev/nvme0n1p2 \
--type luks2 \
--cipher aes-xts-plain64 \
--key-size 512 \
--pbkdf argon2id \
--tpm2-device=auto
# Bind the LUKS key to TPM2 PCR measurements
cryptsetup luksAddKey /dev/nvme0n1p2 \
--key-slot 1 \
--tpm2-device=auto \
--tpm2-pcrs=0,2,7
# During boot — TPM2 auto-unseals if PCRs match
cryptsetup open --tpm2-device=auto /dev/nvme0n1p2 data
PCR Measurements Bound:
| PCR | Purpose |
|---|---|
| PCR 0 | Core system firmware executable code |
| PCR 2 | Extended or pluggable executable code |
| PCR 7 | Secure Boot state |
12.13 Cloud Infrastructure Security
| Control | Implementation | Verification |
|---|---|---|
| Private subnets | All internal services in private subnets; no public IPs | VPC flow logs |
| Security groups | Least privilege; explicit allow only; no default allow-all | Quarterly review |
| Database access | No public access; app servers only via security group reference | AWS Config rule |
| Bastion host | Emergency access only; non-standard SSH port (2222); admin IP allowlist only | Access log audit |
| IMDSv2 | Enforced on all EC2 instances; no IMDSv1 fallback | Instance metadata check |
| Container security | Non-root users, read-only root FS, no new privileges, drop ALL capabilities | Pod Security admission |
| Image scanning | Trivy + Snyk on every build; HIGH/CRITICAL vulnerabilities block deployment | CI/CD pipeline gate |
| Image signing | Cosign signature verification required before deployment | Admission controller |
| Resource quotas | Kubernetes LimitRange on all namespaces | Resource quota monitoring |
| Network policies | Default deny all ingress/egress; explicit rules per service | Policy audit |
| Pod Security | Restricted standard enforced cluster-wide | Pod Security admission |
| Secrets management | Vault + External Secrets Operator; no secrets in Git | Secret scanning |
| Logging | All AWS API calls logged via CloudTrail; VPC Flow Logs enabled | Log analysis |
12.14 Secrets Rotation Policy
| Secret Type | Frequency | Method | Automation | Rollback |
|---|---|---|---|---|
| Database passwords | 90 days | Terraform + Vault dynamic credentials | Full | N/A (short-lived) |
| JWT signing keys | 180 days | Dual-key grace period; new key signs, old key verifies for 7 days | Full | Keep old key for 7 days |
| Internal API keys | 90 days | Zero-downtime: add new key, deploy, remove old key | Full | Immediate via config revert |
| Telegram/WhatsApp tokens | 180 days or on suspicion | Generate new via provider, update Vault, 5-min grace, revoke old | Semi | Old token valid for 5-minute grace |
| TLS certificates | 60 days | cert-manager + Let's Encrypt auto-renewal | Full | Previous certificate cached |
| WireGuard keys | 365 days | Maintenance window: generate new keys, update both endpoints simultaneously | Scripted | Manual key restore |
| DVR credentials | 180 days | Manual via DVR web UI | Manual | Previous password documented |
| Backup encryption keys | 365 days | Generate new key, re-encrypt all backups in background | Full | Previous key kept for 30 days |
| Session secrets | On security incident | Immediate: generate new secret, force all re-authentication | Admin trigger | Not applicable |
12.15 Incident Response
12.15.1 Security Event Detection and Response
| Phase | Timeline | Actions | Responsible |
|---|---|---|---|
| Detection | Automated (real-time) | Automated rules + behavioral analysis detect anomaly; alert generated | System |
| Assessment | 0-15 minutes | On-call engineer evaluates severity; determines if genuine security event | On-call Engineer |
| Containment | 15-60 minutes | Isolate affected systems; revoke compromised credentials; block malicious IPs | Security Team |
| Eradication | 1-4 hours | Remove root cause; patch vulnerabilities; rotate all exposed secrets | Engineering |
| Recovery | 4-24 hours | Restore from clean backups; verify system integrity; re-enable services | Platform Team |
| Lessons Learned | 24-48 hours | Post-mortem; update procedures; implement preventive measures | Security Team |
12.15.2 Breach Notification Procedure
| Phase | Timeline | GDPR Requirement | Actions |
|---|---|---|---|
| Detection & Assessment | 0-24 hours | — | Confirm breach; contain; assemble response team |
| Investigation | 24-72 hours | Article 33(1) | Forensic analysis; determine scope of affected data |
| Supervisory Authority | Within 72 hours | Article 33 | Notify Data Protection Authority |
| Data Subjects | Without undue delay | Article 34 | Notify affected individuals if high risk |
| Recovery | Post-notification | — | Restore from clean backups; apply patches |
| Post-Incident | Within 48 hours | Article 5(2) | Root cause analysis; update plans; document |
12.15.3 Breach Severity Classification
| Level | Criteria | Notification Required | Example |
|---|---|---|---|
| Low | No personal data accessed | Internal only | Failed attack attempt; no data exposure |
| Medium | Limited personal data; no sensitive data | DPA notification | Username/email list exposed |
| High | Sensitive personal data or biometric data accessed | DPA + Data subjects | Facial embeddings database accessed |
| Critical | Large-scale biometric exfiltration; ongoing threat | DPA + Data subjects + Public | Ransomware attack with biometric data theft |
12.16 Security Checklist Summary
The complete security checklist contains 100+ items across 15 categories. The following table summarizes the key items per category:
| Category | Items | Key Requirements |
|---|---|---|
| SSL/TLS | 8 | TLS 1.3, strong cipher suites only, HSTS, OCSP stapling, auto-renewal |
| Authentication | 13 | Argon2id, JWT ES256, MFA enforcement, password policy, HaveIBeenPwned |
| RBAC | 7 | 4 roles, 30+ permissions, resource-level access, default deny |
| VPN & Network | 10 | WireGuard + PSK, 5 security zones, firewall deny-all, network policies |
| Secret Management | 10 | Vault storage, dynamic credentials, field encryption, rotation schedule |
| Audit Logging | 11 | Hash-chain integrity, 20+ fields per entry, WORM storage, real-time alerts |
| Media Access | 8 | Signed URLs, session-bound, 5-min expiry, single-use option, watermarking |
| API Security | 11 | Rate limiting, Pydantic validation, parameterized queries, CSP, CSRF, CORS |
| Session Security | 8 | HttpOnly/Secure/Strict cookies, 8h absolute timeout, 30m idle timeout |
| Data Privacy (GDPR) | 13 | Consent tracking, right to deletion, anonymization, DPO, PIA |
| Edge Gateway | 12 | 20-point hardening, LUKS + TPM2, tamper detection, auto-updates |
| Cloud Infrastructure | 11 | Private subnets, image scanning, Pod Security, IMDSv2, CloudTrail |
| Secrets Rotation | 7 | All types scheduled, 60-day TLS, 90-day DB, dual-key JWT |
| Incident Response | 9 | Detection rules, breach notification, severity classification, post-mortem |
| Total | 130+ | — |
Section 13: UX / Website Structure
13.1 Design System
13.1.1 Design Philosophy
The UX design follows a "dark cockpit" philosophy optimized for 24/7 surveillance operations. The interface minimizes eye strain during long monitoring shifts while ensuring critical information is immediately visible. All design decisions prioritize operator efficiency and rapid threat identification.
| Principle | Implementation |
|---|---|
| Dark mode default | Near-black background with blue-tinted grays to reduce eye strain in low-light environments |
| Information density | High-density layouts that maximize data visible without scrolling |
| At-a-glance status | Color-coded status indicators for immediate situational awareness |
| Progressive disclosure | Advanced controls hidden behind "Expand" toggles; essential info always visible |
| Consistent patterns | Same interaction patterns reused across all 18 pages |
| Responsive feedback | Every action produces visible feedback within 100ms |
13.1.2 Color Palette
| Token | Hex | RGBA | Usage | Contrast Ratio |
|---|---|---|---|---|
--bg-primary |
#0B0E14 |
rgb(11, 14, 20) | Main application background | — |
--bg-secondary |
#151922 |
rgb(21, 25, 34) | Card and panel backgrounds | — |
--bg-tertiary |
#1E2330 |
rgb(30, 35, 48) | Elevated surfaces, modals, dropdowns | — |
--bg-sidebar |
#0D1117 |
rgb(13, 17, 23) | Sidebar navigation background | — |
--bg-hover |
#1A2030 |
rgb(26, 32, 48) | Row/card hover state | — |
--bg-selected |
#1E3A5F |
rgb(30, 58, 95) | Selected item background | — |
--text-primary |
#E2E8F0 |
rgb(226, 232, 240) | Headings, important content | 15.8:1 |
--text-secondary |
#94A3B8 |
rgb(148, 163, 184) | Labels, descriptions, metadata | 9.2:1 |
--text-muted |
#64748B |
rgb(100, 115, 139) | Placeholder text, disabled states | 6.1:1 |
--accent-blue |
#3B82F6 |
rgb(59, 130, 246) | Primary accent — buttons, links, active states | 4.5:1 |
--accent-blue-hover |
#2563EB |
rgb(37, 99, 235) | Button/link hover state | 5.1:1 |
--accent-green |
#10B981 |
rgb(16, 185, 129) | Success, online status, positive trends | 5.3:1 |
--accent-red |
#EF4444 |
rgb(239, 68, 68) | Critical alerts, errors, offline status | 5.0:1 |
--accent-orange |
#F59E0B |
rgb(245, 158, 11) | Warnings, medium severity | 5.4:1 |
--accent-yellow |
#FBBF24 |
rgb(251, 191, 36) | Watchlist indicators, highlights | 6.1:1 |
--accent-purple |
#8B5CF6 |
rgb(139, 92, 246) | AI features, special highlights | 4.8:1 |
--border-color |
#1E293B |
rgb(30, 41, 59) | Card borders, dividers, separators | — |
--border-focus |
#3B82F6 |
rgb(59, 130, 246) | Focus ring color | — |
--shadow-sm |
0 1px 2px rgba(0,0,0,0.3) |
— | Subtle elevation | — |
--shadow-md |
0 4px 6px rgba(0,0,0,0.4) |
— | Card elevation | — |
--shadow-lg |
0 10px 25px rgba(0,0,0,0.5) |
— | Modal/dialog elevation | — |
13.1.3 Typography
| Token | Font Family | Size | Weight | Line Height | Letter Spacing | Usage |
|---|---|---|---|---|---|---|
| Display | Inter | 28px | 700 (Bold) | 1.2 | -0.02em | Page titles |
| H1 | Inter | 22px | 600 (Semi-bold) | 1.3 | -0.01em | Section headings |
| H2 | Inter | 18px | 600 (Semi-bold) | 1.4 | 0 | Card titles, modal headers |
| H3 | Inter | 15px | 500 (Medium) | 1.4 | 0 | Sub-sections, form labels |
| Body | Inter | 14px | 400 (Regular) | 1.5 | 0 | General text, descriptions |
| Body Small | Inter | 13px | 400 (Regular) | 1.5 | 0 | Secondary body text |
| Caption | Inter | 12px | 400 (Regular) | 1.4 | 0.01em | Captions, metadata, footnotes |
| Timestamp | JetBrains Mono | 12px | 400 (Regular) | 1.4 | 0 | All timestamps, durations |
| Code | JetBrains Mono | 13px | 400 (Regular) | 1.5 | 0 | Code snippets, IDs, technical data |
| Badge | Inter | 11px | 500 (Medium) | 1 | 0.02em | Status badges, tags |
13.1.4 Spacing and Layout
| Token | Value | Usage |
|---|---|---|
| Sidebar expanded | 260px | Full navigation with labels and icons |
| Sidebar collapsed | 72px | Icons only; hover for tooltip |
| Top bar height | 56px | Clock, alerts, user menu |
| Content padding | 24px | Page content horizontal padding |
| Content max-width | 1400px | Maximum content width; centered above |
| Card padding | 16px | Internal card padding |
| Card border radius | 12px | Card and panel corners |
| Card gap | 16px | Gap between cards in grid |
| Button border radius | 8px | Button corners |
| Input border radius | 6px | Form input corners |
| Modal border radius | 16px | Modal/dialog corners |
| Toast border radius | 8px | Toast notification corners |
| Avatar size (small) | 24px | Inline avatars |
| Avatar size (medium) | 40px | Card headers, lists |
| Avatar size (large) | 64px | Profile pages |
| Icon size (default) | 20px | Navigation and actions |
| Icon size (small) | 16px | Inline icons |
| Scrollbar width | 8px | Custom styled scrollbar |
13.2 Global Navigation Structure
13.2.1 Layout Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ [Logo] Sentinel AI Surveillance [Clock] [Alerts] [👤 User] │ ▲ 56px
├────────┬───────────────────────────────────────────────────────────────────┤
│ │ │
│ [📊] │ MAIN CONTENT AREA │
│ Dash │ │
│ board │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ │ Card 1 │ │ Card 2 │ │ Card 3 │ │
│ [📹] │ │ │ │ │ │ │ │
│ Live │ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ [🔔] │ ┌──────────────────────────────────────────────────┐ │
│ Alerts │ │ Wide Card / Table │ │
│ │ └──────────────────────────────────────────────────┘ │
│ [🔍] │ │
│ Detec │ │
│ tions │ │
│ │ │
│ [remaining navigation items...] │
│ │ │
├────────┤ │
│◁ / ▷ │ │
└────────┴───────────────────────────────────────────────────────────────────┘
◄── 260px (expanded) / 72px (collapsed) ──►
13.2.2 Navigation Menu Items
| # | Icon | Label | Route | Badge Type | Required Permission |
|---|---|---|---|---|---|
| 1 | LayoutDashboard |
Dashboard | /dashboard |
None | Any |
| 2 | Video |
Live View | /live |
Online camera count | cameras:view |
| 3 | Bell |
Alert Center | /alerts |
Pending alert count | alerts:view |
| 4 | ScanEye |
Detections | /detections |
None | cameras:view |
| 5 | Users |
Person Gallery | /persons |
Total person count | persons:view |
| 6 | UserQuestion |
Unknown Review | /unknowns |
Queue count | persons:view |
| 7 | ClockAlert |
Suspicious Activity | /timeline |
None | alerts:view |
| 8 | Search |
Search | /search |
None | Any |
| 9 | ShieldAlert |
Watchlists | /watchlists |
None | watchlists:view |
| 10 | Sparkles |
AI Vibe Settings | /settings/ai |
None | ai_settings:view |
| 11 | Brain |
Training Review | /training |
Pending suggestions | ai_settings:view |
| 12 | Activity |
System Health | /health |
Status dot (green/yellow/red) | system:view |
| 13 | Settings |
Settings | /settings |
None | Admin functions |
Settings Submenu:
| # | Icon | Label | Route | Required Permission |
|---|---|---|---|---|
| 13a | Camera |
Camera Management | /settings/cameras |
cameras:manage |
| 13b | HardDrive |
Retention & Storage | /settings/storage |
storage:manage |
| 13c | UserCog |
Admin Users | /settings/users |
users:manage |
| 13d | BellRing |
Notification Settings | /settings/notifications |
notifications:manage |
13.2.3 Top Bar
| Element | Position | Content | Update Frequency |
|---|---|---|---|
| Logo + Brand | Left | Sentinel AI logo + text | Static |
| Current Time | Center-Right | HH:MM:SS live clock |
Every second |
| Alert Badge | Right | Bell icon with red count badge | On alert change |
| User Menu | Far right | Avatar + dropdown menu | Static |
User Menu Dropdown:
| Item | Action |
|---|---|
| Profile | Navigate to user profile |
| Preferences | Theme, timezone, notification preferences |
| Keyboard Shortcuts | Show shortcut reference modal |
| Help & Documentation | Open help center |
| Logout | End session (clears all tokens) |
13.3 Page Descriptions
13.3.1 Page 1: Login (/login)
The login page is the entry point to the system. It is designed for quick, secure access with minimal friction.
| Feature | Specification |
|---|---|
| Layout | Centered card on dark background |
| Logo | Sentinel AI logo (large) centered above form |
| Fields | Username/email (text input), Password (password input with show/hide toggle) |
| Remember me | Checkbox — "Keep me signed in for 7 days" |
| Submit | "Sign In" button — full width, accent blue |
| MFA step | Appears after successful password; 6-digit TOTP input with auto-focus |
| Error states | Inline validation; shake animation on error |
| Footer | "v2.3.1" version number, copyright, privacy policy link |
| Security | Rate limiting (5 attempts / 15 min), CAPTCHA after 3 failures |
| Redirect | After login, redirect to originally requested URL (or Dashboard) |
| Session | JWT access token (15 min) + refresh token cookie (7 days) |
13.3.2 Page 2: Dashboard (/dashboard)
The Dashboard is the primary landing page providing at-a-glance situational awareness.
┌──────────────────────────────────────────────────────────────────────────────┐
│ Dashboard [Refresh] [Date Range] │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │ 📹 8/8 │ │ 🔔 12 │ │ 👥 47 │ │ ✓ Healthy │ │
│ │ Cameras │ │ Alerts Today │ │ Persons │ │ System │ │
│ │ Online │ │ 3 Critical │ │ Detected │ │ All Good │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ ┌──────────────────────────┐ │
│ │ Alert Distribution (Last 24 Hours) │ │ Recent Alerts │ │
│ │ │ │ │ │
│ │ 8 ┤ ██ │ │ 🔴 CAM-01 Unknown │ │
│ │ 6 ┤ ██ ██ ██ │ │ 14:32 — Entrance │ │
│ │ 4 ┤ ██ ██ ██ ██ ██ │ │ 🟡 CAM-03 Watchlist │ │
│ │ 2 ┤ ██ ██ ██ ██ ██ ██ ██ │ │ 13:15 — Parking │ │
│ │ 0 ┼────┬────┬────┬────┬────┬────┬── │ │ 🟠 CAM-05 System │ │
│ │ 00 04 08 12 16 20 │ │ 12:08 — Storage 90% │ │
│ │ │ │ │ │
│ └────────────────────────────────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Camera Status Grid (2x4) │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ CAM-01 │ │ CAM-02 │ │ CAM-03 │ │ CAM-04 │ │ │
│ │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ │
│ │ │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ CAM-05 │ │ CAM-06 │ │ CAM-07 │ │ CAM-08 │ │ │
│ │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ │
│ │ │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Activity Feed │ │
│ │ 14:32 — Unknown person detected at CAM-01 (Entrance) │ │
│ │ 14:15 — Watchlist match: John Smith at CAM-03 (Parking) │ │
│ │ 13:58 — Operator Alice acknowledged alert #ALT-2847 │ │
│ │ 13:42 — Camera CAM-05 stream reconnected │ │
│ │ 13:30 — Daily training completed: 3 new face clusters │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
Dashboard Components:
| Component | Refresh Rate | Description |
|---|---|---|
| Stat cards | 30 seconds | Active cameras, alerts today, persons detected, system health |
| Alert distribution chart | 5 minutes | Bar chart showing alerts by hour for last 24 hours |
| Recent alerts card | 30 seconds | Last 5 alerts with severity badge, camera, timestamp |
| Camera status grid | 30 seconds | 2x4 grid of all 8 cameras with live thumbnail and status dot |
| Activity feed | Real-time (WebSocket) | Recent system events — detections, alerts, operator actions |
13.3.3 Page 3: Live Camera View (/live)
The live view is the primary monitoring interface, showing real-time streams from all 8 cameras.
| Feature | Specification |
|---|---|
| Default layout | 2x4 grid (8 cameras) |
| Layout options | 1x1 (single), 2x2 (4 cameras), 2x4 (8 cameras), 4x4 (16 cameras for future scaling) |
| Stream format | HLS (HTTP Live Streaming) with WebRTC fallback for lower latency |
| Per-camera overlay | Camera name, status dot, expand button, snapshot button |
| Grid controls | Play all / Pause all, Refresh all streams, Layout selector |
| Camera states | Loading (spinner), Playing, Paused, Error (retry button), Offline (gray placeholder) |
| Fullscreen | Click any camera to expand; press F to toggle fullscreen for focused camera |
| Camera switching | Press 1-8 to focus camera by number |
| Snapshot | Press S or click camera snapshot button to capture current frame |
| Recording indicator | Red pulsing dot on cameras actively recording |
| Alert overlay | Flashing border on camera that triggered recent alert |
13.3.4 Page 4: Alert Center (/alerts)
The Alert Center provides comprehensive alert management with filtering, batch actions, and detailed investigation tools.
| Feature | Specification |
|---|---|
| Filter bar | Date range picker, severity multi-select (Critical/High/Medium/Low/Info), camera multi-select, status filter (Pending/Acknowledged/Resolved/Ignored), type filter |
| Severity legend | Color-coded badges: Critical (red), High (orange), Medium (yellow), Low (blue), Info (gray) |
| Alert cards | Each card: thumbnail image, camera name, timestamp, severity badge, person name (if known), description, current status |
| Card actions | Acknowledge, Resolve, Ignore, View Details, Mark False Positive |
| Bulk actions | Checkbox selection; batch Acknowledge or Ignore |
| Sort options | Newest first (default), Oldest first, Severity (highest first), Camera name |
| Pagination | 20 alerts per page; infinite scroll option |
| Empty state | "No alerts in the selected period" with illustration |
| Detail panel | Slide-out panel with full alert info: images, video clip, AI confidence, detection metadata, person profile link |
13.3.5 Page 5: Recent Detections (/detections)
Shows all recent detection events with face thumbnails and recognition results.
| Feature | Specification |
|---|---|
| Filter controls | Known/Unknown/All toggle, date range picker, camera selector, person name search |
| Detection cards | Face thumbnail + name (or "Unknown") + confidence percentage + camera name + timestamp + watchlist badge |
| Card click | Opens detail view with full-size image, sighting history for that person, camera info |
| Actions | "Name This Person" (unknowns), "View Profile" (known), "Add to Watchlist" |
| Confidence indicator | Visual bar showing confidence level; color-coded (green > 90%, yellow 70-90%, orange < 70%) |
| Grid layout | 4 columns desktop, 3 tablet, 2 mobile |
| Auto-refresh | New detections appear at top without page reload (WebSocket) |
13.3.6 Page 6: Person Gallery (/persons)
A browsable gallery of all known persons in the system.
| Feature | Specification |
|---|---|
| Search bar | Full-text search across names, roles, departments, tags |
| Role filters | Employee / Visitor / Vendor / Contractor / Other — pill-style toggle buttons |
| Sort options | Name (A-Z), Last Seen (recent first), Sightings Count (highest first), Date Added (newest first) |
| Person cards | Face image, name, role badge, department, last seen timestamp, total sightings count |
| Grid layout | 5 columns desktop (xl), 4 columns (lg), 3 columns (md), 2 columns (sm) |
| Pagination | 50 persons per page |
| Actions | Click card → navigate to Person Profile; right-click context menu |
| Bulk actions | Select multiple for bulk add to watchlist |
| Empty state | "No persons found" with "Add your first person" CTA |
13.3.7 Page 7: Unknown Persons Review (/unknowns)
The review queue for unidentifified persons — a critical workflow for building the person database.
| Feature | Specification |
|---|---|
| Queue view | Cards of unknown person clusters (grouped by face similarity via DBSCAN) |
| Cluster card | Representative face image + cluster size (number of sightings) + first/last seen + cameras detected at + confidence range |
| Actions per cluster | Name This Person, Merge with Existing, Ignore Cluster, Mark as Reviewed |
| AI insight panel | Pattern suggestion: "Seen 5x at entrance between 08:00-09:00 — possibly employee" |
| Progress indicator | "23 unknown clusters remaining" with progress bar |
| Batch review | Keyboard navigation (arrow keys + Enter to select action) for rapid review |
| Empty state | "Great job! No unknown persons to review. All caught up!" with celebration animation |
| Reviewed history | Tab to view previously reviewed clusters |
13.3.8 Page 8: Person Profile (/persons/{id})
Detailed view of a single person's information, detection history, and management options.
| Feature | Specification |
|---|---|
| Header | Name, role badge, status (Active/Inactive), action buttons (Edit, Delete, Add to Watchlist) |
| Photo gallery | Primary face photo (large) + additional reference photos in thumbnail grid below |
| Info panel | Department, employee ID, contact information, notes, tags, date added, added by |
| Sighting history | Timeline of all detections — timestamp, camera name, confidence, thumbnail image |
| Sighting stats | Total sightings, first seen, last seen, most common camera, most common time |
| Watchlist memberships | Which watchlists this person belongs to, with badge per watchlist |
| Activity log | Who created/edited the profile and when; full audit trail |
| Danger zone | Delete person (with confirmation dialog explaining consequences) |
13.3.9 Page 9: Suspicious Activity Timeline (/timeline)
A timeline-based visualization of flagged events for pattern analysis.
| Feature | Specification |
|---|---|
| Timeline view | Horizontal time axis with event markers positioned by timestamp |
| Event types | Unusual movement (orange), Loitering (yellow), Unauthorized access (red), Crowd gathering (purple) |
| Color coding | Each event type has a distinct color; severity affects marker size |
| Filters | Event type multi-select, camera selector, date range, severity threshold |
| Zoom levels | Hour view, Day view (default), Week view, Month view |
| Click marker | Opens detail panel with description, evidence images, AI reasoning, confidence |
| Density heatmap | Background shows detection density to identify high-activity periods |
13.3.10 Page 10: Search (/search)
Global search across all data types in the system.
| Feature | Specification |
|---|---|
| Search bar | Prominent centered search input with clear button |
| Category filters | Person, Camera, Event, Alert — toggle pills |
| Results grouping | Results grouped by category with section headers |
| Person search | Type name or upload a photo for face recognition similarity search |
| Camera search | By name, location, or status |
| Event search | By description, camera, person, or event type |
| Alert search | By ID, description, or camera |
| Keyboard shortcut | / (forward slash) focuses search from any page |
| Recent searches | Dropdown shows recent searches for quick access |
| Empty state | "No results found" with search tips |
13.3.11 Page 11: Watchlists (/watchlists)
Management interface for watchlist categories and their members.
| Feature | Specification |
|---|---|
| Watchlist cards | Name, icon (selected from preset), color, member count, alert settings summary |
| Create button | "+ New Watchlist" with modal: name, icon picker, color picker, alert configuration |
| Default watchlists | VIP (green), Blacklist (red), Authorized (blue), Temporary Access (yellow) |
| Card click | Opens watchlist detail with full member list |
| Member management | Add from gallery (search + select), remove member, bulk import via CSV |
| Alert settings | Per-watchlist: alert timing, severity override, notify groups, quiet hours override |
| Test button | "Test Alert" — sends test notification for this watchlist to verify configuration |
| Member table | Sortable by name, date added, added by, sightings count |
13.3.12 Page 12: AI Vibe Settings (/settings/ai)
The AI Vibe Settings page presents AI configuration as friendly questions rather than technical parameters.
| # | Setting | Question | Options | Description |
|---|---|---|---|---|
| 1 | Detection Sensitivity | "How carefully should the AI watch?" | Relaxed / Balanced / High / Maximum | Controls how aggressively the AI reports detections |
| 2 | Face Match Threshold | "How confident should the AI be before naming someone?" | Lenient / Normal / Strict / Very Strict | Lower = more matches but more false positives |
| 3 | Night Mode | "How should the AI behave at night?" | Off / Diminished / Active / Enhanced | Night-specific model and sensitivity adjustment |
| 4 | Evidence Capture | "What should be saved when someone is detected?" | Photo Only / Photo + 5s Clip / Photo + 10s Clip / Full Recording | Media stored per detection event |
| 5 | Alert Style | "When should alerts be sent?" | Silent / Digest / Normal / Urgent / Critical | Controls alert frequency and channels used |
| 6 | Learning Mode | "Should the AI learn from new sightings?" | Off / Review First / Auto-Learn Cautiously / Auto-Learn Aggressively | How unknown face clusters are handled |
| 7 | Privacy Mode | "How should privacy be handled?" | Full Recognition / Blur Unrecognized / Blur All Faces / Privacy Zones | Face processing and display privacy |
Each setting control:
- Segmented button group (pill-shaped options)
- Selected option highlighted in accent blue
- Brief description below updates on selection
- Current value displayed as badge
- Auto-save (no save button); toast confirms: "Detection Sensitivity updated to High"
- Expand toggle reveals internal numerical values (Admin permission required)
Advanced Mode (Admin only): When expanded, each control shows the internal parameter values:
| Setting | Option | Internal Value |
|---|---|---|
| Detection Sensitivity | Relaxed | Confidence threshold: 0.85, NMS: 0.5 |
| Detection Sensitivity | Balanced | Confidence threshold: 0.70, NMS: 0.45 |
| Detection Sensitivity | High | Confidence threshold: 0.55, NMS: 0.4 |
| Detection Sensitivity | Maximum | Confidence threshold: 0.40, NMS: 0.35 |
| Face Match Threshold | Lenient | Similarity threshold: 0.60 |
| Face Match Threshold | Normal | Similarity threshold: 0.70 |
| Face Match Threshold | Strict | Similarity threshold: 0.80 |
| Face Match Threshold | Very Strict | Similarity threshold: 0.90 |
13.3.13 Page 13: Training Review (/training)
Interface for reviewing AI-suggested face clusters and approving them for model training.
| Feature | Specification |
|---|---|
| Suggestion cards | Face cluster the AI is uncertain about — multiple face images + AI confidence + reason for suggestion |
| Card layout | Grid of face thumbnails + confidence bar + suggestion reason ("Seen 8x at different cameras, high confidence match") |
| Actions per suggestion | Approve (add to training data), Reject (not a valid cluster), Merge with Existing Person |
| Batch actions | Select multiple suggestions for bulk Approve/Reject |
| Queue status | "12 suggestions pending review" with progress bar |
| Filter | By confidence level, camera, date range |
| History | Tab showing previously reviewed suggestions with outcome |
| Training metrics | Model accuracy trend, training data count, last training time |
13.3.14 Page 14: System Health (/health)
Real-time system health monitoring dashboard.
| Feature | Specification |
|---|---|
| Status overview | Large status indicator: All Systems Operational (green) / Degraded (yellow) / Critical (red) |
| Service cards | Per-service status card: Video Capture, AI Inference, Database, Storage, Notifications, VPN |
| Per-service metrics | Status dot, uptime percentage, last restart, CPU, memory |
| Camera health table | All 8 cameras: stream status, FPS, bitrate, last seen, error count |
| System metrics | CPU usage (%), memory usage (%), disk usage (%), network I/O |
| Logs viewer | Recent system logs with severity filtering (DEBUG/INFO/WARNING/ERROR/CRITICAL); tail -f style auto-scroll |
| Refresh | Auto-refresh every 30 seconds; manual refresh button |
| Historical view | Toggle to show metrics history (last 1h, 6h, 24h, 7d) |
13.3.15 Page 15: Notifications Settings (/settings/notifications)
Configuration interface for the notification system.
| Feature | Specification |
|---|---|
| Recipient groups | Add/edit/delete groups; each group has name, Telegram chat IDs, WhatsApp numbers, alert preferences |
| Routing rules | Visual rule builder with drag-and-drop condition blocks (camera, person, role, event_type, zone, time, day, severity, watchlist) |
| Quiet hours | Schedule builder with day-of-week checkboxes, time range pickers, timezone selector |
| Template editor | Edit message templates per alert type; live preview with sample data; variable reference panel |
| Delivery status | Real-time view showing notification delivery states (pending/sent/delivered/failed) |
| Test buttons | "Send Test Alert" per channel to verify configuration |
| DLQ viewer | Dead letter queue entries with retry/discard actions |
13.3.16 Page 16: Admin Users (/settings/users)
User management interface for administrators.
| Feature | Specification |
|---|---|
| Users table | Username, email, role badge, status (Active/Inactive), last login, MFA status, actions menu |
| Add user | Modal: username, email, role selector, password (or send invite link), MFA toggle |
| Edit user | Role, status, force password change on next login, reset 2FA, session revocation |
| User activity log | Login history (timestamp, IP, device), actions taken, settings changed |
| Bulk actions | Deactivate multiple accounts simultaneously |
| Filter | By role, status, last login date range |
| Sort | By username, role, last login, created date |
| Pagination | 25 users per page |
13.3.17 Page 17: Camera Management (/settings/cameras)
Configuration interface for camera setup and zone management.
| Feature | Specification |
|---|---|
| Camera cards | Name, status (Online/Offline/Disabled), IP/connection string, stream info (resolution, FPS), action buttons (Edit, Test, Disable) |
| Add camera | Modal: name, location, stream URL, credentials, channel number, description |
| Edit camera | All camera properties; test connection button |
| Zone configuration | Interactive polygon drawing on live camera feed; zone name, color, sensitivity, type (Entrance/Restricted/Detection/Ignore) |
| Stream settings | Resolution (720p/1080p), frame rate (5/10/15/25/30 FPS), codec (H.264/H.265), night mode toggle |
| Recording settings | Continuous/event-triggered, retention policy, storage location |
| Camera ordering | Drag to reorder cameras in grid layout |
13.3.18 Page 18: Retention & Storage (/settings/storage)
Storage management and retention policy configuration.
| Feature | Specification |
|---|---|
| Storage overview | Donut chart showing usage breakdown: Video recordings, Detection snapshots, Training data, System logs, Free space |
| Numerical values | Total capacity / Used / Free; warning at > 80% (yellow), critical at > 95% (red) |
| Retention policies | Dropdown per category: 7 days / 14 days / 30 days / 60 days / 90 days / 180 days / 365 days / Forever |
| Auto-cleanup | Enable toggle + schedule time picker (daily at 03:00 default) |
| Actions | "Save Settings", "Run Cleanup Now" (with confirmation), "Export Storage Report" |
| Growth projection | Estimated days until full based on current growth rate |
| Storage alerts | Configure alert thresholds (80% warning, 90% high, 95% critical) |
13.4 Key User Flows
13.4.1 Flow 1: Daily Operator — Monitor & Respond
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 1: DAILY OPERATOR (Monitor & Respond) │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: LOGIN │
│ ────────────── │
│ Enter username → Enter password → MFA code (if enabled) │
│ → Redirect to Dashboard │
│ │
│ STEP 2: DASHBOARD REVIEW (~30 seconds) │
│ ───────────────────────────────────── │
│ Glance at stat cards: │
│ ├─ All 8 cameras online? ✓ │
│ ├─ Any critical alerts pending? (red badge) │
│ ├─ Any unknown persons detected? │
│ └─ System health OK? │
│ │
│ If critical alert visible: │
│ → Click alert card → Go to Alert Center │
│ If no urgent alerts: │
│ → Click "Live View" in sidebar │
│ │
│ STEP 3: LIVE CAMERA MONITORING (ongoing) │
│ ───────────────────────────────────── │
│ View 2x4 grid of all cameras │
│ Observe feeds for anomalies │
│ │
│ When alert toast appears (top-right): │
│ → Toast slides in with sound notification │
│ → Click toast to view alert details │
│ │
│ STEP 4: ALERT RESPONSE │
│ ────────────────── │
│ Click alert toast OR navigate to Alert Center │
│ Review alert card: │
│ ├─ Thumbnail image │
│ ├─ Camera name, timestamp │
│ ├─ Alert type (unknown person, watchlist match, etc.) │
│ └─ Severity level │
│ │
│ Click "View Details" for full information: │
│ ├─ Full-size image / video clip │
│ ├─ AI confidence score │
│ ├─ Detection metadata (bounding box, zone) │
│ └─ Person profile link (if known) │
│ │
│ DECISION: │
│ ├─ False detection → Click "Mark as False Positive" │
│ ├─ Legitimate alert → Click "Acknowledge" or "Resolve" │
│ ├─ Unknown person → Click "Name This Person" │
│ ├─ Needs escalation → Click "Escalate" │
│ └─ Need live view → Click "View Live" to jump to camera │
│ │
│ STEP 5: RETURN TO MONITORING │
│ ──────────────────────────── │
│ After handling alert, return to Live View │
│ Continue monitoring cycle │
│ │
│ STEP 6: END OF SHIFT │
│ ────────────────── │
│ Review unacknowledged alerts (if any) │
│ Check System Health page │
│ Hand over to next operator (verbal + note any pending issues) │
│ Click user menu → Logout │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.2 Flow 2: New Person Onboarding
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 2: NEW PERSON ONBOARDING │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ TRIGGER: System detects unknown person → Alert created → Operator notified │
│ │
│ STEP 1: REVIEW DETECTION │
│ ──────────────────── │
│ Navigate to "Recent Detections" via sidebar │
│ Filter: "Unknown" (toggle button) │
│ Click on unknown detection card │
│ │
│ Detail view shows: │
│ ├─ Full-size face image │
│ ├─ Camera: CAM-01 (Entrance) │
│ ├─ Timestamp: 2025-01-16 14:32:15 │
│ ├─ Confidence: 87.3% │
│ └─ AI note: "No matching person found in database" │
│ │
│ STEP 2: NAME THE PERSON │
│ ──────────────────── │
│ Click "Name This Person" button │
│ Modal dialog appears: │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Name This Person │ │
│ │ │ │
│ │ Face: [thumbnail] │ │
│ │ │ │
│ │ Full Name * [____________] │ │
│ │ Role * [Employee ▼] │ │
│ │ Department [____________] │ │
│ │ Employee ID [____________] │ │
│ │ Notes [____________] │ │
│ │ Tags [____________] │ │
│ │ │ │
│ │ Similar existing persons: │ │
│ │ [No similar persons found] │ │
│ │ │ │
│ │ [Cancel] [Save & Create Profile] │ │
│ └────────────────────────────────────┘ │
│ │
│ STEP 3: SIMILARITY CHECK │
│ ──────────────────── │
│ System searches for similar existing persons │
│ If matches found: display side-by-side comparison │
│ → Option to merge with existing person instead of creating new │
│ If no matches: proceed with creation │
│ │
│ STEP 4: SAVE PROFILE │
│ ────────────── │
│ Click "Save & Create Profile" │
│ Toast notification: "Profile created for [Name]" │
│ Detection card updates with person name │
│ Person now appears in Person Gallery │
│ │
│ STEP 5: ADD TRAINING IMAGES (Optional) │
│ ──────────────────────────────────── │
│ Navigate to Person Profile │
│ Click "Upload Reference Photos" │
│ Select additional clear face images │
│ System queues for model retraining │
│ Toast: "3 new training images added. Model will retrain automatically." │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.3 Flow 3: Unknown Person Review Queue
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 3: UNKNOWN PERSON REVIEW QUEUE │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: OPEN REVIEW QUEUE │
│ ────────────────────── │
│ Sidebar → "Unknown Persons Review" │
│ View: Grid of unknown person cluster cards │
│ Header: "23 unknown clusters remaining" │
│ │
│ STEP 2: SELECT CLUSTER │
│ ──────────────── │
│ Click on a cluster card to expand │
│ Shows: │
│ ├─ Representative face (largest) │
│ ├─ Gallery of all face instances in cluster │
│ ├─ Sighting history (camera, time, count) │
│ ├─ AI pattern insight: "Seen 5x at entrance between 08:00-09:00" │
│ └─ Confidence distribution graph │
│ │
│ STEP 3: MAKE DECISION │
│ ──────────────── │
│ Options: │
│ ├─ [Name This Person] → Enter details → Create new profile │
│ ├─ [Merge with Existing] → Search/select person → Confirm merge │
│ ├─ [Ignore Cluster] → "False detection / not a person" → Remove │
│ └─ [Mark Reviewed] → "Unsure, keep in queue for later" │
│ │
│ STEP 4: QUEUE UPDATES │
│ ──────────────── │
│ Processed item removed from queue │
│ Toast confirms action: "Cluster marked as [Name]. 22 remaining." │
│ Auto-advance to next cluster (optional) │
│ Keyboard shortcut: Right arrow → next cluster │
│ │
│ STEP 5: CONTINUE REVIEW │
│ ──────────────── │
│ Process all clusters or stop and resume later │
│ Queue persists across sessions │
│ New clusters automatically added as detected │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.4 Flow 4: AI Settings Adjustment
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 4: AI SETTINGS ADJUSTMENT │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: NAVIGATE TO AI VIBE SETTINGS │
│ ──────────────────────────────────── │
│ Sidebar → "AI Vibe Settings" (Sparkles icon) │
│ View: Scrollable page with 7 setting sections │
│ │
│ STEP 2: ADJUST DETECTION SENSITIVITY │
│ ──────────────────────────────── │
│ Section: "How carefully should the AI watch?" │
│ Current: [Relaxed] [Balanced] [High] [Maximum] │
│ Change: Click "High" │
│ Description updates: │
│ "High: The AI will catch almost everything. │
│ Expect more alerts, including some false positives." │
│ Toast: "Detection Sensitivity updated to High" │
│ Change takes effect immediately │
│ │
│ STEP 3: ADJUST ALERT STYLE │
│ ──────────────────── │
│ Section: "When should alerts be sent?" │
│ Current: [Silent] [Digest] [Normal] [Urgent] [Critical] │
│ Change: Click "Critical" │
│ Description updates: │
│ "Critical: Only truly important events trigger alerts. │
│ All other activity is logged but not alerted." │
│ Toast: "Alert Style updated to Critical" │
│ │
│ STEP 4: REVIEW ADVANCED (Admin only) │
│ ──────────────────────────────────── │
│ Click "Expand" on Advanced Settings │
│ Shows internal values: │
│ Detection Sensitivity: High │
│ └─ Confidence Threshold: 0.55 │
│ └─ NMS Threshold: 0.40 │
│ └─ Model: yolo11m.onnx │
│ Admin can directly edit numerical values │
│ │
│ STEP 5: DONE │
│ ──────── │
│ All changes auto-saved │
│ Return to monitoring — changes effective immediately │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.5 Flow 5: Watchlist Alert Configuration
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 5: WATCHLIST ALERT CONFIGURATION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: NAVIGATE TO WATCHLISTS │
│ ──────────────────────────── │
│ Sidebar → "Watchlists" │
│ View: Grid of existing watchlist cards │
│ Default: VIP, Blacklist, Authorized, Temporary Access │
│ │
│ STEP 2: CREATE NEW WATCHLIST (Optional) │
│ ──────────────────────────────────── │
│ Click "+ New Watchlist" │
│ Modal: │
│ Name: [Security Escort Required] │
│ Icon: [🛡️] (icon picker) │
│ Color: [Orange] (color picker) │
│ Description: [People who require security escort] │
│ Click "Create" │
│ New watchlist card appears in grid │
│ │
│ STEP 3: ADD MEMBERS │
│ ──────────────── │
│ Click on watchlist card │
│ Click "Add from Gallery" │
│ Search/select persons to add: │
│ [☑] John Doe │
│ [☑] Jane Smith │
│ [☐] Bob Johnson (not selected) │
│ Click "Add to Watchlist" │
│ Toast: "2 persons added to Security Escort Required" │
│ │
│ STEP 4: CONFIGURE ALERTS │
│ ──────────────────── │
│ Click "Settings" tab on watchlist detail │
│ Configure: │
│ Alert Timing: [☑] Immediate [☐] Delayed (___ min) │
│ Severity: [☐] Inherit [☑] Force Critical │
│ Notify Groups: [☑] Security Team [☐] Management │
│ Media: [☑] Image [☑] Video │
│ Quiet Hours: [☐] Respect global [☑] Always alert │
│ Escalation: [☑] Enable escalation (5/10/20 min) │
│ Click "Save" │
│ │
│ STEP 5: TEST │
│ ──────── │
│ Click "Test Alert" button │
│ System sends test alert through configured channels │
│ Verify: Telegram message received ✓ │
│ Verify: WhatsApp message received ✓ │
│ Watchlist is now active and monitoring │
└──────────────────────────────────────────────────────────────────────────────┘
13.5 Component Specifications
13.5.1 Camera Feed Component
| State | Visual | Interaction |
|---|---|---|
| Loading | Centered spinner overlay, camera name visible | None — wait for stream |
| Playing | Live stream active, recording dot if applicable | Click to focus, hover for controls |
| Paused | Stream paused, large play button overlay | Click to resume |
| Error | Error icon + "Connection failed" + Retry button | Click Retry to reconnect |
| Offline | Gray placeholder with camera icon + "Offline" | Shows last online timestamp |
| Disabled | Grayed out with "Disabled" badge | No stream attempted |
| Prop | Type | Required | Default | Description |
|---|---|---|---|---|
cameraId |
string |
Yes | — | Unique camera identifier (e.g., "cam-01") |
name |
string |
Yes | — | Display name shown as overlay |
streamUrl |
string |
Yes | — | HLS or WebRTC stream URL |
status |
'online' | 'offline' | 'reconnecting' | 'disabled' |
Yes | — | Current camera status |
layout |
'grid' | 'fullscreen' |
No | 'grid' |
Current layout mode |
quality |
'auto' | 'hd' | 'sd' |
No | 'auto' |
Stream quality preference |
showControls |
boolean |
No | true |
Show overlay controls |
onFocus |
(id: string) => void |
No | — | Callback when camera is focused |
onSnapshot |
(id: string) => void |
No | — | Callback when snapshot is taken |
13.5.2 Alert Card Component
| Prop | Type | Required | Description |
|---|---|---|---|
id |
string |
Yes | Alert unique identifier |
severity |
'critical' | 'high' | 'medium' | 'low' | 'info' |
Yes | Alert severity level |
type |
string |
Yes | Alert type classification |
cameraName |
string |
Yes | Source camera display name |
timestamp |
Date |
Yes | When the alert occurred |
thumbnail |
string |
No | URL to thumbnail image |
personName |
string |
No | Identified person name (if known) |
status |
'pending' | 'acknowledged' | 'resolved' | 'ignored' |
Yes | Current alert status |
onAcknowledge |
() => void |
No | Acknowledge callback |
onResolve |
() => void |
No | Resolve callback |
onIgnore |
() => void |
No | Ignore callback |
onViewDetails |
() => void |
No | View details callback |
13.5.3 Stat Card Component
| Prop | Type | Required | Description |
|---|---|---|---|
title |
string |
Yes | Card label (e.g., "Cameras Online") |
value |
string | number |
Yes | Main displayed value (e.g., "8/8") |
icon |
LucideIcon |
Yes | Icon component from Lucide React |
color |
'green' | 'blue' | 'orange' | 'red' | 'purple' |
No | Color theme (default: blue) |
trend |
number |
No | Percentage change from previous period |
subtitle |
string |
No | Secondary text below value |
href |
string |
No | Navigation link (e.g., to detail page) |
13.6 Toast Notification System
| Type | Icon | Color | Duration | Use Case |
|---|---|---|---|---|
| Success | Check circle | Green (#10B981) |
3 seconds | Action completed successfully |
| Error | X circle | Red (#EF4444) |
5 seconds (or persistent) | Action failed; may require user attention |
| Warning | Alert triangle | Orange (#F59E0B) |
4 seconds | Non-critical issue; may need attention |
| Info | Info circle | Blue (#3B82F6) |
3 seconds | Informational message |
| Alert | Bell | Red (#EF4444) |
Persistent (until dismissed) | Critical alert notification |
Toast behavior:
- Appears in top-right corner
- Stacks up to 5 toasts simultaneously
- Older toasts pushed down when new ones arrive
- Hovering pauses auto-dismiss timer
- Click to dismiss immediately
- Swipe right to dismiss (mobile)
13.7 Modal System
| Size | Width | Use Case |
|---|---|---|
| Small | 400px | Confirmations, simple forms |
| Medium (default) | 560px | Standard forms, detail views |
| Large | 800px | Complex forms, image viewers |
| Fullscreen | 100% | Camera fullscreen, large data tables |
Modal behavior:
- Backdrop click to close (configurable)
- Escape key to close (configurable)
- Focus trap — Tab cycles within modal
- Return focus to trigger element on close
- Body scroll locked when modal open
- Enter key submits primary action (forms)
13.8 Responsive Behavior
| Breakpoint | Width | Layout Changes |
|---|---|---|
xs |
< 576px | Single column; stacked layouts; bottom tab bar; hamburger menu; camera grid 1x1 or 2x1 |
sm |
576-767px | Two column layouts; sidebar as overlay drawer; camera grid 2x2 |
md |
768-991px | Collapsed sidebar (72px); filters as drawer; camera grid 2x3; 3-column person gallery |
lg |
992-1199px | Sidebar expanded (260px); full desktop layout; 4-column person gallery |
xl |
1200-1399px | Full desktop layout; 5-column person gallery; 2x4 camera grid |
xxl |
1400px+ | Max content width 1400px centered; all features visible |
13.9 Keyboard Shortcuts
| Shortcut | Context | Action |
|---|---|---|
? |
Global | Show keyboard shortcuts reference modal |
/ |
Global | Focus global search bar |
Escape |
Global | Close modal / exit fullscreen / deselect |
F |
Live View | Toggle fullscreen on focused camera |
S |
Live View | Take snapshot of focused camera |
1-8 |
Live View | Focus camera 1-8 |
Space |
Live View | Pause/play focused camera stream |
A |
Alert Center | Acknowledge selected alert |
R |
Alert Center | Resolve selected alert |
N |
Detections / Unknowns | Name unknown person |
→ |
Unknown Review | Next cluster |
← |
Unknown Review | Previous cluster |
Ctrl+K |
Global | Command palette (quick navigation) |
Ctrl+Shift+A |
Global | Acknowledge most recent alert |
M |
Live View | Toggle mute on camera audio |
+ / - |
Timeline | Zoom in / zoom out |
13.10 Animation Guidelines
| Animation | Duration | Easing | Description |
|---|---|---|---|
| Page transition | 200ms | ease-out |
Fade in on route change |
| Modal open | 250ms | cubic-bezier(0.16, 1, 0.3, 1) |
Scale up + fade in |
| Modal close | 150ms | ease-in |
Scale down + fade out |
| Sidebar toggle | 250ms | ease-in-out |
Width transition 260px ↔ 72px |
| Toast slide-in | 300ms | ease-out |
Slide from right + fade in |
| Toast fade-out | 200ms | ease-in |
Fade out before removal |
| Card hover lift | 150ms | ease |
Subtle translateY(-2px) + shadow increase |
| Segmented slider | 200ms | ease |
Sliding background between options |
| Pulse (recording) | 2s | ease-in-out infinite |
Red dot opacity oscillation |
| Stats update | 500ms | ease |
Number count-up animation |
| Skeleton shimmer | 1.5s | linear infinite |
Shimmer gradient sweep |
| Alert flash | 1s | ease-out |
Border flash on camera with new alert |
| Camera focus | 300ms | ease-out |
Expand to fullscreen |
| Dropdown open | 150ms | ease-out |
Fade + slight translateY |
| Tooltip | 100ms | ease |
Fade in on hover |
13.11 Technology Stack
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | 18.x | UI library |
| Meta-framework | Next.js | 14.x | SSR, routing, API routes |
| Language | TypeScript | 5.x | Type safety |
| Styling | Tailwind CSS | 3.x | Utility-first CSS |
| Theme | CSS Custom Properties | — | Dark mode via dark class |
| UI Components | shadcn/ui | latest | Base component library |
| Icons | Lucide React | latest | Consistent icon set |
| State Management | Zustand | 4.x | Lightweight global state |
| Data Fetching | TanStack Query (React Query) | 5.x | Server state management |
| Real-time | Socket.IO Client | 4.x | WebSocket for live updates |
| Video | hls.js | latest | HLS stream playback |
| Video (WebRTC) | native | — | WebRTC stream fallback |
| Charts | Recharts | 2.x | Data visualization |
| Date/Time | date-fns | 2.x | Date formatting and manipulation |
| Forms | React Hook Form | 7.x | Form state management |
| Validation | Zod | 3.x | Schema validation |
| Zone Drawing | SVG + native events | — | Polygon drawing on camera feed |
| Testing | Vitest | 1.x | Unit testing |
| E2E Testing | Playwright | 1.x | Browser automation testing |
| Build | Next.js built-in | — | Production optimization |
Section 14: Deployment Plan
14.1 Deployment Architecture Overview
The deployment architecture spans two physical environments: AWS cloud for centralized services and an Intel NUC edge gateway at the surveillance site. Both environments are connected via an encrypted WireGuard VPN tunnel. All deployments use containerization (Docker/Kubernetes) with GitOps-based continuous delivery.
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEPLOYMENT ARCHITECTURE │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ AWS CLOUD │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Route 53 │──▶ ALB │──▶ EKS │──▶ App Pods │ │ │
│ │ │ DNS │ │ TLS 1.3 │ │ Cluster │ │ (FastAPI/Next) │ │ │
│ │ └──────────┘ └──────────┘ └────┬─────┘ └──────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌────┴─────┐ ┌──────────────────┐ │ │
│ │ │ S3 │ │ RDS │ │ ElastiCache│ │ MSK Kafka │ │ │
│ │ │ Media │ │ Postgres │ │ Redis │ │ (Event Bus) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ WireGuard VPN Gateway (EC2) ←────→ Edge Gateway │ │ │
│ │ │ UDP 51820 Tunnel (Intel NUC, Site) │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ EDGE SITE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Intel NUC (Ubuntu Server 22.04) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Video Capture│ │ AI Inference │ │ MinIO │ │ │ │
│ │ │ │ (RTSP/FFmpeg)│ │ (YOLO/Face) │ │ (Storage) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Redis │ │ WireGuard │ │ Node Exporter│ │ │ │
│ │ │ │ (Cache) │ │ (VPN) │ │ (Metrics) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────┴──────────┐ │ │
│ │ │ Camera LAN │ │ │
│ │ │ CP PLUS DVR │ │ │
│ │ │ 192.168.29.200:554 │ │ │
│ │ │ (8 channels) │ │ │
│ │ └─────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
14.2 Cloud Deployment (AWS EKS)
14.2.1 EKS Cluster Configuration
| Parameter | Value | Notes |
|---|---|---|
| Kubernetes version | 1.28+ | Latest stable at deployment |
| Control plane | Managed by AWS | Multi-AZ availability |
| Node group type | Managed (EC2) | t3.large for general, g4dn.xlarge for GPU |
| CNI | Amazon VPC CNI | Native VPC networking for pods |
| Ingress controller | NGINX Ingress + cert-manager | TLS termination at ALB |
| GitOps | ArgoCD | Declarative continuous deployment |
| Pod identity | IRSA (IAM Roles for Service Accounts) | No long-term AWS credentials |
14.2.2 Cloud Service Resources
| Service | AWS Service | Instance/Tier | HA Mode | Monthly Est. |
|---|---|---|---|---|
| Orchestration | Amazon EKS | Managed control plane | Multi-AZ | $73 |
| Application nodes | EC2 (t3.large) | 3 nodes (on-demand) | Multi-AZ spread | $200 |
| GPU nodes | EC2 (g4dn.xlarge) | 1 node (spot preferred) | Single + auto-recovery | $350 |
| Database | RDS PostgreSQL 15 | db.r6g.xlarge Multi-AZ | Multi-AZ with failover | $520 |
| Cache | ElastiCache Redis | cache.r6g.large (2 shards) | Cluster mode | $260 |
| Message bus | Amazon MSK | kafka.m5.large (3 brokers) | Multi-AZ | $350 |
| Object storage | S3 | Standard + IA + Glacier | Cross-region replication | $200 |
| Load balancer | ALB | Application Load Balancer | Multi-AZ | $25 |
| DNS | Route 53 | Hosted zone + health checks | Global | $15 |
| VPN gateway | EC2 (t3.micro) | WireGuard endpoint | Single (monitor for HA) | $15 |
| Secrets | AWS Secrets Manager | Vault integration | Multi-AZ | $10 |
| Monitoring | CloudWatch | Logs + metrics + alarms | Multi-AZ | $50 |
| Total | ~$2,088/month |
14.3 Edge Deployment (Intel NUC)
14.3.1 Edge Hardware Specification
| Component | Specification | Notes |
|---|---|---|
| Device | Intel NUC 13 Pro (or equivalent) | Fanless preferred for reliability |
| CPU | Intel Core i7-1360P (12 cores, 16 threads) | Sufficient for 8 streams + AI inference |
| RAM | 32 GB DDR4-3200 (2x16 GB) | Dual channel for memory bandwidth |
| Storage (OS) | 500 GB NVMe SSD (Samsung 980 Pro or equivalent) | Fast boot and application loading |
| Storage (Data) | 2 TB NVMe SSD (Samsung 990 Pro or equivalent) | 7-day local recording buffer |
| Network | Intel i226-V 2.5 GbE (dual port) | Dual NIC for WAN + LAN separation |
| WiFi | Disabled in BIOS | Security — no wireless |
| Bluetooth | Disabled in BIOS | Security — no wireless |
| TPM | TPM 2.0 enabled | For LUKS auto-unseal |
| OS | Ubuntu Server 22.04 LTS (minimal install) | No desktop environment |
14.3.2 Edge Docker Compose Configuration
version: "3.8"
services:
# RTSP stream capture and frame extraction
video-capture:
image: sentinel/surveillance-video-capture:v2.3.1
restart: unless-stopped
network_mode: host
environment:
- DVR_IP=192.168.29.200
- DVR_PORT=554
- NUM_CHANNELS=8
- FRAME_EXTRACT_FPS=1
- RECORDING_SEGMENT_SEC=10
- REDIS_HOST=localhost
- REDIS_PORT=6379
- MINIO_ENDPOINT=localhost:9000
volumes:
- /data/frames:/app/frames
- /data/recordings:/app/recordings
- ./secrets:/run/secrets:ro
depends_on:
- redis
- minio
deploy:
resources:
limits:
cpus: '4.0'
memory: 4G
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "5"
# AI inference service (lightweight edge models)
ai-inference:
image: sentinel/surveillance-ai-inference:edge-v2.3.1
restart: unless-stopped
runtime: nvidia # If NVIDIA GPU available; fallback to CPU
environment:
- MODEL_PATH=/models
- REDIS_HOST=localhost
- REDIS_PORT=6379
- MINIO_ENDPOINT=localhost:9000
- INFERENCE_BATCH_SIZE=8
- CONFIDENCE_THRESHOLD=0.7
- NMS_THRESHOLD=0.45
volumes:
- ./models:/models:ro
- /data/frames:/app/frames:ro
- ./secrets:/run/secrets:ro
depends_on:
- redis
deploy:
resources:
limits:
cpus: '6.0'
memory: 8G
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "5"
# Local object storage (S3-compatible)
minio:
image: minio/minio:RELEASE.2024-latest
restart: unless-stopped
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
volumes:
- /data/minio:/data
environment:
- MINIO_ROOT_USER_FILE=/run/secrets/minio_user
- MINIO_ROOT_PASSWORD_FILE=/run/secrets/minio_password
secrets:
- minio_user
- minio_password
deploy:
resources:
limits:
cpus: '1.0'
memory: 1G
# Local cache and Pub/Sub
redis:
image: redis:7-alpine
restart: unless-stopped
command: >
redis-server
--requirepass """
--appendonly yes
--maxmemory 512mb
--maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
ports:
- "127.0.0.1:6379:6379"
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
# WireGuard VPN client
wireguard:
image: linuxserver/wireguard:latest
restart: unless-stopped
cap_add:
- NET_ADMIN
- SYS_MODULE
environment:
- PUID=1000
- PGID=1000
volumes:
- ./wireguard-config:/config
sysctls:
- net.ipv4.conf.all.src_valid_mark=1
deploy:
resources:
limits:
cpus: '0.25'
memory: 64M
# Metrics exporter for Prometheus
node-exporter:
image: prom/node-exporter:latest
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
redis_data:
driver: local
secrets:
minio_user:
file: ./secrets/minio_user.txt
minio_password:
file: ./secrets/minio_password.txt
14.4 Configuration and Environment Variables
14.4.1 Environment Structure
| Environment | URL Pattern | Data | Purpose |
|---|---|---|---|
| Development | *.dev.internal |
Synthetic test data | Feature development, local testing |
| Staging | *.staging.example.com |
Anonymized production-like data | Integration testing, UAT |
| Production | *.example.com |
Real operational data | Live surveillance operations |
14.4.2 Required Environment Variables
# ─── APPLICATION ───
APP_ENV=production # dev | staging | production
APP_NAME="Sentinel AI Surveillance"
APP_VERSION=2.3.1
APP_DEBUG=false
APP_SECRET_KEY=<random-256-bit-key> # Used for session signing
LOG_LEVEL=INFO # DEBUG | INFO | WARNING | ERROR | CRITICAL
# ─── SERVER ───
API_HOST=0.0.0.0
API_PORT=8080
WORKERS=4 # Uvicorn worker processes
TIMEZONE=Asia/Kolkata
# ─── DATABASE ───
DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/surveillance
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10
DB_POOL_TIMEOUT=30
DB_ECHO=false # Set true for SQL logging (dev only)
# ─── REDIS ───
REDIS_URL=redis://:password@redis-endpoint:6379/0
REDIS_POOL_SIZE=50
REDIS_SOCKET_TIMEOUT=5
# ─── OBJECT STORAGE (S3 or MinIO) ───
STORAGE_TYPE=s3 # s3 | minio
STORAGE_ENDPOINT=s3.amazonaws.com
STORAGE_BUCKET=sentinel-surveillance-media
STORAGE_REGION=ap-south-1
STORAGE_ACCESS_KEY=<access-key>
STORAGE_SECRET_KEY=<secret-key>
STORAGE_SECURE=true
STORAGE_URL_EXPIRY=300 # Signed URL expiry in seconds
# ─── DVR / CAMERA CONNECTION ───
DVR_IP=192.168.29.200
DVR_PORT=554
DVR_USERNAME=admin
DVR_PASSWORD=<dvr-password>
DVR_CHANNELS=8
DVR_STREAM_QUALITY=0 # 0=main (high), 1=sub (low)
DVR_RTSP_TEMPLATE="rtsp://{user}:{pass}@{ip}:{port}/user={user}&password={pass}&channel={ch}&stream={quality}.sdp?"
# ─── AI MODELS ───
MODEL_PATH=/models
HUMAN_DETECTION_MODEL=yolo11m.onnx
FACE_DETECTION_MODEL=scrfd_10g_bnkps.onnx
FACE_RECOGNITION_MODEL=arcface_r100.onnx
CONFIDENCE_THRESHOLD=0.7
NMS_THRESHOLD=0.45
FACE_MATCH_THRESHOLD=0.70
UNKNOWN_CLUSTER_EPS=0.35
UNKNOWN_CLUSTER_MIN_SAMPLES=3
# ─── TELEGRAM NOTIFICATIONS ───
TELEGRAM_ENABLED=true
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_WEBHOOK_URL=https://api.example.com/webhooks/telegram
TELEGRAM_WEBHOOK_SECRET=<webhook-secret>
TELEGRAM_ADMIN_CHAT_ID=<admin-chat-id>
# ─── WHATSAPP NOTIFICATIONS ───
WHATSAPP_ENABLED=true
WHATSAPP_API_VERSION=v18.0
WHATSAPP_ACCESS_TOKEN=<access-token>
WHATSAPP_PHONE_NUMBER_ID=<phone-number-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN=<verify-token>
WHATSAPP_BUSINESS_ACCOUNT_ID=<business-account-id>
# ─── VPN ───
VPN_ENABLED=true
VPN_TYPE=wireguard
VPN_ENDPOINT=wg.example.com:51820
VPN_PUBLIC_KEY=<server-public-key>
VPN_PRIVATE_KEY=<client-private-key>
VPN_PRESHARED_KEY=<preshared-key>
VPN_ALLOWED_IPS=10.100.0.0/16
VPN_KEEPALIVE=25
# ─── AUTHENTICATION ───
JWT_SECRET_KEY=<ecdsa-private-key-pem>
JWT_PUBLIC_KEY=<ecdsa-public-key-pem>
JWT_ALGORITHM=ES256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=15
JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
MFA_REQUIRED_ROLES=super_admin,admin
MFA_ISSUER="Sentinel AI Surveillance"
# ─── MONITORING ───
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
GRAFANA_URL=https://grafana.example.com
SENTRY_DSN=<sentry-dsn>
HEALTH_CHECK_INTERVAL=30
# ─── RETENTION ───
RECORDING_RETENTION_DAYS=90
DETECTION_SNAPSHOT_RETENTION_DAYS=90
EVENT_LOG_RETENTION_DAYS=365
AUDIT_LOG_RETENTION_DAYS=365
TRAINING_DATA_RETENTION_DAYS=365
AUTO_CLEANUP_ENABLED=true
AUTO_CLEANUP_HOUR=3 # 3:00 AM daily
# ─── SECURITY ───
CORS_ALLOWED_ORIGINS=https://app.example.com,https://staging.example.com
CSP_REPORT_ONLY=false
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
SESSION_MAX_AGE_HOURS=8
SESSION_IDLE_TIMEOUT_MINUTES=30
14.5 Rollout Stages
14.5.1 Stage 1: Foundation (Weeks 1-4)
Objective: Infrastructure, VPN connectivity, and core data layer operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 1 | AWS account setup, VPC creation (3 AZs), EKS cluster deployment, IAM roles | Cloud network ready | VPC flow logs active; EKS nodes Ready |
| 1 | RDS PostgreSQL Multi-AZ, ElastiCache Redis cluster | Data layer ready | DB connections successful; replication lag < 1s |
| 2 | S3 buckets (media, backups, logs), lifecycle policies, CORS | Storage ready | Upload/download test successful |
| 2 | WireGuard VPN gateway (EC2), key generation, firewall rules | VPN endpoint ready | Tunnel handshake successful |
| 3 | Edge gateway: OS install, hardening, Docker, WireGuard client | Edge device ready | Edge connects to cloud over VPN |
| 3 | Edge services: MinIO, Redis, video capture container | Edge services running | RTSP streams reachable from edge |
| 4 | Database schema migration (29 tables), seed data (admin user, 8 cameras) | Database ready | Schema matches design; seed data present |
| 4 | Monitoring: Prometheus, Grafana, CloudWatch dashboards | Monitoring active | Dashboards accessible; metrics flowing |
| 4 | End-to-end connectivity test | Full pipeline verified | Video from DVR → Edge → Cloud (VPN) → S3 |
Milestone M1 — Infrastructure Ready (End of Week 4):
- All cloud services deployed and healthy
- VPN tunnel established and stable (< 100ms latency)
- Edge gateway online, all Docker services running
- Database schema deployed with migrations and seed data
- All 8 camera streams reachable from edge
- Basic monitoring and alerting in place
14.5.2 Stage 2: Core AI Pipeline (Weeks 5-8)
Objective: Video ingestion, AI detection, face recognition, and basic API operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 5 | Video capture service: RTSP ingestion, frame extraction, segment recording | Stream ingestion working | All 8 streams connected; FPS > 5 per stream |
| 5 | Kafka topic setup, stream ingestion producer | Event streaming ready | Frames published to Kafka |
| 6 | AI Inference Service: YOLO (human detection), SCRFD (face detection) | Detection models running | mAP > 0.90 for human detection |
| 6 | Detection event storage in PostgreSQL | Detection database working | Events queryable via API |
| 7 | ArcFace (face recognition) model deployment, embedding generation, pgvector | Face recognition working | Rank-1 accuracy > 95% on test set |
| 7 | Person matching logic: known person lookup, unknown person handling | Person matching working | Correct identification in < 100ms |
| 8 | FastAPI core: health endpoints, camera endpoints, detection endpoints | API core functional | All endpoints return correct data |
| 8 | Basic authentication: login, JWT token issuance, password hashing | Auth working | Login → token → authenticated requests |
Milestone M2 — AI Pipeline Operational (End of Week 8):
- All 8 camera streams ingesting at target FPS
- Human detection, face detection, and face recognition operational
- Detection events stored and queryable
- Person matching (known/unknown) working
- Basic REST API serving authenticated requests
- End-to-end: Camera → Detection → Database → API
14.5.3 Stage 3: Application Layer (Weeks 9-12)
Objective: Web dashboard, alerting, notifications, and person management operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 9 | Next.js project setup, design system, Tailwind config, dark theme | Frontend foundation | Login page renders correctly |
| 9 | Authentication flow: login form, MFA input, token management, logout | Auth UI working | Full login → dashboard flow |
| 10 | Dashboard page: stat cards, alert chart, camera grid, activity feed | Dashboard live | All widgets populated with real data |
| 10 | Live camera view: HLS player, grid layout, fullscreen, camera controls | Live view working | All 8 streams visible, playable |
| 10 | Alert engine: rule evaluation, severity assignment, routing | Alert generation working | Alerts created within 5s of detection |
| 11 | Telegram integration: bot setup, message templates, inline keyboards | Telegram alerts working | Test alert received in Telegram |
| 11 | WhatsApp integration: template messages, session messages | WhatsApp alerts working | Test template message received |
| 11 | Person management: gallery, profile, CRUD, face matching display | Person management working | Person created, detected, viewed |
| 12 | Unknown review queue: cluster display, naming, merging, ignore | Review queue working | Unknown person processed through queue |
| 12 | Watchlists: CRUD, member management, alert routing | Watchlists working | Watchlist match triggers correct alert |
| 12 | WebSocket: real-time alert feed, dashboard updates | Real-time working | Alerts appear without page refresh |
Milestone M3 — Application Live (End of Week 12):
- Web dashboard accessible with live camera feeds
- Alerts generated and delivered via Telegram and WhatsApp
- Person management (add, view, match, review unknowns) working
- Watchlist alerts functional with correct routing
- Real-time updates via WebSocket
- All RBAC permissions enforced in UI
14.5.4 Stage 4: Intelligence (Weeks 13-16)
Objective: Night mode, training pipeline, self-learning, and advanced features.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 13 | Night mode: low-light model training, deployment, auto-scheduling | Night mode working | Detection mAP > 0.75 in < 5 lux conditions |
| 13 | AI Vibe Settings page: all 7 controls, auto-save, advanced mode | Settings page working | All controls functional, changes effective immediately |
| 14 | Training pipeline: data collection, model training job, evaluation | Training pipeline working | Model accuracy improves with new training data |
| 14 | Model versioning: A/B testing, shadow mode, promotion workflow | Model management working | Blue/green model deployment |
| 15 | Self-learning service: automatic unknown clustering, suggestions | Self-learning working | Suggestions generated for unknown clusters |
| 15 | Privacy mode: face blurring, privacy zones, per-camera settings | Privacy mode working | Faces blurred according to settings |
| 15 | Suspicious activity detection: pattern rules, anomaly scoring | Advanced alerts working | Anomaly alerts generated for unusual behavior |
| 16 | Search service: face similarity search, text search, filters | Search working | Results returned in < 500ms |
| 16 | System health dashboard: service cards, metrics, logs viewer | Health dashboard working | All systems visible with status |
Milestone M4 — Intelligence Features Live (End of Week 16):
- Night mode detection operational
- Training pipeline runs and improves models
- Self-learning suggestions appear in review queue
- Privacy modes configurable and effective
- Suspicious activity alerts functional
- Search returns results in acceptable time
- All AI Vibe Settings controls operational
14.5.5 Stage 5: Hardening (Weeks 17-20)
Objective: Security hardening, testing framework, operations readiness, production go-live.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 17 | Security penetration test (external vendor) | Pen test report | All critical/high findings addressed |
| 17 | SAST/DAST scans, dependency vulnerability scan | Scan reports | Zero critical vulnerabilities |
| 17 | Self-test framework: 21 test suites, scheduling, reporting | Testing framework deployed | All test suites execute successfully |
| 18 | Backup configuration: pgBackRest, S3 sync, restore procedures | Backup system ready | Restore test successful |
| 18 | DR environment setup, failover procedures, quarterly drill schedule | DR ready | DR failover test: RTO < 1 hour |
| 18 | Incident response runbooks: 5 documented procedures | Runbooks complete | All scenarios documented |
| 19 | Load testing: 8/16/32/64 camera simulation | Load test report | System handles 64 cameras within SLA |
| 19 | Performance tuning: database queries, API response times, cache optimization | Tuning complete | p95 API response < 200ms |
| 19 | Operations team training: system overview, runbooks, escalation procedures | Team trained | Training sign-off complete |
| 19 | 98-item go-live checklist review | Checklist complete | All items pass |
| 20 | Final readiness review, security sign-off, management approval | Go approval | All stakeholders sign off |
| 20 | Production DNS cutover, monitoring, 72-hour stability period | Production live | 72-hour stability confirmed |
Milestone M5 — Production Go-Live (End of Week 20):
- Security audit complete with all findings addressed
- Self-test framework passing (score >= 85)
- DR tested and verified (RTO < 1 hour, RPO < 15 minutes)
- Operations team trained and runbooks reviewed
- Load test passed at 64-camera target
- 98-item go-live checklist: all items complete
- System stable in production for 72+ hours
14.6 Kubernetes Manifests Overview
| Resource Type | Name | Purpose | Namespace |
|---|---|---|---|
| Deployment | api |
FastAPI application server (3 replicas) | sentinel |
| Deployment | ai-inference |
AI model serving (GPU node) | sentinel |
| Deployment | video-capture |
RTSP stream ingestion (edge) | sentinel |
| Deployment | alert-engine |
Alert generation and routing | sentinel |
| Deployment | notification-service |
Telegram/WhatsApp delivery | sentinel |
| Deployment | frontend |
Next.js web application | sentinel |
| Deployment | websocket |
WebSocket real-time server | sentinel |
| StatefulSet | redis |
Session cache and Pub/Sub | sentinel-data |
| Service | api-service |
Internal API access (ClusterIP) | sentinel |
| Service | ai-service |
AI inference access (ClusterIP) | sentinel |
| Service | frontend-service |
Web app access (ClusterIP) | sentinel |
| Ingress | sentinel-ingress |
External HTTPS routing | sentinel |
| ConfigMap | app-config |
Application configuration | sentinel |
| ConfigMap | nginx-config |
Ingress/Nginx configuration | sentinel |
| Secret | app-secrets |
Encrypted secrets (Vault agent injector) | sentinel |
| Secret | tls-cert |
TLS certificate (cert-manager) | sentinel |
| HPA | api-hpa |
Auto-scale API: 3-10 replicas | sentinel |
| HPA | ai-hpa |
Auto-scale AI: 1-4 replicas | sentinel |
| NetworkPolicy | default-deny |
Block all unauthorized traffic | sentinel |
| NetworkPolicy | allow-api |
API ingress rules | sentinel |
| NetworkPolicy | allow-ai |
AI service communication rules | sentinel |
| PodDisruptionBudget | api-pdb |
Ensure 2 API pods minimum | sentinel |
| ServiceMonitor | api-metrics |
Prometheus scraping config | sentinel-monitoring |
| PrometheusRule | alert-rules |
Alerting rules for platform | sentinel-monitoring |
14.7 VPN Setup Procedure
14.7.1 Cloud VPN Gateway Setup
#!/bin/bash
# cloud-vpn-setup.sh — Run on cloud VPN EC2 instance
# 1. System preparation
sudo apt update && sudo apt install -y wireguard wireguard-tools iptables-persistent
# 2. Generate WireGuard keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
# 3. Create WireGuard configuration
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.1/24
ListenPort = 51820
PrivateKey = <CLOUD_PRIVATE_KEY>
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
# Edge Gateway peer
[Peer]
PublicKey = <EDGE_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
AllowedIPs = 10.200.0.2/32, 192.168.29.0/24
PersistentKeepalive = 25
EOF
# 4. Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf
# 5. Start WireGuard
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0
# 6. Verify
sudo wg show
ping -c 3 10.200.0.2
14.7.2 Edge VPN Client Setup
#!/bin/bash
# edge-vpn-setup.sh — Run on Intel NUC edge gateway
# 1. Install WireGuard
sudo apt update && sudo apt install -y wireguard wireguard-tools
# 2. Generate keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
# 3. Configure
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.2/32
PrivateKey = <EDGE_PRIVATE_KEY>
DNS = 10.100.0.2
[Peer]
PublicKey = <CLOUD_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
Endpoint = <CLOUD_PUBLIC_IP>:51820
AllowedIPs = 10.100.0.0/16, 10.200.0.0/24
PersistentKeepalive = 25
EOF
# 4. Start and enable
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0
# 5. Verify connectivity
ping -c 3 10.200.0.1 # Cloud VPN gateway
ping -c 3 10.100.0.2 # Cloud DNS/internal service
14.8 Database Initialization
14.8.1 Migration Strategy
Database migrations are managed with Alembic (SQLAlchemy) and executed as Kubernetes init containers before application startup:
initContainers:
- name: db-migrations
image: sentinel/surveillance-api:v2.3.1
command: ["alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
limits:
cpu: "500m"
memory: "256Mi"
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Migration Rules:
| Rule | Implementation |
|---|---|
| Backward compatibility | All migrations must be backward-compatible within a release |
| Destructive changes | 2-phase deployment: add new column in release N, drop old in release N+1 |
| Automatic execution | Migrations run automatically before application startup via init container |
| Health check | Migration status exposed via /health/ready endpoint |
| Rollback | alembic downgrade script available for emergency rollback |
| Version tracking | alembic_version table tracks current schema version |
14.9 SSL Certificate Setup
14.9.1 cert-manager Configuration
# ClusterIssuer for Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
selector: {}
---
# Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: sentinel-tls
namespace: sentinel
spec:
secretName: sentinel-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- app.example.com
- api.example.com
- ws.example.com
usages:
- digital signature
- key encipherment
privateKey:
algorithm: ECDSA
size: 256
Section 15: Testing Plan
15.1 Testing Strategy Overview
The testing strategy encompasses five levels of testing, from isolated unit tests to full system end-to-end validation. The goal is comprehensive coverage of all functional and non-functional requirements with automated execution in CI/CD.
┌──────────────────────────────────────────────────────────────────────────────┐
│ TESTING PYRAMID │
│ │
│ ┌─────────┐ │
│ │ E2E │ ~20 tests │
│ │ Tests │ Full system scenarios │
│ ├─────────┤ │
│ ┌─────────────┐ │
│ │ Integration │ ~100 tests │
│ │ Tests │ Service-to-service │
│ ├─────────────┤ │
│ ┌───────────────────┐ │
│ │ Unit Tests │ ~300 tests │
│ │ (Components, AI) │ Isolated functions │
│ └───────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
15.2 Unit Testing Strategy
| Component | Framework | Coverage Target | Mock Strategy | CI Execution |
|---|---|---|---|---|
| API backend (Python) | pytest + pytest-asyncio | 85%+ | pytest-mock, moto (AWS),responses (HTTP) | Every commit |
| Frontend (React/TS) | Vitest + React Testing Library | 80%+ | MSW (API mocking), jsdom | Every commit |
| AI models (Python) | pytest | 70%+ (model logic) | Mock inference engine, fixture data | Every commit |
| Database models | pytest + asyncpg | 80%+ | testcontainers-postgres | Every commit |
| Notification adapters | pytest | 80%+ | responses library for HTTP mocking | Every commit |
15.3 Integration Testing
| Integration Pair | Scope | Framework | Strategy |
|---|---|---|---|
| API + Database | CRUD operations, transactions, query performance | pytest + testcontainers | PostgreSQL container per test run |
| API + Redis | Caching, Pub/Sub, session storage | pytest + Redis container | Redis container per test run |
| API + S3/MinIO | Media upload, download, presigned URLs | pytest + LocalStack | S3 mock via LocalStack |
| Alert Engine + Router | Rule evaluation, routing decisions | pytest | Mock channel adapters |
| Telegram Adapter | Message formatting, API calls, error handling | pytest + responses | HTTP request/response mocking |
| WhatsApp Adapter | Template rendering, API calls, error handling | pytest + responses | HTTP request/response mocking |
| Auth + Database | User CRUD, password hashing, session management | pytest + testcontainers | Full auth flow testing |
15.4 System Testing (End-to-End)
| # | Scenario | Steps | Expected Result |
|---|---|---|---|
| 1 | Full detection pipeline | Trigger motion → verify detection stored → verify alert created → verify notification sent | All components process correctly within SLA |
| 2 | Person recognition flow | Known person walks by → verify face detected → verify identity matched → verify no false alert | Correct person identified with > 95% confidence |
| 3 | Unknown person flow | Unknown person detected → verify "Unknown" classification → verify review queue updated | Unknown queued for operator review within 5 seconds |
| 4 | Watchlist alert (blacklist) | Blacklist person detected → verify immediate critical alert → verify notification to security team | Alert within 5 seconds, correct severity, all channels |
| 5 | Night mode detection | Low-light detection scenario → verify night model used → verify detection confidence acceptable | Detection mAP > 0.75 in < 5 lux conditions |
| 6 | Privacy mode | Enable privacy mode → verify face blurring in live view → verify no face recognition occurs | Faces blurred, no biometric processing |
| 7 | Alert escalation | Create critical alert → don't acknowledge → verify escalation levels trigger at correct times | Level 1 at 5min, Level 2 at 10min, Level 3 at 20min |
| 8 | VPN failure recovery | Disconnect VPN → verify local operation continues → reconnect VPN → verify sync resumes | No data loss; automatic recovery |
| 9 | Database failover | Trigger RDS failover → verify application continues → verify no data loss | < 60 second downtime; zero data loss |
| 10 | Complete user flow | Login → view dashboard → view live cameras → receive alert → acknowledge → logout | All pages load; all actions succeed |
15.5 Load Testing Plan
| Scenario | Camera Count | Duration | Users | Target Metrics |
|---|---|---|---|---|
| Baseline | 8 | 1 hour | 5 concurrent | Establish baseline metrics |
| Scale-up | 16 | 2 hours | 10 concurrent | Verify 2x capacity; p95 latency < 500ms |
| Scale-up | 32 | 2 hours | 20 concurrent | Verify 4x capacity; auto-scaling triggers |
| Stress test | 64 | 1 hour | 50 concurrent | Find breaking point; error rate < 1% |
| Sustained | 8 | 24 hours | 5 concurrent | Memory leak detection; stability verification |
| Spike test | 8→64→8 | 30 minutes | Ramp up/down | Verify auto-scaling response time |
15.6 Failover Testing
| Test Case | Description | Pass Criteria |
|---|---|---|
| API pod failure | Kill 1 API pod | Traffic routed to healthy pods; zero failed requests |
| Database failover | Trigger RDS Multi-AZ failover | < 60s downtime; no data loss; connections re-established |
| Redis failure | Restart Redis cluster | Session recovery; cache warm within 5 minutes |
| VPN tunnel failure | Disconnect WireGuard | Auto-reconnect within 30s; streams resume |
| Edge gateway restart | Reboot edge device | Full recovery within 5 minutes; all streams reconnect |
| AI inference failure | Kill inference container | Queue buffers frames; recovery < 30s; no frame loss |
| Complete cloud failure | Simulate region outage | DR test: RTO < 1 hour; RPO < 15 minutes |
15.7 Security Testing
| Test Type | Tool | Scope | Frequency | Gate |
|---|---|---|---|---|
| Static Analysis (SAST) | Bandit, Semgrep | Source code | Every commit | Block on HIGH/CRITICAL |
| Dependency Scan | Snyk, pip-audit | All dependencies | Daily | Block on HIGH/CRITICAL |
| Container Image Scan | Trivy | Docker images | Every build | Block on HIGH/CRITICAL |
| Dynamic Analysis (DAST) | OWASP ZAP | Running application | Weekly | Review findings |
| Penetration Test | External vendor | Full stack | Quarterly | All findings addressed |
| TLS Configuration | testssl.sh | SSL/TLS endpoints | Monthly | Grade A+ required |
| API Security | OWASP ZAP API scan | All REST endpoints | Weekly | Review findings |
| Secrets Scan | TruffleHog, GitLeaks | Git repositories | Every commit | Block on findings |
15.8 AI Pipeline Testing
| Test | Description | Target Metric | Test Data |
|---|---|---|---|
| Human detection accuracy | Evaluate YOLO on held-out test set | mAP > 0.90 | 1000 labeled frames |
| Face detection accuracy | Evaluate SCRFD on test set | Detection rate > 0.85 | 500 labeled face images |
| Face recognition accuracy | Evaluate ArcFace on test set | Rank-1 accuracy > 0.95 | 200 person gallery |
| False positive rate | Measure incorrect person matches | < 2% | Simulated impostor set |
| False negative rate | Measure missed person matches | < 5% | Known person test set |
| Inference latency | Measure end-to-end processing | < 200ms per frame (p95) | Benchmark suite |
| Night mode accuracy | Test low-light detection | mAP > 0.75 | 200 low-light frames |
| Batch processing | Test throughput at batch size 8 | > 40 FPS aggregate | Benchmark suite |
15.9 Notification Testing
| Test | Description | Verification |
|---|---|---|
| Telegram delivery | Send test alert via Telegram | Message received; formatting correct; buttons functional |
| WhatsApp delivery | Send test alert via WhatsApp | Template message received; parameters correct |
| Routing rules | Trigger alert matching specific rule | Delivered to correct recipients only |
| Quiet hours | Send alert during quiet hours | Non-critical suppressed; critical bypasses |
| Escalation | Leave critical alert unacknowledged | Escalation notifications at correct thresholds |
| Rate limiting | Trigger burst of 50 alerts | Rate limiting applied; no provider blocks |
| Media attachments | Send alert with image + video | Media processed to correct size; delivered |
| Delivery tracking | Verify webhook receipts | Status updated correctly in dashboard |
| DLQ handling | Force 5 failed deliveries | Messages moved to DLQ; admin notification sent |
15.10 Test Environments
| Environment | Data | Purpose | Pipeline Stage |
|---|---|---|---|
| Local dev | Synthetic (10 cameras, 100 persons) | Developer testing | Pre-commit |
| CI | Synthetic (generated per run) | Automated test execution | Every commit |
| Staging | Anonymized production-like (8 cameras, 500 persons) | Pre-production validation | Post-merge |
| Load test | Generated (64 cameras, 10,000 persons) | Performance testing | Weekly schedule |
| DR | Minimal (2 cameras, 10 persons) | Disaster recovery validation | Quarterly |
15.11 CI/CD Pipeline for Testing
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Push │──▶│ Lint │──▶│ Unit │──▶│ SAST │──▶│ Build │
│ │ │ + Format │ │ Tests │ │ + Scan │ │ Images │
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Deploy │◀──│ E2E │◀──│ DAST │◀──│ Image │◀──│ Push │
│Staging │ │ Tests │ │ Scan │ │ Scan │ │ Registry │
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────┐
│ Deploy │ (Manual approval required)
│ Prod │
└─────────┘
| Stage | Tools | Coverage Gate | Duration |
|---|---|---|---|
| Lint + Format | ruff, black, mypy, ESLint, Prettier | Zero lint errors | 30s |
| Unit Tests | pytest, Vitest | 80%+ coverage | 3 min |
| SAST + Secrets | Bandit, Semgrep, TruffleHog | No HIGH/CRITICAL | 2 min |
| Build | Docker buildx | Build succeeds | 5 min |
| Image Scan | Trivy, Snyk | No HIGH/CRITICAL CVEs | 2 min |
| DAST | OWASP ZAP | No HIGH/CRITICAL findings | 10 min |
| E2E Tests | Playwright, pytest | All scenarios pass | 8 min |
| Deploy Staging | ArgoCD | Health checks pass | 3 min |
Section 16: Self-Test Framework
16.1 Framework Architecture
The Self-Test Framework is a standalone FastAPI service that continuously validates platform health and readiness through automated test execution.
┌──────────────────────────────────────────────────────────────────────────────┐
│ SELF-TEST FRAMEWORK ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TEST ORCHESTRATOR (FastAPI) │ │
│ │ │ │
│ │ Scheduler Queue Executor Aggregator │ │
│ │ (cron/APScheduler) │ (asyncio) │ │ │
│ │ │ │ │ │ │ │
│ │ 15m health ◄─────┼────────────┼───────────────┤ │ │
│ │ Daily 3am ◄─────┼────────────┼───────────────┤ │ │
│ │ On-demand ◄─────┼────────────┼───────────────┤ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Reporter + Storage │ │ │
│ │ │ PostgreSQL + S3 (evidence) │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 21 TEST SUITES (170+ CASES) │ │
│ │ │ │
│ │ Infrastructure (TC-01..04) │ Core AI (TC-05..10) │ │
│ │ Alerts (TC-11..13) │ Search (TC-14) │ │
│ │ Training (TC-15) │ Security (TC-16..17) │ │
│ │ Resilience (TC-18..21) │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
16.2 Test Suite Catalog (21 Suites)
| Suite ID | Name | Tests | Priority | Description |
|---|---|---|---|---|
| TC-INF-01 | DVR Connectivity | 8 | P0 | RTSP handshake, stream access, credential validation |
| TC-INF-02 | VPN Health | 6 | P0 | Tunnel status, latency, packet loss, throughput |
| TC-INF-03 | Database Health | 8 | P0 | Connection pool, query performance, replication lag |
| TC-INF-04 | Storage Health | 7 | P0 | Disk space, read/write performance, object storage |
| TC-STR-05 | Camera Stream Access | 10 | P0 | All 8 channels streaming, FPS, bitrate verification |
| TC-STR-06 | Live Streaming | 6 | P1 | HLS stream delivery to browsers, latency check |
| TC-AI-07 | Human Detection | 12 | P0 | YOLO accuracy, confidence thresholds, edge cases |
| TC-AI-08 | Face Detection | 10 | P0 | SCRFD accuracy, face bounding box quality |
| TC-AI-09 | Face Recognition | 12 | P0 | ArcFace embeddings, person matching accuracy |
| TC-AI-10 | Unknown Clustering | 8 | P1 | Face grouping quality, similarity thresholds |
| TC-ALT-11 | Alert Generation | 10 | P0 | Rule evaluation, severity assignment, routing |
| TC-ALT-12 | Telegram Delivery | 8 | P1 | Message delivery, formatting, media, error handling |
| TC-ALT-13 | WhatsApp Delivery | 8 | P1 | Template delivery, session messages, error handling |
| TC-CAP-14 | Image Capture | 6 | P1 | Frame extraction quality, storage, metadata |
| TC-CAP-15 | Video Clip Capture | 6 | P1 | Clip generation, compression, storage |
| TC-SEA-16 | Search Retrieval | 8 | P1 | Face search accuracy, text search, performance |
| TC-TRA-17 | Training Workflow | 8 | P2 | Model retraining, evaluation, deployment |
| TC-SEC-18 | Admin Login Security | 10 | P0 | Auth flow, MFA, session management, brute force |
| TC-SEC-19 | RBAC Enforcement | 12 | P0 | Permission checks, role-based access, resource-level |
| TC-RES-20 | Restart Recovery | 8 | P1 | Service restart, state recovery, data integrity |
| TC-RES-21 | Load Handling | 7 | P1 | 8/16/32/64 camera simulation, throughput |
Total: 21 suites, 170 test cases
16.3 Test Scheduling
| Schedule | Suites | Trigger | Notification |
|---|---|---|---|
| Every 15 minutes | Infrastructure (TC-01..04) | APScheduler cron | Alert on failure |
| Daily at 03:00 UTC | All 21 suites | APScheduler cron | Full report via email + Slack |
| On-demand | Any subset | Admin API call | Immediate report |
| Post-deployment | Critical path (TC-01,05,07,11,18) | CI/CD webhook | Pipeline gate |
| Weekly (Sunday 04:00) | Full suite + extended load tests | APScheduler cron | Weekly report |
16.4 Production Readiness Scoring
Base Score: 100.0
Deductions:
P0 failure: -20.0 points each
P1 failure: -10.0 points each
P2 failure: -5.0 points each
P3 failure: -2.0 points each
Minimum score: 0.0
Maximum score: 100.0
| Verdict | Score Range | Meaning | Recommended Action |
|---|---|---|---|
| GO | 95.0 - 100.0 | All critical systems healthy | Proceed with confidence |
| GO WITH CAVEATS | 85.0 - 94.9 | Minor issues, non-critical | Proceed with monitoring plan |
| CONDITIONAL GO | 70.0 - 84.9 | Significant issues | Fix P1 issues before deployment |
| NO-GO | 0.0 - 69.9 | Critical failures | Do not deploy; address P0 issues first |
16.5 Report Generation
| Format | Use Case | Generation | Retention |
|---|---|---|---|
| JSON API | Programmatic consumption, CI/CD integration | Immediate | 90 days |
| HTML Dashboard | Web-based viewing, trend analysis | ~5 seconds | 90 days |
| PDF Report | Email distribution, compliance archiving | ~30 seconds | 1 year |
Section 17: Sample Self-Test Report
17.1 Report Header
================================================================================
SENTINEL AI SURVEILLANCE PLATFORM — SELF-TEST REPORT
================================================================================
Report ID: STR-20250116-030015
Generated: 2025-01-16 03:00:15 UTC
Environment: production
Version: v2.3.1
Triggered By: Scheduled (Daily 3:00 AM)
Duration: 18 minutes 42 seconds
Overall Status: GO WITH CAVEATS
17.2 Executive Summary
| Metric | Value |
|---|---|
| Verdict | GO WITH CAVEATS |
| Production Readiness Score | 94.8 / 100 |
| Total Test Cases | 170 |
| Passed | 168 (98.8%) |
| Failed | 2 (1.2%) |
| Skipped | 0 (0.0%) |
| Previous Run Score | 97.2 / 100 |
| Score Change | -2.4 (downward) |
Priority Breakdown:
| Priority | Total | Passed | Failed | Pass Rate |
|---|---|---|---|---|
| P0 (Critical) | 42 | 42 | 0 | 100.0% |
| P1 (High) | 70 | 68 | 2 | 97.1% |
| P2 (Medium) | 38 | 38 | 0 | 100.0% |
| P3 (Low) | 20 | 20 | 0 | 100.0% |
17.3 System Metrics at Test Time
| Metric | Value | Status |
|---|---|---|
| Active Cameras | 8 / 8 | Online |
| Stream FPS (avg) | 28.5 | Normal |
| AI Inference Latency (p95) | 42ms | Normal |
| Detection Rate (last hour) | 47 events | Normal |
| Database Connections | 18 / 100 | Healthy |
| Storage Usage | 67% | Healthy |
| VPN Latency | 12ms | Excellent |
| API Response Time (p95) | 78ms | Normal |
| Telegram Delivery Rate (24h) | 99.2% | Healthy |
| WhatsApp Delivery Rate (24h) | 99.8% | Healthy |
17.4 Failed Test Cases
Failure 1: TC-ALT-12-004 — Telegram Media Group Delivery
| Field | Value |
|---|---|
| Test Case | TC-ALT-12-004 |
| Suite | Telegram Delivery (TC-ALT-12) |
| Priority | P1 |
| Status | FAILED |
| Duration | 12,450 ms |
| Severity | Medium |
Description: Verify that media group (multiple images) is delivered correctly via Telegram when an alert contains multiple evidence images.
Expected Result: All 3 images delivered as a media group album within 10 seconds.
Actual Result: Only 2 of 3 images delivered. Third image failed with error: telegram_api_error: Request Entity Too Large (413). Image size after processing: 10.8 MB (exceeds Telegram's 10 MB per-image limit for media groups).
Root Cause: The media processing pipeline resizes images to 1280x720 but does not enforce a hard 10 MB per-image cap for Telegram media groups. The iterative quality reduction loop stops at quality 50 but can still produce files > 10 MB.
Recommended Fix: Add a hard size cap check after image processing. If image exceeds 10 MB after quality reduction to 50%, apply additional compression (reduce dimensions or use WebP format).
Workaround: Single-image delivery mode works correctly. Multi-image alerts temporarily deliver images individually.
Failure 2: TC-RES-20-006 — AI Inference Recovery After Simulated Crash
| Field | Value |
|---|---|
| Test Case | TC-RES-20-006 |
| Suite | Restart Recovery (TC-RES-20) |
| Priority | P1 |
| Status | FAILED |
| Duration | 65,200 ms |
| Severity | Medium |
Description: Verify that the AI inference service recovers and resumes processing within 60 seconds after a simulated process crash.
Expected Result: AI inference pod restarts and resumes processing frames within 60 seconds for all 8 cameras.
Actual Result: Pod restarted successfully (18 seconds), but detection did not resume for Camera 3 and Camera 7. Other 6 cameras resumed within 45 seconds. Root cause: model warm-up process failed due to a race condition in GPU memory allocation during concurrent channel initialization.
Root Cause: All 8 channel processors attempt to load the face recognition model simultaneously. On resource-constrained edge hardware, this causes OOM for channels that lose the initialization race.
Recommended Fix: Implement shared model loading — load each model once and share across all channel processors. Add initialization semaphore.
Workaround: Manual restart of affected channel processors via admin API.
17.5 Trending (Last 14 Days)
| Date | Score | Verdict | Notes |
|---|---|---|---|
| 2025-01-02 | 96.5 | GO | — |
| 2025-01-03 | 98.2 | GO | — |
| 2025-01-04 | 97.1 | GO | — |
| 2025-01-05 | 98.8 | GO | — |
| 2025-01-06 | 97.5 | GO | — |
| 2025-01-07 | 98.2 | GO | — |
| 2025-01-08 | 96.8 | GO | TC-RES-21 had 1 P3 failure |
| 2025-01-09 | 97.2 | GO | — |
| 2025-01-10 | 98.2 | GO | — |
| 2025-01-11 | 97.5 | GO | — |
| 2025-01-12 | 98.2 | GO | — |
| 2025-01-13 | 97.2 | GO | — |
| 2025-01-14 | 98.2 | GO | — |
| 2025-01-15 | 97.2 | GO | — |
| 2025-01-16 | 94.8 | GO WITH CAVEATS | 2 P1 failures (see above) |
17.6 Conclusion and Recommendations
Verdict: GO WITH CAVEATS
The Sentinel AI Surveillance Platform is operational and safe to use. All 42 P0 (Critical) test cases passed, confirming that core surveillance functions are working correctly.
Two P1 (High) priority issues were identified with documented workarounds. Both fixes are scheduled for v2.3.2.
Recommended Actions:
- Address TC-ALT-12-004: Add aggressive compression for Telegram media group images
- Address TC-RES-20-006: Implement shared model loading in AI inference service
- Monitor Telegram multi-image alert delivery metrics (workaround active)
- Monitor AI inference recovery metrics (manual restart documented in runbook)
- Validate both fixes in next daily test run after v2.3.2 deployment
Section 18: Risks and Mitigations
18.1 Risk Register Summary
| # | Category | Risk | Likelihood | Impact | Score | Mitigation | Owner |
|---|---|---|---|---|---|---|---|
| T1 | Technical | DVR disk full (0 bytes free) | High | Critical | 20 | Auto-rotation at 85%; emergency cleanup; secondary storage | Platform |
| T2 | Technical | AI false positives in low light | Medium | High | 12 | Night models; adjustable thresholds; operator review | AI Team |
| T3 | Technical | Face rec accuracy with masks/angles | Medium | Medium | 9 | Multi-angle training; pose normalization | AI Team |
| T4 | Technical | VPN tunnel instability | Medium | High | 12 | Auto-reconnect; local buffering; redundant endpoints | Platform |
| T5 | Technical | DB performance at scale | Medium | Medium | 9 | Partitioning; read replicas; archiving | Platform |
| O1 | Operational | Edge hardware failure | Medium | Critical | 15 | Cold spare; config backup; documented replacement | Operations |
| O2 | Operational | Internet loss at edge site | Medium | High | 12 | Local storage buffer; 4G failover; local AI continues | Operations |
| O3 | Operational | Operator training gaps | Medium | Medium | 9 | Training program; inline help; escalation procedures | Operations |
| O4 | Operational | Alert fatigue | Medium | High | 12 | Escalation rules; alert grouping; severity routing | Operations |
| S1 | Security | Biometric data breach | Low | Critical | 10 | AES-256-GCM; signed URLs; GDPR deletion; audit | Security |
| S2 | Security | Unauthorized feed access | Low | Critical | 10 | RBAC; JWT; MFA; session binding; rate limiting | Security |
| S3 | Security | Bot token compromise | Low | High | 8 | Vault encryption; 180-day rotation; IP allowlist | Security |
| A1 | AI/ML | Model drift over time | Medium | High | 12 | Monthly evaluation; auto-monitoring; retraining | AI Team |
| A2 | AI/ML | Training data poisoning | Low | Critical | 10 | Validation; multi-person review; audit trail | AI Team |
| A3 | AI/ML | Demographic bias | Medium | High | 12 | Diverse data; fairness audits; human-in-loop | AI Team |
| A4 | AI/ML | Edge hardware insufficient | Medium | High | 12 | CPU models; cloud offloading; GPU upgrade path | AI Team |
| I1 | Integration | DVR firmware incompatibility | Medium | High | 12 | RTSP compliance check; firmware validation | Engineering |
| C1 | Compliance | GDPR non-compliance | Low | Critical | 10 | PIA; consent mgmt; right to deletion; DPO | DPO |
| R1 | Resource | Budget overrun | Medium | Medium | 9 | Reserved instances; cost monitoring; quotas | Finance |
| R3 | Resource | Timeline delay | Medium | High | 12 | Phased delivery; parallel work; weekly tracking | PMO |
18.2 Critical Risks Requiring Immediate Action
T1 — DVR Disk Full (Score: 20)
- Action: Emergency disk cleanup within 24 hours
- Implement automatic rotation at 85% capacity
- Configure critical alerts at 90%, 95%, 98%
- Owner: Platform Team | Due: 2025-01-17
O1 — Edge Hardware Failure (Score: 15)
- Action: Procure cold spare device
- Document hardware replacement runbook
- Automate configuration restoration from GitOps
- Owner: Operations Team | Due: 2025-02-01
Section 19: Final Implementation Roadmap
19.1 Five-Phase Implementation (20 Weeks)
| Phase | Weeks | Name | Theme | Key Milestone |
|---|---|---|---|---|
| 1 | 1-4 | Foundation | Infrastructure, VPN, edge, database | M1: Infrastructure Ready |
| 2 | 5-8 | Core AI Pipeline | Video ingestion, detection, recognition | M2: AI Pipeline Operational |
| 3 | 9-12 | Application Layer | Dashboard, alerts, notifications | M3: Application Live |
| 4 | 13-16 | Intelligence | Night mode, training, self-learning | M4: Intelligence Features |
| 5 | 17-20 | Hardening | Security, testing, operations, go-live | M5: Production Go-Live |
19.2 Key Milestones and Deliverables
| Milestone | Target Week | Deliverables | Entry Criteria | Exit Criteria |
|---|---|---|---|---|
| M1 Infrastructure | Week 4 | Cloud services, VPN, edge gateway, database, monitoring | Project kickoff, hardware delivered | All services healthy, VPN stable, schema deployed |
| M2 AI Pipeline | Week 8 | Video capture, YOLO, SCRFD, ArcFace, detection DB, API | M1 complete, models ready | All 8 streams ingesting, AI accuracy targets met, API functional |
| M3 Application | Week 12 | Dashboard, alerts, Telegram, WhatsApp, person mgmt, WebSocket | M2 complete, frontend env ready | Dashboard live, alerts delivered, person management working |
| M4 Intelligence | Week 16 | Night mode, training pipeline, self-learning, privacy, search | M3 complete, training data accumulated | All intelligence features operational |
| M5 Go-Live | Week 20 | Security audit, test framework, DR, runbooks, load test, checklist | M4 complete, security audit scheduled | All audits passed, checklist complete, 72h stability |
19.3 Phase Details
Phase 1 (Weeks 1-4): VPC, EKS, RDS, Redis, Kafka, S3, WireGuard VPN, edge gateway OS hardening, Docker setup, database schema with migrations, monitoring stack (Prometheus, Grafana).
Phase 2 (Weeks 5-8): RTSP capture service, YOLO human detection, SCRFD face detection, ArcFace face recognition, embedding storage with pgvector, person matching logic, FastAPI core, authentication.
Phase 3 (Weeks 9-12): Next.js frontend, design system, dashboard, live camera view (HLS), alert engine with rules, Telegram Bot API integration, WhatsApp Business API integration, person gallery and profile, unknown review queue, watchlists, WebSocket real-time updates.
Phase 4 (Weeks 13-16): Night mode AI model, AI Vibe Settings page, training pipeline with model versioning, self-learning service for unknown clusters, privacy mode with face blurring, suspicious activity detection, search service (face + text), system health dashboard.
Phase 5 (Weeks 17-20): Penetration testing, SAST/DAST, self-test framework (21 suites), backup/DR setup, incident response runbooks, load testing (8-64 cameras), performance tuning, operations training, go-live checklist (98 items), production cutover, 72-hour stability monitoring.
19.4 Resource Allocation
| Phase | Engineering | AI/ML | DevOps | QA | Security |
|---|---|---|---|---|---|
| 1: Foundation | 2 | — | 2 | — | 1 |
| 2: Core AI | 2 | 2 | 1 | 1 | — |
| 3: Application | 3 | 1 | 1 | 2 | — |
| 4: Intelligence | 2 | 2 | 1 | 1 | — |
| 5: Hardening | 2 | 1 | 2 | 2 | 2 |
Section 20: Final Production-Readiness Summary
20.1 System at a Glance
| Category | Specification |
|---|---|
| Architecture | Cloud (AWS EKS) + Edge (Intel NUC) + VPN (WireGuard) |
| Services | 12 containerized microservices |
| Security Zones | 5 (Public, App Private, Database, Edge LAN, Camera LAN) |
| AI Pipeline | YOLO11m (human detection) + SCRFD (face detection) + ArcFace (recognition) |
| Embeddings | 512-Dimensional face vectors stored in pgvector |
| Database | PostgreSQL 15, 29 tables, partitioned, AES-256-GCM encrypted |
| Web Application | 18 pages, dark mode, Next.js 14, real-time WebSocket |
| Notifications | Telegram Bot API + WhatsApp Business API (dual channel) |
| Security | TLS 1.3, Argon2id, JWT ES256, TOTP MFA, RBAC (4 roles, 30+ permissions) |
| Testing | 21 test suites, 170+ test cases, automated readiness scoring |
| Reliability | 99.9% uptime target, RTO 1 hour, RPO 15 minutes |
| Timeline | 20 weeks (5 months) to production |
20.2 Readiness Checklist Summary
| Category | Items | Status |
|---|---|---|
| Infrastructure | 14 | Ready to implement |
| Security | 18 | Ready to implement |
| AI/ML Pipeline | 15 | Ready to implement |
| Application | 16 | Ready to implement |
| Operations | 15 | Ready to implement |
| Data & Privacy | 10 | Ready to implement |
| Documentation | 10 | Ready to implement |
| Total | 98 | Ready to implement |
20.3 Estimated Timeline
| Milestone | Target | Duration |
|---|---|---|
| M1: Infrastructure Ready | Week 4 | 4 weeks |
| M2: AI Pipeline Operational | Week 8 | 4 weeks |
| M3: Application Live | Week 12 | 4 weeks |
| M4: Intelligence Features | Week 16 | 4 weeks |
| M5: Production Go-Live | Week 20 | 4 weeks |
| Total to Production | 20 weeks | ~5 months |
Appendices
Appendix A: Cross-Reference to Specialist Documents
| Document | Path | Content |
|---|---|---|
| Notification System | /mnt/agents/output/notification_system.md |
Telegram, WhatsApp, routing rules, templates, retry logic |
| Security Architecture | /mnt/agents/output/security_architecture.md |
SSL/TLS, auth, RBAC, VPN, secrets, audit, GDPR, checklist |
| Web UX Design | /mnt/agents/output/web_ux_design.md |
Design system, 18 pages, navigation, user flows, AI vibe settings |
| Self-Test Framework | /mnt/agents/output/self_test_framework.md |
Framework architecture, 21 suites, scheduling, sample report |
| Operations Plan | /mnt/agents/output/operations_plan.md |
Monitoring, logging, backup, DR, incident response, runbooks |
| Architecture | /mnt/agents/output/architecture.md |
System architecture, data flow, scaling strategy, cost estimates |
Appendix B: Acronyms
| Acronym | Full Form |
|---|---|
| AI | Artificial Intelligence |
| ALB | Application Load Balancer |
| API | Application Programming Interface |
| ArcFace | Additive Angular Margin Loss for Deep Face Recognition |
| CSP | Content Security Policy |
| CSRF | Cross-Site Request Forgery |
| CORS | Cross-Origin Resource Sharing |
| DLQ | Dead Letter Queue |
| DVR | Digital Video Recorder |
| EKS | Elastic Kubernetes Service |
| ES256 | ECDSA using P-256 and SHA-256 |
| FFmpeg | Fast Forward MPEG (multimedia framework) |
| FPS | Frames Per Second |
| GDPR | General Data Protection Regulation |
| GPU | Graphics Processing Unit |
| HLS | HTTP Live Streaming |
| HPA | Horizontal Pod Autoscaler |
| HSTS | HTTP Strict Transport Security |
| JWT | JSON Web Token |
| LUKS | Linux Unified Key Setup |
| MFA | Multi-Factor Authentication |
| mTLS | Mutual TLS |
| mAP | mean Average Precision |
| NMS | Non-Maximum Suppression |
| NUC | Next Unit of Computing |
| OCSP | Online Certificate Status Protocol |
| PII | Personally Identifiable Information |
| PSK | Pre-Shared Key |
| RBAC | Role-Based Access Control |
| RDS | Relational Database Service |
| RPO | Recovery Point Objective |
| RTO | Recovery Time Objective |
| RTSP | Real Time Streaming Protocol |
| S3 | Simple Storage Service |
| SAST | Static Application Security Testing |
| SCRFD | Single-Shot Multi-scale Face Detector |
| SLA | Service Level Agreement |
| SQL | Structured Query Language |
| SSL | Secure Sockets Layer |
| TLS | Transport Layer Security |
| TOTP | Time-based One-Time Password |
| TPM | Trusted Platform Module |
| UAT | User Acceptance Testing |
| VPC | Virtual Private Cloud |
| VPN | Virtual Private Network |
| WAF | Web Application Firewall |
| WORM | Write Once Read Many |
| XSS | Cross-Site Scripting |
| YOLO | You Only Look Once |
End of Document
Document Version: 1.0 Classification: Confidential — Internal Use Only Next Review: 2025-04-16 Owner: Sentinel AI Architecture Team