Sentinel AI Surveillance Platform — Unified Technical Blueprint (Part B)
Document Version: 1.0 Date: 2025-01-16 Classification: Confidential — Internal Use Only Author: Technical Architecture Team
Part B Table of Contents
- Section 11: Alerting Design (Notification System)
- Section 12: Security Design
- Section 13: UX / Website Structure
- Section 14: Deployment Plan
- Section 15: Testing Plan
- Section 16: Self-Test Framework
- Section 17: Sample Self-Test Report
- Section 18: Risks and Mitigations
- Section 19: Final Implementation Roadmap
- Section 20: Final Production-Readiness Summary
Section 11: Alerting Design
11.1 Architecture Overview
The notification system employs an event-driven architecture built on Redis Pub/Sub for real-time message distribution. All detection events, system alerts, and manual triggers flow through a unified pipeline that supports dual-channel delivery via Telegram Bot API and WhatsApp Business API (Meta Official). The system is designed to ensure that critical security alerts are never lost while maintaining high performance and reliability through sophisticated rate limiting, retry logic, and dead letter queue handling.
┌──────────────────────────────────────────────────────────────────────────────┐
│ ALERTING ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ EVENT SOURCES │ │
│ │ │ │
│ │ Detection Pipeline ──▶ New person detected │ │
│ │ Face Recognition ────▶ Known/Unknown/Watchlist match │ │
│ │ System Monitors ─────▶ Camera offline, Storage full, VPN down │ │
│ │ Manual Triggers ─────▶ Operator-initiated alerts │ │
│ │ AI Anomaly Engine ───▶ Suspicious activity detected │ │
│ └──────────────────────────┬──────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ REDIS PUB/SUB │ │
│ │ │ │
│ │ Channel: alerts.critical ─── High priority, immediate process │ │
│ │ Channel: alerts.high ─── Standard priority │ │
│ │ Channel: alerts.medium ─── Batched processing │ │
│ │ Channel: system.health ─── System health events │ │
│ └──────────────┬───────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┴───────────────────────────────────────────────────┐ │
│ │ NOTIFICATION ROUTER │ │
│ │ (Python/FastAPI) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │
│ │ │ Event Parser │──▶ Rules Engine │──▶ Channel Selector │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ │
│ └──────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌───────────────┐ ┌──────────────────┐ │
│ │ TEMPLATE │ │ RATE │ │ ESCALATION │ │
│ │ RENDERER │ │ LIMITER │ │ ENGINE │ │
│ │ │ │ │ │ │ │
│ │ HTML/TXT │ │ Token Bucket │ │ 3-level timeout │ │
│ │ per channel│ │ 4-tier limits │ │ Auto-escalation │ │
│ └──────┬──────┘ └───────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └──────────────────┼─────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CHANNEL ADAPTERS │ │
│ │ │ │
│ │ ┌──────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ TELEGRAM BOT API │ │ WHATSAPP BUSINESS API │ │ │
│ │ │ │ │ │ │ │
│ │ │ - HTML formatting │ │ - Template messages │ │ │
│ │ │ - Inline keyboards │ │ - Session messages │ │ │
│ │ │ - Media groups │ │ - Interactive messages │ │ │
│ │ │ - Edit/Delete messages │ │ - Media attachments │ │ │
│ │ │ - Webhook receipts │ │ - Message status API │ │ │
│ │ └──────────┬───────────────┘ └──────────┬───────────────┘ │ │
│ └─────────────┼─────────────────────────────┼─────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Telegram │ │ WhatsApp │ │
│ │ Servers │ │ Cloud API │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SUPPORTING SERVICES │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │
│ │ │ RETRY MGR │ │ DLQ │ │ DELIVERY TRACKER │ │ │
│ │ │ Exponential │ │ Redis-backed │ │ Webhook callbacks │ │ │
│ │ │ 5 max │ │ Admin review │ │ Status dashboard │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
Key Design Principles:
| Principle | Implementation |
|---|---|
| Guaranteed delivery | At-least-once delivery via retry with exponential backoff; dead letter queue for permanent failures |
| Ordered processing | Events within a single camera stream processed in sequence; no alert reordering |
| Non-blocking | Alert generation does not block the detection pipeline; async processing via queues |
| Channel isolation | Failure in one channel (e.g., Telegram down) does not affect the other (WhatsApp continues) |
| Deduplication | 5-minute window for duplicate suppression; composite key based on camera + person + event type |
| Observability | Every notification tracked from creation through delivery with full audit trail |
11.2 Telegram Integration
11.2.1 Bot API Configuration
Telegram integration uses the official Telegram Bot API for message delivery. The bot is configured with encrypted tokens stored in HashiCorp Vault, with HTML message formatting for rich alert presentation.
| Parameter | Value | Notes |
|---|---|---|
| API Base URL | https://api.telegram.org/bot<TOKEN>/ |
Standard Bot API endpoint |
| API Version | Bot API 7.x | Latest stable as of Q1 2025 |
| Token Storage | HashiCorp Vault (AES-256-GCM encrypted) | Rotated every 180 days |
| Communication | HTTPS POST + WebSocket fallback | TLS 1.3 required for all calls |
| Message Format | HTML subset | <b>, <i>, <code>, <pre>, <a href> tags supported |
| Max Message Size | 4096 characters per message | Longer messages auto-split into parts |
| Media Size Limit (Image) | 10 MB per image | Processed via Pillow for compression |
| Media Size Limit (Video) | 50 MB per video | Processed via FFmpeg for re-encoding |
| Media Group Limit | Up to 10 items per media group | Album delivery for multi-image alerts |
| Global Rate Limit | 30 messages per second | Across all chats |
| Per-Chat Rate Limit | 1 message per second | Per conversation throttling |
| Webhook Endpoint | /webhooks/telegram |
Receives delivery receipts and callback queries |
11.2.2 Bot Features and Capabilities
Inline Keyboards: Every alert message includes contextual action buttons that allow operators to respond directly from Telegram without opening the web dashboard.
| Keyboard Type | Buttons | Actions |
|---|---|---|
| Standard Alert | Acknowledge / View Live / Details | Confirm receipt, open stream, view full info |
| Watchlist Alert | Acknowledge / View Live / Escalate / Details | Includes escalation for watchlist matches |
| Blacklist Alert | ACKNOWLEDGE NOW / View Live / Dispatch Security / Escalate / Details | Highest priority actions for blacklist |
| Escalation Notice | Acknowledge / View Original Alert | Acknowledge escalated alert or view source |
| System Alert | Acknowledge / View Dashboard / Details | System-level alert actions |
Media Groups: When an alert contains multiple evidence images (up to 10), they are sent as a Telegram media group (album). This presents all related images in a single scrollable gallery rather than individual messages, reducing chat clutter.
Webhook Receipts: Telegram delivers message status updates via webhooks:
| Webhook Type | Trigger | Action |
|---|---|---|
message |
Bot receives a command | Process command (e.g., /status, /acknowledge) |
callback_query |
User clicks inline button | Execute action, update message status |
edited_message |
Message edited externally | Log for audit trail |
my_chat_member |
Bot added/removed from chat | Update recipient group membership |
Chat Commands:
| Command | Description | Response |
|---|---|---|
/status |
Get system health status | Camera count, offline count, last alert time |
/acknowledge <alert_id> |
Acknowledge an alert | Confirmation or error message |
/cameras |
List all cameras and their status | Camera name, status, last seen |
/health |
Get edge gateway health | CPU, memory, disk, VPN status |
/help |
Show available commands | Command reference |
11.2.3 Security Considerations
Telegram bot tokens are among the most sensitive credentials in the system. The following security measures are implemented:
| Measure | Implementation |
|---|---|
| Encryption at rest | AES-256-GCM in Vault |
| Token rotation | Every 180 days or immediately on compromise suspicion |
| Rotation procedure | 1) Generate new token via BotFather, 2) Update Vault, 3) Notify services to hot-reload, 4) 5-minute grace period, 5) Revoke old token |
| IP allowlisting | Webhook endpoint accepts only Telegram IP ranges |
| Webhook secret | HMAC verification on incoming webhook payloads |
| No token logging | Tokens never appear in application logs |
| No token in code | Tokens injected via Vault at runtime |
11.3 WhatsApp Business API Integration
11.3.1 Meta Cloud API Configuration
WhatsApp integration uses Meta's official Cloud API (Business Platform), which provides a reliable, enterprise-grade messaging channel. This requires a verified Meta Business account and pre-approved message templates for proactive messaging.
| Parameter | Value | Notes |
|---|---|---|
| API Base URL | https://graph.facebook.com/v18.0/ |
Meta Graph API v18.0 minimum |
| Authentication | Permanent Access Token | Scoped to WhatsApp Business Management |
| Token Storage | HashiCorp Vault (AES-256-GCM encrypted) | Rotated every 180 days |
| Phone Number ID | Dedicated business phone number | Not shared with other WhatsApp uses |
| Business Account | Verified Meta Business Account | Required for template message approval |
| Message Types | Template messages + Session messages | Template for first contact; session for replies |
| Media Size Limit (All) | 16 MB per file | Stricter than Telegram; aggressive compression needed |
| Supported Media | JPEG, PNG, MP4 (H.264), PDF, Audio | Format validation before upload |
| Global Rate Limit | 80 messages per second | Across all recipients |
| Per-Recipient Rate Limit | 20 messages per minute | Per WhatsApp ID throttling |
| Webhook Endpoint | /webhooks/whatsapp |
Receives message status updates |
11.3.2 Message Types
Template Messages: Pre-approved message templates are required for any proactive (business-initiated) message. Templates must be created and submitted for approval in Meta Business Manager. Each template contains named parameters that are dynamically populated at send time.
| Template Name | Purpose | Parameters | Approval Status |
|---|---|---|---|
person_detected_known |
Known person detected | name, role, camera, date, time, confidence, alert_id | Approved |
person_detected_unknown |
Unknown person alert | camera, date, time, confidence | Approved |
watchlist_match |
Person on watchlist detected | name, watchlist_type, camera, date, time | Approved |
blacklist_alert |
Blacklisted person detected | name, camera, date, time | Approved |
suspicious_activity |
Suspicious behavior detected | activity_type, camera, date, time, confidence | Approved |
system_alert |
System health alert | message, timestamp, severity | Approved |
escalation_notice |
Alert escalation notification | alert_id, level, summary, elapsed_minutes | Approved |
daily_digest |
Daily summary of activity | date, total_detections, total_alerts, top_cameras | Approved |
test_message |
System test | timestamp | Approved |
Session Messages: Within a 24-hour window after a user sends a message to the business, free-form session messages can be sent without template restrictions. This is used for:
- Acknowledgment confirmations
- Escalation follow-ups
- Interactive conversations initiated by the recipient
- Quick reply responses
11.3.3 Webhook Event Handling
| Webhook Event | Trigger | System Action |
|---|---|---|
messages.delivered |
Message delivered to device | Update delivery status to delivered |
messages.read |
Recipient read the message | Update delivery status to read |
messages.failed |
Message delivery failed | Trigger retry or move to DLQ |
message_reaction |
Recipient reacted to message | Log for engagement metrics |
account_alerts |
Meta account issue | Alert admin, review account status |
template_category_update |
Template status change | Update template catalog |
11.4 Alert Routing Rules Engine
11.4.1 Condition Types
The routing engine evaluates 9 distinct condition types to determine which recipients receive which alerts through which channels. Multiple conditions can be combined with AND/OR logic for precise targeting.
| # | Condition Type | Description | Example Values | Operators |
|---|---|---|---|---|
| 1 | camera |
Source camera identifier | "CAM-01", "CAM-02", "entrance-cam" | equals, in, not_in |
| 2 | person |
Detected known person | "John Smith", "Jane Doe" | equals, in, not_in |
| 3 | role |
Person role category | "employee", "visitor", "vendor", "contractor", "security" | equals, in |
| 4 | event_type |
Type of detection event | "person_detected", "unknown_person", "suspicious_activity", "crowd_gathering", "camera_tamper" | equals, in |
| 5 | zone |
Detection zone name | "entrance", "restricted_area", "parking", "lobby", "warehouse" | equals, in |
| 6 | time |
Time of day range | "08:00-18:00", "22:00-06:00" | between, not_between |
| 7 | day |
Day of week | "monday", "weekday", "weekend" | equals, in |
| 8 | severity |
Alert severity level | "critical", "high", "medium", "low", "info" | equals, in, gte |
| 9 | watchlist |
Watchlist membership | "vip", "blacklist", "authorized", "temporary_access" | equals, in |
11.4.2 Rule Structure
Each routing rule consists of conditions, actions, and metadata:
rule:
id: "rule-001"
name: "Blacklist Immediate Alert"
enabled: true
priority: 100 # Higher number = evaluated first
conditions:
operator: "AND"
conditions:
- field: "watchlist"
operator: "equals"
value: "blacklist"
- field: "severity"
operator: "in"
value: ["critical", "high"]
actions:
- channel: "telegram"
recipients: ["security_team", "management"]
template: "blacklist_alert"
media: ["image", "video"]
bypass_quiet_hours: true
priority: "high"
- channel: "whatsapp"
recipients: ["security_manager"]
template: "blacklist_alert"
media: ["image"]
bypass_quiet_hours: true
metadata:
created_by: "admin"
created_at: "2025-01-01T00:00:00Z"
last_modified: "2025-01-10T12:00:00Z"
tags: ["critical", "blacklist"]
11.4.3 Default Routing Rules
The system ships with a comprehensive set of default routing rules that cover common surveillance scenarios:
| # | Scenario | Conditions | Severity | Recipients | Channels | Media | Quiet Hours |
|---|---|---|---|---|---|---|---|
| 1 | Known employee normal hours | role=employee, time=08:00-18:00, weekday | Info | None (log only) | — | — | N/A |
| 2 | Known employee after hours | role=employee, time=18:00-08:00 | Low | Security team | Telegram | Image | Respected |
| 3 | Known visitor during hours | role=visitor, time=08:00-18:00 | Low | Reception desk | Telegram | Image | Respected |
| 4 | Unknown person detected | event_type=unknown_person | Medium | Security team | Telegram + WhatsApp | Image | Respected |
| 5 | Unknown person after hours | event_type=unknown_person, time=22:00-06:00 | High | Security team + Manager | Both | Image + Video | Bypassed |
| 6 | Watchlist match | watchlist=watchlist | High | Security team | Both | Image + Video | Respected |
| 7 | Blacklist match | watchlist=blacklist | Critical | All groups | Both (bypass quiet) | Image + Video | Bypassed |
| 8 | VIP detected | watchlist=vip | Low | Reception desk | Telegram | Image | Respected |
| 9 | Camera offline | event_type=camera_offline | High | IT team + Security team | Telegram | None | Bypassed |
| 10 | Storage > 90% | event_type=storage_warning | High | IT team + Management | Both | None | Bypassed |
| 11 | Storage > 95% | event_type=storage_critical | Critical | All groups | Both (bypass quiet) | None | Bypassed |
| 12 | VPN tunnel down | event_type=vpn_down | Critical | IT team + Management | Both (bypass quiet) | None | Bypassed |
| 13 | Suspicious activity | event_type=suspicious_activity | High | Security team | Both | Image + Video | Respected |
| 14 | Crowd gathering | event_type=crowd_gathering | Medium | Security team | Telegram | Image | Respected |
11.5 Recipient Groups and Quiet Hours
11.5.1 Recipient Group Management
Recipient groups are the primary mechanism for organizing alert destinations. Each group contains one or more contacts with specified channels.
| Group Name | Members | Primary Channel | Backup Channel | Alert Preferences | Quiet Hours |
|---|---|---|---|---|---|
| Security Team | On-site security guards | Telegram | All except info | Disabled | |
| Security Manager | Shift supervisor | Telegram | Medium and above | Disabled | |
| IT Team | Infrastructure staff | Telegram | System alerts only | Nights | |
| Management | Facility managers | Telegram | Critical only | Disabled | |
| Reception | Front desk staff | Telegram | None | Visitor-related, VIP | Disabled |
| After-Hours | On-call personnel | Telegram | High and Critical | Disabled |
Group Configuration Interface:
Groups are managed through the web dashboard at /settings/notifications/groups. Each group can be configured with:
| Setting | Description |
|---|---|
| Group name | Human-readable identifier |
| Description | Purpose of the group |
| Members | List of Telegram chat IDs and WhatsApp phone numbers |
| Default channel | Primary delivery channel |
| Alert severity filter | Minimum severity to deliver |
| Quiet hours override | Whether quiet hours apply to this group |
| Media preferences | Which media types to include |
| Max alerts per hour | Rate limit for this group |
11.5.2 Quiet Hours Configuration
Quiet hours allow suppressing non-critical alerts during configured time windows. Critical alerts always bypass quiet hours — this is a non-configurable safety measure.
quiet_hours:
enabled: false # DISABLED BY DEFAULT for security
preset: "none" # none / nights / weekends / custom
custom_schedule:
- label: "Weekday Nights"
days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
start_time: "22:00"
end_time: "06:00"
timezone: "Asia/Kolkata"
- label: "Weekend All Day"
days: ["Saturday", "Sunday"]
start_time: "00:00"
end_time: "23:59"
timezone: "Asia/Kolkata"
allowed_during_quiet: # Which severities bypass quiet hours
- "critical" # Always delivered (non-configurable)
emergency_bypass:
enabled: true
triggers:
- severity: "critical"
- tag: "emergency"
- rule_override: "bypass_quiet_hours"
notification_method: "all_channels"
suppression_behavior: "queue" # queue / discard / digest
# "queue": Hold until quiet hours end
# "discard": Drop non-critical alerts entirely
# "digest": Send summary when quiet hours end
Security Note: Quiet hours are disabled by default because the surveillance use case requires continuous awareness. Any decision to enable quiet hours must be documented with security team sign-off.
11.5.3 Per-Recipient Quiet Hours
Individual recipients can configure personal quiet hours that override group settings:
| Recipient | Personal Quiet Hours | Group Override | Effect |
|---|---|---|---|
| Security Guard A | None | Security Team (Disabled) | Receives all alerts |
| IT Manager | 23:00-07:00 | IT Team (Nights) | Matches group — no IT alerts at night |
| Manager B | 22:00-08:00 | Management (Disabled) | Personal quiet hours applied |
11.6 Message Templates
11.6.1 Telegram HTML Templates
All Telegram templates use a safe HTML subset for rich formatting with inline action keyboards.
Template: Person Detected (Known)
🔍 <b>Person Detected</b>
<b>{name}</b> ({role})
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<a href="{dashboard_url}">View in Dashboard</a>
Template: Unknown Person Detected
❓ <b>Unknown Person Detected</b>
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<i>This person is not in the database.</i>
<a href="{naming_url}">Name This Person</a>
Template: Watchlist Match
⚠️ <b>WATCHLIST ALERT</b>
<b>{name}</b>
📋 Watchlist: {watchlist_type}
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<i>This person is on a watchlist and requires attention.</i>
Template: Blacklist Alert
🚨 <b>BLACKLIST ALERT</b> 🚨
⚠️ <b>{name}</b> has been detected!
📍 Camera: {camera_name}
🕐 {date} at {time}
🎯 Confidence: {confidence}%
<b>This person is BLACKLISTED. Immediate attention required.</b>
<a href="{dispatch_url}">🚨 Dispatch Security</a>
Template: Escalation Notice
⬆️ <b>Alert Escalated — Level {escalation_level}</b>
Alert #{alert_id} has been escalated.
Original: {alert_summary}
⏱️ Unacknowledged for {elapsed_minutes} minutes
Threshold: {threshold_minutes} minutes
<i>Please review immediately.</i>
Template: System Alert
⚙️ <b>System Alert</b>
{message}
🕐 {timestamp}
Severity: {severity}
<a href="{health_dashboard_url}">View System Health</a>
Template: Daily Digest
📊 <b>Daily Activity Digest — {date}</b>
👥 Persons Detected: {total_detections}
🔔 Alerts Generated: {total_alerts}
📹 Cameras Online: {cameras_online}/{cameras_total}
Top Cameras:
{camera_list}
<a href="{full_report_url}">View Full Report</a>
11.6.2 WhatsApp Template Format
WhatsApp templates use a different format — they are pre-registered with Meta and use numbered parameter substitution:
Template: person_detected_known
🔍 Person Detected
{{1}} ({{2}})
📍 Camera: {{3}}
🕐 {{4}} at {{5}}
🎯 Confidence: {{6}}%
Alert ID: {{7}}
Parameters: {{1}}=name, {{2}}=role, {{3}}=camera_name, {{4}}=date, {{5}}=time, {{6}}=confidence, {{7}}=alert_id
11.6.3 Template Variable Reference
| Variable | Description | Source | Example |
|---|---|---|---|
{name} |
Detected person's name | Person database | "John Smith" |
{role} |
Person's role | Person database | "Employee" |
{camera_name} |
Camera display name | Camera configuration | "Main Entrance" |
{date} |
Event date | Event timestamp | "2025-01-16" |
{time} |
Event time | Event timestamp | "14:32:15" |
{confidence} |
Detection confidence % | AI inference result | "97.3" |
{alert_id} |
Unique alert identifier | Alert database | "ALT-20250116-001" |
{watchlist_type} |
Watchlist category | Watchlist configuration | "Blacklist" |
{activity_type} |
Type of suspicious activity | AI classification | "Loitering" |
{severity} |
Alert severity | Rules engine | "Critical" |
{dashboard_url} |
Deep link to dashboard | System configuration | "https://..." |
{elapsed_minutes} |
Time since alert creation | System clock | "15" |
11.7 Retry Logic and Rate Limiting
11.7.1 Retry Configuration
Failed notifications are retried using an exponential backoff strategy to avoid overwhelming downstream services.
| Parameter | Value | Description |
|---|---|---|
| Maximum retries | 5 | After 5 failures, move to DLQ |
| Base delay | 2 seconds | Initial retry wait time |
| Exponential base | 2 | Delay multiplier (2^n) |
| Maximum delay | 300 seconds (5 minutes) | Cap on retry delay |
| Jitter | Up to 1 second random | Prevents thundering herd |
Retry Schedule:
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1 (initial) | Immediate | 0s |
| 2 | 2s + jitter | ~2s |
| 3 | 4s + jitter | ~6s |
| 4 | 8s + jitter | ~14s |
| 5 | 16s + jitter | ~30s |
| 6 (final) | 32s + jitter | ~62s |
| DLQ | — | After 62s total |
Retryable Errors:
| Error Code | Description | Retry? |
|---|---|---|
| Timeout | Request timed out | Yes |
| 429 Too Many Requests | Rate limited by provider | Yes (with longer delay) |
| 500 Internal Server Error | Provider error | Yes |
| 502 Bad Gateway | Provider gateway error | Yes |
| 503 Service Unavailable | Provider temporarily down | Yes |
| 409 Conflict | Request conflict | Yes |
| 401 Unauthorized | Authentication failed | No (credential issue) |
| 403 Forbidden | Permission denied | No (configuration issue) |
| 400 Bad Request | Invalid request | No (template/parameter issue) |
| Chat not found | Recipient blocked bot | No |
Non-Retryable Errors (Immediate DLQ):
- Invalid bot token (401)
- Bot blocked by user (403)
- Chat not found
- Malformed template (400)
- Message too long (after split)
- Unsupported media format
11.7.2 Circuit Breaker
Each channel adapter implements a circuit breaker to prevent cascading failures:
| Parameter | Value |
|---|---|
| Failure threshold | 10 consecutive failures |
| Open state duration | 60 seconds |
| Half-open test calls | 3 successful calls required |
| Monitoring window | 5 minutes |
Circuit States:
| State | Behavior | Transition Trigger |
|---|---|---|
Closed |
Normal operation — all requests pass | Initial state, or after half-open success |
Open |
Fast fail — no requests sent to provider | 10 consecutive failures |
Half-Open |
Limited test requests allowed | After 60-second open timeout |
11.7.3 Rate Limiting Tiers
The notification system implements multi-tier rate limiting to prevent abuse and ensure fair resource distribution:
| Tier | Limit | Scope | Burst |
|---|---|---|---|
| Global (all channels) | 200 messages/minute | Across all channels combined | 20 |
| Telegram Global | 30 messages/second | All Telegram traffic | 5 |
| Telegram Per-Chat | 1 message/second | Per conversation | 1 |
| WhatsApp Global | 80 messages/second | All WhatsApp traffic | 10 |
| WhatsApp Per-Recipient | 20 messages/minute | Per phone number | 3 |
| Per Camera Source | 30 alerts/minute | Prevents camera spam | 5 |
| Per Severity (Critical) | No limit | Critical alerts bypass rate limits | N/A |
Token Bucket Algorithm: Each tier maintains a token bucket. A token is consumed per message. Tokens replenish at the configured rate. If no tokens are available, the message is queued or rejected based on priority.
11.7.4 Alert Deduplication
Alerts are deduplicated to prevent notification spam when the same event triggers repeatedly:
| Deduplication Key | Components | Window | Action on Duplicate |
|---|---|---|---|
| Known person | camera_id + person_id + event_type |
5 minutes | Suppress, append counter to original |
| Unknown person | camera_id + event_type |
5 minutes | Suppress, append counter to original |
| System alert | alert_type + source_id |
15 minutes | Suppress, update existing message |
| Watchlist match | camera_id + person_id + watchlist_id |
10 minutes | Suppress, append counter |
When a duplicate is detected, the original message is updated with a counter (e.g., "+3 more detections"), avoiding a flood of similar messages.
11.8 Escalation Rules
11.8.1 Escalation Thresholds
When an alert goes unacknowledged, it automatically escalates through up to 3 levels, each with increasing urgency and broader recipient distribution.
| Severity | Level 1 (Primary) | Level 2 (Secondary) | Level 3 (Final) |
|---|---|---|---|
| Critical | 5 minutes | 10 minutes | 20 minutes |
| High | 15 minutes | 30 minutes | 60 minutes |
| Medium | 30 minutes | 60 minutes | 120 minutes |
| Low | 60 minutes | 120 minutes | 240 minutes |
| Info | Never | Never | Never |
11.8.2 Escalation Actions per Level
| Level | Name | Notification Action | Recipient Expansion | Severity Change |
|---|---|---|---|---|
| 0 | Original | Standard routing rules | Primary recipients only | Original severity |
| 1 | Primary | Re-notify with escalation prefix | Add management group | Increase by one level |
| 2 | Secondary | Force all channels, bypass quiet hours | Add all groups, increase severity | Increase by one level |
| 3 | Final | All-hands notification, include audit trail | All configured recipients | Set to Critical |
Escalation Cancellation: Acknowledgment cancels ALL pending escalation timers for an alert. Acknowledgment can occur via:
- Telegram inline "Acknowledge" button click
- WhatsApp quick reply "Ack"
- Web dashboard "Acknowledge" button
- REST API
POST /api/v1/alerts/{id}/acknowledge - Chat command
/acknowledge {alert_id}
11.8.3 Escalation Notification Template
⬆️ <b>ESCALATION — Level {level}</b>
Original Alert: {alert_summary}
Alert ID: {alert_id}
First Detected: {first_detected_time}
Current Time: {current_time}
Unacknowledged: {elapsed_minutes} minutes
Escalation Threshold: {threshold_minutes} minutes
This alert has been escalated because it has not been acknowledged.
Please review immediately.
<a href="{acknowledge_url}">✅ Acknowledge Now</a>
<a href="{view_alert_url}">👁 View Details</a>
11.9 Media Attachment Handling
11.9.1 Media Processing Pipeline
When an alert includes media (snapshot images or video clips), a multi-stage processing pipeline ensures the media meets channel-specific requirements:
Original Media (from detection)
│
▼
┌──────────────────┐
│ 1. Store Original│ ──▶ MinIO/S3 (full resolution archival)
│ in Storage │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ 2. Process for │
│ Telegram │
└────────┬─────────┘
│
├──▶ Image: Resize 1280x720, JPEG quality 85, max 10 MB
├──▶ Video: H.264, 1280x720, max 50 MB, max 60 seconds
└──▶ Media Group: Each image < 10 MB, max 10 items
│
▼
┌──────────────────┐
│ 3. Process for │
│ WhatsApp │
└────────┬─────────┘
│
├──▶ Image: Resize 1600x900, JPEG quality 80, max 16 MB
└──▶ Video: H.264, 1280x720, max 16 MB, max 60 seconds
11.9.2 Image Processing Details
| Step | Operation | Parameters |
|---|---|---|
| 1. Load | Open source image | Pillow (PIL) |
| 2. Convert | Convert to RGB | Drop alpha channel if present |
| 3. Resize | Scale to target dimensions | Lanczos resampling |
| 4. Compress | JPEG encoding | Quality: 85 (Telegram), 80 (WhatsApp) |
| 5. Check size | Verify file size under limit | If over limit, reduce quality iteratively |
| 6. Fallback | Aggressive compression | If quality < 50 and still over limit, reduce dimensions |
Iterative Quality Reduction:
def compress_image_to_limit(image, size_limit_mb, channel):
quality = 85 if channel == 'telegram' else 80
min_quality = 40
while quality >= min_quality:
buffer = io.BytesIO()
image.save(buffer, format='JPEG', quality=quality, optimize=True)
size_mb = buffer.tell() / (1024 * 1024)
if size_mb <= size_limit_mb:
return buffer.getvalue()
quality -= 5
# If still over limit, reduce dimensions by 25% and retry
new_size = (int(image.width * 0.75), int(image.height * 0.75))
image = image.resize(new_size, Image.LANCZOS)
return compress_image_to_limit(image, size_limit_mb, channel)
11.9.3 Video Processing Details
Videos are processed with FFmpeg using two-pass encoding to achieve the target bitrate calculated from the size limit:
# Calculate target bitrate: (size_limit_bytes * 8) / duration_seconds
# Example: 16 MB limit, 10 second clip = (16*1024*1024*8) / 10 = ~13.4 Mbps
ffmpeg -i input.mp4 \
-c:v libx264 \
-b:v 10M \ # Target video bitrate
-maxrate 12M \ # Maximum bitrate
-bufsize 20M \ # Buffer size
-vf "scale=1280:720:force_original_aspect_ratio=decrease" \
-c:a aac -b:a 128k \ # Audio encoding
-movflags +faststart \ # Web-optimized
-preset fast \ # Encoding speed/quality tradeoff
-y output.mp4
11.10 Delivery Tracking
11.10.1 Delivery Status Lifecycle
Every notification progresses through a well-defined status lifecycle, tracked in the database for audit and troubleshooting:
| Status | Description | Terminal? |
|---|---|---|
pending |
Queued, waiting to be sent | No |
processing |
Currently being sent to provider | No |
sent |
API request to provider succeeded | No |
delivered |
Provider confirmed delivery to device | No |
read |
Recipient opened/read the message | No |
engaged |
User interacted (button click, reaction) | Yes |
failed |
Permanently failed (non-retryable error) | Yes |
retrying |
Scheduled for retry attempt | No |
dead_letter |
Moved to DLQ after all retries exhausted | Yes |
suppressed |
Blocked by quiet hours or deduplication | Yes |
cancelled |
Cancelled (e.g., acknowledged before send) | Yes |
expired |
Message TTL expired before delivery | Yes |
Status Transitions:
pending → processing → sent → delivered → read → engaged
│ │ │ │
▼ ▼ ▼ ▼
retrying cancelled failed suppressed
│
▼
dead_letter
11.10.2 Dead Letter Queue (DLQ)
Failed notifications that exhaust all retry attempts are moved to a Redis-backed Dead Letter Queue. Admin users can review and manage DLQ entries through the web dashboard.
| DLQ Feature | Description |
|---|---|
| Storage | Redis sorted set, ordered by failure timestamp |
| Retention | 30 days |
| View | Filterable by channel, error type, date range |
| Actions | Retry individual, Retry all (batch), Discard, Export |
| Alert | Daily digest of DLQ count; alert if > 10 entries |
| Auto-retry | Optional: automatically retry DLQ entries every 6 hours |
11.11 API Endpoints Summary
11.11.1 REST Endpoints (13 endpoints)
| # | Method | Endpoint | Purpose | Auth |
|---|---|---|---|---|
| 1 | GET | /api/v1/notifications/rules |
List all routing rules | Admin |
| 2 | POST | /api/v1/notifications/rules |
Create new routing rule | Admin |
| 3 | GET | /api/v1/notifications/rules/{id} |
Get specific rule | Admin |
| 4 | PUT | /api/v1/notifications/rules/{id} |
Update routing rule | Admin |
| 5 | DELETE | /api/v1/notifications/rules/{id} |
Delete routing rule | Admin |
| 6 | GET | /api/v1/notifications/templates |
List message templates | Admin |
| 7 | POST | /api/v1/notifications/templates |
Create/update template | Admin |
| 8 | GET | /api/v1/notifications/delivery-status/{alert_id} |
Get delivery status for alert | Operator+ |
| 9 | GET | /api/v1/notifications/{id}/status |
Single notification status | Operator+ |
| 10 | POST | /api/v1/notifications/{id}/retry |
Manual retry of failed notification | Admin |
| 11 | GET | /api/v1/notifications/dlq |
List dead letter queue | Admin |
| 12 | POST | /api/v1/notifications/dlq/retry-all |
Retry all DLQ entries | Admin |
| 13 | POST | /api/v1/notifications/dlq/clear |
Clear all DLQ entries | Admin |
11.11.2 Alert Management Endpoints
| # | Method | Endpoint | Purpose | Auth |
|---|---|---|---|---|
| 1 | GET | /api/v1/alerts |
List alerts with filters | Operator+ |
| 2 | GET | /api/v1/alerts/{id} |
Get single alert details | Operator+ |
| 3 | POST | /api/v1/alerts/{id}/acknowledge |
Acknowledge alert | Operator+ |
| 4 | POST | /api/v1/alerts/{id}/resolve |
Resolve alert | Operator+ |
| 5 | POST | /api/v1/alerts/{id}/ignore |
Ignore alert | Operator+ |
| 6 | POST | /api/v1/alerts/{id}/false-positive |
Mark as false positive | Operator+ |
| 7 | POST | /api/v1/alerts/bulk/acknowledge |
Bulk acknowledge | Operator+ |
| 8 | POST | /api/v1/alerts/bulk/ignore |
Bulk ignore | Operator+ |
11.11.3 WebSocket Endpoints (2 endpoints)
| Endpoint | Purpose | Authentication |
|---|---|---|
WS /api/v1/notifications/live |
Real-time notification stream for connected clients | JWT token in query parameter |
WS /api/v1/alerts/stream |
Live alert feed for operator dashboards | JWT token in query parameter |
11.11.4 Webhook Endpoints (2 endpoints)
| Endpoint | Source | Purpose |
|---|---|---|
POST /webhooks/telegram |
Telegram servers | Receive delivery receipts, callback queries, chat events |
POST /webhooks/whatsapp |
Meta servers | Receive message status updates, incoming messages |
Webhook Security:
| Measure | Implementation |
|---|---|
| Telegram | HMAC-SHA256 signature verification using bot token |
| SHA-256 signature verification using app secret | |
| IP allowlisting | Only accept requests from Telegram/Meta IP ranges |
| Replay protection | Reject messages with timestamps older than 5 minutes |
| Rate limiting | 100 requests per minute per source IP |
Section 12: Security Design
12.1 Security Architecture Overview
The Sentinel AI Surveillance Platform implements defense-in-depth security across seven distinct layers. Every component — from network perimeter to data storage — has been designed with security as a primary consideration, reflecting the sensitive nature of surveillance data, biometric information, and the critical safety function the system performs.
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEFENSE IN DEPTH ARCHITECTURE │
│ │
│ LAYER 1: PERIMETER │
│ ───────────────── │
│ AWS WAF v2 │ Geo-restriction │ DDoS protection │ Rate limiting │
│ │
│ LAYER 2: TRANSPORT │
│ ───────────────── │
│ TLS 1.3 │ mTLS internal │ WireGuard ChaCha20-Poly1305 │ Certificate mgmt │
│ │
│ LAYER 3: AUTHENTICATION & AUTHORIZATION │
│ ───────────────────────────────────────── │
│ Argon2id │ JWT ES256 │ TOTP MFA │ RBAC 4 roles │ API keys │
│ │
│ LAYER 4: APPLICATION SECURITY │
│ ──────────────────────────── │
│ Input validation │ Parameterized queries │ CSP │ CSRF │ CORS │ File upload │
│ │
│ LAYER 5: DATA SECURITY │
│ ──────────────────── │
│ AES-256-GCM at rest │ Field-level encryption │ Signed URLs │ Key rotation │
│ │
│ LAYER 6: NETWORK SEGMENTATION │
│ ─────────────────────────── │
│ VPC private subnets │ Security groups │ Network Policies │ Firewall rules │
│ │
│ LAYER 7: AUDIT & MONITORING │
│ ───────────────────────── │
│ Hash-chain audit log │ Real-time alerts │ CloudTrail │ Flow Logs │
└──────────────────────────────────────────────────────────────────────────────┘
12.2 SSL/TLS Configuration
12.2.1 Protocol and Cipher Suite Requirements
All external-facing services enforce strong TLS configuration with modern cipher suites:
| Setting | Value | Rationale |
|---|---|---|
| Minimum TLS Version | TLS 1.2 | Fallback for older clients; TLS 1.3 preferred |
| Preferred TLS Version | TLS 1.3 | Fastest, most secure handshake |
| Cipher Suites (TLS 1.2) | ECDHE-ECDSA-AES256-GCM-SHA384 |
Forward secrecy, AES-GCM authenticated encryption |
| Cipher Suites (TLS 1.2) | ECDHE-RSA-AES256-GCM-SHA384 |
Same with RSA certificates |
| Cipher Suites (TLS 1.2) | ECDHE-ECDSA-CHACHA20-POLY1305 |
Mobile-optimized cipher |
| Cipher Suites (TLS 1.2) | ECDHE-RSA-CHACHA20-POLY1305 |
Mobile-optimized with RSA |
| Cipher Suites (TLS 1.3) | TLS_AES_256_GCM_SHA384 |
Mandatory TLS 1.3 cipher |
| Cipher Suites (TLS 1.3) | TLS_CHACHA20_POLY1305_SHA256 |
Alternative TLS 1.3 cipher |
| Disabled Ciphers | CBC mode, RC4, 3DES, DES, MD5, SHA1, RSA key exchange (no forward secrecy) | Known weaknesses |
| HSTS | max-age=63072000; includeSubDomains; preload |
2-year HSTS with preload eligibility |
| OCSP Stapling | Enabled | Reduces certificate validation latency |
| Certificate Provider | Let's Encrypt (ACME v2) | Free, automated, trusted |
| Auto-renewal | 60 days before expiry | Ensures 30+ day buffer |
| Certificate Transparency | Required | All certificates publicly logged |
12.2.2 mTLS for Internal Service Communication
All inter-service communication uses mutual TLS (mTLS) with client certificate verification. This means both the client and server must present valid certificates signed by the internal Certificate Authority.
| Parameter | Value |
|---|---|
| Internal CA | Self-managed ECDSA P-256 CA |
| Certificate lifetime | 90 days (auto-rotated) |
| Verification mode | Required (reject if no client cert) |
| Revocation | CRL + OCSP |
| Service identity | SPIFFE URI in certificate Subject Alternative Name |
Benefits of mTLS:
- Even if network boundaries are breached, unauthorized services cannot access internal APIs
- Every service-to-service call is authenticated and encrypted
- Certificates provide strong service identity (not just IP-based)
- No shared secrets between services (except Vault tokens)
12.2.3 TLS Configuration Code Example
# FastAPI TLS configuration
from fastapi import FastAPI
from uvicorn.config import Config
app = FastAPI()
# TLS settings for uvicorn
ssl_config = {
"ssl_keyfile": "/certs/server.key",
"ssl_certfile": "/certs/server.crt",
"ssl_ca_certs": "/certs/ca.crt", # For mTLS
"ssl_cert_reqs": ssl.CERT_REQUIRED, # Require client cert
"ssl_min_version": ssl.TLSVersion.TLSv1_2,
"ssl_ciphers": "ECDHE-ECDSA-AES256-GCM-SHA384:"
"ECDHE-RSA-AES256-GCM-SHA384:"
"ECDHE-ECDSA-CHACHA20-POLY1305:"
"ECDHE-RSA-CHACHA20-POLY1305",
}
12.3 Authentication
12.3.1 Password Policy
| Requirement | Value | Enforcement |
|---|---|---|
| Minimum length | 12 characters | Hard validation |
| Complexity | At least one uppercase, one lowercase, one digit, one special character | Regex validation |
| Password history | Last 12 passwords cannot be reused | Database check |
| Hashing algorithm | Argon2id (memory-hard, resistant to GPU cracking) | Passwords never stored in plaintext |
| Argon2id parameters | Time cost: 3, Memory: 64MB, Parallelism: 4 | Tuned for 500ms hash time |
| HaveIBeenPwned check | Enabled for all new passwords | k-anonymity API (no full password sent) |
| Maximum age | 90 days | Configurable; reminder at 75 days |
| Lockout after failures | 5 failed attempts | 30-minute lockout |
| Password change | Users cannot reuse current password | Immediate validation |
12.3.2 JWT Token Configuration
| Parameter | Value | Notes |
|---|---|---|
| Signing algorithm | ES256 (ECDSA with P-256 curve) | Smaller signatures than RS256; same security |
| Access token lifetime | 15 minutes | Short-lived for security |
| Refresh token lifetime | 7 days | Long-lived but revocable |
| Key rotation | Every 180 days | Dual-key support for zero-downtime rotation |
| Key storage | HashiCorp Vault | Private key never exposed to application filesystem |
| Token binding | Session ID + browser fingerprint | Detects token theft/reuse |
| Claims | sub, iss, aud, exp, iat, jti, role, permissions, mfa_verified |
Standard + custom claims |
| Issuer | sentinel-ai |
Verified by all services |
| Audience | sentinel-api |
Scope-limited |
JWT Token Structure:
{
"header": {
"alg": "ES256",
"typ": "JWT",
"kid": "key-2025-01"
},
"payload": {
"sub": "user-uuid-here",
"iss": "sentinel-ai",
"aud": "sentinel-api",
"exp": 1705500000,
"iat": 1705499100,
"jti": "unique-token-id",
"role": "operator",
"permissions": ["alerts:view", "alerts:acknowledge", "cameras:view"],
"mfa_verified": true,
"session_id": "sess-uuid-here"
}
}
12.3.3 Multi-Factor Authentication (MFA)
| Parameter | Value |
|---|---|
| Method | TOTP (Time-based One-Time Password) per RFC 6238 |
| Issuer label | "Sentinel AI Surveillance" |
| Algorithm | SHA-1 (for compatibility) |
| Digit length | 6 digits |
| Time step | 30 seconds |
| Valid window | 1 step before and after current (3-step tolerance) |
| Recovery codes | 10 single-use codes generated at setup |
| Enforced for | Super Admin, Admin roles (mandatory) |
| Optional for | Operator, Viewer roles (recommended) |
| QR code format | otpauth://totp/Sentinel%20AI:{username}?secret={secret}&issuer=Sentinel%20AI |
MFA Enforcement Matrix:
| Role | MFA Required | Can Disable |
|---|---|---|
| Super Admin | Yes | No |
| Admin | Yes | No |
| Operator | No (Recommended) | Yes |
| Viewer | No | Yes |
12.4 Role-Based Access Control (RBAC)
12.4.1 Role Definitions
| Role | Level | Description | Typical Users | Count |
|---|---|---|---|---|
| Super Admin | L1 | Full system access; can manage other admins | CISO, CTO, Platform Lead | 1-2 |
| Admin | L2 | Administrative functions; day-to-day management | Security Manager, IT Manager | 2-4 |
| Operator | L3 | Day-to-day surveillance operations | Security guards, SOC analysts | 5-20 |
| Viewer | L4 | Read-only access for review and audit | Auditors, Management | 2-10 |
12.4.2 Permission Matrix (30+ Permissions)
| Permission | Super Admin | Admin | Operator | Viewer |
|---|---|---|---|---|
users:full_access |
Y | N | N | N |
users:manage (create/edit/deactivate) |
Y | Y | N | N |
users:view (list, details) |
Y | Y | Y | Y |
users:reset_password |
Y | Y | N | N |
users:reset_mfa |
Y | Y | N | N |
cameras:full_access |
Y | N | N | N |
cameras:manage (add/edit/remove) |
Y | Y | N | N |
cameras:view (list, status) |
Y | Y | Y | Y |
cameras:control (PTZ, restart stream) |
Y | Y | Y | N |
cameras:configure_zones |
Y | Y | N | N |
alerts:manage (edit rules, bulk actions) |
Y | Y | N | N |
alerts:view (list, filter, search) |
Y | Y | Y | Y |
alerts:acknowledge |
Y | Y | Y | N |
alerts:resolve |
Y | Y | Y | N |
alerts:mark_false_positive |
Y | Y | Y | N |
persons:full_access |
Y | N | N | N |
persons:manage (create/edit/delete) |
Y | Y | N | N |
persons:view (gallery, profiles) |
Y | Y | Y | Y |
persons:name_unknown |
Y | Y | Y | N |
persons:merge |
Y | Y | Y | N |
watchlists:manage (create/edit/delete) |
Y | Y | N | N |
watchlists:view (list, members) |
Y | Y | Y | Y |
watchlists:add_remove_members |
Y | Y | Y | N |
ai_settings:manage (change defaults) |
Y | Y | N | N |
ai_settings:view (see current settings) |
Y | Y | Y | Y |
ai_settings:adjust (operator adjustments) |
Y | Y | Y | N |
reports:full_access |
Y | N | N | N |
reports:view (all reports) |
Y | Y | Y | Y |
reports:export |
Y | Y | Y | N |
system:full_access |
Y | N | N | N |
system:manage (config changes) |
Y | Y | N | N |
system:view (health, status) |
Y | Y | Y | Y |
audit:view (audit logs) |
Y | Y | N | N |
notifications:manage (routing rules) |
Y | Y | N | N |
storage:manage (retention policies) |
Y | Y | N | N |
storage:view (usage, reports) |
Y | Y | Y | Y |
privacy:manage (GDPR actions) |
Y | Y | N | N |
privacy:view (consent status) |
Y | Y | Y | Y |
12.4.3 Resource-Level Permissions
Beyond global permissions, the system supports resource-level access control:
| Resource Type | Granularity | Example |
|---|---|---|
| Cameras | Per-camera access | Operator A can only view CAM-01, CAM-02 |
| Zones | Per-zone access | Operator B can only view "entrance" zone |
| Alerts | Per-camera origin | Viewer can only see alerts from specific cameras |
| Persons | Per-department | HR can only view employee records |
| Watchlists | Per-watchlist | Security can only view "blacklist", not "vip" |
12.5 VPN and Network Security
12.5.1 WireGuard VPN Configuration
WireGuard provides the encrypted tunnel between cloud infrastructure and the edge site:
| Parameter | Value | Notes |
|---|---|---|
| Protocol | WireGuard | Modern, simple, fast VPN |
| Port | UDP 51820 | Single port, firewall-friendly |
| Authentication | Ed25519 key pairs + Preshared Key (PSK) | Defense in depth |
| Encryption | ChaCha20-Poly1305 | Fast on hardware without AES-NI |
| Key exchange | Curve25519 elliptic curve | 128-bit security |
| Tunnel network | 10.200.0.0/24 | Dedicated VPN subnet |
| Cloud endpoint | 10.200.0.1/32 | Single IP for cloud side |
| Edge endpoint | 10.200.0.2/32 | Single IP for edge side |
| AllowedIPs (cloud) | 10.200.0.2/32, 192.168.29.0/24 | Edge + camera network only |
| AllowedIPs (edge) | 10.100.0.0/16, 10.200.0.0/24 | Full cloud VPC + VPN |
| Keepalive | 25 seconds | Prevents NAT timeout |
| Key rotation | 365 days | Annual rotation via maintenance window |
12.5.2 Network Segmentation Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ NETWORK ARCHITECTURE │
│ │
│ INTERNET │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ AWS WAF │ │
│ │ + ALB │ │
│ └──────┬───────┘ │
│ │ │
│ ═══════╪════════════════ AWS CLOUD VPC: 10.100.0.0/16 ═══════════════════ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ PUBLIC SUBNET: 10.100.1.0/24 │ │
│ │ │ - ALB (Application Load Balancer) │ │
│ │ │ - NAT Gateway │ │
│ │ │ - WireGuard VPN Gateway (10.200.0.1) │ │
│ │ │ - Bastion Host (emergency SSH, admin IPs only) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ └────▶┌──────────────────────────────────────────────────────┐ │
│ │ PRIVATE SUBNET: 10.100.2.0/24 (App Tier) │ │
│ │ - EKS Worker Nodes (API, AI, Web pods) │ │
│ │ - Stream Ingestion Service │ │
│ │ - Alert Engine │ │
│ │ - Notification Service │ │
│ └──────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ DATA SUBNET: 10.100.3.0/24 (No Internet) │ │
│ │ │ - RDS PostgreSQL (Multi-AZ) │ │
│ │ │ - ElastiCache Redis Cluster │ │
│ │ │ - Amazon MSK Kafka │ │
│ │ │ - NO INTERNET ACCESS (VPC endpoints only) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ MONITORING SUBNET: 10.100.4.0/24 │ │
│ │ │ - Prometheus, Grafana, Alertmanager │ │
│ │ │ - Loki (log aggregation) │ │
│ │ │ - Jaeger (distributed tracing) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ════════════╪══════════════════════════════════════════════════════════ │
│ │ │
│ │ WireGuard VPN Tunnel (UDP 51820) │
│ │ │
│ ════════════╪══════════════════════════════════════════════════════════ │
│ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │
│ │ │ EDGE GATEWAY: 192.168.29.5/24 (Intel NUC) │ │
│ │ │ OS: Ubuntu Server 22.04 LTS (minimal) │ │
│ │ │ - Docker Compose stack │ │
│ │ │ - WireGuard Client (10.200.0.2) │ │
│ │ │ - Local MinIO (hot storage) │ │
│ │ │ - Redis (local cache) │ │
│ │ │ - Video Capture Service │ │
│ │ │ - AI Inference (edge models) │ │
│ │ └──────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ ┌─────────────────────────┴──────────────────────┐ │
│ │ │ CAMERA LAN: 192.168.29.0/24 │ │
│ │ │ - CP PLUS DVR: 192.168.29.200 (8 channels) │ │
│ │ │ - RTSP streams on port 554 │ │
│ │ │ - NO INTERNET ACCESS │ │
│ │ │ - NO ROUTE TO CLOUD (only via edge gateway) │ │
│ │ └────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
12.5.3 Firewall Rules
Edge Gateway Firewall (iptables):
| Direction | Protocol | Port | Source | Destination | Action | Purpose |
|---|---|---|---|---|---|---|
| IN | TCP | 22 | Admin IP range only | Edge gateway | ACCEPT | SSH management |
| IN | UDP | 51820 | Cloud VPN IP | Edge gateway | ACCEPT | WireGuard tunnel |
| IN | TCP | 8080 | Local LAN only | Edge gateway | ACCEPT | Admin UI |
| IN | — | — | Any | Edge gateway | DROP | Default deny |
| OUT | TCP | 443 | Edge gateway | AWS S3 endpoint | ACCEPT | Cloud storage sync |
| OUT | UDP | 51820 | Edge gateway | Cloud VPN IP | ACCEPT | WireGuard tunnel |
| OUT | TCP | 8080 | Edge gateway | Local LAN | ACCEPT | Internal services |
| OUT | — | — | Edge gateway | Internet | DROP | No direct internet |
Cloud Firewall (AWS Security Groups):
| Direction | Protocol | Port | Source | Action | Purpose |
|---|---|---|---|---|---|
| IN | TCP | 443 | 0.0.0.0/0 | ACCEPT | Public HTTPS |
| IN | UDP | 51820 | Edge gateway IP | ACCEPT | WireGuard |
| IN | TCP | 5432 | App security group | ACCEPT | PostgreSQL |
| IN | TCP | 6379 | App security group | ACCEPT | Redis |
| IN | TCP | 9092 | App security group | ACCEPT | Kafka |
| IN | TCP | 22 | Admin IPs only | ACCEPT | Bastion SSH |
| IN | — | — | Any | DROP | Default deny |
12.6 Secret Management
12.6.1 Vault Integration
All secrets are stored in HashiCorp Vault with automatic rotation policies:
| Secret Type | Encryption | Rotation Frequency | Rotation Method | Access Pattern |
|---|---|---|---|---|
| Database passwords | AES-256-GCM | 90 days | Terraform + Vault dynamic credentials | Short-lived (1-hour TTL) |
| JWT signing keys | AES-256-GCM | 180 days | Dual-key grace period | Zero-downtime rotation |
| Internal API keys | AES-256-GCM | 90 days | Zero-downtime rotation | Automated |
| Telegram bot tokens | AES-256-GCM | 180 days | Regenerate via BotFather | Semi-automated |
| WhatsApp API tokens | AES-256-GCM | 180 days | Regenerate via Meta Business Manager | Semi-automated |
| DVR credentials | AES-256-GCM | 180 days | Manual via DVR web UI | Manual |
| TLS certificates | ACME auto | 60 days | cert-manager + Let's Encrypt | Fully automated |
| WireGuard keys | AES-256-GCM | 365 days | Maintenance window rotation | Scripted |
| Backup encryption keys | AES-256-GCM | 365 days | Re-encrypt all backups | Automated |
| Session secrets | AES-256-GCM | On security incident | Immediate revocation | Admin trigger |
12.6.2 Dynamic Database Credentials
Instead of static database passwords, the system uses Vault's dynamic credential engine:
Application → Vault (request db credentials)
│
▼
Vault creates temporary DB user
(TTL: 1 hour, auto-revoke)
│
▼
Application receives credentials
Uses them for DB connections
│
▼
After TTL expires → Vault revokes DB user
Application requests new credentials
Benefits:
- No long-lived database passwords in application configuration
- Each application instance gets unique credentials
- Automatic credential rotation without application restart
- Full audit trail of credential issuance and revocation
- Instant credential revocation on compromise
12.6.3 Field-Level Encryption
PII and biometric data in the database uses AES-256-GCM field-level encryption:
| Field Category | Example Fields | Encryption |
|---|---|---|
| Personal identification | name_encrypted, email_encrypted, phone_encrypted |
AES-256-GCM per-field |
| Employment data | employee_id_encrypted, department_encrypted |
AES-256-GCM per-field |
| Biometric data | face_encoding_encrypted (512-D vector) |
AES-256-GCM per-field |
| Media metadata | location_encrypted (GPS coordinates) |
AES-256-GCM per-field |
Encryption Architecture:
Application receives plaintext data
│
▼
[Encrypt field-by-field using Vault KMS]
│
▼
Store ciphertext in PostgreSQL
│
▼
[Decrypt only in application layer when needed]
│
▼
Decrypted data never logged, never cached
12.7 Audit Logging
12.7.1 Tamper-Resistant Hash-Chain
The audit log implements a cryptographically linked chain to ensure integrity:
| Field | Purpose | Example |
|---|---|---|
event_id |
Unique UUID for each audit event | 550e8400-e29b-41d4-a716-446655440000 |
timestamp |
ISO 8601 timestamp | 2025-01-16T14:32:15Z |
event_type |
Category of event | user_login, person_viewed, alert_acknowledged |
actor_id |
User who performed the action | user-uuid-here |
actor_role |
Role of the actor at the time | operator |
resource_type |
Type of resource accessed | person, camera, alert |
resource_id |
Specific resource identifier | person-123, cam-01 |
action |
Action performed | view, edit, delete, create |
result |
Success or failure | success, failure, denied |
ip_address |
Source IP address | 10.100.2.15 |
session_id |
Session identifier | sess-uuid-here |
previous_hash |
SHA-256 hash of the previous entry | a3f5c2... |
entry_hash |
SHA-256 hash of current entry content | b7e1d9... |
signature |
ECDSA signature of the entry hash | 30450221... |
Chain Verification: Any modification to historical entries invalidates all subsequent hashes and signatures, making tampering detectable.
12.7.2 Log Retention Policy
| Log Type | Online Retention | Archive Retention | Storage Type |
|---|---|---|---|
| Authentication events | 1 year | 6 years | WORM (Write-Once-Read-Many) |
| Authorization decisions | 1 year | 6 years | WORM |
| Person data modifications | 1 year | 6 years | WORM |
| Alert actions (ack, resolve) | 1 year | 3 years | Standard |
| Configuration changes | 2 years | 5 years | Standard |
| Security events | 1 year | 6 years | WORM |
| System health events | 90 days | 1 year | Standard |
| API access logs | 90 days | 1 year | Standard |
12.7.3 Real-Time Security Alerting
Automated detection rules trigger alerts on suspicious patterns:
| Rule ID | Rule Name | Condition | Auto-Response |
|---|---|---|---|
| SEC-001 | Brute force login | > 5 failed logins from same IP in 5 minutes | Block IP for 1 hour; alert security team |
| SEC-002 | Credential stuffing | > 10 unique usernames from same IP in 5 minutes | Block IP for 24 hours; alert security team |
| SEC-003 | Impossible travel | Logins > 500 km apart within 1 hour | Force MFA re-verification; alert security team |
| SEC-004 | Privilege escalation | > 20 admin actions in 10 minutes from new user | Alert security team; log for review |
| SEC-005 | Data exfiltration | > 1 GB downloaded by single user in 1 hour | Suspend account; alert security team |
| SEC-006 | Off-hours admin | Admin action between 22:00-06:00 | Log + notify security manager |
| SEC-007 | MFA bypass attempt | > 3 MFA failures then success without MFA | Block account; alert security team |
| SEC-008 | Suspicious media access | > 50 media downloads by non-security role | Alert security team |
| SEC-009 | Unknown device login | Login from unrecognized device fingerprint | Require MFA; notify user |
| SEC-010 | Concurrent sessions | > 3 concurrent sessions for same user | Force logout of oldest session |
12.8 Media Access Security
12.8.1 Signed URL Architecture
Media files are never served directly from object storage. All access is mediated through signed URLs:
| Parameter | Value | Notes |
|---|---|---|
| Default expiration | 5 minutes | Short-lived to prevent sharing |
| Maximum expiration | 1 hour | For bulk exports only |
| URL binding | Tied to user session | Invalidated on logout |
| Single-use option | Available for sensitive media | Blacklist incident footage |
| Access logging | Every media request logged | User ID, media ID, timestamp, IP |
| IP binding | Optional | URL valid only from requesting IP |
| Watermarking | Optional | Username/timestamp overlay on images |
Signed URL Flow:
1. User requests to view media
2. System checks: authentication + authorization + consent
3. If allowed: generate signed URL with HMAC-SHA256 signature
4. URL format: https://cdn.example.com/media/{id}?token={jwt}&sig={hmac}
5. Redirect user to signed URL
6. CDN/Object storage validates signature and expiry
7. Media served if valid; 403 if expired or invalid
8. Access logged with full context
12.8.2 Media Access Controls
| Control | Implementation |
|---|---|
| No direct S3/MinIO URLs | All access via signed URL proxy |
| Authentication required | Valid JWT session required for all media requests |
| Authorization enforced | RBAC checks per media item; camera-level permissions respected |
| Access logging | Every media request logged with user ID, media ID, timestamp, IP, session |
| DPO notification | Automatic notification for access to sensitive media (blacklist incidents) |
| Secure deletion | Overwrite with random data + verification before removal |
| Download tracking | Number of downloads per media item tracked and reported |
12.9 API Security
12.9.1 Defense Layers
| Layer | Implementation | Details |
|---|---|---|
| Rate limiting | Per-endpoint, per-user tiers | Token bucket algorithm; 100 req/min default; 10 req/min for auth endpoints |
| Input validation | Pydantic models on all endpoints | Strict type checking; reject unknown fields; max length limits |
| SQL injection prevention | Parameterized queries only | No dynamic SQL construction; ORM for all database access |
| XSS prevention | Output encoding + CSP headers | User input never rendered as HTML; Content-Security-Policy enforced |
| CSRF protection | SameSite=Strict cookies + tokens | State-changing operations require CSRF token validation |
| CORS | Restricted to known origins | No wildcard origins; explicit allowlist per environment |
| Request size limits | 10 MB default; 50 MB for media upload | Prevents DoS via large payloads |
| Request timeout | 30 seconds default | Prevents resource exhaustion |
12.9.2 Security Headers
| Header | Value | Purpose |
|---|---|---|
Strict-Transport-Security |
max-age=63072000; includeSubDomains; preload |
Enforce HTTPS for 2 years |
X-Content-Type-Options |
nosniff |
Prevent MIME-type sniffing |
X-Frame-Options |
DENY |
Prevent clickjacking |
X-XSS-Protection |
0 |
Disabled — CSP is preferred defense |
Referrer-Policy |
strict-origin-when-cross-origin |
Minimal referrer information |
Permissions-Policy |
camera=(), microphone=(), geolocation=() |
Disable browser APIs not needed |
Content-Security-Policy |
default-src 'self'; script-src 'self' 'nonce-{random}'; style-src 'self' 'unsafe-inline'; img-src 'self' blob: data: https://*.amazonaws.com; media-src 'self' blob: https://*.amazonaws.com; connect-src 'self' wss://*.example.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; |
Comprehensive CSP |
Cache-Control (API) |
no-store, no-cache, must-revalidate, proxy-revalidate |
Prevent caching of API responses |
Pragma (API) |
no-cache |
Legacy cache directive |
12.10 Session Security
| Parameter | Value | Notes |
|---|---|---|
| Cookie flags | HttpOnly; Secure; SameSite=Strict |
Full protection against XSS and CSRF |
| Access token storage | Memory only (JavaScript variable) | Never stored in localStorage |
| Access token max-age | 15 minutes | Short-lived |
| Refresh token storage | HttpOnly secure cookie |
Cannot be accessed by JavaScript |
| Refresh token max-age | 7 days | Long-lived but revocable |
| Session absolute timeout | 8 hours | Force re-login after 8 hours |
| Idle timeout | 30 minutes | Expire if no activity |
| Max concurrent sessions | 3 per user | Prevents session abuse |
| Session fixation protection | Regenerate session ID on login | Prevent fixation attacks |
| Session binding | Browser fingerprint + IP validation | Detect session theft |
| Force logout capability | Admin can revoke all sessions for any user | Immediate effect via Redis |
| Session storage | Redis with AUTH enabled | Encrypted at rest |
12.11 Data Privacy (GDPR Compliance)
12.11.1 GDPR Compliance Matrix
| GDPR Principle | Implementation Detail | Evidence |
|---|---|---|
| Lawful Basis | Legitimate interest assessment documented per processing purpose | LIA document filed with DPO |
| Data Minimization | Only facial feature embeddings (512-D vector) stored; raw images discarded after encoding | Architecture documentation |
| Purpose Limitation | Facial data used ONLY for security/safety purposes; no marketing or secondary use | Privacy policy |
| Storage Limitation | Automated retention enforcement; cryptographic deletion after expiry | Retention policy configuration |
| Accuracy | Regular review and correction procedures; user can request correction | Data correction workflow |
| Integrity & Confidentiality | AES-256-GCM encryption, RBAC access controls, audit logging | Security architecture |
| Accountability | DPO appointed; Privacy Impact Assessment completed; Records of Processing maintained | Compliance documentation |
| Transparency | Privacy notice displayed at camera entry points; privacy policy on website | Physical signage + web policy |
12.11.2 Consent Management
Consent is managed through a comprehensive lifecycle:
| Stage | Description | Transition Trigger |
|---|---|---|
pending |
Consent requested but not yet obtained | Initial system setup |
granted |
Explicit consent obtained | User signs consent form |
withdrawn |
Consent actively withdrawn | User requests deletion/stop processing |
deleted |
All data removed; audit trail only | Deletion workflow complete |
Consent Metadata:
| Field | Description |
|---|---|
| Consent method | written / digital / verbal |
| Consent document reference | ID of signed consent form |
| Consent date | When consent was obtained |
| Consent recorder | Who recorded the consent |
| Consent expiry | Annual expiry date |
| Consent scope | What processing is consented to |
Withdrawal Processing:
- User submits withdrawal request (any channel)
- System flags person record for deletion
- Delete face embeddings (biometric data) within 72 hours
- Delete all personal images from storage
- Anonymize detection events (keep event, replace name with
[REDACTED], remove person link) - Delete related event clips
- Log all deletion actions in audit trail
- Confirm completion to user within 30 days
12.11.3 Privacy Mode Controls
Four privacy modes are available per camera:
| Mode | Recording | Face Recognition | Alerts | Live View | Use Case |
|---|---|---|---|---|---|
| Full Operation | Yes | Yes | All | Yes | Standard surveillance |
| Recording Only | Yes | No | Motion only (no face) | Yes | Areas where facial recognition is not needed |
| Live View Only | No | No | No | Yes | Privacy-sensitive areas; viewing only |
| Privacy Mode | No | No | No | Privacy overlay | Break rooms, restrooms — privacy completely protected |
12.12 Edge Gateway Security
12.12.1 Hardening Checklist
| # | Hardening Measure | Implementation |
|---|---|---|
| 1 | Minimal OS | Ubuntu Server 22.04 LTS — no desktop packages |
| 2 | Disabled Bluetooth | systemctl stop bluetooth; systemctl disable bluetooth |
| 3 | Disabled WiFi | nmcli radio wifi off; modprobe -r iwlwifi |
| 4 | Disabled CUPS | systemctl stop cups; systemctl disable cups |
| 5 | Disabled avahi/mDNS | systemctl stop avahi-daemon; systemctl disable avahi-daemon |
| 6 | Disabled snapd | systemctl stop snapd; systemctl disable snapd |
| 7 | Disabled modemmanager | systemctl stop ModemManager; systemctl disable ModemManager |
| 8 | SSH key-only | PasswordAuthentication no; PubkeyAuthentication yes |
| 9 | SSH LAN-only | ListenAddress 192.168.29.5 |
| 10 | SSH root disabled | PermitRootLogin no |
| 11 | SSH rate limit | MaxAuthTries 3; ClientAliveInterval 300 |
| 12 | SSH protocol 2 | Protocol 2 (only) |
| 13 | SSH modern ciphers | Ciphers chacha20-poly1305@openssh.com |
| 14 | Auto-updates | unattended-upgrades — security updates only |
| 15 | Update schedule | Daily at 03:00; auto-reboot at 04:00 if required |
| 16 | Disk encryption | LUKS + TPM2 auto-unseal |
| 17 | Tamper detection | File integrity monitoring (AIDE) for critical config |
| 18 | Container security | Non-root users, read-only root FS, no new privileges |
| 19 | Firewall | iptables default deny; explicit allow only |
| 20 | No internet access | All outbound traffic via VPN tunnel only |
12.12.2 LUKS Disk Encryption with TPM2
The edge gateway uses LUKS full-disk encryption with TPM2 auto-unseal for headless operation:
# During setup — encrypt the data partition
cryptsetup luksFormat /dev/nvme0n1p2 \
--type luks2 \
--cipher aes-xts-plain64 \
--key-size 512 \
--pbkdf argon2id \
--tpm2-device=auto
# Bind the LUKS key to TPM2 PCR measurements
cryptsetup luksAddKey /dev/nvme0n1p2 \
--key-slot 1 \
--tpm2-device=auto \
--tpm2-pcrs=0,2,7
# During boot — TPM2 auto-unseals if PCRs match
cryptsetup open --tpm2-device=auto /dev/nvme0n1p2 data
PCR Measurements Bound:
| PCR | Purpose |
|---|---|
| PCR 0 | Core system firmware executable code |
| PCR 2 | Extended or pluggable executable code |
| PCR 7 | Secure Boot state |
12.13 Cloud Infrastructure Security
| Control | Implementation | Verification |
|---|---|---|
| Private subnets | All internal services in private subnets; no public IPs | VPC flow logs |
| Security groups | Least privilege; explicit allow only; no default allow-all | Quarterly review |
| Database access | No public access; app servers only via security group reference | AWS Config rule |
| Bastion host | Emergency access only; non-standard SSH port (2222); admin IP allowlist only | Access log audit |
| IMDSv2 | Enforced on all EC2 instances; no IMDSv1 fallback | Instance metadata check |
| Container security | Non-root users, read-only root FS, no new privileges, drop ALL capabilities | Pod Security admission |
| Image scanning | Trivy + Snyk on every build; HIGH/CRITICAL vulnerabilities block deployment | CI/CD pipeline gate |
| Image signing | Cosign signature verification required before deployment | Admission controller |
| Resource quotas | Kubernetes LimitRange on all namespaces | Resource quota monitoring |
| Network policies | Default deny all ingress/egress; explicit rules per service | Policy audit |
| Pod Security | Restricted standard enforced cluster-wide | Pod Security admission |
| Secrets management | Vault + External Secrets Operator; no secrets in Git | Secret scanning |
| Logging | All AWS API calls logged via CloudTrail; VPC Flow Logs enabled | Log analysis |
12.14 Secrets Rotation Policy
| Secret Type | Frequency | Method | Automation | Rollback |
|---|---|---|---|---|
| Database passwords | 90 days | Terraform + Vault dynamic credentials | Full | N/A (short-lived) |
| JWT signing keys | 180 days | Dual-key grace period; new key signs, old key verifies for 7 days | Full | Keep old key for 7 days |
| Internal API keys | 90 days | Zero-downtime: add new key, deploy, remove old key | Full | Immediate via config revert |
| Telegram/WhatsApp tokens | 180 days or on suspicion | Generate new via provider, update Vault, 5-min grace, revoke old | Semi | Old token valid for 5-minute grace |
| TLS certificates | 60 days | cert-manager + Let's Encrypt auto-renewal | Full | Previous certificate cached |
| WireGuard keys | 365 days | Maintenance window: generate new keys, update both endpoints simultaneously | Scripted | Manual key restore |
| DVR credentials | 180 days | Manual via DVR web UI | Manual | Previous password documented |
| Backup encryption keys | 365 days | Generate new key, re-encrypt all backups in background | Full | Previous key kept for 30 days |
| Session secrets | On security incident | Immediate: generate new secret, force all re-authentication | Admin trigger | Not applicable |
12.15 Incident Response
12.15.1 Security Event Detection and Response
| Phase | Timeline | Actions | Responsible |
|---|---|---|---|
| Detection | Automated (real-time) | Automated rules + behavioral analysis detect anomaly; alert generated | System |
| Assessment | 0-15 minutes | On-call engineer evaluates severity; determines if genuine security event | On-call Engineer |
| Containment | 15-60 minutes | Isolate affected systems; revoke compromised credentials; block malicious IPs | Security Team |
| Eradication | 1-4 hours | Remove root cause; patch vulnerabilities; rotate all exposed secrets | Engineering |
| Recovery | 4-24 hours | Restore from clean backups; verify system integrity; re-enable services | Platform Team |
| Lessons Learned | 24-48 hours | Post-mortem; update procedures; implement preventive measures | Security Team |
12.15.2 Breach Notification Procedure
| Phase | Timeline | GDPR Requirement | Actions |
|---|---|---|---|
| Detection & Assessment | 0-24 hours | — | Confirm breach; contain; assemble response team |
| Investigation | 24-72 hours | Article 33(1) | Forensic analysis; determine scope of affected data |
| Supervisory Authority | Within 72 hours | Article 33 | Notify Data Protection Authority |
| Data Subjects | Without undue delay | Article 34 | Notify affected individuals if high risk |
| Recovery | Post-notification | — | Restore from clean backups; apply patches |
| Post-Incident | Within 48 hours | Article 5(2) | Root cause analysis; update plans; document |
12.15.3 Breach Severity Classification
| Level | Criteria | Notification Required | Example |
|---|---|---|---|
| Low | No personal data accessed | Internal only | Failed attack attempt; no data exposure |
| Medium | Limited personal data; no sensitive data | DPA notification | Username/email list exposed |
| High | Sensitive personal data or biometric data accessed | DPA + Data subjects | Facial embeddings database accessed |
| Critical | Large-scale biometric exfiltration; ongoing threat | DPA + Data subjects + Public | Ransomware attack with biometric data theft |
12.16 Security Checklist Summary
The complete security checklist contains 100+ items across 15 categories. The following table summarizes the key items per category:
| Category | Items | Key Requirements |
|---|---|---|
| SSL/TLS | 8 | TLS 1.3, strong cipher suites only, HSTS, OCSP stapling, auto-renewal |
| Authentication | 13 | Argon2id, JWT ES256, MFA enforcement, password policy, HaveIBeenPwned |
| RBAC | 7 | 4 roles, 30+ permissions, resource-level access, default deny |
| VPN & Network | 10 | WireGuard + PSK, 5 security zones, firewall deny-all, network policies |
| Secret Management | 10 | Vault storage, dynamic credentials, field encryption, rotation schedule |
| Audit Logging | 11 | Hash-chain integrity, 20+ fields per entry, WORM storage, real-time alerts |
| Media Access | 8 | Signed URLs, session-bound, 5-min expiry, single-use option, watermarking |
| API Security | 11 | Rate limiting, Pydantic validation, parameterized queries, CSP, CSRF, CORS |
| Session Security | 8 | HttpOnly/Secure/Strict cookies, 8h absolute timeout, 30m idle timeout |
| Data Privacy (GDPR) | 13 | Consent tracking, right to deletion, anonymization, DPO, PIA |
| Edge Gateway | 12 | 20-point hardening, LUKS + TPM2, tamper detection, auto-updates |
| Cloud Infrastructure | 11 | Private subnets, image scanning, Pod Security, IMDSv2, CloudTrail |
| Secrets Rotation | 7 | All types scheduled, 60-day TLS, 90-day DB, dual-key JWT |
| Incident Response | 9 | Detection rules, breach notification, severity classification, post-mortem |
| Total | 130+ | — |
Section 13: UX / Website Structure
13.1 Design System
13.1.1 Design Philosophy
The UX design follows a "dark cockpit" philosophy optimized for 24/7 surveillance operations. The interface minimizes eye strain during long monitoring shifts while ensuring critical information is immediately visible. All design decisions prioritize operator efficiency and rapid threat identification.
| Principle | Implementation |
|---|---|
| Dark mode default | Near-black background with blue-tinted grays to reduce eye strain in low-light environments |
| Information density | High-density layouts that maximize data visible without scrolling |
| At-a-glance status | Color-coded status indicators for immediate situational awareness |
| Progressive disclosure | Advanced controls hidden behind "Expand" toggles; essential info always visible |
| Consistent patterns | Same interaction patterns reused across all 18 pages |
| Responsive feedback | Every action produces visible feedback within 100ms |
13.1.2 Color Palette
| Token | Hex | RGBA | Usage | Contrast Ratio |
|---|---|---|---|---|
--bg-primary |
#0B0E14 |
rgb(11, 14, 20) | Main application background | — |
--bg-secondary |
#151922 |
rgb(21, 25, 34) | Card and panel backgrounds | — |
--bg-tertiary |
#1E2330 |
rgb(30, 35, 48) | Elevated surfaces, modals, dropdowns | — |
--bg-sidebar |
#0D1117 |
rgb(13, 17, 23) | Sidebar navigation background | — |
--bg-hover |
#1A2030 |
rgb(26, 32, 48) | Row/card hover state | — |
--bg-selected |
#1E3A5F |
rgb(30, 58, 95) | Selected item background | — |
--text-primary |
#E2E8F0 |
rgb(226, 232, 240) | Headings, important content | 15.8:1 |
--text-secondary |
#94A3B8 |
rgb(148, 163, 184) | Labels, descriptions, metadata | 9.2:1 |
--text-muted |
#64748B |
rgb(100, 115, 139) | Placeholder text, disabled states | 6.1:1 |
--accent-blue |
#3B82F6 |
rgb(59, 130, 246) | Primary accent — buttons, links, active states | 4.5:1 |
--accent-blue-hover |
#2563EB |
rgb(37, 99, 235) | Button/link hover state | 5.1:1 |
--accent-green |
#10B981 |
rgb(16, 185, 129) | Success, online status, positive trends | 5.3:1 |
--accent-red |
#EF4444 |
rgb(239, 68, 68) | Critical alerts, errors, offline status | 5.0:1 |
--accent-orange |
#F59E0B |
rgb(245, 158, 11) | Warnings, medium severity | 5.4:1 |
--accent-yellow |
#FBBF24 |
rgb(251, 191, 36) | Watchlist indicators, highlights | 6.1:1 |
--accent-purple |
#8B5CF6 |
rgb(139, 92, 246) | AI features, special highlights | 4.8:1 |
--border-color |
#1E293B |
rgb(30, 41, 59) | Card borders, dividers, separators | — |
--border-focus |
#3B82F6 |
rgb(59, 130, 246) | Focus ring color | — |
--shadow-sm |
0 1px 2px rgba(0,0,0,0.3) |
— | Subtle elevation | — |
--shadow-md |
0 4px 6px rgba(0,0,0,0.4) |
— | Card elevation | — |
--shadow-lg |
0 10px 25px rgba(0,0,0,0.5) |
— | Modal/dialog elevation | — |
13.1.3 Typography
| Token | Font Family | Size | Weight | Line Height | Letter Spacing | Usage |
|---|---|---|---|---|---|---|
| Display | Inter | 28px | 700 (Bold) | 1.2 | -0.02em | Page titles |
| H1 | Inter | 22px | 600 (Semi-bold) | 1.3 | -0.01em | Section headings |
| H2 | Inter | 18px | 600 (Semi-bold) | 1.4 | 0 | Card titles, modal headers |
| H3 | Inter | 15px | 500 (Medium) | 1.4 | 0 | Sub-sections, form labels |
| Body | Inter | 14px | 400 (Regular) | 1.5 | 0 | General text, descriptions |
| Body Small | Inter | 13px | 400 (Regular) | 1.5 | 0 | Secondary body text |
| Caption | Inter | 12px | 400 (Regular) | 1.4 | 0.01em | Captions, metadata, footnotes |
| Timestamp | JetBrains Mono | 12px | 400 (Regular) | 1.4 | 0 | All timestamps, durations |
| Code | JetBrains Mono | 13px | 400 (Regular) | 1.5 | 0 | Code snippets, IDs, technical data |
| Badge | Inter | 11px | 500 (Medium) | 1 | 0.02em | Status badges, tags |
13.1.4 Spacing and Layout
| Token | Value | Usage |
|---|---|---|
| Sidebar expanded | 260px | Full navigation with labels and icons |
| Sidebar collapsed | 72px | Icons only; hover for tooltip |
| Top bar height | 56px | Clock, alerts, user menu |
| Content padding | 24px | Page content horizontal padding |
| Content max-width | 1400px | Maximum content width; centered above |
| Card padding | 16px | Internal card padding |
| Card border radius | 12px | Card and panel corners |
| Card gap | 16px | Gap between cards in grid |
| Button border radius | 8px | Button corners |
| Input border radius | 6px | Form input corners |
| Modal border radius | 16px | Modal/dialog corners |
| Toast border radius | 8px | Toast notification corners |
| Avatar size (small) | 24px | Inline avatars |
| Avatar size (medium) | 40px | Card headers, lists |
| Avatar size (large) | 64px | Profile pages |
| Icon size (default) | 20px | Navigation and actions |
| Icon size (small) | 16px | Inline icons |
| Scrollbar width | 8px | Custom styled scrollbar |
13.2 Global Navigation Structure
13.2.1 Layout Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ [Logo] Sentinel AI Surveillance [Clock] [Alerts] [👤 User] │ ▲ 56px
├────────┬───────────────────────────────────────────────────────────────────┤
│ │ │
│ [📊] │ MAIN CONTENT AREA │
│ Dash │ │
│ board │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ │ Card 1 │ │ Card 2 │ │ Card 3 │ │
│ [📹] │ │ │ │ │ │ │ │
│ Live │ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ [🔔] │ ┌──────────────────────────────────────────────────┐ │
│ Alerts │ │ Wide Card / Table │ │
│ │ └──────────────────────────────────────────────────┘ │
│ [🔍] │ │
│ Detec │ │
│ tions │ │
│ │ │
│ [remaining navigation items...] │
│ │ │
├────────┤ │
│◁ / ▷ │ │
└────────┴───────────────────────────────────────────────────────────────────┘
◄── 260px (expanded) / 72px (collapsed) ──►
13.2.2 Navigation Menu Items
| # | Icon | Label | Route | Badge Type | Required Permission |
|---|---|---|---|---|---|
| 1 | LayoutDashboard |
Dashboard | /dashboard |
None | Any |
| 2 | Video |
Live View | /live |
Online camera count | cameras:view |
| 3 | Bell |
Alert Center | /alerts |
Pending alert count | alerts:view |
| 4 | ScanEye |
Detections | /detections |
None | cameras:view |
| 5 | Users |
Person Gallery | /persons |
Total person count | persons:view |
| 6 | UserQuestion |
Unknown Review | /unknowns |
Queue count | persons:view |
| 7 | ClockAlert |
Suspicious Activity | /timeline |
None | alerts:view |
| 8 | Search |
Search | /search |
None | Any |
| 9 | ShieldAlert |
Watchlists | /watchlists |
None | watchlists:view |
| 10 | Sparkles |
AI Vibe Settings | /settings/ai |
None | ai_settings:view |
| 11 | Brain |
Training Review | /training |
Pending suggestions | ai_settings:view |
| 12 | Activity |
System Health | /health |
Status dot (green/yellow/red) | system:view |
| 13 | Settings |
Settings | /settings |
None | Admin functions |
Settings Submenu:
| # | Icon | Label | Route | Required Permission |
|---|---|---|---|---|
| 13a | Camera |
Camera Management | /settings/cameras |
cameras:manage |
| 13b | HardDrive |
Retention & Storage | /settings/storage |
storage:manage |
| 13c | UserCog |
Admin Users | /settings/users |
users:manage |
| 13d | BellRing |
Notification Settings | /settings/notifications |
notifications:manage |
13.2.3 Top Bar
| Element | Position | Content | Update Frequency |
|---|---|---|---|
| Logo + Brand | Left | Sentinel AI logo + text | Static |
| Current Time | Center-Right | HH:MM:SS live clock |
Every second |
| Alert Badge | Right | Bell icon with red count badge | On alert change |
| User Menu | Far right | Avatar + dropdown menu | Static |
User Menu Dropdown:
| Item | Action |
|---|---|
| Profile | Navigate to user profile |
| Preferences | Theme, timezone, notification preferences |
| Keyboard Shortcuts | Show shortcut reference modal |
| Help & Documentation | Open help center |
| Logout | End session (clears all tokens) |
13.3 Page Descriptions
13.3.1 Page 1: Login (/login)
The login page is the entry point to the system. It is designed for quick, secure access with minimal friction.
| Feature | Specification |
|---|---|
| Layout | Centered card on dark background |
| Logo | Sentinel AI logo (large) centered above form |
| Fields | Username/email (text input), Password (password input with show/hide toggle) |
| Remember me | Checkbox — "Keep me signed in for 7 days" |
| Submit | "Sign In" button — full width, accent blue |
| MFA step | Appears after successful password; 6-digit TOTP input with auto-focus |
| Error states | Inline validation; shake animation on error |
| Footer | "v2.3.1" version number, copyright, privacy policy link |
| Security | Rate limiting (5 attempts / 15 min), CAPTCHA after 3 failures |
| Redirect | After login, redirect to originally requested URL (or Dashboard) |
| Session | JWT access token (15 min) + refresh token cookie (7 days) |
13.3.2 Page 2: Dashboard (/dashboard)
The Dashboard is the primary landing page providing at-a-glance situational awareness.
┌──────────────────────────────────────────────────────────────────────────────┐
│ Dashboard [Refresh] [Date Range] │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │ 📹 8/8 │ │ 🔔 12 │ │ 👥 47 │ │ ✓ Healthy │ │
│ │ Cameras │ │ Alerts Today │ │ Persons │ │ System │ │
│ │ Online │ │ 3 Critical │ │ Detected │ │ All Good │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ └──────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ ┌──────────────────────────┐ │
│ │ Alert Distribution (Last 24 Hours) │ │ Recent Alerts │ │
│ │ │ │ │ │
│ │ 8 ┤ ██ │ │ 🔴 CAM-01 Unknown │ │
│ │ 6 ┤ ██ ██ ██ │ │ 14:32 — Entrance │ │
│ │ 4 ┤ ██ ██ ██ ██ ██ │ │ 🟡 CAM-03 Watchlist │ │
│ │ 2 ┤ ██ ██ ██ ██ ██ ██ ██ │ │ 13:15 — Parking │ │
│ │ 0 ┼────┬────┬────┬────┬────┬────┬── │ │ 🟠 CAM-05 System │ │
│ │ 00 04 08 12 16 20 │ │ 12:08 — Storage 90% │ │
│ │ │ │ │ │
│ └────────────────────────────────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Camera Status Grid (2x4) │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ CAM-01 │ │ CAM-02 │ │ CAM-03 │ │ CAM-04 │ │ │
│ │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ │
│ │ │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ CAM-05 │ │ CAM-06 │ │ CAM-07 │ │ CAM-08 │ │ │
│ │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ [LIVE] │ │ │
│ │ │ ● Online│ │ ● Online│ │ ● Online│ │ ● Online│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Activity Feed │ │
│ │ 14:32 — Unknown person detected at CAM-01 (Entrance) │ │
│ │ 14:15 — Watchlist match: John Smith at CAM-03 (Parking) │ │
│ │ 13:58 — Operator Alice acknowledged alert #ALT-2847 │ │
│ │ 13:42 — Camera CAM-05 stream reconnected │ │
│ │ 13:30 — Daily training completed: 3 new face clusters │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
Dashboard Components:
| Component | Refresh Rate | Description |
|---|---|---|
| Stat cards | 30 seconds | Active cameras, alerts today, persons detected, system health |
| Alert distribution chart | 5 minutes | Bar chart showing alerts by hour for last 24 hours |
| Recent alerts card | 30 seconds | Last 5 alerts with severity badge, camera, timestamp |
| Camera status grid | 30 seconds | 2x4 grid of all 8 cameras with live thumbnail and status dot |
| Activity feed | Real-time (WebSocket) | Recent system events — detections, alerts, operator actions |
13.3.3 Page 3: Live Camera View (/live)
The live view is the primary monitoring interface, showing real-time streams from all 8 cameras.
| Feature | Specification |
|---|---|
| Default layout | 2x4 grid (8 cameras) |
| Layout options | 1x1 (single), 2x2 (4 cameras), 2x4 (8 cameras), 4x4 (16 cameras for future scaling) |
| Stream format | HLS (HTTP Live Streaming) with WebRTC fallback for lower latency |
| Per-camera overlay | Camera name, status dot, expand button, snapshot button |
| Grid controls | Play all / Pause all, Refresh all streams, Layout selector |
| Camera states | Loading (spinner), Playing, Paused, Error (retry button), Offline (gray placeholder) |
| Fullscreen | Click any camera to expand; press F to toggle fullscreen for focused camera |
| Camera switching | Press 1-8 to focus camera by number |
| Snapshot | Press S or click camera snapshot button to capture current frame |
| Recording indicator | Red pulsing dot on cameras actively recording |
| Alert overlay | Flashing border on camera that triggered recent alert |
13.3.4 Page 4: Alert Center (/alerts)
The Alert Center provides comprehensive alert management with filtering, batch actions, and detailed investigation tools.
| Feature | Specification |
|---|---|
| Filter bar | Date range picker, severity multi-select (Critical/High/Medium/Low/Info), camera multi-select, status filter (Pending/Acknowledged/Resolved/Ignored), type filter |
| Severity legend | Color-coded badges: Critical (red), High (orange), Medium (yellow), Low (blue), Info (gray) |
| Alert cards | Each card: thumbnail image, camera name, timestamp, severity badge, person name (if known), description, current status |
| Card actions | Acknowledge, Resolve, Ignore, View Details, Mark False Positive |
| Bulk actions | Checkbox selection; batch Acknowledge or Ignore |
| Sort options | Newest first (default), Oldest first, Severity (highest first), Camera name |
| Pagination | 20 alerts per page; infinite scroll option |
| Empty state | "No alerts in the selected period" with illustration |
| Detail panel | Slide-out panel with full alert info: images, video clip, AI confidence, detection metadata, person profile link |
13.3.5 Page 5: Recent Detections (/detections)
Shows all recent detection events with face thumbnails and recognition results.
| Feature | Specification |
|---|---|
| Filter controls | Known/Unknown/All toggle, date range picker, camera selector, person name search |
| Detection cards | Face thumbnail + name (or "Unknown") + confidence percentage + camera name + timestamp + watchlist badge |
| Card click | Opens detail view with full-size image, sighting history for that person, camera info |
| Actions | "Name This Person" (unknowns), "View Profile" (known), "Add to Watchlist" |
| Confidence indicator | Visual bar showing confidence level; color-coded (green > 90%, yellow 70-90%, orange < 70%) |
| Grid layout | 4 columns desktop, 3 tablet, 2 mobile |
| Auto-refresh | New detections appear at top without page reload (WebSocket) |
13.3.6 Page 6: Person Gallery (/persons)
A browsable gallery of all known persons in the system.
| Feature | Specification |
|---|---|
| Search bar | Full-text search across names, roles, departments, tags |
| Role filters | Employee / Visitor / Vendor / Contractor / Other — pill-style toggle buttons |
| Sort options | Name (A-Z), Last Seen (recent first), Sightings Count (highest first), Date Added (newest first) |
| Person cards | Face image, name, role badge, department, last seen timestamp, total sightings count |
| Grid layout | 5 columns desktop (xl), 4 columns (lg), 3 columns (md), 2 columns (sm) |
| Pagination | 50 persons per page |
| Actions | Click card → navigate to Person Profile; right-click context menu |
| Bulk actions | Select multiple for bulk add to watchlist |
| Empty state | "No persons found" with "Add your first person" CTA |
13.3.7 Page 7: Unknown Persons Review (/unknowns)
The review queue for unidentifified persons — a critical workflow for building the person database.
| Feature | Specification |
|---|---|
| Queue view | Cards of unknown person clusters (grouped by face similarity via DBSCAN) |
| Cluster card | Representative face image + cluster size (number of sightings) + first/last seen + cameras detected at + confidence range |
| Actions per cluster | Name This Person, Merge with Existing, Ignore Cluster, Mark as Reviewed |
| AI insight panel | Pattern suggestion: "Seen 5x at entrance between 08:00-09:00 — possibly employee" |
| Progress indicator | "23 unknown clusters remaining" with progress bar |
| Batch review | Keyboard navigation (arrow keys + Enter to select action) for rapid review |
| Empty state | "Great job! No unknown persons to review. All caught up!" with celebration animation |
| Reviewed history | Tab to view previously reviewed clusters |
13.3.8 Page 8: Person Profile (/persons/{id})
Detailed view of a single person's information, detection history, and management options.
| Feature | Specification |
|---|---|
| Header | Name, role badge, status (Active/Inactive), action buttons (Edit, Delete, Add to Watchlist) |
| Photo gallery | Primary face photo (large) + additional reference photos in thumbnail grid below |
| Info panel | Department, employee ID, contact information, notes, tags, date added, added by |
| Sighting history | Timeline of all detections — timestamp, camera name, confidence, thumbnail image |
| Sighting stats | Total sightings, first seen, last seen, most common camera, most common time |
| Watchlist memberships | Which watchlists this person belongs to, with badge per watchlist |
| Activity log | Who created/edited the profile and when; full audit trail |
| Danger zone | Delete person (with confirmation dialog explaining consequences) |
13.3.9 Page 9: Suspicious Activity Timeline (/timeline)
A timeline-based visualization of flagged events for pattern analysis.
| Feature | Specification |
|---|---|
| Timeline view | Horizontal time axis with event markers positioned by timestamp |
| Event types | Unusual movement (orange), Loitering (yellow), Unauthorized access (red), Crowd gathering (purple) |
| Color coding | Each event type has a distinct color; severity affects marker size |
| Filters | Event type multi-select, camera selector, date range, severity threshold |
| Zoom levels | Hour view, Day view (default), Week view, Month view |
| Click marker | Opens detail panel with description, evidence images, AI reasoning, confidence |
| Density heatmap | Background shows detection density to identify high-activity periods |
13.3.10 Page 10: Search (/search)
Global search across all data types in the system.
| Feature | Specification |
|---|---|
| Search bar | Prominent centered search input with clear button |
| Category filters | Person, Camera, Event, Alert — toggle pills |
| Results grouping | Results grouped by category with section headers |
| Person search | Type name or upload a photo for face recognition similarity search |
| Camera search | By name, location, or status |
| Event search | By description, camera, person, or event type |
| Alert search | By ID, description, or camera |
| Keyboard shortcut | / (forward slash) focuses search from any page |
| Recent searches | Dropdown shows recent searches for quick access |
| Empty state | "No results found" with search tips |
13.3.11 Page 11: Watchlists (/watchlists)
Management interface for watchlist categories and their members.
| Feature | Specification |
|---|---|
| Watchlist cards | Name, icon (selected from preset), color, member count, alert settings summary |
| Create button | "+ New Watchlist" with modal: name, icon picker, color picker, alert configuration |
| Default watchlists | VIP (green), Blacklist (red), Authorized (blue), Temporary Access (yellow) |
| Card click | Opens watchlist detail with full member list |
| Member management | Add from gallery (search + select), remove member, bulk import via CSV |
| Alert settings | Per-watchlist: alert timing, severity override, notify groups, quiet hours override |
| Test button | "Test Alert" — sends test notification for this watchlist to verify configuration |
| Member table | Sortable by name, date added, added by, sightings count |
13.3.12 Page 12: AI Vibe Settings (/settings/ai)
The AI Vibe Settings page presents AI configuration as friendly questions rather than technical parameters.
| # | Setting | Question | Options | Description |
|---|---|---|---|---|
| 1 | Detection Sensitivity | "How carefully should the AI watch?" | Relaxed / Balanced / High / Maximum | Controls how aggressively the AI reports detections |
| 2 | Face Match Threshold | "How confident should the AI be before naming someone?" | Lenient / Normal / Strict / Very Strict | Lower = more matches but more false positives |
| 3 | Night Mode | "How should the AI behave at night?" | Off / Diminished / Active / Enhanced | Night-specific model and sensitivity adjustment |
| 4 | Evidence Capture | "What should be saved when someone is detected?" | Photo Only / Photo + 5s Clip / Photo + 10s Clip / Full Recording | Media stored per detection event |
| 5 | Alert Style | "When should alerts be sent?" | Silent / Digest / Normal / Urgent / Critical | Controls alert frequency and channels used |
| 6 | Learning Mode | "Should the AI learn from new sightings?" | Off / Review First / Auto-Learn Cautiously / Auto-Learn Aggressively | How unknown face clusters are handled |
| 7 | Privacy Mode | "How should privacy be handled?" | Full Recognition / Blur Unrecognized / Blur All Faces / Privacy Zones | Face processing and display privacy |
Each setting control:
- Segmented button group (pill-shaped options)
- Selected option highlighted in accent blue
- Brief description below updates on selection
- Current value displayed as badge
- Auto-save (no save button); toast confirms: "Detection Sensitivity updated to High"
- Expand toggle reveals internal numerical values (Admin permission required)
Advanced Mode (Admin only): When expanded, each control shows the internal parameter values:
| Setting | Option | Internal Value |
|---|---|---|
| Detection Sensitivity | Relaxed | Confidence threshold: 0.85, NMS: 0.5 |
| Detection Sensitivity | Balanced | Confidence threshold: 0.70, NMS: 0.45 |
| Detection Sensitivity | High | Confidence threshold: 0.55, NMS: 0.4 |
| Detection Sensitivity | Maximum | Confidence threshold: 0.40, NMS: 0.35 |
| Face Match Threshold | Lenient | Similarity threshold: 0.60 |
| Face Match Threshold | Normal | Similarity threshold: 0.70 |
| Face Match Threshold | Strict | Similarity threshold: 0.80 |
| Face Match Threshold | Very Strict | Similarity threshold: 0.90 |
13.3.13 Page 13: Training Review (/training)
Interface for reviewing AI-suggested face clusters and approving them for model training.
| Feature | Specification |
|---|---|
| Suggestion cards | Face cluster the AI is uncertain about — multiple face images + AI confidence + reason for suggestion |
| Card layout | Grid of face thumbnails + confidence bar + suggestion reason ("Seen 8x at different cameras, high confidence match") |
| Actions per suggestion | Approve (add to training data), Reject (not a valid cluster), Merge with Existing Person |
| Batch actions | Select multiple suggestions for bulk Approve/Reject |
| Queue status | "12 suggestions pending review" with progress bar |
| Filter | By confidence level, camera, date range |
| History | Tab showing previously reviewed suggestions with outcome |
| Training metrics | Model accuracy trend, training data count, last training time |
13.3.14 Page 14: System Health (/health)
Real-time system health monitoring dashboard.
| Feature | Specification |
|---|---|
| Status overview | Large status indicator: All Systems Operational (green) / Degraded (yellow) / Critical (red) |
| Service cards | Per-service status card: Video Capture, AI Inference, Database, Storage, Notifications, VPN |
| Per-service metrics | Status dot, uptime percentage, last restart, CPU, memory |
| Camera health table | All 8 cameras: stream status, FPS, bitrate, last seen, error count |
| System metrics | CPU usage (%), memory usage (%), disk usage (%), network I/O |
| Logs viewer | Recent system logs with severity filtering (DEBUG/INFO/WARNING/ERROR/CRITICAL); tail -f style auto-scroll |
| Refresh | Auto-refresh every 30 seconds; manual refresh button |
| Historical view | Toggle to show metrics history (last 1h, 6h, 24h, 7d) |
13.3.15 Page 15: Notifications Settings (/settings/notifications)
Configuration interface for the notification system.
| Feature | Specification |
|---|---|
| Recipient groups | Add/edit/delete groups; each group has name, Telegram chat IDs, WhatsApp numbers, alert preferences |
| Routing rules | Visual rule builder with drag-and-drop condition blocks (camera, person, role, event_type, zone, time, day, severity, watchlist) |
| Quiet hours | Schedule builder with day-of-week checkboxes, time range pickers, timezone selector |
| Template editor | Edit message templates per alert type; live preview with sample data; variable reference panel |
| Delivery status | Real-time view showing notification delivery states (pending/sent/delivered/failed) |
| Test buttons | "Send Test Alert" per channel to verify configuration |
| DLQ viewer | Dead letter queue entries with retry/discard actions |
13.3.16 Page 16: Admin Users (/settings/users)
User management interface for administrators.
| Feature | Specification |
|---|---|
| Users table | Username, email, role badge, status (Active/Inactive), last login, MFA status, actions menu |
| Add user | Modal: username, email, role selector, password (or send invite link), MFA toggle |
| Edit user | Role, status, force password change on next login, reset 2FA, session revocation |
| User activity log | Login history (timestamp, IP, device), actions taken, settings changed |
| Bulk actions | Deactivate multiple accounts simultaneously |
| Filter | By role, status, last login date range |
| Sort | By username, role, last login, created date |
| Pagination | 25 users per page |
13.3.17 Page 17: Camera Management (/settings/cameras)
Configuration interface for camera setup and zone management.
| Feature | Specification |
|---|---|
| Camera cards | Name, status (Online/Offline/Disabled), IP/connection string, stream info (resolution, FPS), action buttons (Edit, Test, Disable) |
| Add camera | Modal: name, location, stream URL, credentials, channel number, description |
| Edit camera | All camera properties; test connection button |
| Zone configuration | Interactive polygon drawing on live camera feed; zone name, color, sensitivity, type (Entrance/Restricted/Detection/Ignore) |
| Stream settings | Resolution (720p/1080p), frame rate (5/10/15/25/30 FPS), codec (H.264/H.265), night mode toggle |
| Recording settings | Continuous/event-triggered, retention policy, storage location |
| Camera ordering | Drag to reorder cameras in grid layout |
13.3.18 Page 18: Retention & Storage (/settings/storage)
Storage management and retention policy configuration.
| Feature | Specification |
|---|---|
| Storage overview | Donut chart showing usage breakdown: Video recordings, Detection snapshots, Training data, System logs, Free space |
| Numerical values | Total capacity / Used / Free; warning at > 80% (yellow), critical at > 95% (red) |
| Retention policies | Dropdown per category: 7 days / 14 days / 30 days / 60 days / 90 days / 180 days / 365 days / Forever |
| Auto-cleanup | Enable toggle + schedule time picker (daily at 03:00 default) |
| Actions | "Save Settings", "Run Cleanup Now" (with confirmation), "Export Storage Report" |
| Growth projection | Estimated days until full based on current growth rate |
| Storage alerts | Configure alert thresholds (80% warning, 90% high, 95% critical) |
13.4 Key User Flows
13.4.1 Flow 1: Daily Operator — Monitor & Respond
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 1: DAILY OPERATOR (Monitor & Respond) │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: LOGIN │
│ ────────────── │
│ Enter username → Enter password → MFA code (if enabled) │
│ → Redirect to Dashboard │
│ │
│ STEP 2: DASHBOARD REVIEW (~30 seconds) │
│ ───────────────────────────────────── │
│ Glance at stat cards: │
│ ├─ All 8 cameras online? ✓ │
│ ├─ Any critical alerts pending? (red badge) │
│ ├─ Any unknown persons detected? │
│ └─ System health OK? │
│ │
│ If critical alert visible: │
│ → Click alert card → Go to Alert Center │
│ If no urgent alerts: │
│ → Click "Live View" in sidebar │
│ │
│ STEP 3: LIVE CAMERA MONITORING (ongoing) │
│ ───────────────────────────────────── │
│ View 2x4 grid of all cameras │
│ Observe feeds for anomalies │
│ │
│ When alert toast appears (top-right): │
│ → Toast slides in with sound notification │
│ → Click toast to view alert details │
│ │
│ STEP 4: ALERT RESPONSE │
│ ────────────────── │
│ Click alert toast OR navigate to Alert Center │
│ Review alert card: │
│ ├─ Thumbnail image │
│ ├─ Camera name, timestamp │
│ ├─ Alert type (unknown person, watchlist match, etc.) │
│ └─ Severity level │
│ │
│ Click "View Details" for full information: │
│ ├─ Full-size image / video clip │
│ ├─ AI confidence score │
│ ├─ Detection metadata (bounding box, zone) │
│ └─ Person profile link (if known) │
│ │
│ DECISION: │
│ ├─ False detection → Click "Mark as False Positive" │
│ ├─ Legitimate alert → Click "Acknowledge" or "Resolve" │
│ ├─ Unknown person → Click "Name This Person" │
│ ├─ Needs escalation → Click "Escalate" │
│ └─ Need live view → Click "View Live" to jump to camera │
│ │
│ STEP 5: RETURN TO MONITORING │
│ ──────────────────────────── │
│ After handling alert, return to Live View │
│ Continue monitoring cycle │
│ │
│ STEP 6: END OF SHIFT │
│ ────────────────── │
│ Review unacknowledged alerts (if any) │
│ Check System Health page │
│ Hand over to next operator (verbal + note any pending issues) │
│ Click user menu → Logout │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.2 Flow 2: New Person Onboarding
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 2: NEW PERSON ONBOARDING │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ TRIGGER: System detects unknown person → Alert created → Operator notified │
│ │
│ STEP 1: REVIEW DETECTION │
│ ──────────────────── │
│ Navigate to "Recent Detections" via sidebar │
│ Filter: "Unknown" (toggle button) │
│ Click on unknown detection card │
│ │
│ Detail view shows: │
│ ├─ Full-size face image │
│ ├─ Camera: CAM-01 (Entrance) │
│ ├─ Timestamp: 2025-01-16 14:32:15 │
│ ├─ Confidence: 87.3% │
│ └─ AI note: "No matching person found in database" │
│ │
│ STEP 2: NAME THE PERSON │
│ ──────────────────── │
│ Click "Name This Person" button │
│ Modal dialog appears: │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Name This Person │ │
│ │ │ │
│ │ Face: [thumbnail] │ │
│ │ │ │
│ │ Full Name * [____________] │ │
│ │ Role * [Employee ▼] │ │
│ │ Department [____________] │ │
│ │ Employee ID [____________] │ │
│ │ Notes [____________] │ │
│ │ Tags [____________] │ │
│ │ │ │
│ │ Similar existing persons: │ │
│ │ [No similar persons found] │ │
│ │ │ │
│ │ [Cancel] [Save & Create Profile] │ │
│ └────────────────────────────────────┘ │
│ │
│ STEP 3: SIMILARITY CHECK │
│ ──────────────────── │
│ System searches for similar existing persons │
│ If matches found: display side-by-side comparison │
│ → Option to merge with existing person instead of creating new │
│ If no matches: proceed with creation │
│ │
│ STEP 4: SAVE PROFILE │
│ ────────────── │
│ Click "Save & Create Profile" │
│ Toast notification: "Profile created for [Name]" │
│ Detection card updates with person name │
│ Person now appears in Person Gallery │
│ │
│ STEP 5: ADD TRAINING IMAGES (Optional) │
│ ──────────────────────────────────── │
│ Navigate to Person Profile │
│ Click "Upload Reference Photos" │
│ Select additional clear face images │
│ System queues for model retraining │
│ Toast: "3 new training images added. Model will retrain automatically." │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.3 Flow 3: Unknown Person Review Queue
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 3: UNKNOWN PERSON REVIEW QUEUE │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: OPEN REVIEW QUEUE │
│ ────────────────────── │
│ Sidebar → "Unknown Persons Review" │
│ View: Grid of unknown person cluster cards │
│ Header: "23 unknown clusters remaining" │
│ │
│ STEP 2: SELECT CLUSTER │
│ ──────────────── │
│ Click on a cluster card to expand │
│ Shows: │
│ ├─ Representative face (largest) │
│ ├─ Gallery of all face instances in cluster │
│ ├─ Sighting history (camera, time, count) │
│ ├─ AI pattern insight: "Seen 5x at entrance between 08:00-09:00" │
│ └─ Confidence distribution graph │
│ │
│ STEP 3: MAKE DECISION │
│ ──────────────── │
│ Options: │
│ ├─ [Name This Person] → Enter details → Create new profile │
│ ├─ [Merge with Existing] → Search/select person → Confirm merge │
│ ├─ [Ignore Cluster] → "False detection / not a person" → Remove │
│ └─ [Mark Reviewed] → "Unsure, keep in queue for later" │
│ │
│ STEP 4: QUEUE UPDATES │
│ ──────────────── │
│ Processed item removed from queue │
│ Toast confirms action: "Cluster marked as [Name]. 22 remaining." │
│ Auto-advance to next cluster (optional) │
│ Keyboard shortcut: Right arrow → next cluster │
│ │
│ STEP 5: CONTINUE REVIEW │
│ ──────────────── │
│ Process all clusters or stop and resume later │
│ Queue persists across sessions │
│ New clusters automatically added as detected │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.4 Flow 4: AI Settings Adjustment
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 4: AI SETTINGS ADJUSTMENT │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: NAVIGATE TO AI VIBE SETTINGS │
│ ──────────────────────────────────── │
│ Sidebar → "AI Vibe Settings" (Sparkles icon) │
│ View: Scrollable page with 7 setting sections │
│ │
│ STEP 2: ADJUST DETECTION SENSITIVITY │
│ ──────────────────────────────── │
│ Section: "How carefully should the AI watch?" │
│ Current: [Relaxed] [Balanced] [High] [Maximum] │
│ Change: Click "High" │
│ Description updates: │
│ "High: The AI will catch almost everything. │
│ Expect more alerts, including some false positives." │
│ Toast: "Detection Sensitivity updated to High" │
│ Change takes effect immediately │
│ │
│ STEP 3: ADJUST ALERT STYLE │
│ ──────────────────── │
│ Section: "When should alerts be sent?" │
│ Current: [Silent] [Digest] [Normal] [Urgent] [Critical] │
│ Change: Click "Critical" │
│ Description updates: │
│ "Critical: Only truly important events trigger alerts. │
│ All other activity is logged but not alerted." │
│ Toast: "Alert Style updated to Critical" │
│ │
│ STEP 4: REVIEW ADVANCED (Admin only) │
│ ──────────────────────────────────── │
│ Click "Expand" on Advanced Settings │
│ Shows internal values: │
│ Detection Sensitivity: High │
│ └─ Confidence Threshold: 0.55 │
│ └─ NMS Threshold: 0.40 │
│ └─ Model: yolo11m.onnx │
│ Admin can directly edit numerical values │
│ │
│ STEP 5: DONE │
│ ──────── │
│ All changes auto-saved │
│ Return to monitoring — changes effective immediately │
└──────────────────────────────────────────────────────────────────────────────┘
13.4.5 Flow 5: Watchlist Alert Configuration
┌──────────────────────────────────────────────────────────────────────────────┐
│ FLOW 5: WATCHLIST ALERT CONFIGURATION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: NAVIGATE TO WATCHLISTS │
│ ──────────────────────────── │
│ Sidebar → "Watchlists" │
│ View: Grid of existing watchlist cards │
│ Default: VIP, Blacklist, Authorized, Temporary Access │
│ │
│ STEP 2: CREATE NEW WATCHLIST (Optional) │
│ ──────────────────────────────────── │
│ Click "+ New Watchlist" │
│ Modal: │
│ Name: [Security Escort Required] │
│ Icon: [🛡️] (icon picker) │
│ Color: [Orange] (color picker) │
│ Description: [People who require security escort] │
│ Click "Create" │
│ New watchlist card appears in grid │
│ │
│ STEP 3: ADD MEMBERS │
│ ──────────────── │
│ Click on watchlist card │
│ Click "Add from Gallery" │
│ Search/select persons to add: │
│ [☑] John Doe │
│ [☑] Jane Smith │
│ [☐] Bob Johnson (not selected) │
│ Click "Add to Watchlist" │
│ Toast: "2 persons added to Security Escort Required" │
│ │
│ STEP 4: CONFIGURE ALERTS │
│ ──────────────────── │
│ Click "Settings" tab on watchlist detail │
│ Configure: │
│ Alert Timing: [☑] Immediate [☐] Delayed (___ min) │
│ Severity: [☐] Inherit [☑] Force Critical │
│ Notify Groups: [☑] Security Team [☐] Management │
│ Media: [☑] Image [☑] Video │
│ Quiet Hours: [☐] Respect global [☑] Always alert │
│ Escalation: [☑] Enable escalation (5/10/20 min) │
│ Click "Save" │
│ │
│ STEP 5: TEST │
│ ──────── │
│ Click "Test Alert" button │
│ System sends test alert through configured channels │
│ Verify: Telegram message received ✓ │
│ Verify: WhatsApp message received ✓ │
│ Watchlist is now active and monitoring │
└──────────────────────────────────────────────────────────────────────────────┘
13.5 Component Specifications
13.5.1 Camera Feed Component
| State | Visual | Interaction |
|---|---|---|
| Loading | Centered spinner overlay, camera name visible | None — wait for stream |
| Playing | Live stream active, recording dot if applicable | Click to focus, hover for controls |
| Paused | Stream paused, large play button overlay | Click to resume |
| Error | Error icon + "Connection failed" + Retry button | Click Retry to reconnect |
| Offline | Gray placeholder with camera icon + "Offline" | Shows last online timestamp |
| Disabled | Grayed out with "Disabled" badge | No stream attempted |
| Prop | Type | Required | Default | Description |
|---|---|---|---|---|
cameraId |
string |
Yes | — | Unique camera identifier (e.g., "cam-01") |
name |
string |
Yes | — | Display name shown as overlay |
streamUrl |
string |
Yes | — | HLS or WebRTC stream URL |
status |
'online' | 'offline' | 'reconnecting' | 'disabled' |
Yes | — | Current camera status |
layout |
'grid' | 'fullscreen' |
No | 'grid' |
Current layout mode |
quality |
'auto' | 'hd' | 'sd' |
No | 'auto' |
Stream quality preference |
showControls |
boolean |
No | true |
Show overlay controls |
onFocus |
(id: string) => void |
No | — | Callback when camera is focused |
onSnapshot |
(id: string) => void |
No | — | Callback when snapshot is taken |
13.5.2 Alert Card Component
| Prop | Type | Required | Description |
|---|---|---|---|
id |
string |
Yes | Alert unique identifier |
severity |
'critical' | 'high' | 'medium' | 'low' | 'info' |
Yes | Alert severity level |
type |
string |
Yes | Alert type classification |
cameraName |
string |
Yes | Source camera display name |
timestamp |
Date |
Yes | When the alert occurred |
thumbnail |
string |
No | URL to thumbnail image |
personName |
string |
No | Identified person name (if known) |
status |
'pending' | 'acknowledged' | 'resolved' | 'ignored' |
Yes | Current alert status |
onAcknowledge |
() => void |
No | Acknowledge callback |
onResolve |
() => void |
No | Resolve callback |
onIgnore |
() => void |
No | Ignore callback |
onViewDetails |
() => void |
No | View details callback |
13.5.3 Stat Card Component
| Prop | Type | Required | Description |
|---|---|---|---|
title |
string |
Yes | Card label (e.g., "Cameras Online") |
value |
string | number |
Yes | Main displayed value (e.g., "8/8") |
icon |
LucideIcon |
Yes | Icon component from Lucide React |
color |
'green' | 'blue' | 'orange' | 'red' | 'purple' |
No | Color theme (default: blue) |
trend |
number |
No | Percentage change from previous period |
subtitle |
string |
No | Secondary text below value |
href |
string |
No | Navigation link (e.g., to detail page) |
13.6 Toast Notification System
| Type | Icon | Color | Duration | Use Case |
|---|---|---|---|---|
| Success | Check circle | Green (#10B981) |
3 seconds | Action completed successfully |
| Error | X circle | Red (#EF4444) |
5 seconds (or persistent) | Action failed; may require user attention |
| Warning | Alert triangle | Orange (#F59E0B) |
4 seconds | Non-critical issue; may need attention |
| Info | Info circle | Blue (#3B82F6) |
3 seconds | Informational message |
| Alert | Bell | Red (#EF4444) |
Persistent (until dismissed) | Critical alert notification |
Toast behavior:
- Appears in top-right corner
- Stacks up to 5 toasts simultaneously
- Older toasts pushed down when new ones arrive
- Hovering pauses auto-dismiss timer
- Click to dismiss immediately
- Swipe right to dismiss (mobile)
13.7 Modal System
| Size | Width | Use Case |
|---|---|---|
| Small | 400px | Confirmations, simple forms |
| Medium (default) | 560px | Standard forms, detail views |
| Large | 800px | Complex forms, image viewers |
| Fullscreen | 100% | Camera fullscreen, large data tables |
Modal behavior:
- Backdrop click to close (configurable)
- Escape key to close (configurable)
- Focus trap — Tab cycles within modal
- Return focus to trigger element on close
- Body scroll locked when modal open
- Enter key submits primary action (forms)
13.8 Responsive Behavior
| Breakpoint | Width | Layout Changes |
|---|---|---|
xs |
< 576px | Single column; stacked layouts; bottom tab bar; hamburger menu; camera grid 1x1 or 2x1 |
sm |
576-767px | Two column layouts; sidebar as overlay drawer; camera grid 2x2 |
md |
768-991px | Collapsed sidebar (72px); filters as drawer; camera grid 2x3; 3-column person gallery |
lg |
992-1199px | Sidebar expanded (260px); full desktop layout; 4-column person gallery |
xl |
1200-1399px | Full desktop layout; 5-column person gallery; 2x4 camera grid |
xxl |
1400px+ | Max content width 1400px centered; all features visible |
13.9 Keyboard Shortcuts
| Shortcut | Context | Action |
|---|---|---|
? |
Global | Show keyboard shortcuts reference modal |
/ |
Global | Focus global search bar |
Escape |
Global | Close modal / exit fullscreen / deselect |
F |
Live View | Toggle fullscreen on focused camera |
S |
Live View | Take snapshot of focused camera |
1-8 |
Live View | Focus camera 1-8 |
Space |
Live View | Pause/play focused camera stream |
A |
Alert Center | Acknowledge selected alert |
R |
Alert Center | Resolve selected alert |
N |
Detections / Unknowns | Name unknown person |
→ |
Unknown Review | Next cluster |
← |
Unknown Review | Previous cluster |
Ctrl+K |
Global | Command palette (quick navigation) |
Ctrl+Shift+A |
Global | Acknowledge most recent alert |
M |
Live View | Toggle mute on camera audio |
+ / - |
Timeline | Zoom in / zoom out |
13.10 Animation Guidelines
| Animation | Duration | Easing | Description |
|---|---|---|---|
| Page transition | 200ms | ease-out |
Fade in on route change |
| Modal open | 250ms | cubic-bezier(0.16, 1, 0.3, 1) |
Scale up + fade in |
| Modal close | 150ms | ease-in |
Scale down + fade out |
| Sidebar toggle | 250ms | ease-in-out |
Width transition 260px ↔ 72px |
| Toast slide-in | 300ms | ease-out |
Slide from right + fade in |
| Toast fade-out | 200ms | ease-in |
Fade out before removal |
| Card hover lift | 150ms | ease |
Subtle translateY(-2px) + shadow increase |
| Segmented slider | 200ms | ease |
Sliding background between options |
| Pulse (recording) | 2s | ease-in-out infinite |
Red dot opacity oscillation |
| Stats update | 500ms | ease |
Number count-up animation |
| Skeleton shimmer | 1.5s | linear infinite |
Shimmer gradient sweep |
| Alert flash | 1s | ease-out |
Border flash on camera with new alert |
| Camera focus | 300ms | ease-out |
Expand to fullscreen |
| Dropdown open | 150ms | ease-out |
Fade + slight translateY |
| Tooltip | 100ms | ease |
Fade in on hover |
13.11 Technology Stack
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | 18.x | UI library |
| Meta-framework | Next.js | 14.x | SSR, routing, API routes |
| Language | TypeScript | 5.x | Type safety |
| Styling | Tailwind CSS | 3.x | Utility-first CSS |
| Theme | CSS Custom Properties | — | Dark mode via dark class |
| UI Components | shadcn/ui | latest | Base component library |
| Icons | Lucide React | latest | Consistent icon set |
| State Management | Zustand | 4.x | Lightweight global state |
| Data Fetching | TanStack Query (React Query) | 5.x | Server state management |
| Real-time | Socket.IO Client | 4.x | WebSocket for live updates |
| Video | hls.js | latest | HLS stream playback |
| Video (WebRTC) | native | — | WebRTC stream fallback |
| Charts | Recharts | 2.x | Data visualization |
| Date/Time | date-fns | 2.x | Date formatting and manipulation |
| Forms | React Hook Form | 7.x | Form state management |
| Validation | Zod | 3.x | Schema validation |
| Zone Drawing | SVG + native events | — | Polygon drawing on camera feed |
| Testing | Vitest | 1.x | Unit testing |
| E2E Testing | Playwright | 1.x | Browser automation testing |
| Build | Next.js built-in | — | Production optimization |
Section 14: Deployment Plan
14.1 Deployment Architecture Overview
The deployment architecture spans two physical environments: AWS cloud for centralized services and an Intel NUC edge gateway at the surveillance site. Both environments are connected via an encrypted WireGuard VPN tunnel. All deployments use containerization (Docker/Kubernetes) with GitOps-based continuous delivery.
┌──────────────────────────────────────────────────────────────────────────────┐
│ DEPLOYMENT ARCHITECTURE │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ AWS CLOUD │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Route 53 │──▶ ALB │──▶ EKS │──▶ App Pods │ │ │
│ │ │ DNS │ │ TLS 1.3 │ │ Cluster │ │ (FastAPI/Next) │ │ │
│ │ └──────────┘ └──────────┘ └────┬─────┘ └──────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌────┴─────┐ ┌──────────────────┐ │ │
│ │ │ S3 │ │ RDS │ │ ElastiCache│ │ MSK Kafka │ │ │
│ │ │ Media │ │ Postgres │ │ Redis │ │ (Event Bus) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ WireGuard VPN Gateway (EC2) ←────→ Edge Gateway │ │ │
│ │ │ UDP 51820 Tunnel (Intel NUC, Site) │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ EDGE SITE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Intel NUC (Ubuntu Server 22.04) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Video Capture│ │ AI Inference │ │ MinIO │ │ │ │
│ │ │ │ (RTSP/FFmpeg)│ │ (YOLO/Face) │ │ (Storage) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Redis │ │ WireGuard │ │ Node Exporter│ │ │ │
│ │ │ │ (Cache) │ │ (VPN) │ │ (Metrics) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────┴──────────┐ │ │
│ │ │ Camera LAN │ │ │
│ │ │ CP PLUS DVR │ │ │
│ │ │ 192.168.29.200:554 │ │ │
│ │ │ (8 channels) │ │ │
│ │ └─────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
14.2 Cloud Deployment (AWS EKS)
14.2.1 EKS Cluster Configuration
| Parameter | Value | Notes |
|---|---|---|
| Kubernetes version | 1.28+ | Latest stable at deployment |
| Control plane | Managed by AWS | Multi-AZ availability |
| Node group type | Managed (EC2) | t3.large for general, g4dn.xlarge for GPU |
| CNI | Amazon VPC CNI | Native VPC networking for pods |
| Ingress controller | NGINX Ingress + cert-manager | TLS termination at ALB |
| GitOps | ArgoCD | Declarative continuous deployment |
| Pod identity | IRSA (IAM Roles for Service Accounts) | No long-term AWS credentials |
14.2.2 Cloud Service Resources
| Service | AWS Service | Instance/Tier | HA Mode | Monthly Est. |
|---|---|---|---|---|
| Orchestration | Amazon EKS | Managed control plane | Multi-AZ | $73 |
| Application nodes | EC2 (t3.large) | 3 nodes (on-demand) | Multi-AZ spread | $200 |
| GPU nodes | EC2 (g4dn.xlarge) | 1 node (spot preferred) | Single + auto-recovery | $350 |
| Database | RDS PostgreSQL 15 | db.r6g.xlarge Multi-AZ | Multi-AZ with failover | $520 |
| Cache | ElastiCache Redis | cache.r6g.large (2 shards) | Cluster mode | $260 |
| Message bus | Amazon MSK | kafka.m5.large (3 brokers) | Multi-AZ | $350 |
| Object storage | S3 | Standard + IA + Glacier | Cross-region replication | $200 |
| Load balancer | ALB | Application Load Balancer | Multi-AZ | $25 |
| DNS | Route 53 | Hosted zone + health checks | Global | $15 |
| VPN gateway | EC2 (t3.micro) | WireGuard endpoint | Single (monitor for HA) | $15 |
| Secrets | AWS Secrets Manager | Vault integration | Multi-AZ | $10 |
| Monitoring | CloudWatch | Logs + metrics + alarms | Multi-AZ | $50 |
| Total | ~$2,088/month |
14.3 Edge Deployment (Intel NUC)
14.3.1 Edge Hardware Specification
| Component | Specification | Notes |
|---|---|---|
| Device | Intel NUC 13 Pro (or equivalent) | Fanless preferred for reliability |
| CPU | Intel Core i7-1360P (12 cores, 16 threads) | Sufficient for 8 streams + AI inference |
| RAM | 32 GB DDR4-3200 (2x16 GB) | Dual channel for memory bandwidth |
| Storage (OS) | 500 GB NVMe SSD (Samsung 980 Pro or equivalent) | Fast boot and application loading |
| Storage (Data) | 2 TB NVMe SSD (Samsung 990 Pro or equivalent) | 7-day local recording buffer |
| Network | Intel i226-V 2.5 GbE (dual port) | Dual NIC for WAN + LAN separation |
| WiFi | Disabled in BIOS | Security — no wireless |
| Bluetooth | Disabled in BIOS | Security — no wireless |
| TPM | TPM 2.0 enabled | For LUKS auto-unseal |
| OS | Ubuntu Server 22.04 LTS (minimal install) | No desktop environment |
14.3.2 Edge Docker Compose Configuration
version: "3.8"
services:
# RTSP stream capture and frame extraction
video-capture:
image: sentinel/surveillance-video-capture:v2.3.1
restart: unless-stopped
network_mode: host
environment:
- DVR_IP=192.168.29.200
- DVR_PORT=554
- NUM_CHANNELS=8
- FRAME_EXTRACT_FPS=1
- RECORDING_SEGMENT_SEC=10
- REDIS_HOST=localhost
- REDIS_PORT=6379
- MINIO_ENDPOINT=localhost:9000
volumes:
- /data/frames:/app/frames
- /data/recordings:/app/recordings
- ./secrets:/run/secrets:ro
depends_on:
- redis
- minio
deploy:
resources:
limits:
cpus: '4.0'
memory: 4G
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "5"
# AI inference service (lightweight edge models)
ai-inference:
image: sentinel/surveillance-ai-inference:edge-v2.3.1
restart: unless-stopped
runtime: nvidia # If NVIDIA GPU available; fallback to CPU
environment:
- MODEL_PATH=/models
- REDIS_HOST=localhost
- REDIS_PORT=6379
- MINIO_ENDPOINT=localhost:9000
- INFERENCE_BATCH_SIZE=8
- CONFIDENCE_THRESHOLD=0.7
- NMS_THRESHOLD=0.45
volumes:
- ./models:/models:ro
- /data/frames:/app/frames:ro
- ./secrets:/run/secrets:ro
depends_on:
- redis
deploy:
resources:
limits:
cpus: '6.0'
memory: 8G
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "5"
# Local object storage (S3-compatible)
minio:
image: minio/minio:RELEASE.2024-latest
restart: unless-stopped
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
volumes:
- /data/minio:/data
environment:
- MINIO_ROOT_USER_FILE=/run/secrets/minio_user
- MINIO_ROOT_PASSWORD_FILE=/run/secrets/minio_password
secrets:
- minio_user
- minio_password
deploy:
resources:
limits:
cpus: '1.0'
memory: 1G
# Local cache and Pub/Sub
redis:
image: redis:7-alpine
restart: unless-stopped
command: >
redis-server
--requirepass """
--appendonly yes
--maxmemory 512mb
--maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
ports:
- "127.0.0.1:6379:6379"
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
# WireGuard VPN client
wireguard:
image: linuxserver/wireguard:latest
restart: unless-stopped
cap_add:
- NET_ADMIN
- SYS_MODULE
environment:
- PUID=1000
- PGID=1000
volumes:
- ./wireguard-config:/config
sysctls:
- net.ipv4.conf.all.src_valid_mark=1
deploy:
resources:
limits:
cpus: '0.25'
memory: 64M
# Metrics exporter for Prometheus
node-exporter:
image: prom/node-exporter:latest
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
redis_data:
driver: local
secrets:
minio_user:
file: ./secrets/minio_user.txt
minio_password:
file: ./secrets/minio_password.txt
14.4 Configuration and Environment Variables
14.4.1 Environment Structure
| Environment | URL Pattern | Data | Purpose |
|---|---|---|---|
| Development | *.dev.internal |
Synthetic test data | Feature development, local testing |
| Staging | *.staging.example.com |
Anonymized production-like data | Integration testing, UAT |
| Production | *.example.com |
Real operational data | Live surveillance operations |
14.4.2 Required Environment Variables
# ─── APPLICATION ───
APP_ENV=production # dev | staging | production
APP_NAME="Sentinel AI Surveillance"
APP_VERSION=2.3.1
APP_DEBUG=false
APP_SECRET_KEY=<random-256-bit-key> # Used for session signing
LOG_LEVEL=INFO # DEBUG | INFO | WARNING | ERROR | CRITICAL
# ─── SERVER ───
API_HOST=0.0.0.0
API_PORT=8080
WORKERS=4 # Uvicorn worker processes
TIMEZONE=Asia/Kolkata
# ─── DATABASE ───
DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/surveillance
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10
DB_POOL_TIMEOUT=30
DB_ECHO=false # Set true for SQL logging (dev only)
# ─── REDIS ───
REDIS_URL=redis://:password@redis-endpoint:6379/0
REDIS_POOL_SIZE=50
REDIS_SOCKET_TIMEOUT=5
# ─── OBJECT STORAGE (S3 or MinIO) ───
STORAGE_TYPE=s3 # s3 | minio
STORAGE_ENDPOINT=s3.amazonaws.com
STORAGE_BUCKET=sentinel-surveillance-media
STORAGE_REGION=ap-south-1
STORAGE_ACCESS_KEY=<access-key>
STORAGE_SECRET_KEY=<secret-key>
STORAGE_SECURE=true
STORAGE_URL_EXPIRY=300 # Signed URL expiry in seconds
# ─── DVR / CAMERA CONNECTION ───
DVR_IP=192.168.29.200
DVR_PORT=554
DVR_USERNAME=admin
DVR_PASSWORD=<dvr-password>
DVR_CHANNELS=8
DVR_STREAM_QUALITY=0 # 0=main (high), 1=sub (low)
DVR_RTSP_TEMPLATE="rtsp://{user}:{pass}@{ip}:{port}/user={user}&password={pass}&channel={ch}&stream={quality}.sdp?"
# ─── AI MODELS ───
MODEL_PATH=/models
HUMAN_DETECTION_MODEL=yolo11m.onnx
FACE_DETECTION_MODEL=scrfd_10g_bnkps.onnx
FACE_RECOGNITION_MODEL=arcface_r100.onnx
CONFIDENCE_THRESHOLD=0.7
NMS_THRESHOLD=0.45
FACE_MATCH_THRESHOLD=0.70
UNKNOWN_CLUSTER_EPS=0.35
UNKNOWN_CLUSTER_MIN_SAMPLES=3
# ─── TELEGRAM NOTIFICATIONS ───
TELEGRAM_ENABLED=true
TELEGRAM_BOT_TOKEN=<bot-token>
TELEGRAM_WEBHOOK_URL=https://api.example.com/webhooks/telegram
TELEGRAM_WEBHOOK_SECRET=<webhook-secret>
TELEGRAM_ADMIN_CHAT_ID=<admin-chat-id>
# ─── WHATSAPP NOTIFICATIONS ───
WHATSAPP_ENABLED=true
WHATSAPP_API_VERSION=v18.0
WHATSAPP_ACCESS_TOKEN=<access-token>
WHATSAPP_PHONE_NUMBER_ID=<phone-number-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN=<verify-token>
WHATSAPP_BUSINESS_ACCOUNT_ID=<business-account-id>
# ─── VPN ───
VPN_ENABLED=true
VPN_TYPE=wireguard
VPN_ENDPOINT=wg.example.com:51820
VPN_PUBLIC_KEY=<server-public-key>
VPN_PRIVATE_KEY=<client-private-key>
VPN_PRESHARED_KEY=<preshared-key>
VPN_ALLOWED_IPS=10.100.0.0/16
VPN_KEEPALIVE=25
# ─── AUTHENTICATION ───
JWT_SECRET_KEY=<ecdsa-private-key-pem>
JWT_PUBLIC_KEY=<ecdsa-public-key-pem>
JWT_ALGORITHM=ES256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=15
JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
MFA_REQUIRED_ROLES=super_admin,admin
MFA_ISSUER="Sentinel AI Surveillance"
# ─── MONITORING ───
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
GRAFANA_URL=https://grafana.example.com
SENTRY_DSN=<sentry-dsn>
HEALTH_CHECK_INTERVAL=30
# ─── RETENTION ───
RECORDING_RETENTION_DAYS=90
DETECTION_SNAPSHOT_RETENTION_DAYS=90
EVENT_LOG_RETENTION_DAYS=365
AUDIT_LOG_RETENTION_DAYS=365
TRAINING_DATA_RETENTION_DAYS=365
AUTO_CLEANUP_ENABLED=true
AUTO_CLEANUP_HOUR=3 # 3:00 AM daily
# ─── SECURITY ───
CORS_ALLOWED_ORIGINS=https://app.example.com,https://staging.example.com
CSP_REPORT_ONLY=false
RATE_LIMIT_DEFAULT=100/minute
RATE_LIMIT_AUTH=10/minute
SESSION_MAX_AGE_HOURS=8
SESSION_IDLE_TIMEOUT_MINUTES=30
14.5 Rollout Stages
14.5.1 Stage 1: Foundation (Weeks 1-4)
Objective: Infrastructure, VPN connectivity, and core data layer operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 1 | AWS account setup, VPC creation (3 AZs), EKS cluster deployment, IAM roles | Cloud network ready | VPC flow logs active; EKS nodes Ready |
| 1 | RDS PostgreSQL Multi-AZ, ElastiCache Redis cluster | Data layer ready | DB connections successful; replication lag < 1s |
| 2 | S3 buckets (media, backups, logs), lifecycle policies, CORS | Storage ready | Upload/download test successful |
| 2 | WireGuard VPN gateway (EC2), key generation, firewall rules | VPN endpoint ready | Tunnel handshake successful |
| 3 | Edge gateway: OS install, hardening, Docker, WireGuard client | Edge device ready | Edge connects to cloud over VPN |
| 3 | Edge services: MinIO, Redis, video capture container | Edge services running | RTSP streams reachable from edge |
| 4 | Database schema migration (29 tables), seed data (admin user, 8 cameras) | Database ready | Schema matches design; seed data present |
| 4 | Monitoring: Prometheus, Grafana, CloudWatch dashboards | Monitoring active | Dashboards accessible; metrics flowing |
| 4 | End-to-end connectivity test | Full pipeline verified | Video from DVR → Edge → Cloud (VPN) → S3 |
Milestone M1 — Infrastructure Ready (End of Week 4):
- All cloud services deployed and healthy
- VPN tunnel established and stable (< 100ms latency)
- Edge gateway online, all Docker services running
- Database schema deployed with migrations and seed data
- All 8 camera streams reachable from edge
- Basic monitoring and alerting in place
14.5.2 Stage 2: Core AI Pipeline (Weeks 5-8)
Objective: Video ingestion, AI detection, face recognition, and basic API operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 5 | Video capture service: RTSP ingestion, frame extraction, segment recording | Stream ingestion working | All 8 streams connected; FPS > 5 per stream |
| 5 | Kafka topic setup, stream ingestion producer | Event streaming ready | Frames published to Kafka |
| 6 | AI Inference Service: YOLO (human detection), SCRFD (face detection) | Detection models running | mAP > 0.90 for human detection |
| 6 | Detection event storage in PostgreSQL | Detection database working | Events queryable via API |
| 7 | ArcFace (face recognition) model deployment, embedding generation, pgvector | Face recognition working | Rank-1 accuracy > 95% on test set |
| 7 | Person matching logic: known person lookup, unknown person handling | Person matching working | Correct identification in < 100ms |
| 8 | FastAPI core: health endpoints, camera endpoints, detection endpoints | API core functional | All endpoints return correct data |
| 8 | Basic authentication: login, JWT token issuance, password hashing | Auth working | Login → token → authenticated requests |
Milestone M2 — AI Pipeline Operational (End of Week 8):
- All 8 camera streams ingesting at target FPS
- Human detection, face detection, and face recognition operational
- Detection events stored and queryable
- Person matching (known/unknown) working
- Basic REST API serving authenticated requests
- End-to-end: Camera → Detection → Database → API
14.5.3 Stage 3: Application Layer (Weeks 9-12)
Objective: Web dashboard, alerting, notifications, and person management operational.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 9 | Next.js project setup, design system, Tailwind config, dark theme | Frontend foundation | Login page renders correctly |
| 9 | Authentication flow: login form, MFA input, token management, logout | Auth UI working | Full login → dashboard flow |
| 10 | Dashboard page: stat cards, alert chart, camera grid, activity feed | Dashboard live | All widgets populated with real data |
| 10 | Live camera view: HLS player, grid layout, fullscreen, camera controls | Live view working | All 8 streams visible, playable |
| 10 | Alert engine: rule evaluation, severity assignment, routing | Alert generation working | Alerts created within 5s of detection |
| 11 | Telegram integration: bot setup, message templates, inline keyboards | Telegram alerts working | Test alert received in Telegram |
| 11 | WhatsApp integration: template messages, session messages | WhatsApp alerts working | Test template message received |
| 11 | Person management: gallery, profile, CRUD, face matching display | Person management working | Person created, detected, viewed |
| 12 | Unknown review queue: cluster display, naming, merging, ignore | Review queue working | Unknown person processed through queue |
| 12 | Watchlists: CRUD, member management, alert routing | Watchlists working | Watchlist match triggers correct alert |
| 12 | WebSocket: real-time alert feed, dashboard updates | Real-time working | Alerts appear without page refresh |
Milestone M3 — Application Live (End of Week 12):
- Web dashboard accessible with live camera feeds
- Alerts generated and delivered via Telegram and WhatsApp
- Person management (add, view, match, review unknowns) working
- Watchlist alerts functional with correct routing
- Real-time updates via WebSocket
- All RBAC permissions enforced in UI
14.5.4 Stage 4: Intelligence (Weeks 13-16)
Objective: Night mode, training pipeline, self-learning, and advanced features.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 13 | Night mode: low-light model training, deployment, auto-scheduling | Night mode working | Detection mAP > 0.75 in < 5 lux conditions |
| 13 | AI Vibe Settings page: all 7 controls, auto-save, advanced mode | Settings page working | All controls functional, changes effective immediately |
| 14 | Training pipeline: data collection, model training job, evaluation | Training pipeline working | Model accuracy improves with new training data |
| 14 | Model versioning: A/B testing, shadow mode, promotion workflow | Model management working | Blue/green model deployment |
| 15 | Self-learning service: automatic unknown clustering, suggestions | Self-learning working | Suggestions generated for unknown clusters |
| 15 | Privacy mode: face blurring, privacy zones, per-camera settings | Privacy mode working | Faces blurred according to settings |
| 15 | Suspicious activity detection: pattern rules, anomaly scoring | Advanced alerts working | Anomaly alerts generated for unusual behavior |
| 16 | Search service: face similarity search, text search, filters | Search working | Results returned in < 500ms |
| 16 | System health dashboard: service cards, metrics, logs viewer | Health dashboard working | All systems visible with status |
Milestone M4 — Intelligence Features Live (End of Week 16):
- Night mode detection operational
- Training pipeline runs and improves models
- Self-learning suggestions appear in review queue
- Privacy modes configurable and effective
- Suspicious activity alerts functional
- Search returns results in acceptable time
- All AI Vibe Settings controls operational
14.5.5 Stage 5: Hardening (Weeks 17-20)
Objective: Security hardening, testing framework, operations readiness, production go-live.
| Week | Tasks | Deliverables | Success Criteria |
|---|---|---|---|
| 17 | Security penetration test (external vendor) | Pen test report | All critical/high findings addressed |
| 17 | SAST/DAST scans, dependency vulnerability scan | Scan reports | Zero critical vulnerabilities |
| 17 | Self-test framework: 21 test suites, scheduling, reporting | Testing framework deployed | All test suites execute successfully |
| 18 | Backup configuration: pgBackRest, S3 sync, restore procedures | Backup system ready | Restore test successful |
| 18 | DR environment setup, failover procedures, quarterly drill schedule | DR ready | DR failover test: RTO < 1 hour |
| 18 | Incident response runbooks: 5 documented procedures | Runbooks complete | All scenarios documented |
| 19 | Load testing: 8/16/32/64 camera simulation | Load test report | System handles 64 cameras within SLA |
| 19 | Performance tuning: database queries, API response times, cache optimization | Tuning complete | p95 API response < 200ms |
| 19 | Operations team training: system overview, runbooks, escalation procedures | Team trained | Training sign-off complete |
| 19 | 98-item go-live checklist review | Checklist complete | All items pass |
| 20 | Final readiness review, security sign-off, management approval | Go approval | All stakeholders sign off |
| 20 | Production DNS cutover, monitoring, 72-hour stability period | Production live | 72-hour stability confirmed |
Milestone M5 — Production Go-Live (End of Week 20):
- Security audit complete with all findings addressed
- Self-test framework passing (score >= 85)
- DR tested and verified (RTO < 1 hour, RPO < 15 minutes)
- Operations team trained and runbooks reviewed
- Load test passed at 64-camera target
- 98-item go-live checklist: all items complete
- System stable in production for 72+ hours
14.6 Kubernetes Manifests Overview
| Resource Type | Name | Purpose | Namespace |
|---|---|---|---|
| Deployment | api |
FastAPI application server (3 replicas) | sentinel |
| Deployment | ai-inference |
AI model serving (GPU node) | sentinel |
| Deployment | video-capture |
RTSP stream ingestion (edge) | sentinel |
| Deployment | alert-engine |
Alert generation and routing | sentinel |
| Deployment | notification-service |
Telegram/WhatsApp delivery | sentinel |
| Deployment | frontend |
Next.js web application | sentinel |
| Deployment | websocket |
WebSocket real-time server | sentinel |
| StatefulSet | redis |
Session cache and Pub/Sub | sentinel-data |
| Service | api-service |
Internal API access (ClusterIP) | sentinel |
| Service | ai-service |
AI inference access (ClusterIP) | sentinel |
| Service | frontend-service |
Web app access (ClusterIP) | sentinel |
| Ingress | sentinel-ingress |
External HTTPS routing | sentinel |
| ConfigMap | app-config |
Application configuration | sentinel |
| ConfigMap | nginx-config |
Ingress/Nginx configuration | sentinel |
| Secret | app-secrets |
Encrypted secrets (Vault agent injector) | sentinel |
| Secret | tls-cert |
TLS certificate (cert-manager) | sentinel |
| HPA | api-hpa |
Auto-scale API: 3-10 replicas | sentinel |
| HPA | ai-hpa |
Auto-scale AI: 1-4 replicas | sentinel |
| NetworkPolicy | default-deny |
Block all unauthorized traffic | sentinel |
| NetworkPolicy | allow-api |
API ingress rules | sentinel |
| NetworkPolicy | allow-ai |
AI service communication rules | sentinel |
| PodDisruptionBudget | api-pdb |
Ensure 2 API pods minimum | sentinel |
| ServiceMonitor | api-metrics |
Prometheus scraping config | sentinel-monitoring |
| PrometheusRule | alert-rules |
Alerting rules for platform | sentinel-monitoring |
14.7 VPN Setup Procedure
14.7.1 Cloud VPN Gateway Setup
#!/bin/bash
# cloud-vpn-setup.sh — Run on cloud VPN EC2 instance
# 1. System preparation
sudo apt update && sudo apt install -y wireguard wireguard-tools iptables-persistent
# 2. Generate WireGuard keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
# 3. Create WireGuard configuration
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.1/24
ListenPort = 51820
PrivateKey = <CLOUD_PRIVATE_KEY>
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
# Edge Gateway peer
[Peer]
PublicKey = <EDGE_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
AllowedIPs = 10.200.0.2/32, 192.168.29.0/24
PersistentKeepalive = 25
EOF
# 4. Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf
# 5. Start WireGuard
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0
# 6. Verify
sudo wg show
ping -c 3 10.200.0.2
14.7.2 Edge VPN Client Setup
#!/bin/bash
# edge-vpn-setup.sh — Run on Intel NUC edge gateway
# 1. Install WireGuard
sudo apt update && sudo apt install -y wireguard wireguard-tools
# 2. Generate keys
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
# 3. Configure
sudo tee /etc/wireguard/wg0.conf << 'EOF'
[Interface]
Address = 10.200.0.2/32
PrivateKey = <EDGE_PRIVATE_KEY>
DNS = 10.100.0.2
[Peer]
PublicKey = <CLOUD_PUBLIC_KEY>
PresharedKey = <PRESHARED_KEY>
Endpoint = <CLOUD_PUBLIC_IP>:51820
AllowedIPs = 10.100.0.0/16, 10.200.0.0/24
PersistentKeepalive = 25
EOF
# 4. Start and enable
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0
# 5. Verify connectivity
ping -c 3 10.200.0.1 # Cloud VPN gateway
ping -c 3 10.100.0.2 # Cloud DNS/internal service
14.8 Database Initialization
14.8.1 Migration Strategy
Database migrations are managed with Alembic (SQLAlchemy) and executed as Kubernetes init containers before application startup:
initContainers:
- name: db-migrations
image: sentinel/surveillance-api:v2.3.1
command: ["alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
limits:
cpu: "500m"
memory: "256Mi"
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Migration Rules:
| Rule | Implementation |
|---|---|
| Backward compatibility | All migrations must be backward-compatible within a release |
| Destructive changes | 2-phase deployment: add new column in release N, drop old in release N+1 |
| Automatic execution | Migrations run automatically before application startup via init container |
| Health check | Migration status exposed via /health/ready endpoint |
| Rollback | alembic downgrade script available for emergency rollback |
| Version tracking | alembic_version table tracks current schema version |
14.9 SSL Certificate Setup
14.9.1 cert-manager Configuration
# ClusterIssuer for Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
selector: {}
---
# Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: sentinel-tls
namespace: sentinel
spec:
secretName: sentinel-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- app.example.com
- api.example.com
- ws.example.com
usages:
- digital signature
- key encipherment
privateKey:
algorithm: ECDSA
size: 256
Section 15: Testing Plan
15.1 Testing Strategy Overview
The testing strategy encompasses five levels of testing, from isolated unit tests to full system end-to-end validation. The goal is comprehensive coverage of all functional and non-functional requirements with automated execution in CI/CD.
┌──────────────────────────────────────────────────────────────────────────────┐
│ TESTING PYRAMID │
│ │
│ ┌─────────┐ │
│ │ E2E │ ~20 tests │
│ │ Tests │ Full system scenarios │
│ ├─────────┤ │
│ ┌─────────────┐ │
│ │ Integration │ ~100 tests │
│ │ Tests │ Service-to-service │
│ ├─────────────┤ │
│ ┌───────────────────┐ │
│ │ Unit Tests │ ~300 tests │
│ │ (Components, AI) │ Isolated functions │
│ └───────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
15.2 Unit Testing Strategy
| Component | Framework | Coverage Target | Mock Strategy | CI Execution |
|---|---|---|---|---|
| API backend (Python) | pytest + pytest-asyncio | 85%+ | pytest-mock, moto (AWS),responses (HTTP) | Every commit |
| Frontend (React/TS) | Vitest + React Testing Library | 80%+ | MSW (API mocking), jsdom | Every commit |
| AI models (Python) | pytest | 70%+ (model logic) | Mock inference engine, fixture data | Every commit |
| Database models | pytest + asyncpg | 80%+ | testcontainers-postgres | Every commit |
| Notification adapters | pytest | 80%+ | responses library for HTTP mocking | Every commit |
15.3 Integration Testing
| Integration Pair | Scope | Framework | Strategy |
|---|---|---|---|
| API + Database | CRUD operations, transactions, query performance | pytest + testcontainers | PostgreSQL container per test run |
| API + Redis | Caching, Pub/Sub, session storage | pytest + Redis container | Redis container per test run |
| API + S3/MinIO | Media upload, download, presigned URLs | pytest + LocalStack | S3 mock via LocalStack |
| Alert Engine + Router | Rule evaluation, routing decisions | pytest | Mock channel adapters |
| Telegram Adapter | Message formatting, API calls, error handling | pytest + responses | HTTP request/response mocking |
| WhatsApp Adapter | Template rendering, API calls, error handling | pytest + responses | HTTP request/response mocking |
| Auth + Database | User CRUD, password hashing, session management | pytest + testcontainers | Full auth flow testing |
15.4 System Testing (End-to-End)
| # | Scenario | Steps | Expected Result |
|---|---|---|---|
| 1 | Full detection pipeline | Trigger motion → verify detection stored → verify alert created → verify notification sent | All components process correctly within SLA |
| 2 | Person recognition flow | Known person walks by → verify face detected → verify identity matched → verify no false alert | Correct person identified with > 95% confidence |
| 3 | Unknown person flow | Unknown person detected → verify "Unknown" classification → verify review queue updated | Unknown queued for operator review within 5 seconds |
| 4 | Watchlist alert (blacklist) | Blacklist person detected → verify immediate critical alert → verify notification to security team | Alert within 5 seconds, correct severity, all channels |
| 5 | Night mode detection | Low-light detection scenario → verify night model used → verify detection confidence acceptable | Detection mAP > 0.75 in < 5 lux conditions |
| 6 | Privacy mode | Enable privacy mode → verify face blurring in live view → verify no face recognition occurs | Faces blurred, no biometric processing |
| 7 | Alert escalation | Create critical alert → don't acknowledge → verify escalation levels trigger at correct times | Level 1 at 5min, Level 2 at 10min, Level 3 at 20min |
| 8 | VPN failure recovery | Disconnect VPN → verify local operation continues → reconnect VPN → verify sync resumes | No data loss; automatic recovery |
| 9 | Database failover | Trigger RDS failover → verify application continues → verify no data loss | < 60 second downtime; zero data loss |
| 10 | Complete user flow | Login → view dashboard → view live cameras → receive alert → acknowledge → logout | All pages load; all actions succeed |
15.5 Load Testing Plan
| Scenario | Camera Count | Duration | Users | Target Metrics |
|---|---|---|---|---|
| Baseline | 8 | 1 hour | 5 concurrent | Establish baseline metrics |
| Scale-up | 16 | 2 hours | 10 concurrent | Verify 2x capacity; p95 latency < 500ms |
| Scale-up | 32 | 2 hours | 20 concurrent | Verify 4x capacity; auto-scaling triggers |
| Stress test | 64 | 1 hour | 50 concurrent | Find breaking point; error rate < 1% |
| Sustained | 8 | 24 hours | 5 concurrent | Memory leak detection; stability verification |
| Spike test | 8→64→8 | 30 minutes | Ramp up/down | Verify auto-scaling response time |
15.6 Failover Testing
| Test Case | Description | Pass Criteria |
|---|---|---|
| API pod failure | Kill 1 API pod | Traffic routed to healthy pods; zero failed requests |
| Database failover | Trigger RDS Multi-AZ failover | < 60s downtime; no data loss; connections re-established |
| Redis failure | Restart Redis cluster | Session recovery; cache warm within 5 minutes |
| VPN tunnel failure | Disconnect WireGuard | Auto-reconnect within 30s; streams resume |
| Edge gateway restart | Reboot edge device | Full recovery within 5 minutes; all streams reconnect |
| AI inference failure | Kill inference container | Queue buffers frames; recovery < 30s; no frame loss |
| Complete cloud failure | Simulate region outage | DR test: RTO < 1 hour; RPO < 15 minutes |
15.7 Security Testing
| Test Type | Tool | Scope | Frequency | Gate |
|---|---|---|---|---|
| Static Analysis (SAST) | Bandit, Semgrep | Source code | Every commit | Block on HIGH/CRITICAL |
| Dependency Scan | Snyk, pip-audit | All dependencies | Daily | Block on HIGH/CRITICAL |
| Container Image Scan | Trivy | Docker images | Every build | Block on HIGH/CRITICAL |
| Dynamic Analysis (DAST) | OWASP ZAP | Running application | Weekly | Review findings |
| Penetration Test | External vendor | Full stack | Quarterly | All findings addressed |
| TLS Configuration | testssl.sh | SSL/TLS endpoints | Monthly | Grade A+ required |
| API Security | OWASP ZAP API scan | All REST endpoints | Weekly | Review findings |
| Secrets Scan | TruffleHog, GitLeaks | Git repositories | Every commit | Block on findings |
15.8 AI Pipeline Testing
| Test | Description | Target Metric | Test Data |
|---|---|---|---|
| Human detection accuracy | Evaluate YOLO on held-out test set | mAP > 0.90 | 1000 labeled frames |
| Face detection accuracy | Evaluate SCRFD on test set | Detection rate > 0.85 | 500 labeled face images |
| Face recognition accuracy | Evaluate ArcFace on test set | Rank-1 accuracy > 0.95 | 200 person gallery |
| False positive rate | Measure incorrect person matches | < 2% | Simulated impostor set |
| False negative rate | Measure missed person matches | < 5% | Known person test set |
| Inference latency | Measure end-to-end processing | < 200ms per frame (p95) | Benchmark suite |
| Night mode accuracy | Test low-light detection | mAP > 0.75 | 200 low-light frames |
| Batch processing | Test throughput at batch size 8 | > 40 FPS aggregate | Benchmark suite |
15.9 Notification Testing
| Test | Description | Verification |
|---|---|---|
| Telegram delivery | Send test alert via Telegram | Message received; formatting correct; buttons functional |
| WhatsApp delivery | Send test alert via WhatsApp | Template message received; parameters correct |
| Routing rules | Trigger alert matching specific rule | Delivered to correct recipients only |
| Quiet hours | Send alert during quiet hours | Non-critical suppressed; critical bypasses |
| Escalation | Leave critical alert unacknowledged | Escalation notifications at correct thresholds |
| Rate limiting | Trigger burst of 50 alerts | Rate limiting applied; no provider blocks |
| Media attachments | Send alert with image + video | Media processed to correct size; delivered |
| Delivery tracking | Verify webhook receipts | Status updated correctly in dashboard |
| DLQ handling | Force 5 failed deliveries | Messages moved to DLQ; admin notification sent |
15.10 Test Environments
| Environment | Data | Purpose | Pipeline Stage |
|---|---|---|---|
| Local dev | Synthetic (10 cameras, 100 persons) | Developer testing | Pre-commit |
| CI | Synthetic (generated per run) | Automated test execution | Every commit |
| Staging | Anonymized production-like (8 cameras, 500 persons) | Pre-production validation | Post-merge |
| Load test | Generated (64 cameras, 10,000 persons) | Performance testing | Weekly schedule |
| DR | Minimal (2 cameras, 10 persons) | Disaster recovery validation | Quarterly |
15.11 CI/CD Pipeline for Testing
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Push │──▶│ Lint │──▶│ Unit │──▶│ SAST │──▶│ Build │
│ │ │ + Format │ │ Tests │ │ + Scan │ │ Images │
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Deploy │◀──│ E2E │◀──│ DAST │◀──│ Image │◀──│ Push │
│Staging │ │ Tests │ │ Scan │ │ Scan │ │ Registry │
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────┐
│ Deploy │ (Manual approval required)
│ Prod │
└─────────┘
| Stage | Tools | Coverage Gate | Duration |
|---|---|---|---|
| Lint + Format | ruff, black, mypy, ESLint, Prettier | Zero lint errors | 30s |
| Unit Tests | pytest, Vitest | 80%+ coverage | 3 min |
| SAST + Secrets | Bandit, Semgrep, TruffleHog | No HIGH/CRITICAL | 2 min |
| Build | Docker buildx | Build succeeds | 5 min |
| Image Scan | Trivy, Snyk | No HIGH/CRITICAL CVEs | 2 min |
| DAST | OWASP ZAP | No HIGH/CRITICAL findings | 10 min |
| E2E Tests | Playwright, pytest | All scenarios pass | 8 min |
| Deploy Staging | ArgoCD | Health checks pass | 3 min |
Section 16: Self-Test Framework
16.1 Framework Architecture
The Self-Test Framework is a standalone FastAPI service that continuously validates platform health and readiness through automated test execution.
┌──────────────────────────────────────────────────────────────────────────────┐
│ SELF-TEST FRAMEWORK ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TEST ORCHESTRATOR (FastAPI) │ │
│ │ │ │
│ │ Scheduler Queue Executor Aggregator │ │
│ │ (cron/APScheduler) │ (asyncio) │ │ │
│ │ │ │ │ │ │ │
│ │ 15m health ◄─────┼────────────┼───────────────┤ │ │
│ │ Daily 3am ◄─────┼────────────┼───────────────┤ │ │
│ │ On-demand ◄─────┼────────────┼───────────────┤ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Reporter + Storage │ │ │
│ │ │ PostgreSQL + S3 (evidence) │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 21 TEST SUITES (170+ CASES) │ │
│ │ │ │
│ │ Infrastructure (TC-01..04) │ Core AI (TC-05..10) │ │
│ │ Alerts (TC-11..13) │ Search (TC-14) │ │
│ │ Training (TC-15) │ Security (TC-16..17) │ │
│ │ Resilience (TC-18..21) │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
16.2 Test Suite Catalog (21 Suites)
| Suite ID | Name | Tests | Priority | Description |
|---|---|---|---|---|
| TC-INF-01 | DVR Connectivity | 8 | P0 | RTSP handshake, stream access, credential validation |
| TC-INF-02 | VPN Health | 6 | P0 | Tunnel status, latency, packet loss, throughput |
| TC-INF-03 | Database Health | 8 | P0 | Connection pool, query performance, replication lag |
| TC-INF-04 | Storage Health | 7 | P0 | Disk space, read/write performance, object storage |
| TC-STR-05 | Camera Stream Access | 10 | P0 | All 8 channels streaming, FPS, bitrate verification |
| TC-STR-06 | Live Streaming | 6 | P1 | HLS stream delivery to browsers, latency check |
| TC-AI-07 | Human Detection | 12 | P0 | YOLO accuracy, confidence thresholds, edge cases |
| TC-AI-08 | Face Detection | 10 | P0 | SCRFD accuracy, face bounding box quality |
| TC-AI-09 | Face Recognition | 12 | P0 | ArcFace embeddings, person matching accuracy |
| TC-AI-10 | Unknown Clustering | 8 | P1 | Face grouping quality, similarity thresholds |
| TC-ALT-11 | Alert Generation | 10 | P0 | Rule evaluation, severity assignment, routing |
| TC-ALT-12 | Telegram Delivery | 8 | P1 | Message delivery, formatting, media, error handling |
| TC-ALT-13 | WhatsApp Delivery | 8 | P1 | Template delivery, session messages, error handling |
| TC-CAP-14 | Image Capture | 6 | P1 | Frame extraction quality, storage, metadata |
| TC-CAP-15 | Video Clip Capture | 6 | P1 | Clip generation, compression, storage |
| TC-SEA-16 | Search Retrieval | 8 | P1 | Face search accuracy, text search, performance |
| TC-TRA-17 | Training Workflow | 8 | P2 | Model retraining, evaluation, deployment |
| TC-SEC-18 | Admin Login Security | 10 | P0 | Auth flow, MFA, session management, brute force |
| TC-SEC-19 | RBAC Enforcement | 12 | P0 | Permission checks, role-based access, resource-level |
| TC-RES-20 | Restart Recovery | 8 | P1 | Service restart, state recovery, data integrity |
| TC-RES-21 | Load Handling | 7 | P1 | 8/16/32/64 camera simulation, throughput |
Total: 21 suites, 170 test cases
16.3 Test Scheduling
| Schedule | Suites | Trigger | Notification |
|---|---|---|---|
| Every 15 minutes | Infrastructure (TC-01..04) | APScheduler cron | Alert on failure |
| Daily at 03:00 UTC | All 21 suites | APScheduler cron | Full report via email + Slack |
| On-demand | Any subset | Admin API call | Immediate report |
| Post-deployment | Critical path (TC-01,05,07,11,18) | CI/CD webhook | Pipeline gate |
| Weekly (Sunday 04:00) | Full suite + extended load tests | APScheduler cron | Weekly report |
16.4 Production Readiness Scoring
Base Score: 100.0
Deductions:
P0 failure: -20.0 points each
P1 failure: -10.0 points each
P2 failure: -5.0 points each
P3 failure: -2.0 points each
Minimum score: 0.0
Maximum score: 100.0
| Verdict | Score Range | Meaning | Recommended Action |
|---|---|---|---|
| GO | 95.0 - 100.0 | All critical systems healthy | Proceed with confidence |
| GO WITH CAVEATS | 85.0 - 94.9 | Minor issues, non-critical | Proceed with monitoring plan |
| CONDITIONAL GO | 70.0 - 84.9 | Significant issues | Fix P1 issues before deployment |
| NO-GO | 0.0 - 69.9 | Critical failures | Do not deploy; address P0 issues first |
16.5 Report Generation
| Format | Use Case | Generation | Retention |
|---|---|---|---|
| JSON API | Programmatic consumption, CI/CD integration | Immediate | 90 days |
| HTML Dashboard | Web-based viewing, trend analysis | ~5 seconds | 90 days |
| PDF Report | Email distribution, compliance archiving | ~30 seconds | 1 year |
Section 17: Sample Self-Test Report
17.1 Report Header
================================================================================
SENTINEL AI SURVEILLANCE PLATFORM — SELF-TEST REPORT
================================================================================
Report ID: STR-20250116-030015
Generated: 2025-01-16 03:00:15 UTC
Environment: production
Version: v2.3.1
Triggered By: Scheduled (Daily 3:00 AM)
Duration: 18 minutes 42 seconds
Overall Status: GO WITH CAVEATS
17.2 Executive Summary
| Metric | Value |
|---|---|
| Verdict | GO WITH CAVEATS |
| Production Readiness Score | 94.8 / 100 |
| Total Test Cases | 170 |
| Passed | 168 (98.8%) |
| Failed | 2 (1.2%) |
| Skipped | 0 (0.0%) |
| Previous Run Score | 97.2 / 100 |
| Score Change | -2.4 (downward) |
Priority Breakdown:
| Priority | Total | Passed | Failed | Pass Rate |
|---|---|---|---|---|
| P0 (Critical) | 42 | 42 | 0 | 100.0% |
| P1 (High) | 70 | 68 | 2 | 97.1% |
| P2 (Medium) | 38 | 38 | 0 | 100.0% |
| P3 (Low) | 20 | 20 | 0 | 100.0% |
17.3 System Metrics at Test Time
| Metric | Value | Status |
|---|---|---|
| Active Cameras | 8 / 8 | Online |
| Stream FPS (avg) | 28.5 | Normal |
| AI Inference Latency (p95) | 42ms | Normal |
| Detection Rate (last hour) | 47 events | Normal |
| Database Connections | 18 / 100 | Healthy |
| Storage Usage | 67% | Healthy |
| VPN Latency | 12ms | Excellent |
| API Response Time (p95) | 78ms | Normal |
| Telegram Delivery Rate (24h) | 99.2% | Healthy |
| WhatsApp Delivery Rate (24h) | 99.8% | Healthy |
17.4 Failed Test Cases
Failure 1: TC-ALT-12-004 — Telegram Media Group Delivery
| Field | Value |
|---|---|
| Test Case | TC-ALT-12-004 |
| Suite | Telegram Delivery (TC-ALT-12) |
| Priority | P1 |
| Status | FAILED |
| Duration | 12,450 ms |
| Severity | Medium |
Description: Verify that media group (multiple images) is delivered correctly via Telegram when an alert contains multiple evidence images.
Expected Result: All 3 images delivered as a media group album within 10 seconds.
Actual Result: Only 2 of 3 images delivered. Third image failed with error: telegram_api_error: Request Entity Too Large (413). Image size after processing: 10.8 MB (exceeds Telegram's 10 MB per-image limit for media groups).
Root Cause: The media processing pipeline resizes images to 1280x720 but does not enforce a hard 10 MB per-image cap for Telegram media groups. The iterative quality reduction loop stops at quality 50 but can still produce files > 10 MB.
Recommended Fix: Add a hard size cap check after image processing. If image exceeds 10 MB after quality reduction to 50%, apply additional compression (reduce dimensions or use WebP format).
Workaround: Single-image delivery mode works correctly. Multi-image alerts temporarily deliver images individually.
Failure 2: TC-RES-20-006 — AI Inference Recovery After Simulated Crash
| Field | Value |
|---|---|
| Test Case | TC-RES-20-006 |
| Suite | Restart Recovery (TC-RES-20) |
| Priority | P1 |
| Status | FAILED |
| Duration | 65,200 ms |
| Severity | Medium |
Description: Verify that the AI inference service recovers and resumes processing within 60 seconds after a simulated process crash.
Expected Result: AI inference pod restarts and resumes processing frames within 60 seconds for all 8 cameras.
Actual Result: Pod restarted successfully (18 seconds), but detection did not resume for Camera 3 and Camera 7. Other 6 cameras resumed within 45 seconds. Root cause: model warm-up process failed due to a race condition in GPU memory allocation during concurrent channel initialization.
Root Cause: All 8 channel processors attempt to load the face recognition model simultaneously. On resource-constrained edge hardware, this causes OOM for channels that lose the initialization race.
Recommended Fix: Implement shared model loading — load each model once and share across all channel processors. Add initialization semaphore.
Workaround: Manual restart of affected channel processors via admin API.
17.5 Trending (Last 14 Days)
| Date | Score | Verdict | Notes |
|---|---|---|---|
| 2025-01-02 | 96.5 | GO | — |
| 2025-01-03 | 98.2 | GO | — |
| 2025-01-04 | 97.1 | GO | — |
| 2025-01-05 | 98.8 | GO | — |
| 2025-01-06 | 97.5 | GO | — |
| 2025-01-07 | 98.2 | GO | — |
| 2025-01-08 | 96.8 | GO | TC-RES-21 had 1 P3 failure |
| 2025-01-09 | 97.2 | GO | — |
| 2025-01-10 | 98.2 | GO | — |
| 2025-01-11 | 97.5 | GO | — |
| 2025-01-12 | 98.2 | GO | — |
| 2025-01-13 | 97.2 | GO | — |
| 2025-01-14 | 98.2 | GO | — |
| 2025-01-15 | 97.2 | GO | — |
| 2025-01-16 | 94.8 | GO WITH CAVEATS | 2 P1 failures (see above) |
17.6 Conclusion and Recommendations
Verdict: GO WITH CAVEATS
The Sentinel AI Surveillance Platform is operational and safe to use. All 42 P0 (Critical) test cases passed, confirming that core surveillance functions are working correctly.
Two P1 (High) priority issues were identified with documented workarounds. Both fixes are scheduled for v2.3.2.
Recommended Actions:
- Address TC-ALT-12-004: Add aggressive compression for Telegram media group images
- Address TC-RES-20-006: Implement shared model loading in AI inference service
- Monitor Telegram multi-image alert delivery metrics (workaround active)
- Monitor AI inference recovery metrics (manual restart documented in runbook)
- Validate both fixes in next daily test run after v2.3.2 deployment
Section 18: Risks and Mitigations
18.1 Risk Register Summary
| # | Category | Risk | Likelihood | Impact | Score | Mitigation | Owner |
|---|---|---|---|---|---|---|---|
| T1 | Technical | DVR disk full (0 bytes free) | High | Critical | 20 | Auto-rotation at 85%; emergency cleanup; secondary storage | Platform |
| T2 | Technical | AI false positives in low light | Medium | High | 12 | Night models; adjustable thresholds; operator review | AI Team |
| T3 | Technical | Face rec accuracy with masks/angles | Medium | Medium | 9 | Multi-angle training; pose normalization | AI Team |
| T4 | Technical | VPN tunnel instability | Medium | High | 12 | Auto-reconnect; local buffering; redundant endpoints | Platform |
| T5 | Technical | DB performance at scale | Medium | Medium | 9 | Partitioning; read replicas; archiving | Platform |
| O1 | Operational | Edge hardware failure | Medium | Critical | 15 | Cold spare; config backup; documented replacement | Operations |
| O2 | Operational | Internet loss at edge site | Medium | High | 12 | Local storage buffer; 4G failover; local AI continues | Operations |
| O3 | Operational | Operator training gaps | Medium | Medium | 9 | Training program; inline help; escalation procedures | Operations |
| O4 | Operational | Alert fatigue | Medium | High | 12 | Escalation rules; alert grouping; severity routing | Operations |
| S1 | Security | Biometric data breach | Low | Critical | 10 | AES-256-GCM; signed URLs; GDPR deletion; audit | Security |
| S2 | Security | Unauthorized feed access | Low | Critical | 10 | RBAC; JWT; MFA; session binding; rate limiting | Security |
| S3 | Security | Bot token compromise | Low | High | 8 | Vault encryption; 180-day rotation; IP allowlist | Security |
| A1 | AI/ML | Model drift over time | Medium | High | 12 | Monthly evaluation; auto-monitoring; retraining | AI Team |
| A2 | AI/ML | Training data poisoning | Low | Critical | 10 | Validation; multi-person review; audit trail | AI Team |
| A3 | AI/ML | Demographic bias | Medium | High | 12 | Diverse data; fairness audits; human-in-loop | AI Team |
| A4 | AI/ML | Edge hardware insufficient | Medium | High | 12 | CPU models; cloud offloading; GPU upgrade path | AI Team |
| I1 | Integration | DVR firmware incompatibility | Medium | High | 12 | RTSP compliance check; firmware validation | Engineering |
| C1 | Compliance | GDPR non-compliance | Low | Critical | 10 | PIA; consent mgmt; right to deletion; DPO | DPO |
| R1 | Resource | Budget overrun | Medium | Medium | 9 | Reserved instances; cost monitoring; quotas | Finance |
| R3 | Resource | Timeline delay | Medium | High | 12 | Phased delivery; parallel work; weekly tracking | PMO |
18.2 Critical Risks Requiring Immediate Action
T1 — DVR Disk Full (Score: 20)
- Action: Emergency disk cleanup within 24 hours
- Implement automatic rotation at 85% capacity
- Configure critical alerts at 90%, 95%, 98%
- Owner: Platform Team | Due: 2025-01-17
O1 — Edge Hardware Failure (Score: 15)
- Action: Procure cold spare device
- Document hardware replacement runbook
- Automate configuration restoration from GitOps
- Owner: Operations Team | Due: 2025-02-01
Section 19: Final Implementation Roadmap
19.1 Five-Phase Implementation (20 Weeks)
| Phase | Weeks | Name | Theme | Key Milestone |
|---|---|---|---|---|
| 1 | 1-4 | Foundation | Infrastructure, VPN, edge, database | M1: Infrastructure Ready |
| 2 | 5-8 | Core AI Pipeline | Video ingestion, detection, recognition | M2: AI Pipeline Operational |
| 3 | 9-12 | Application Layer | Dashboard, alerts, notifications | M3: Application Live |
| 4 | 13-16 | Intelligence | Night mode, training, self-learning | M4: Intelligence Features |
| 5 | 17-20 | Hardening | Security, testing, operations, go-live | M5: Production Go-Live |
19.2 Key Milestones and Deliverables
| Milestone | Target Week | Deliverables | Entry Criteria | Exit Criteria |
|---|---|---|---|---|
| M1 Infrastructure | Week 4 | Cloud services, VPN, edge gateway, database, monitoring | Project kickoff, hardware delivered | All services healthy, VPN stable, schema deployed |
| M2 AI Pipeline | Week 8 | Video capture, YOLO, SCRFD, ArcFace, detection DB, API | M1 complete, models ready | All 8 streams ingesting, AI accuracy targets met, API functional |
| M3 Application | Week 12 | Dashboard, alerts, Telegram, WhatsApp, person mgmt, WebSocket | M2 complete, frontend env ready | Dashboard live, alerts delivered, person management working |
| M4 Intelligence | Week 16 | Night mode, training pipeline, self-learning, privacy, search | M3 complete, training data accumulated | All intelligence features operational |
| M5 Go-Live | Week 20 | Security audit, test framework, DR, runbooks, load test, checklist | M4 complete, security audit scheduled | All audits passed, checklist complete, 72h stability |
19.3 Phase Details
Phase 1 (Weeks 1-4): VPC, EKS, RDS, Redis, Kafka, S3, WireGuard VPN, edge gateway OS hardening, Docker setup, database schema with migrations, monitoring stack (Prometheus, Grafana).
Phase 2 (Weeks 5-8): RTSP capture service, YOLO human detection, SCRFD face detection, ArcFace face recognition, embedding storage with pgvector, person matching logic, FastAPI core, authentication.
Phase 3 (Weeks 9-12): Next.js frontend, design system, dashboard, live camera view (HLS), alert engine with rules, Telegram Bot API integration, WhatsApp Business API integration, person gallery and profile, unknown review queue, watchlists, WebSocket real-time updates.
Phase 4 (Weeks 13-16): Night mode AI model, AI Vibe Settings page, training pipeline with model versioning, self-learning service for unknown clusters, privacy mode with face blurring, suspicious activity detection, search service (face + text), system health dashboard.
Phase 5 (Weeks 17-20): Penetration testing, SAST/DAST, self-test framework (21 suites), backup/DR setup, incident response runbooks, load testing (8-64 cameras), performance tuning, operations training, go-live checklist (98 items), production cutover, 72-hour stability monitoring.
19.4 Resource Allocation
| Phase | Engineering | AI/ML | DevOps | QA | Security |
|---|---|---|---|---|---|
| 1: Foundation | 2 | — | 2 | — | 1 |
| 2: Core AI | 2 | 2 | 1 | 1 | — |
| 3: Application | 3 | 1 | 1 | 2 | — |
| 4: Intelligence | 2 | 2 | 1 | 1 | — |
| 5: Hardening | 2 | 1 | 2 | 2 | 2 |
Section 20: Final Production-Readiness Summary
20.1 System at a Glance
| Category | Specification |
|---|---|
| Architecture | Cloud (AWS EKS) + Edge (Intel NUC) + VPN (WireGuard) |
| Services | 12 containerized microservices |
| Security Zones | 5 (Public, App Private, Database, Edge LAN, Camera LAN) |
| AI Pipeline | YOLO11m (human detection) + SCRFD (face detection) + ArcFace (recognition) |
| Embeddings | 512-Dimensional face vectors stored in pgvector |
| Database | PostgreSQL 15, 29 tables, partitioned, AES-256-GCM encrypted |
| Web Application | 18 pages, dark mode, Next.js 14, real-time WebSocket |
| Notifications | Telegram Bot API + WhatsApp Business API (dual channel) |
| Security | TLS 1.3, Argon2id, JWT ES256, TOTP MFA, RBAC (4 roles, 30+ permissions) |
| Testing | 21 test suites, 170+ test cases, automated readiness scoring |
| Reliability | 99.9% uptime target, RTO 1 hour, RPO 15 minutes |
| Timeline | 20 weeks (5 months) to production |
20.2 Readiness Checklist Summary
| Category | Items | Status |
|---|---|---|
| Infrastructure | 14 | Ready to implement |
| Security | 18 | Ready to implement |
| AI/ML Pipeline | 15 | Ready to implement |
| Application | 16 | Ready to implement |
| Operations | 15 | Ready to implement |
| Data & Privacy | 10 | Ready to implement |
| Documentation | 10 | Ready to implement |
| Total | 98 | Ready to implement |
20.3 Estimated Timeline
| Milestone | Target | Duration |
|---|---|---|
| M1: Infrastructure Ready | Week 4 | 4 weeks |
| M2: AI Pipeline Operational | Week 8 | 4 weeks |
| M3: Application Live | Week 12 | 4 weeks |
| M4: Intelligence Features | Week 16 | 4 weeks |
| M5: Production Go-Live | Week 20 | 4 weeks |
| Total to Production | 20 weeks | ~5 months |
Appendices
Appendix A: Cross-Reference to Specialist Documents
| Document | Path | Content |
|---|---|---|
| Notification System | /mnt/agents/output/notification_system.md |
Telegram, WhatsApp, routing rules, templates, retry logic |
| Security Architecture | /mnt/agents/output/security_architecture.md |
SSL/TLS, auth, RBAC, VPN, secrets, audit, GDPR, checklist |
| Web UX Design | /mnt/agents/output/web_ux_design.md |
Design system, 18 pages, navigation, user flows, AI vibe settings |
| Self-Test Framework | /mnt/agents/output/self_test_framework.md |
Framework architecture, 21 suites, scheduling, sample report |
| Operations Plan | /mnt/agents/output/operations_plan.md |
Monitoring, logging, backup, DR, incident response, runbooks |
| Architecture | /mnt/agents/output/architecture.md |
System architecture, data flow, scaling strategy, cost estimates |
Appendix B: Acronyms
| Acronym | Full Form |
|---|---|
| AI | Artificial Intelligence |
| ALB | Application Load Balancer |
| API | Application Programming Interface |
| ArcFace | Additive Angular Margin Loss for Deep Face Recognition |
| CSP | Content Security Policy |
| CSRF | Cross-Site Request Forgery |
| CORS | Cross-Origin Resource Sharing |
| DLQ | Dead Letter Queue |
| DVR | Digital Video Recorder |
| EKS | Elastic Kubernetes Service |
| ES256 | ECDSA using P-256 and SHA-256 |
| FFmpeg | Fast Forward MPEG (multimedia framework) |
| FPS | Frames Per Second |
| GDPR | General Data Protection Regulation |
| GPU | Graphics Processing Unit |
| HLS | HTTP Live Streaming |
| HPA | Horizontal Pod Autoscaler |
| HSTS | HTTP Strict Transport Security |
| JWT | JSON Web Token |
| LUKS | Linux Unified Key Setup |
| MFA | Multi-Factor Authentication |
| mTLS | Mutual TLS |
| mAP | mean Average Precision |
| NMS | Non-Maximum Suppression |
| NUC | Next Unit of Computing |
| OCSP | Online Certificate Status Protocol |
| PII | Personally Identifiable Information |
| PSK | Pre-Shared Key |
| RBAC | Role-Based Access Control |
| RDS | Relational Database Service |
| RPO | Recovery Point Objective |
| RTO | Recovery Time Objective |
| RTSP | Real Time Streaming Protocol |
| S3 | Simple Storage Service |
| SAST | Static Application Security Testing |
| SCRFD | Single-Shot Multi-scale Face Detector |
| SLA | Service Level Agreement |
| SQL | Structured Query Language |
| SSL | Secure Sockets Layer |
| TLS | Transport Layer Security |
| TOTP | Time-based One-Time Password |
| TPM | Trusted Platform Module |
| UAT | User Acceptance Testing |
| VPC | Virtual Private Cloud |
| VPN | Virtual Private Network |
| WAF | Web Application Firewall |
| WORM | Write Once Read Many |
| XSS | Cross-Site Scripting |
| YOLO | You Only Look Once |
End of Document
Document Version: 1.0 Classification: Confidential — Internal Use Only Next Review: 2025-04-16 Owner: Sentinel AI Architecture Team