Skip to content

UC-PLT-401: Multi-Region High Availability

1. Metadata

Property Value
ID UC-PLT-401
Actor Cloud Infrastructure Manager
Trigger Regional datacenter outage or high latency detection
Pre-conditions Database replication active; Traffic manager (CDN) configured
Post-conditions Traffic failed over to secondary region; Zero perceived downtime
Side Effects Temporary increase in inter-regional data transfer costs

2. Description

Ensures the nutrition platform remains available even during major cloud provider outages, providing 99.9% uptime for critical school-hour operations.

3. Success Scenario

  1. Health Check: Global Traffic Manager detects an increase in 5xx errors from the Primary Region (e.g., India West).
  2. Decision: System triggers the 'Failover' circuit breaker.
  3. Redirection: DNS entries are updated to point to the Secondary Region (e.g., India South) within 30 seconds.
  4. State Sync: Secondary Region promotes its PostgreSQL Read-Replica to 'Leader' status.
  5. Service Activation: Cloudflare Workers begin routing all /api/v1/* traffic to the new Cluster.
  6. Notification: Infra-monitoring sends an 'Urgent' alert to the DevOps team.

4. Acceptance Criteria

  • [ ] RPO (Recovery Point Objective): Maximum data loss of < 5 seconds during failover.
  • [ ] RTO (Recovery Time Objective): Full service restoration in < 60 seconds.
  • [ ] Global Edge: Static assets must be served from 200+ edge locations to ensure < 100ms latency.

5. Infrastructure Diagram

graph TD U[User] --> CF[Cloudflare Global Edge] CF -- Condition: West Down --> S2[Region: India South] CF -- Normal --> S1[Region: India West] S1 --> DB1[(Primary DB)] S2 --> DB2[(Standby DB)] DB1 -. Replicate .-> DB2