UC-PLT-401: Multi-Region High Availability
1. Metadata
| Property | Value |
|---|---|
| ID | UC-PLT-401 |
| Actor | Cloud Infrastructure Manager |
| Trigger | Regional datacenter outage or high latency detection |
| Pre-conditions | Database replication active; Traffic manager (CDN) configured |
| Post-conditions | Traffic failed over to secondary region; Zero perceived downtime |
| Side Effects | Temporary increase in inter-regional data transfer costs |
2. Description
Ensures the nutrition platform remains available even during major cloud provider outages, providing 99.9% uptime for critical school-hour operations.
3. Success Scenario
- Health Check: Global Traffic Manager detects an increase in 5xx errors from the Primary Region (e.g., India West).
- Decision: System triggers the 'Failover' circuit breaker.
- Redirection: DNS entries are updated to point to the Secondary Region (e.g., India South) within 30 seconds.
- State Sync: Secondary Region promotes its PostgreSQL Read-Replica to 'Leader' status.
- Service Activation: Cloudflare Workers begin routing all
/api/v1/*traffic to the new Cluster. - Notification: Infra-monitoring sends an 'Urgent' alert to the DevOps team.
4. Acceptance Criteria
- [ ] RPO (Recovery Point Objective): Maximum data loss of < 5 seconds during failover.
- [ ] RTO (Recovery Time Objective): Full service restoration in < 60 seconds.
- [ ] Global Edge: Static assets must be served from 200+ edge locations to ensure < 100ms latency.
5. Infrastructure Diagram
graph TD
U[User] --> CF[Cloudflare Global Edge]
CF -- Condition: West Down --> S2[Region: India South]
CF -- Normal --> S1[Region: India West]
S1 --> DB1[(Primary DB)]
S2 --> DB2[(Standby DB)]
DB1 -. Replicate .-> DB2