DR Readiness Checklist
When to use: Before declaring disaster-recovery readiness for a new environment, or as a periodic validation (recommended quarterly).
Prerequisites:
- Access to backup storage (S3 or compatible)
- Non-production environment available for restore testing
- Familiarity with
scripts/ops/drill scripts
Run this checklist before declaring disaster-recovery readiness.
Backups and Restore
- Daily backup schedule configured and observed.
- Most recent backup verified in object storage.
- Backup drill run recently:
scripts/ops/backup-restore-drill.sh
- Restore path tested in non-production environment.
Failover
- Game day script executed in staging:
scripts/ops/ha-failover-game-day.sh
- Worker failover and recovery verified.
- Temporal namespace health remains stable across failover event.
Data Integrity
- Audit chain verification succeeds for critical tenants.
- Tool registry export/import tested.
- Quota and tenant policy data available post-restore.
Operational Signals
-
/api/healthand/api/health/detailchecks pass in normal and failover states. - Security alerts and quota critical events reviewed.
- LLM failover events observed and reasonable (no sustained churn).
Ownership
- Incident commander and on-call ownership documented.
- Escalation contacts for platform, database, and security confirmed.
- Recovery decision log template prepared for incident usage.