Skip to main content

DR Readiness Checklist

When to use: Before declaring disaster-recovery readiness for a new environment, or as a periodic validation (recommended quarterly).

Prerequisites:

  • Access to backup storage (S3 or compatible)
  • Non-production environment available for restore testing
  • Familiarity with scripts/ops/ drill scripts

Run this checklist before declaring disaster-recovery readiness.

Backups and Restore

  • Daily backup schedule configured and observed.
  • Most recent backup verified in object storage.
  • Backup drill run recently:
    • scripts/ops/backup-restore-drill.sh
  • Restore path tested in non-production environment.

Failover

  • Game day script executed in staging:
    • scripts/ops/ha-failover-game-day.sh
  • Worker failover and recovery verified.
  • Temporal namespace health remains stable across failover event.

Data Integrity

  • Audit chain verification succeeds for critical tenants.
  • Tool registry export/import tested.
  • Quota and tenant policy data available post-restore.

Operational Signals

  • /api/health and /api/health/detail checks pass in normal and failover states.
  • Security alerts and quota critical events reviewed.
  • LLM failover events observed and reasonable (no sustained churn).

Ownership

  • Incident commander and on-call ownership documented.
  • Escalation contacts for platform, database, and security confirmed.
  • Recovery decision log template prepared for incident usage.