Scaling Runbook

When to use: Worker throughput is degraded, task queue depth is growing, or step latency exceeds SLO targets for 10+ minutes.

Prerequisites:

Trigger Conditions

Scale review is required when one or more conditions persist for 10+ minutes:

Confirm health baseline:
- curl -fsS http://<ui>/api/health
- curl -fsS http://<ui>/api/health/detail
Validate bottleneck component:
- workers saturated,
- Temporal matching/history pressure,
- Postgres connection or query saturation.
Scale workers immediately if queue depth is rising:
- kubectl -n cruvero scale deployment/cruvero-worker --replicas=<N>
If HPA is enabled, tune bounds/target:
- edit deploy/kubernetes/hpa.yaml and raise maxReplicas,
- adjust CPU and/or queue-depth target.
Temporal scaling (platform/SRE step):
- increase matching/history capacity,
- add shards/partitions based on Temporal deployment model,
- verify frontend saturation is not the limiter.
Postgres scaling:
- ensure PgBouncer pooling is in use,
- increase pool safely,
- route heavy reads to replicas where supported.
Recheck queue drain and latency after each change before further scaling.

Escalate to Platform/SRE when:
- scaling workers does not reduce queue depth,
- Temporal services remain saturated after worker scale-out,
- Postgres saturation persists after pool tuning.
Escalate to Application owners when:
- workload spike caused by runaway workflow/tool behavior,
- tenant-specific abusive pattern needs policy mitigation.