Tenant Onboarding Runbook
When to use: A new tenant is approved for production or staging, or an existing tenant is being migrated to a dedicated namespace with custom policies.
Prerequisites:
- Admin access to Postgres and Temporal
- Vault access for storing LLM credentials
- Tenant approval documentation with agreed quotas and isolation requirements
Trigger Conditions
Run this procedure when:
- a new tenant is approved for production/staging,
- an existing tenant is being migrated to dedicated namespace/policies.
Checklist
- Tenant record created and enabled.
- Temporal namespace provisioned and healthy.
- Quotas and rate limits configured.
- LLM credentials stored in Vault.
- Tool registry seeded.
- Health verification complete.
Step-by-Step Actions
- Create tenant configuration:
go run ./cmd/tenant create --id <tenant_id> --display-name "<name>" --namespace cruvero-<tenant_id> --memory-namespace <tenant_id>
- Verify tenant exists:
go run ./cmd/tenant get --id <tenant_id>
- Set initial quotas/rate limits (example):
go run ./cmd/tenant update --id <tenant_id> --rpm 120 --rph 5000 --tpd 2000000 --max-runs-per-day 10000
- Configure allowed/blocked tools/models for tenant policy:
go run ./cmd/tenant update --id <tenant_id> --allowed-models x-ai/grok-4-fast --allowed-tools http_get,calculator
- Store tenant LLM secrets in Vault (path model):
secret/cruvero/tenants/<tenant_id>/llm/openroutersecret/cruvero/tenants/<tenant_id>/llm/azure
- Seed tool registry (global/default plus tenant-specific if needed):
go run ./cmd/seed-registry --id default --version v2.0.0
- Validate health from UI/API:
curl -fsS http://<ui>/api/healthcurl -fsS http://<ui>/api/health/detail
- Run sample workflow in tenant context:
go run ./cmd/run --prompt "Health check for tenant <tenant_id>"
Verification
- Tenant appears in
tenant listand isenabled=true. - Namespace is reachable and workers pick up tasks.
- Sample workflow completes without cross-tenant leakage.
- Quota counters initialize and increment for tenant runs.
Escalation Path
- Escalate to Platform/SRE if namespace provisioning fails or workers cannot connect.
- Escalate to Security if Vault secret retrieval or lease renewal fails.
- Escalate to Product/Application owner if policy defaults block expected tools/models.