Skip to main content

Tenant Onboarding Runbook

When to use: A new tenant is approved for production or staging, or an existing tenant is being migrated to a dedicated namespace with custom policies.

Prerequisites:

  • Admin access to Postgres and Temporal
  • Vault access for storing LLM credentials
  • Tenant approval documentation with agreed quotas and isolation requirements

Trigger Conditions

Run this procedure when:

  • a new tenant is approved for production/staging,
  • an existing tenant is being migrated to dedicated namespace/policies.

Checklist

  • Tenant record created and enabled.
  • Temporal namespace provisioned and healthy.
  • Quotas and rate limits configured.
  • LLM credentials stored in Vault.
  • Tool registry seeded.
  • Health verification complete.

Step-by-Step Actions

  1. Create tenant configuration:
    • go run ./cmd/tenant create --id <tenant_id> --display-name "<name>" --namespace cruvero-<tenant_id> --memory-namespace <tenant_id>
  2. Verify tenant exists:
    • go run ./cmd/tenant get --id <tenant_id>
  3. Set initial quotas/rate limits (example):
    • go run ./cmd/tenant update --id <tenant_id> --rpm 120 --rph 5000 --tpd 2000000 --max-runs-per-day 10000
  4. Configure allowed/blocked tools/models for tenant policy:
    • go run ./cmd/tenant update --id <tenant_id> --allowed-models x-ai/grok-4-fast --allowed-tools http_get,calculator
  5. Store tenant LLM secrets in Vault (path model):
    • secret/cruvero/tenants/<tenant_id>/llm/openrouter
    • secret/cruvero/tenants/<tenant_id>/llm/azure
  6. Seed tool registry (global/default plus tenant-specific if needed):
    • go run ./cmd/seed-registry --id default --version v2.0.0
  7. Validate health from UI/API:
    • curl -fsS http://<ui>/api/health
    • curl -fsS http://<ui>/api/health/detail
  8. Run sample workflow in tenant context:
    • go run ./cmd/run --prompt "Health check for tenant <tenant_id>"

Verification

  • Tenant appears in tenant list and is enabled=true.
  • Namespace is reachable and workers pick up tasks.
  • Sample workflow completes without cross-tenant leakage.
  • Quota counters initialize and increment for tenant runs.

Escalation Path

  • Escalate to Platform/SRE if namespace provisioning fails or workers cannot connect.
  • Escalate to Security if Vault secret retrieval or lease renewal fails.
  • Escalate to Product/Application owner if policy defaults block expected tools/models.