Skip to main content

Models, Pricing, and Cost

Cruvero tracks LLM usage and cost at the per-step and per-run level. It supports multiple LLM providers with automatic failover, model preference controls per namespace, and cost-aware execution that can downgrade models when approaching quota limits.

Source: internal/llm/*, internal/agent/cost.go

Provider

  • Providers: openrouter (default), azure, openai, and google.
  • Model IDs are provider-scoped:
    • OpenRouter: x-ai/grok-4.1-fast
    • Azure: azure/<deployment>
    • OpenAI: openai/<model>
    • Google Gemini: google/<model>

Model Catalog

  • Stored in model_catalog (Postgres).
  • Ingested via cmd/models-refresh.
  • Queried via cmd/models-list. For Azure, cmd/models-refresh --source azure uses:
  • CRUVERO_AZURE_PRICING_JSON (optional, per deployment)
  • CRUVERO_AZURE_CONTEXT_JSON (optional, per deployment)

Azure Override Format

Azure deployment listing does not include price/context metadata, so provide overrides with JSON maps.

CRUVERO_AZURE_PRICING_JSON accepts deployment keys (with or without azure/ prefix):

{
"deploy-gpt4o-prod": { "prompt": "0.000005", "completion": "0.000015", "request": "0" },
"azure/deploy-gpt4o-mini": { "prompt": "0.00000015", "completion": "0.00000060", "request": "0" },
"gpt-4o": { "prompt": "0.000005", "completion": "0.000015", "request": "0" }
}
  • prompt and completion are per-token prices (same format used across model catalog pricing fields).
  • request is an optional fixed per-request surcharge.
  • Deployment-specific keys win over model-name fallback keys when both are present.

CRUVERO_AZURE_CONTEXT_JSON maps deployment/model keys to context length:

{
"deploy-gpt4o-prod": 128000,
"azure/deploy-gpt4o-mini": 128000,
"gpt-4o": 128000
}

Cost Accounting

  • Each decision stores estimated and actual cost.
  • If usage cost is returned by provider, it supersedes estimates.
  • Per-tool cost hints can be set in tool registry definitions.
  • For Azure, when provider usage cost is 0, Cruvero computes cost as:
    • request + (prompt_tokens * prompt_price) + (completion_tokens * completion_price)
    • using CRUVERO_AZURE_PRICING_JSON deployment overrides.

Cost Policy

  • MaxCostPerRun, MaxCostPerStep
  • PreferCheaper with CheapModel
  • ToolCostDefault

Preferred Models

  • Stored in kv_store under model_prefs:<namespace>.
  • Managed via cmd/model-prefs or tool model_prefs.

Speculative Execution

  • Run multiple models and select by lowest cost or highest similarity.
  • cmd/run --spec-models ... --spec-select ...