Source: docs/manual/models-cost.md

This page is generated by site/scripts/sync-manual-docs.mjs.

Models, Pricing, and Cost

Cruvero tracks LLM usage and cost at the per-step and per-run level. It supports multiple LLM providers with automatic failover, model preference controls per namespace, and cost-aware execution that can downgrade models when approaching quota limits.

Source: internal/llm/*, internal/agent/cost.go

Provider

Providers: openrouter (default), azure, openai, and google.
Model IDs are provider-scoped:
- OpenRouter: x-ai/grok-4.1-fast
- Azure: azure/<deployment>
- OpenAI: openai/<model>
- Google Gemini: google/<model>

Model Catalog

Stored in model_catalog (Postgres).
Ingested via cmd/models-refresh.
Queried via cmd/models-list. For Azure, cmd/models-refresh --source azure uses:
CRUVERO_AZURE_PRICING_JSON (optional, per deployment)
CRUVERO_AZURE_CONTEXT_JSON (optional, per deployment)

Azure Override Format

Azure deployment listing does not include price/context metadata, so provide overrides with JSON maps.

CRUVERO_AZURE_PRICING_JSON accepts deployment keys (with or without azure/ prefix):

{
  "deploy-gpt4o-prod": { "prompt": "0.000005", "completion": "0.000015", "request": "0" },
  "azure/deploy-gpt4o-mini": { "prompt": "0.00000015", "completion": "0.00000060", "request": "0" },
  "gpt-4o": { "prompt": "0.000005", "completion": "0.000015", "request": "0" }
}

prompt and completion are per-token prices (same format used across model catalog pricing fields).
request is an optional fixed per-request surcharge.
Deployment-specific keys win over model-name fallback keys when both are present.

CRUVERO_AZURE_CONTEXT_JSON maps deployment/model keys to context length:

{
  "deploy-gpt4o-prod": 128000,
  "azure/deploy-gpt4o-mini": 128000,
  "gpt-4o": 128000
}

Cost Accounting

Each decision stores estimated and actual cost.
If usage cost is returned by provider, it supersedes estimates.
Per-tool cost hints can be set in tool registry definitions.
For Azure, when provider usage cost is 0, Cruvero computes cost as:
- request + (prompt_tokens * prompt_price) + (completion_tokens * completion_price)
- using CRUVERO_AZURE_PRICING_JSON deployment overrides.

Cost Policy

MaxCostPerRun, MaxCostPerStep
PreferCheaper with CheapModel
ToolCostDefault

Preferred Models

Stored in kv_store under model_prefs:<namespace>.
Managed via cmd/model-prefs or tool model_prefs.

Speculative Execution

Run multiple models and select by lowest cost or highest similarity.
cmd/run --spec-models ... --spec-select ...

Provider​

Model Catalog​

Azure Override Format​

Cost Accounting​

Cost Policy​

Preferred Models​

Speculative Execution​

Related Docs​