Tool Registry Restructure
The tool registry restructure (Phase 19) extends the tool registry with usage metrics, quality tracking, semantic search, and automated quarantine. It adds a metrics store, weighted scoring, and a quality tracker that can disable tools exhibiting degraded behavior.
Source: internal/registry/* (extended)
Overview
The tool registry restructure upgrades tool discovery from keyword-based matching to semantic vector search with quality tracking. It introduces a 3-stage search pipeline (vector retrieval, quality re-ranking, result assembly), per-tool quality metrics with LLM auto-rating, degradation detection with quarantine escalation, and user-submitted feedback. All functionality extends the existing internal/registry/ package.
When enabled (CRUVERO_TOOL_SEARCH_SEMANTIC=true), the agent's filterRegistryForPrompt uses semantic vector search instead of keyword scoring. When disabled or when the vector store is unavailable, it falls back to existing keyword scoring transparently.
Semantic Search
Three-stage pipeline for tool discovery by semantic similarity:
Stage 1: Vector Retrieval
- Embed the query text using the configured embedding provider
- Search the
tool_registryvector store collection for top-K candidates (default K=30) - Apply tenant isolation filter
Stage 2: Quality Re-Ranking
Score each candidate using a weighted formula:
score = W_sim * similarity + W_qual * quality + W_rec * recency
| Weight | Default | Source |
|---|---|---|
| Similarity | 0.5 | Vector cosine similarity from Stage 1 |
| Quality | 0.35 | success_rate * avg_llm_rating from tool metrics |
| Recency | 0.15 | Recency decay from last successful call |
Tools with active quarantine entries are excluded from results.
Stage 3: Result Assembly
- Sort by composite score, truncate to requested limit (default 20)
- Return scored results with component breakdowns for transparency
Quality Tracking
LLM Auto-Rating
After each tool execution in ToolExecuteActivity, a non-blocking Temporal activity records an ExecutionOutcome including:
- Binary success/failure (existing)
- Execution latency
- LLM quality rating (0.0-1.0) from a post-execution assessment prompt
The LLM rating uses a lightweight prompt asking the model to rate tool output relevance and correctness. This runs as a child activity with short timeout (5s) and fire-and-forget semantics — tool execution is never blocked.
Composite Quality Score
Each tool maintains a running quality score computed as success_rate * avg_llm_rating. This score is stored in the extended tool_retry_stats table and used during search re-ranking.
Degradation Detection
A rolling quality score is computed per tool. When the score drops below a configurable threshold:
- Warning — Structured log warning + NATS event (if Phase 12 active) or memory episode fallback
- Alert — Set
degraded_sincetimestamp in tool metrics - Quarantine escalation — If quality stays below threshold for N consecutive calls, insert into existing
tool_quarantinetable with reason referencing quality degradation
Tool Feedback
User-submitted quality ratings via the tool-feedback CLI or API. Records a rating (0.0-1.0) and optional comment into the tool_feedback table. Feedback contributes to the tool's running quality metrics without modifying immutable tool definitions.
Configuration
| Variable | Default | Description |
|---|---|---|
CRUVERO_TOOL_SEARCH_SEMANTIC | false | Enable semantic vector search for tool discovery |
CRUVERO_TOOL_SEARCH_COLLECTION | tool_registry | Vector store collection name |
CRUVERO_TOOL_SEARCH_K | 30 | Vector retrieval candidates (Stage 1) |
CRUVERO_TOOL_SEARCH_RESULT_LIMIT | 20 | Max tools returned to agent |
CRUVERO_TOOL_SEARCH_W_SIMILARITY | 0.5 | Ranking weight: vector similarity |
CRUVERO_TOOL_SEARCH_W_QUALITY | 0.35 | Ranking weight: quality score |
CRUVERO_TOOL_SEARCH_W_RECENCY | 0.15 | Ranking weight: recency decay |
CRUVERO_TOOL_QUALITY_ENABLED | true | Enable quality tracking and LLM auto-rating |
CRUVERO_TOOL_QUALITY_RATING_TIMEOUT | 5s | Timeout for LLM auto-rating activity |
CRUVERO_TOOL_QUALITY_DEGRADE_THRESHOLD | 0.3 | Quality score below which a tool is considered degraded |
CRUVERO_TOOL_QUALITY_QUARANTINE_AFTER | 5 | Consecutive degraded calls before quarantine escalation |
CLI Tools
tool-feedback submit
Submit a quality rating for a tool.
tool-feedback submit --tool email_dispatch --rating 0.85 --comment "Fast and reliable"
tool-feedback list
List recent feedback for a tool.
tool-feedback list --tool email_dispatch --limit 20
tool-feedback list --tool email_dispatch --format json
tool-feedback metrics
Display current quality metrics for a tool or list degraded tools.
tool-feedback metrics --tool email_dispatch
tool-feedback metrics --degraded
tool-feedback metrics --tool email_dispatch --format json
seed-registry (indexing)
The existing seed-registry CLI is extended to index tool descriptions into the vector store after seeding.
seed-registry --file tools.json --tenant my_tenant