Skip to main content

Tool Registry Restructure

The tool registry restructure (Phase 19) extends the tool registry with usage metrics, quality tracking, semantic search, and automated quarantine. It adds a metrics store, weighted scoring, and a quality tracker that can disable tools exhibiting degraded behavior.

Source: internal/registry/* (extended)

Overview

The tool registry restructure upgrades tool discovery from keyword-based matching to semantic vector search with quality tracking. It introduces a 3-stage search pipeline (vector retrieval, quality re-ranking, result assembly), per-tool quality metrics with LLM auto-rating, degradation detection with quarantine escalation, and user-submitted feedback. All functionality extends the existing internal/registry/ package.

When enabled (CRUVERO_TOOL_SEARCH_SEMANTIC=true), the agent's filterRegistryForPrompt uses semantic vector search instead of keyword scoring. When disabled or when the vector store is unavailable, it falls back to existing keyword scoring transparently.

Three-stage pipeline for tool discovery by semantic similarity:

Stage 1: Vector Retrieval

  • Embed the query text using the configured embedding provider
  • Search the tool_registry vector store collection for top-K candidates (default K=30)
  • Apply tenant isolation filter

Stage 2: Quality Re-Ranking

Score each candidate using a weighted formula:

score = W_sim * similarity + W_qual * quality + W_rec * recency
WeightDefaultSource
Similarity0.5Vector cosine similarity from Stage 1
Quality0.35success_rate * avg_llm_rating from tool metrics
Recency0.15Recency decay from last successful call

Tools with active quarantine entries are excluded from results.

Stage 3: Result Assembly

  • Sort by composite score, truncate to requested limit (default 20)
  • Return scored results with component breakdowns for transparency

Quality Tracking

LLM Auto-Rating

After each tool execution in ToolExecuteActivity, a non-blocking Temporal activity records an ExecutionOutcome including:

  • Binary success/failure (existing)
  • Execution latency
  • LLM quality rating (0.0-1.0) from a post-execution assessment prompt

The LLM rating uses a lightweight prompt asking the model to rate tool output relevance and correctness. This runs as a child activity with short timeout (5s) and fire-and-forget semantics — tool execution is never blocked.

Composite Quality Score

Each tool maintains a running quality score computed as success_rate * avg_llm_rating. This score is stored in the extended tool_retry_stats table and used during search re-ranking.

Degradation Detection

A rolling quality score is computed per tool. When the score drops below a configurable threshold:

  1. Warning — Structured log warning + NATS event (if Phase 12 active) or memory episode fallback
  2. Alert — Set degraded_since timestamp in tool metrics
  3. Quarantine escalation — If quality stays below threshold for N consecutive calls, insert into existing tool_quarantine table with reason referencing quality degradation

Tool Feedback

User-submitted quality ratings via the tool-feedback CLI or API. Records a rating (0.0-1.0) and optional comment into the tool_feedback table. Feedback contributes to the tool's running quality metrics without modifying immutable tool definitions.

Configuration

VariableDefaultDescription
CRUVERO_TOOL_SEARCH_SEMANTICfalseEnable semantic vector search for tool discovery
CRUVERO_TOOL_SEARCH_COLLECTIONtool_registryVector store collection name
CRUVERO_TOOL_SEARCH_K30Vector retrieval candidates (Stage 1)
CRUVERO_TOOL_SEARCH_RESULT_LIMIT20Max tools returned to agent
CRUVERO_TOOL_SEARCH_W_SIMILARITY0.5Ranking weight: vector similarity
CRUVERO_TOOL_SEARCH_W_QUALITY0.35Ranking weight: quality score
CRUVERO_TOOL_SEARCH_W_RECENCY0.15Ranking weight: recency decay
CRUVERO_TOOL_QUALITY_ENABLEDtrueEnable quality tracking and LLM auto-rating
CRUVERO_TOOL_QUALITY_RATING_TIMEOUT5sTimeout for LLM auto-rating activity
CRUVERO_TOOL_QUALITY_DEGRADE_THRESHOLD0.3Quality score below which a tool is considered degraded
CRUVERO_TOOL_QUALITY_QUARANTINE_AFTER5Consecutive degraded calls before quarantine escalation

CLI Tools

tool-feedback submit

Submit a quality rating for a tool.

tool-feedback submit --tool email_dispatch --rating 0.85 --comment "Fast and reliable"

tool-feedback list

List recent feedback for a tool.

tool-feedback list --tool email_dispatch --limit 20
tool-feedback list --tool email_dispatch --format json

tool-feedback metrics

Display current quality metrics for a tool or list degraded tools.

tool-feedback metrics --tool email_dispatch
tool-feedback metrics --degraded
tool-feedback metrics --tool email_dispatch --format json

seed-registry (indexing)

The existing seed-registry CLI is extended to index tool descriptions into the vector store after seeding.

seed-registry --file tools.json --tenant my_tenant