Source: docs/manual/prompt-library-v2.md

This page is generated by site/scripts/sync-manual-docs.mjs.

Prompt Library v2

Advanced prompt management extensions for deployment environments, composable snippets, A/B experimentation, structured evaluation, version diffing, CI/CD integration, and prompt analytics.

Source: internal/promptlib/*

Overview

Phase 26 extends the Prompt Library with lifecycle management features that bridge the gap between "prompts exist in a catalog" and "prompts are safely managed across their lifecycle in production." All extensions are backward-compatible — when disabled, the system behaves identically to the base Phase 18 prompt library.

Deployment Environments

Prompts are promoted through named environments (default: dev → staging → production). Each promotion is an assignment — the immutable prompt version is linked to the environment, not copied.

prompt created → dev → staging → production
                  ↑      ↑         ↑
              quality gates enforce thresholds at each transition

EnvironmentStore.Promote upserts the assignment and appends to promotion history
EnvironmentStore.GetActive resolves the current prompt version for an environment
Searcher filters results by environment when SearchQuery.Environment is set
When environments are disabled, all prompts are visible (Phase 18 behavior)

Quality Gates

Promotion can be gated on quality thresholds:

Gate	Description
`MinUsageCount`	Minimum number of agent invocations before promotion
`MinSuccessRate`	Minimum success rate (0.0–1.0) from `prompt_metrics`
`MinAvgRating`	Minimum average LLM rating (0.0–1.0)
`RequireEvalPass`	Require a passing eval run before promotion

All conditions must pass. Each failure includes a human-readable reason (e.g., "success_rate 0.72 < minimum 0.80").

Composable Snippets

Prompts can reference other prompts as composable fragments using Go template syntax:

{{snippet "safety-guardrails"}}
{{snippet "output-format" "v3"}}
{{snippet "preamble" "production"}}

First argument: snippet prompt ID
Optional second argument: version number or environment label
Resolution: by version → by environment label → latest
Cycle detection prevents infinite loops (max depth configurable, default 3)
Snippet dependencies tracked in prompt_snippet_refs table

A/B Experiments

Controlled prompt A/B testing using Temporal SideEffect for replay-safe variant selection:

Traffic split by percentage (variants must sum to 100%)
Variant selection is deterministic on replay (seeded from RunID + PromptID)
Outcomes recorded fire-and-forget (non-blocking)
Auto-completion when sample size threshold reached (promotes winner)
Only one active experiment per prompt at a time

Evaluation Framework

Structured evaluation with datasets, scorers, and a Temporal workflow orchestrator:

Datasets

Versioned collections of input/expected-output pairs. Created from JSON, YAML, CSV, or production logs. Copy-on-write versioning ensures immutable dataset snapshots.

Built-in Scorers

Scorer	Description
`exact_match`	Binary 1.0/0.0 on exact string match
`contains`	1.0 if all required substrings found
`regex`	1.0 if output matches pattern
`cosine_similarity`	Embedding-based similarity (0.0–1.0)
`llm_judge`	LLM-as-a-judge quality rating (0.0–1.0)

EvalRunWorkflow

Temporal workflow that processes dataset entries with configurable concurrency:

Load prompt and dataset
For each entry: render template → call LLM → run scorers → store result
Aggregate into EvalSummary (per-scorer averages, pass rate, latency, cost)
Individual entry failures do not abort the run

Version Diff

Line-level diff between prompt versions with metadata comparison:

Myers diff algorithm produces add/delete/modify/equal hunks
Context lines configurable (default 3)
Summary-only mode for prompts > 10KB
Metadata diff detects parameter, tag, and type changes
Available via API (/api/prompts/{id}/diff) and CLI (prompt-diff)

CI/CD Integration

The prompt-eval CLI is designed for CI/CD pipelines:

--ci flag for machine-readable JSON output
--github-summary for GitHub Actions job summary markdown
--regression-baseline auto to find the most recent passing baseline
--format markdown for PR comment comparison tables
Exit code 0 on pass, 1 on fail (CI/CD compatible)

Example GitHub Actions usage:

prompt-eval \
  --prompt-hash "$HASH" \
  --dataset "$DATASET_ID" \
  --scorers "exact_match,llm_judge" \
  --threshold 0.80 \
  --fail-on-regression \
  --ci --github-summary

Production Log → Dataset Pipeline

Converts production agent runs into eval datasets via the audit log:

Successful runs become entries with expected_output = actual output
Failed runs become entries flagged for human review
Configurable time range, entry limits, and failure-only filtering
Creates regression test suites from real production data

prompt-dataset --from-logs \
  --prompt-hash abc123 \
  --since 168h \
  --max-entries 500 \
  --failures-only

NATS Cache Invalidation

When a prompt is promoted, a NATS event (prompt.promoted) triggers immediate cache invalidation across all agents. Without NATS, agents fall back to TTL-based expiry (default 5 minutes).

Publisher: best-effort (failure does not roll back promotion)
Subscriber: read-through cache with TTL fallback
Subject: configurable (default cruvero.prompts.events)

Provider Blueprints

Provider-agnostic intermediate representation between prompt content and LLM API calls:

Adapter	Format
`OpenAIAdapter`	OpenAI chat completion format
`AnthropicAdapter`	Anthropic messages format (system extracted)
`AzureAdapter`	Azure OpenAI with deployment name mapping

RenderToBlueprint maps PromptType to message roles (system → system message, user → user message, task → system + user pair, chain_of_thought → system with CoT + user).

Prompt Analytics

Time-series queries over prompt usage and quality metrics:

GetTimeSeries: usage_count, success_rate, avg_rating, failure_count bucketed by hour/day/week
GetTopPrompts: prompts ranked by metric over time range
GetPromptComparison: side-by-side metrics for multiple prompts

API endpoints:

GET /api/prompts/{hash}/analytics?metric=usage_count&interval=day
GET /api/prompts/rankings?metric=success_rate&limit=10
GET /api/prompts/compare?hashes=abc,def,ghi

Configuration

Variable	Default	Description
`CRUVERO_PROMPTLIB_ENVS_ENABLED`	`true`	Enable deployment environments
`CRUVERO_PROMPTLIB_DEFAULT_ENVS`	`dev,staging,production`	Environment names created per tenant
`CRUVERO_PROMPTLIB_SNIPPETS_ENABLED`	`true`	Enable snippet composition
`CRUVERO_PROMPTLIB_SNIPPET_MAX_DEPTH`	`3`	Max nested snippet depth
`CRUVERO_PROMPTLIB_EXPERIMENTS_ENABLED`	`false`	Enable A/B experimentation
`CRUVERO_PROMPTLIB_EXPERIMENT_MAX_VARIANTS`	`4`	Max variants per experiment
`CRUVERO_PROMPTLIB_EVAL_ENABLED`	`true`	Enable evaluation framework
`CRUVERO_PROMPTLIB_EVAL_TIMEOUT`	`300s`	Per-entry eval timeout
`CRUVERO_PROMPTLIB_EVAL_MAX_CONCURRENT`	`10`	Max concurrent eval entries
`CRUVERO_PROMPTLIB_DIFF_CONTEXT_LINES`	`3`	Lines of context in diff output
`CRUVERO_PROMPTLIB_NATS_CACHE_ENABLED`	`false`	Enable NATS cache invalidation
`CRUVERO_PROMPTLIB_NATS_SUBJECT`	`cruvero.prompts.events`	NATS subject for prompt events
`CRUVERO_PROMPTLIB_BLUEPRINT_ENABLED`	`false`	Enable provider-agnostic blueprints
`CRUVERO_PROMPTLIB_ANALYTICS_RETENTION`	`90d`	Analytics data retention period

CLI Tools

CLI	Description
`prompt-eval`	Run eval against dataset, exit 0/1 for CI/CD
`prompt-dataset`	Create/manage eval datasets (JSON, YAML, CSV, logs)
`prompt-experiment`	Create/list/complete A/B experiments
`prompt-diff`	Diff prompt versions with colored output

Prompt Library — Base prompt library (Phase 18)
Memory System — Salience scoring patterns reused by search ranking
Tools and Registry — Tool interface pattern used by prompt_promote
Configuration and Environment — Env var conventions

Overview​

Deployment Environments​

Quality Gates​

Composable Snippets​

A/B Experiments​

Evaluation Framework​

Datasets​

Built-in Scorers​

EvalRunWorkflow​

Version Diff​

CI/CD Integration​

Production Log → Dataset Pipeline​

NATS Cache Invalidation​

Provider Blueprints​

Prompt Analytics​

Configuration​

CLI Tools​

Related Docs​