Testing and Fixtures
Cruvero uses a layered testing approach built around deterministic replay. Unit tests use the agenttest harness to run agent workflows with mocked LLM decisions and tool responses against Temporal's test environment. Integration tests validate database stores, API routes, and cross-package interactions. Golden log comparison catches regressions by diffing recorded decision logs against replayed runs.
This document is for developers adding features, debugging test failures, or extending test coverage.
Prerequisites: Go 1.25+, running Postgres (for integration tests), Temporal dev server (optional, for end-to-end tests).
Unit Testing
The internal/agenttest package provides a Temporal test harness that registers all agent activities and lets you control LLM decisions and tool results.
Suite Setup
Create a Suite with a testing.T and configure the decider and tool mocks:
func TestMyFeature(t *testing.T) {
s := agenttest.Suite{T: t}
s.MockLLM(agenttest.StepDecider(map[int]agent.Decision{
0: {Action: "tool", ToolName: "http_get", ToolArgs: json.RawMessage(`{"url":"https://example.com"}`)},
1: {Action: "halt", HaltReason: "done"},
}))
s.MockTool("http_get", func(args json.RawMessage) (json.RawMessage, error) {
return json.RawMessage(`{"status":200}`), nil
})
result := s.Run(agent.RunConfig{Prompt: "Fetch example.com"})
// assert on result
}
DeterministicJSON
DeterministicJSON creates an LLM decider from a fixture map keyed by input hash. The hash is computed from the prompt, step index, and agent state, so the same inputs always produce the same decision:
fixtures, _ := agenttest.LoadFixtures("fixtures/decision_logs/run.json")
s.MockLLM(agenttest.DeterministicJSON(fixtures))
MockTool
MockTool registers a function that intercepts tool calls by name. Mocks take priority over the real tool executor:
s.MockTool("bash_exec", func(args json.RawMessage) (json.RawMessage, error) {
return json.RawMessage(`{"stdout":"ok","exit_code":0}`), nil
})
ScenarioRunner
ScenarioRunner provides step-by-step tool expectations for ordered test scenarios. Each step declares the expected tool name, optional expected arguments, and the mock result:
runner := agenttest.NewScenarioRunner([]agenttest.ScenarioStep{
{ExpectedTool: "http_get", MockResult: json.RawMessage(`{"status":200}`)},
{ExpectedTool: "bash_exec", MockResult: json.RawMessage(`{"stdout":"done"}`)},
})
s.UseScenario(runner)
After the run, call runner.Verify(t) to assert all steps were consumed.
ChaosExecutor
ChaosExecutor wraps a tool executor with fault injection for resilience testing. Configure failure rates, timeout injection, and contradiction rates:
s.UseChaos(agenttest.ChaosConfig{
FailureRate: 0.2,
TimeoutRate: 0.1,
ContradictionRate: 0.05,
MaxFailures: 5,
})
Golden Decision Logs
Golden logs record an agent execution trace as a JSON fixture. Subsequent test runs replay the same inputs and compare the decision log against the golden file.
- Convention: fixtures stored in
fixtures/decision_logs/*.json - Update goldens: set
UPDATE_GOLDEN=1and run the test, or callagenttest.UpdateGoldenIfEnvin your test - Comparison:
agenttest.ValidateFixtureCoveragechecks that all fixture keys were exercised
Fixture Recording
Record a live run's decision log as a fixture file:
go run ./cmd/record-fixtures --workflow-id <id> --out fixtures/decisions/run.json
Output is keyed by input_hash, so each unique LLM input maps to its recorded decision.
Replay Comparison
Compare an original decision log with a replayed run to detect behavioral drift:
go run ./cmd/replay-compare --workflow-id <id> --prompt "..."
This re-runs the workflow with the same prompt and registry version, then diffs the resulting decision logs step by step.
Approval Flow Tests
TestApprovalFlow demonstrates signaling approvals in the test environment. The Temporal test harness supports SignalWorkflow to simulate human approval signals within unit tests.
Integration Tests
Postgres Integration
Tests that require a real database use the CRUVERO_POSTGRES_TEST_URL (or CRUVERO_POSTGRES_URL) environment variable. Run them with the integration build tag:
# Immune system store tests
go test -tags integration ./internal/agent -run TestPostgresImmunityStore -count=1
# Composite tool registry tests
go test -tags integration ./internal/tools -run TestPostgresCompositeRegistryExecution -count=1
Security Sandbox Host Integration Tests
Requires local runtime binaries (runsc, nsjail) and explicit opt-in:
CRUVERO_RUN_HOST_SANDBOX_TESTS=true go test -tags 'security integration' ./internal/security -run Host
Covers host execution compatibility for runsc and nsjail sandbox modes.
API Route Tests
API routes use humatest for handler-level testing and httptest for HTTP-level integration. Tests create a test server with mock dependencies and verify request/response contracts.
Supervisor Trust Tests
Unit tests covering trust score computation, agent selection, and delegation chain tracking:
go test ./internal/supervisor -run TestComputeTrustScore -count=1
go test ./internal/supervisor -run TestGetBestAgent -count=1
go test ./internal/supervisor -run TestDelegationChain -count=1
Coverage
The project targets 80% minimum test coverage on all packages. Check coverage with:
go test -coverprofile=coverage.out ./...
go tool cover -func=coverage.out
Per-package coverage is enforced in CI. Use -covermode=atomic with -race for accurate race-safe coverage.
Running Tests
# Run all unit tests
go test ./...
# Run a specific test by name
go test ./internal/agent -run TestMyFeature -count=1
# Run with race detector
go test -race ./...
# Verbose output
go test -v ./internal/agent/...
# Skip test caching
go test -count=1 ./...