Testing and Fixtures

Cruvero uses a layered testing approach built around deterministic replay. Unit tests use the agenttest harness to run agent workflows with mocked LLM decisions and tool responses against Temporal's test environment. Integration tests validate database stores, API routes, and cross-package interactions. Golden log comparison catches regressions by diffing recorded decision logs against replayed runs.

This document is for developers adding features, debugging test failures, or extending test coverage.

Prerequisites: Go 1.25+, running Postgres (for integration tests), Temporal dev server (optional, for end-to-end tests).

Unit Testing

The internal/agenttest package provides a Temporal test harness that registers all agent activities and lets you control LLM decisions and tool results.

Suite Setup

Create a Suite with a testing.T and configure the decider and tool mocks:

func TestMyFeature(t *testing.T) {
    s := agenttest.Suite{T: t}
    s.MockLLM(agenttest.StepDecider(map[int]agent.Decision{
        0: {Action: "tool", ToolName: "http_get", ToolArgs: json.RawMessage(`{"url":"https://example.com"}`)},
        1: {Action: "halt", HaltReason: "done"},
    }))
    s.MockTool("http_get", func(args json.RawMessage) (json.RawMessage, error) {
        return json.RawMessage(`{"status":200}`), nil
    })
    result := s.Run(agent.RunConfig{Prompt: "Fetch example.com"})
    // assert on result
}

DeterministicJSON

DeterministicJSON creates an LLM decider from a fixture map keyed by input hash. The hash is computed from the prompt, step index, and agent state, so the same inputs always produce the same decision:

fixtures, _ := agenttest.LoadFixtures("fixtures/decision_logs/run.json")
s.MockLLM(agenttest.DeterministicJSON(fixtures))

MockTool

MockTool registers a function that intercepts tool calls by name. Mocks take priority over the real tool executor:

s.MockTool("bash_exec", func(args json.RawMessage) (json.RawMessage, error) {
    return json.RawMessage(`{"stdout":"ok","exit_code":0}`), nil
})

ScenarioRunner

ScenarioRunner provides step-by-step tool expectations for ordered test scenarios. Each step declares the expected tool name, optional expected arguments, and the mock result:

runner := agenttest.NewScenarioRunner([]agenttest.ScenarioStep{
    {ExpectedTool: "http_get", MockResult: json.RawMessage(`{"status":200}`)},
    {ExpectedTool: "bash_exec", MockResult: json.RawMessage(`{"stdout":"done"}`)},
})
s.UseScenario(runner)

After the run, call runner.Verify(t) to assert all steps were consumed.

ChaosExecutor

ChaosExecutor wraps a tool executor with fault injection for resilience testing. Configure failure rates, timeout injection, and contradiction rates:

s.UseChaos(agenttest.ChaosConfig{
    FailureRate:       0.2,
    TimeoutRate:       0.1,
    ContradictionRate: 0.05,
    MaxFailures:       5,
})

Golden Decision Logs

Golden logs record an agent execution trace as a JSON fixture. Subsequent test runs replay the same inputs and compare the decision log against the golden file.

Convention: fixtures stored in fixtures/decision_logs/*.json
Update goldens: set UPDATE_GOLDEN=1 and run the test, or call agenttest.UpdateGoldenIfEnv in your test
Comparison: agenttest.ValidateFixtureCoverage checks that all fixture keys were exercised

Fixture Recording

Record a live run's decision log as a fixture file:

go run ./cmd/record-fixtures --workflow-id <id> --out fixtures/decisions/run.json

Output is keyed by input_hash, so each unique LLM input maps to its recorded decision.

Replay Comparison

Compare an original decision log with a replayed run to detect behavioral drift:

go run ./cmd/replay-compare --workflow-id <id> --prompt "..."

This re-runs the workflow with the same prompt and registry version, then diffs the resulting decision logs step by step.

Approval Flow Tests

TestApprovalFlow demonstrates signaling approvals in the test environment. The Temporal test harness supports SignalWorkflow to simulate human approval signals within unit tests.

Integration Tests

Postgres Integration

Tests that require a real database use the CRUVERO_POSTGRES_TEST_URL (or CRUVERO_POSTGRES_URL) environment variable. Run them with the integration build tag:

# Immune system store tests
go test -tags integration ./internal/agent -run TestPostgresImmunityStore -count=1

# Composite tool registry tests
go test -tags integration ./internal/tools -run TestPostgresCompositeRegistryExecution -count=1

Security Sandbox Host Integration Tests

Requires local runtime binaries (runsc, nsjail) and explicit opt-in:

CRUVERO_RUN_HOST_SANDBOX_TESTS=true go test -tags 'security integration' ./internal/security -run Host

Covers host execution compatibility for runsc and nsjail sandbox modes.

API Route Tests

API routes use humatest for handler-level testing and httptest for HTTP-level integration. Tests create a test server with mock dependencies and verify request/response contracts.

Supervisor Trust Tests

Unit tests covering trust score computation, agent selection, and delegation chain tracking:

go test ./internal/supervisor -run TestComputeTrustScore -count=1
go test ./internal/supervisor -run TestGetBestAgent -count=1
go test ./internal/supervisor -run TestDelegationChain -count=1

Coverage

The project targets 80% minimum test coverage on all packages. Check coverage with:

go test -coverprofile=coverage.out ./...
go tool cover -func=coverage.out

Per-package coverage is enforced in CI. Use -covermode=atomic with -race for accurate race-safe coverage.

Running Tests

# Run all unit tests
go test ./...

# Run a specific test by name
go test ./internal/agent -run TestMyFeature -count=1

# Run with race detector
go test -race ./...

# Verbose output
go test -v ./internal/agent/...

# Skip test caching
go test -count=1 ./...

Unit Testing​

Suite Setup​

DeterministicJSON​

MockTool​

ScenarioRunner​

ChaosExecutor​

Golden Decision Logs​

Fixture Recording​

Replay Comparison​

Approval Flow Tests​

Integration Tests​

Postgres Integration​

Security Sandbox Host Integration Tests​

API Route Tests​

Supervisor Trust Tests​

Coverage​

Running Tests​

Related Docs​