Skip to content

Glossary

One-line definitions for the vocabulary used throughout the harness, the docs, and the agent prompts. Terms are grouped by topic; within a group, they are listed alphabetically. The operator-facing groups come first; the Harness internals section at the end covers vocabulary you only need when extending the harness or reading raw state.

Audit lifecycle

Audit run. One invocation of bin/audit. May contain many iterations and many agents.

Iteration. One outer pass of the audit loop. Each iteration builds work cards, assigns roles, launches agents, and waits for them to exit.

Session. Operator-facing concept. A continuous stretch of audit iterations against the same target and backend, normally rooted in the same output/<target>/<backend>/ tree.

Cold start. An iteration where no agent has a state file yet — typically the first iteration of a fresh target. Cold start also runs the recon survey before the deep agents launch.

Recon. A short breadth-first review pass over the in-scope source set. Runs at the start of an audit (or via bin/audit-recon on its own), surveys the code for candidate bugs with one model, gates them with an independent second model, and seeds the work queue with prioritized candidates. Cached on the target source SHA. See Recon discovery.

Recon-hypothesis card. A work card derived from a recon finding (kind: "recon-hypothesis" in work-cards.jsonl). High-confidence cards are tried before the regular queue; validator-rejected cards are kept at lower priority rather than deleted.

Validator. The second-opinion model run on every recon emission. Reads the same source independently, votes Promote / Reject / Uncertain, and ranks what reaches the audit's work queue. The validator vote is a triage signal, not proof.

Resume. An iteration where the agent reads structured state to continue prior hypotheses. Recon is skipped on resume — the cache from cold start is re-used.

Compaction. The backend's automatic shortening of the conversation when it nears the context limit. The harness emits a checkpoint warning before compaction so the agent can save findings to its state file.

Session seed. A small set of PRIOR SESSION SEED ranges (files + line windows) the agent already covered. The prompt tells the agent not to re-read those ranges after compaction.

Strategies

Strategy (S1–S8). A named recipe an agent follows: how to pick a hypothesis, find an input, mutate it, and decide what the result means. See Strategy model.

REF. Shared grep recipes used alongside any strategy. Not itself a strategy.

Rotation. Switching an agent's current strategy after sustained dry effort. Effort-gated, not iteration-gated.

Guard chain. A repeating upstream error string ("Error: regexp too big", NS_ERROR_…) that blocks a run of testcases in one subsystem.

GUARD-… card. A synthetic work card the harness appends when a guard chain saturates, asking the agent for a path past the guard.

Probe and execution

Probe (bin/probe). The only execution gate for testcases. Reads headers, picks the right runner, coverage-gates, runs the sanitizer, and records state/runs.jsonl.

Coverage gate. A pre-run on a sancov-instrumented build that confirms the testcase reaches the named target code before spending a sanitizer-run budget.

HIT / MISSED / CLEAN. Probe verdicts.

  • MISSED — the testcase did not reach the target code.
  • HIT — it did.
  • CLEAN — it ran without sanitizer output.

Confirm run. A 5-times re-run of a candidate crash (bin/probe --confirm) before promotion, to filter flaky single-run results.

Harness (testcase HARNESS: header). A sibling source file (harness.c, harness.cc, harness.cpp, harness.cxx, harness.C, or a language-specific runner) that bin/probe compiles and links against the target library to exercise an API.

Scratch dir (scratch-N/). In-progress testcase work for agent N. Anything here is provisional until probe confirms it.

Artifacts

Crash (crashes/CRASH-*). A sanitizer-confirmed reproducer with a saved trace, an input, and a report. Promotion requires a memory-safety or explicit boundary violation reachable through the target's documented input boundary.

Finding (findings/FIND-*). A concrete security issue with a written report naming file:function:line, an issue class, and a reviewer-actionable rationale. May or may not have a reproducer.

Rejected crash (crashes-rejected/). A candidate that failed triage, indexed with a reason so future sessions do not refile it.

Cluster file (CRASH-CLUSTERS.html, FINDING-CLUSTERS.html). A browser-readable summary grouping reports that share a root cause. Per-backend at the result tree; cross-backend at the target root. The .md siblings are the generated markdown source.

Export bundle. The maintainer-facing form of a crash, produced by bin/export-repro: REPORT.md, reproduce.sh, input.<ext>, optional harness.*, sanitizer.txt. See Reproduce a crash.

Configuration

target.toml. Per-target generated config: source metadata, sanitizer binaries, build system, threat model. Lives at output/<target>/target.toml. See Target config reference.

Attacker controls. [threat_model].attacker_controls — the tokens describing what an external caller can legitimately control. Valid tokens are bytes, call-sequence, timing, race, env, protocol-state, and fs-state. A crash whose trigger source falls outside this set stays in crashes/ but is flagged with a contract concern, which sets CVSS MAT:P (Modified Attack Requirements: present) in the CVSS-BTE score — threat-model fit is a scoring question, not a filing one.

Findings-only mode. [sanitizer].enabled = []. Typical for interpreted / managed-runtime targets (Python, Ruby, Node, Java, PHP) but valid for any project without an ASan build. Runtime diagnostics are filed under findings/, not crashes/.

.session-env. Dynamic per-run paths and identifiers (RESULTS_DIR, TARGET_ROOT, TARGET_REV, TARGET_SLUG, LOGDIR, SESSION_STARTED) written by bin/audit at startup into output/<target>/<backend>/results/.session-env. bin/probe discovers it by walking up from the testcase path, so no env vars need to be exported by hand.

Backends

Backend. The LLM CLI driving the agent loop — claude, codex, gemini, or oss (local Ollama via Codex).

Ensemble mode. --backend all (or omitted) — cycles installed hosted backends across iterations, writing per-backend result trees.

Cyber-access program. Provider-side trusted-access registration (OpenAI's Trusted Access for Cyber, Anthropic's Cyber Verification Program) that reduces false-positive policy interruptions during authorised defensive research.

Harness internals

Vocabulary for the queue, structured state, and cost machinery. You only need these when extending the harness or reading raw state files directly.

Work-card pipeline

Work card. A single unit of audit work — one source file × strategy, or one prior fix, or a synthetic GUARD-… task. Lives in work-cards.jsonl / patch-cards.jsonl.

Patch card. A prior-fix work card (strategy S1), built by bin/patch-cards from the target's VCS history.

Companion card. A second work card for the same file with a different strategy, emitted when a file fires more than one code-feature signal.

Ranker (bin/rank-work). The deterministic scorer that walks the source tree, applies CODE_PATTERNS, structural and coverage signals, and emits the ranked queue.

Claim. The lease an agent takes on a card when it adopts the card into a hypothesis. Recorded in claims.jsonl. Expires after 30 minutes by default.

Work surface. The dedupe key the work queue uses to prevent duplicate claims. Surface-card cards key on file:function (file-scoped cards use file:S<n> where n is the strategy number). A separate diversity gate prevents two agents from sharing a subsystem at the same time.

Subsystem. The first one to five path components of a source file (parser/xml, crypto/aes, …). Used for blocklisting, ownership, and guard-saturation tracking. Depth is auto-picked at startup.

Dry streak. Consecutive iterations on the same strategy with no confirmed result. Tracked per agent in .agent_strategy_streak_<n>.

State

state/*.jsonl. Structured append-only records: hypotheses.jsonl, claims.jsonl, runs.jsonl, notes.jsonl.

Hypothesis. A narrow, falsifiable claim about a specific file:function:line with a named input shape, guard gap, and expected diagnostic. The unit of agent work.

Status. A hypothesis lifecycle marker: PENDING, INVESTIGATING, NEEDS_TESTCASE, NEEDS_DEEPER_PROBE, ENV-BLOCKED, DISCARDED, CRASH-XXX, FIND-XXX.

Cost levers

See Cost model for full context.

Prompt cache. Cached cross-agent prompt fragments built once per iteration into .static-prompt-rules.md.

Capping wrappers. bin/rg-safe, bin/peek, bin/show-patch, bin/scratch-search — enforce per-call byte / line ceilings so agents cannot dump raw source into context.

ASan budget. Per-agent per-iteration sanitizer-run budget (25 for browser agents, 60 for shell agents). Coverage-gate runs do not count against it.

Tried-inputs / hits log. tried-inputs-N.log and hits-N.log — per-agent memory of testcase shapes and reached coverage edges, so the next session does not repeat them.