Triage Results¶
Triage keeps the output of a run useful to a maintainer. The rule is simple:
- Record every concrete security issue.
- Be strict about what stays in
crashes/.
A reproducible crash is the strongest kind of prioritisation evidence
there is. A non-crashing issue, or a security issue without a
sanitizer reproducer, belongs in findings/ with a clear report.
The four kinds of result¶
| Type | Directory | Use when |
|---|---|---|
| Crash | crashes/CRASH-NNN-N/ |
A testcase produces sanitizer evidence or an accepted security-boundary violation. |
| Finding | findings/FIND-NNN/ |
Any concrete security issue — see below. |
| Rejected crash | crashes-rejected/ |
A crash candidate is low value, out of scope, or incomplete. |
| Needs-review crash | crashes-needs-review/ |
A borderline LLM-confirm rejection is requeued for another pass before hard rejection. |
findings/ accepts any concrete security class — memory safety,
logic, auth, injection, info disclosure, crypto, race, boundary
violation, and so on. A sanitizer reproducer is not required.
crashes-rejected/ is still part of the workflow — it stops the
harness from refiling the same low-value crash. Findings have a
separate substance gate that drops one of two markers in the FIND
directory while the report waits for content or a second review:
.needs-content— the FIND directory has noreport.mdordescription.mdyet..pending-drop— the LLM substance gate has rejected the report once. Cleared on the next accept; on a second reject the FIND is moved tofindings-rejected/rather than deleted.
The .needs-content marker surfaces as a NEEDS CONTENT value in the
Status column of findings/FINDING-CLUSTERS.html; .pending-drop is
an internal triage marker and is not shown in that column. Either
address the underlying issue
(add the report, sharpen the rationale) and rerun triage, or touch
.reviewed (or .keep) in the FIND directory to pin the current
state as-is.
How the gates decide¶
The LLM-backed crash gates (trace validity, report completeness,
legitimacy) are multi-vote and fail open. A single keep vote keeps
the crash; a rejection sticks only once independent negative votes
reach quorum (two by default, CRASH_GATE_QUORUM). A crash that
collects one negative vote without reaching quorum is parked in
crashes-needs-review/ and requeued for another pass instead of
being rejected outright.
Findings face a parallel mechanism: the substance gate needs two
rejects before a FIND moves to findings-rejected/ (the
.pending-drop marker records the first), and findings promoted
without sanitizer evidence need two independent Promote votes from
the validator.
Common rejection reasons¶
Most operators arrive on this page because something landed in
crashes-rejected/. Start here.
| Reason | Why it is rejected |
|---|---|
| Null dereference only | Usually no memory-safety impact unless a boundary violation is also shown. |
| OOM | Resource exhaustion alone is not the evidence this harness promotes. |
| Assertion-only abort | Debug assertion failures need a security boundary or sanitizer diagnostic. |
| Timeout-only behaviour | A hang needs a stronger impact story and reproduction discipline. |
| Harness-only misuse | The testcase violates a contract no real caller can violate. |
| Missing files | No testcase, no sanitizer output, or an incomplete report. |
Not a rejection — kept but downgraded. A reproducing sanitizer crash whose
Trigger source falls outside the target's attacker_controls (for example a
call-sequence, env, or race trigger on a bytes-only target) is not
moved to crashes-rejected/. Triage keeps it in crashes/, adds a
## Contract concern block, and the scorer sets CVSS MAT:P (Modified
Attack Requirements: present) so it ranks below in-scope crashes in the
CVSS-BTE score. It still counts as an accepted crash. Agents file the
reproducible crash; triage scores the threat-model fit.
Before filing a similar crash, check:
For findings, scan <RESULTS_DIR>/findings/FINDING-CLUSTERS.html and
look at the Status column (OK, NEEDS CONTENT, or NEEDS
ATTENTION). NEEDS CONTENT means no report.md yet. A separate
.pending-drop marker (not shown in this column) means the LLM
substance gate rejected the report once; a second reject moves it to
findings-rejected/.
What a strong crash looks like¶
A strong crash artifact has:
- a runnable testcase;
- saved sanitizer output;
- a confirmation run when applicable;
- a report explaining the boundary and caller controls;
- a trigger source that fits
target.toml; - a root cause in target code, not in harness-only misuse.
The crash path is intentionally stricter than the finding path.
crashes/ should help maintainers prioritise issues they can rerun
quickly. If the underlying concern is real but the crash evidence is
weak, keep the report as a FIND instead of forcing it through crash
triage.
Sanitizer classes that typically belong in crashes/:
- out-of-range read or write;
- container-overflow;
- stack-buffer-overflow;
- heap-buffer-overflow;
- heap-use-after-free;
- alloc-dealloc-mismatch;
- similar memory-safety diagnostics.
Crash report fields¶
Every crash report is written in two formats side by side:
REPORT.md— the source of truth.REPORT.html— auto-generated sibling, easiest to read in a browser (same content with the field table aligned, severity badge, and external links resolved).
Open either. Hand-edit REPORT.md only.
Crash reports include the human explanation and a ## Fields table.
The rows are also emitted as bare-label lines. Triage reads the
bare-label form, and REPORT.html renders the table. The fields:
Surface: network|library-api|cli|maint-tool|unknown
Trigger source: bytes|call-sequence|timing|race|protocol-state|env|fs-state
Caller contract: obeyed|violated|unspecified
Boundary:
Caller controls:
Trusted caller actions:
Parameter control: direct|mapped|harness-only|none
Notes:
Surfacedescribes where the crash is reachable from. Agents write a short label, optionally followed by prose (library-api — C harness calls app_read_memory); export normalises it to one of the tokens above, andbin/reachabilityclassifies it (it may promotelibrary→library_popularorcli→cli_productionbased on external caller counts) before computing the advisory severity. An unsetSurfacedefaults tounknownand under-scores real findings, so always set it.Trigger sourceis compared againstattacker_controlsto set severity, not to decide filing: a trigger fully withinattacker_controlsscores as security; one with a component outside it stays incrashes/but is flagged with a contract concern (CVSS MAT:P), lowering the CVSS-BTE score. It is not moved out ofcrashes/on this basis.Parameter controlis especially important for C harnesses. It tells triage whether a value is externally controlled or only invented by the harness.
A typical exported REPORT.md looks like this. bin/export-repro
emits the ## Fields table first, then the bare-label lines triage
parses, then the auto-Severity bullet, then the agent's narrative,
then ## Expected sanitizer output, then the reproduce pointer.
(The project, symbols, and line numbers below are invented for
illustration.)
# CRASH-001-1: heap-buffer-overflow READ in app_next_char
## Fields
| Field | Value |
|:----------------------|:------|
| Primitive | heap-buffer-overflow READ of size 4 |
| Severity | Medium (CVSS-BTE 4.0: 5.5) |
| Surface | library-api (C harness calls app_read_memory) |
| Trigger source | bytes |
| Caller contract | obeyed |
| Boundary | Untrusted document bytes parsed by the library. |
| Caller controls | Document contents and length. |
| Parameter control | direct |
| Trusted caller actions| none |
| Cluster | C1 |
| Dedup frames | app_next_char → app_parse_char_ref → app_parse_reference |
| Reproduction rate | 5/5 |
| Strategy | S7 |
Surface: library-api
Trigger source: bytes
Caller contract: obeyed
Dedup frames: app_next_char → app_parse_char_ref → app_parse_reference
Boundary: Untrusted document bytes parsed by the library.
Caller controls: Document contents and length.
Parameter control: direct
Strategy: S7
- **Severity**: Medium (CVSS-BTE 4.0: 5.5 Medium; CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:N/VA:L/SC:N/SI:N/SA:N/E:P/CR:M/IR:M/AR:M; primitive=heap READ of 4 byte(s); surface=library)
Out-of-bounds 4-byte read in `app_next_char` reached from
`app_parse_char_ref` while consuming a malformed numeric character
reference. The length check on the entity buffer occurs after the
read, not before. Reachable from any caller passing attacker-supplied
documents through `app_read_memory` / `app_read_file`.
## Expected sanitizer output
```
==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address …
#0 0x… in app_next_char parser.c:225
#1 0x… in app_parse_char_ref parser.c:2403
#2 0x… in app_parse_reference parser.c:7611
…
```
Full original output: `sanitizer.txt`.
## Reproduce
Run `./reproduce.sh /path/to/clean/sampleproj`.
Notes on the fields:
Clusteris filled in bybin/cluster-crashesafter triage; agents leave it blank or use the generated marker.Advisory: yesis added (aboveSurface) when the fix requires an ABI/API-impacting change — see theFix Directionsection in the agent's narrative.Dedup framesis the top-3 ClusterFuzz-style frame chain used for duplicate detection.- The auto-Severity bullet (
- **Severity**: …) is rewritten bybin/reachabilityon every triage pass; hand-edits there are lost. - Severity is the CVSS v4.0 score — one industry-standard
metric, computed by the vendored FIRST reference scorer. The bullet
and the Fields-table
Severityrow carry the level plus the score (Medium (CVSS-BTE 4.0: 5.5)); the generated## Severity rationalesection shows the full vector and how each metric was derived from the report's classification and Fields. - The CVSS vector is derived mechanically: AV/UI from the surface tier; VC/VI/VA/SC/SI/SA from the primitive class; E from reproducer/exploit evidence; CR/IR/AR from local reach and usage; MAT from caller-control or contract concerns. PR:N is the worst-case default because the harness has no auth signal. Review those assumptions against the real deployment before filing an advisory — each derivation line is a reviewable claim.
- Non-shipping code (test/maintenance/internal harness) is represented with Environmental modified impact metrics rather than a custom cap.
- Cluster size has no CVSS v4.0 metric. It is reported as a verification fact and used for triage priority separately.
Finding requirements¶
Findings live under:
What every FIND needs:
- A report file at the FIND root:
report.mdordescription.md. - A concrete location —
file:function:line, an endpoint, a config key, …. - The security issue class — memory safety, auth bypass, injection, info disclosure, crypto, race, boundary violation, logic flaw, ….
- A rationale a reviewer can act on (impact, caller control, what is wrong).
A typical non-crashing FIND report.md looks like:
# FIND-007
## Fields
| Field | Value |
|:---------------|:-------------------------------------------------------|
| Class | logic / authorization bypass |
| Severity | Medium (CVSS-BTE 4.0: 4.5) |
| Surface | library-api |
| Location | src/policy.c:check_acl:142 |
| Caller control | request bytes |
| Cluster | F3 |
Class: authorization bypass
Surface: library-api
Caller controls: request bytes
## Issue
`check_acl` short-circuits to ALLOW when `policy_count == 0`, which is the
default for an uninitialised handle. A caller that obtains a handle without
calling `policy_load()` first will pass any ACL check.
## Impact
Untrusted clients can reach privileged operations before policy is loaded.
## How to verify
Trace `check_acl` callers; no testcase needed.
Findings do not need a runnable reproducer — the report is the evidence. If you do have a testcase or saved sanitizer output, include it alongside.
report.html is generated automatically as a sibling of report.md
and is usually the easiest way to read a finding — open it in a
browser. Do not hand-write it.
Optional but encouraged:
affected-files.txt;- a testcase;
- captured sanitizer output;
- screenshots;
- any other supporting artifact.
What does not belong:
- vague suspicion;
- "looks suspicious" without a nameable location;
- provably unreachable code.
Triage asks an LLM to filter out vacuous reports. Substantive findings without a reproducer are kept.
When the LLM substance gate rejects a FIND once, the directory stays
put with .pending-drop. A later accept clears that marker. If the
reject quorum is reached, the directory moves to findings-rejected/
so QA can audit false rejects and recover anything worth keeping. Add
.reviewed or .keep when a human has confirmed the report is
intentionally terse.
Exported crash bundle¶
Accepted crashes are converted into:
CRASH-001-1/
REPORT.md
REPORT.html
reproduce.sh
input.<ext>
harness.{c,cc,cpp,cxx} # when applicable
sanitizer.txt
patch.diff # optional: candidate fix that passes `git apply --check`
reachability.json # optional: caller search + advisory severity
.audit/
The audit-side originals move under .audit/ for provenance. The
root files are the maintainer-facing interface. As above, hand-edit
REPORT.md only — REPORT.html is regenerated on every triage pass.
Run a bundle¶
A maintainer can reproduce a bundle against their own clean checkout:
reproduce.sh is self-contained. It compiles or launches the right
ASan runner against the input next to it, prints the running
command, and exits with the ASan exit code. The last line is always
[repro] exit=<n>. For a crashing bundle, a non-zero exit
accompanied by ASan output in stderr is a successful reproduction.
The script needs:
- a clean source checkout at the revision recorded in
REPORT.md(or a close-enough revision — small drift is usually fine); - a working
clang/clang++with ASan support onPATH; - whatever build-system dependencies the target needs (CMake, Mercurial, …). The script may rebuild the ASan target on first run.
For browser targets, reproduce.sh launches the configured ASan
browser binary against a file:// URL pointing at the bundled
input.html. For C harness bundles, it compiles harness.c against
the target's static library before invoking it against
input.<ext>. Header-only C++ libraries omit the static-library
link automatically (see
Target config reference).
Clusters and duplicates¶
Crashes and findings are clustered after triage. A maintainer looking at the result tree sees families rather than a wall of independent entries.
crashes/
CRASH-CLUSTERS.md
CRASH-CLUSTERS.html
CRASH-001-1/REPORT.md ← Cluster: C1
CRASH-001-1/REPORT.html
CRASH-001-2/REPORT.md ← Cluster: C1, .dup-of -> CRASH-001-1
CRASH-002-1/REPORT.md ← Cluster: C2
findings/
FINDING-CLUSTERS.md
FINDING-CLUSTERS.html
FIND-001/report.md ← Cluster: F1, Dedup key: [llm] auth/check-acl-zero-policy
FIND-001/report.html
FIND-007/report.md ← Cluster: F1, .dup-of -> FIND-001
How the cluster files and markers work:
CRASH-CLUSTERS.htmlandFINDING-CLUSTERS.htmlare the browser review pages — one row per cluster, with severity, member count, and the canonical member. The.mdsiblings are generated source files.- Each report has a
Cluster: <ID>line. For FINDs, a second lineDedup key: [<source>] <token>records the signature that grouped it.[loc]is the deterministic location key, rendered asfile:function.[llm]is the LLM-chosen canonical token shared across reports of the same root cause filed from different surface sites. ([title]is the fallback key derived from the report title.) - Non-canonical members carry a
.dup-offile naming the canonical member. They are not deleted — a duplicate may still carry a useful variant or a clearer reproducer. Treat the canonical member as the primary report.
Review a cluster top-down:
- Open
CRASH-CLUSTERS.htmlorFINDING-CLUSTERS.html. - Follow the canonical member.
- Skim the
.dup-ofsiblings only if you need additional reproducers or variant inputs.
Use the backend-local HTML tables when reviewing one run:
output/<target>/<backend>/results/crashes/CRASH-CLUSTERS.html
output/<target>/<backend>/results/findings/FINDING-CLUSTERS.html
Use the target-root HTML tables when comparing every backend for a target:
Maintenance commands¶
These are maintenance commands for regenerating or enriching artifacts after manual edits. They are not needed for normal review; read the generated HTML pages first.
export RESULTS=output/<target>/<backend>/results
bin/cluster-crashes "$RESULTS"
bin/cluster-findings "$RESULTS"
bin/cluster-crashes output/<target>
bin/cluster-findings output/<target>
bin/show-exclusions "$RESULTS"
bin/export-repro CRASH-001-1 --slug <target>
bin/reachability --report "$RESULTS/crashes/CRASH-001-1/" --severity-only
bin/reachability --batch "$RESULTS"
What each command does:
bin/export-repro— builds the handoff bundle.bin/reachability --severity-only— updates severity metadata without public code-search queries. Run without that flag only when external caller context is intended.bin/cluster-crashesandbin/cluster-findings— group reports that share a root cause, write the cluster summaries, and stampCluster:andDedup key:lines into each report. Both run automatically during triage. Rerun them by hand after manually editing reports or moving artifacts.
Triage mindset¶
Promote evidence, not ideas.
- A crash should be easy for a maintainer to rerun, understand, and map back to a real input boundary.
- A finding should be concrete enough that a reviewer knows what code or behaviour to inspect.
- Everything else belongs in state, scratch, or rejected indexes until it becomes real evidence.
Ready to hand a crash to the upstream project? See Reproduce a crash — it is written for the maintainer receiving the bundle, and it is the page to send along with one.