PEN-TEST-002 — Phase 11a-2 Pennant criteria A/B (V2)¶
| Field | Value |
|---|---|
| Test ID | PEN-TEST-002 |
| Date | 2026-05-11 |
| Strategy | (detection-only) |
| Cohort produced | DET-V2-2026-05-11 |
| Cohort consumed (for baseline comparison) | DET-BASELINE-2026-05-11 |
| Status | complete |
Purpose¶
Re-run of the Phase 11a A/B test against a less-aggressive variant (V2: pennant 7–17, flagpole 1–5). Baseline events and outcomes from PEN-TEST-001 are reused unchanged; only V2 is freshly scanned.
Method¶
run_v2.py is a small driver over the same harness machinery as
PEN-TEST-001 — mutate config, call detector, write parquet,
compute outcomes inline. Override mechanism, outcome formula, and
universe are identical to PEN-TEST-001.
Headline¶
V2 keeps 39 % of baseline volume (6,108 events) with the +15 % MFE hit-rate +0.4 pp over baseline. V2 is the sweet spot in the duration × flagpole-tightness sweep — neither the loosest (Baseline) nor the strictest (V4 in PEN-TEST-003) variant matches it on per-event quality.
Files in this directory¶
run_v2.py— harness driveranalyze_v2.py— statistics emitterrun_v2.log,run_v2.stdout.log— run logssummary_v2.json— headline JSONreport.md→../../reports/Pennant/pennant_criteria_ab_test_v2_2026-05-11.md
Cohort outputs¶
Pennant/cohorts/DET-V2-2026-05-11/{events,outcomes}.parquet
Related findings¶
- F-002 — V2 keeps 39 % of baseline volume with +0.4 pp hit-rate; the sweet spot in the sweep.