Skip to content

PEN-TEST-002 — Phase 11a-2 Pennant criteria A/B (V2)

Field Value
Test ID PEN-TEST-002
Date 2026-05-11
Strategy (detection-only)
Cohort produced DET-V2-2026-05-11
Cohort consumed (for baseline comparison) DET-BASELINE-2026-05-11
Status complete

Purpose

Re-run of the Phase 11a A/B test against a less-aggressive variant (V2: pennant 7–17, flagpole 1–5). Baseline events and outcomes from PEN-TEST-001 are reused unchanged; only V2 is freshly scanned.

Method

run_v2.py is a small driver over the same harness machinery as PEN-TEST-001 — mutate config, call detector, write parquet, compute outcomes inline. Override mechanism, outcome formula, and universe are identical to PEN-TEST-001.

Headline

V2 keeps 39 % of baseline volume (6,108 events) with the +15 % MFE hit-rate +0.4 pp over baseline. V2 is the sweet spot in the duration × flagpole-tightness sweep — neither the loosest (Baseline) nor the strictest (V4 in PEN-TEST-003) variant matches it on per-event quality.

Files in this directory

  • run_v2.py — harness driver
  • analyze_v2.py — statistics emitter
  • run_v2.log, run_v2.stdout.log — run logs
  • summary_v2.json — headline JSON
  • report.md../../reports/Pennant/pennant_criteria_ab_test_v2_2026-05-11.md

Cohort outputs

  • Pennant/cohorts/DET-V2-2026-05-11/{events,outcomes}.parquet
  • F-002 — V2 keeps 39 % of baseline volume with +0.4 pp hit-rate; the sweet spot in the sweep.