PEN-TEST-003 — Phase 11a-3 Pennant criteria A/B (V3 + V4)¶
| Field | Value |
|---|---|
| Test ID | PEN-TEST-003 |
| Date | 2026-05-11 |
| Strategy | (detection-only) |
| Cohorts produced | DET-V3-2026-05-11, DET-V4-2026-05-11 |
| Cohort consumed | DET-BASELINE-2026-05-11 (for comparison) |
| Status | complete |
Purpose¶
Continuation of PEN-TEST-001 / -002, pushing
flagpole.max_duration_bars tighter (5 → 3 → 2). V3 keeps
pennant 6–17 / flagpole 1–3; V4 tightens further to flagpole 1–2.
Method¶
run_v3_v4.py runs the same harness for V3 and V4 in sequence,
each producing its own events + outcomes parquet.
analyze_v3_v4.py also computes V3∩V4 overlap (which events
appear in both cohorts) for diagnostic purposes.
Headline¶
V3 (flagpole 1–3): 5,200 events — clean cut. V4 (flagpole 1–2): 4,101 events — too aggressive. The V4 cohort's 30-day endpoint mean is below baseline despite tighter selection. The practical floor is V3; V4 trims too aggressively to deliver a quality lift.
Files in this directory¶
run_v3_v4.py— harness driver for both V3 and V4analyze_v3_v4.py— statistics + V3∩V4 overlap analysisrun_v3_v4.log,run_v3_v4.stdout.log— run logssummary_v3.json,summary_v4.json— headline JSONssummary_v3_v4_overlap.json— overlap diagnosticsreport.md→../../reports/Pennant/pennant_criteria_ab_test_v3_v4_2026-05-11.md
Cohort outputs¶
Pennant/cohorts/DET-V3-2026-05-11/{events,outcomes}.parquetPennant/cohorts/DET-V4-2026-05-11/{events,outcomes}.parquet
Related findings¶
- F-003 — V4 trims too aggressively; V3 is
the practical floor for
flagpole.max_duration_bars.