PEN-TEST-005 — Pennant population behavior study (Phase 12 Stage 2a)¶

Field	Value
Test ID	PEN-TEST-005
Date	2026-05-12
Strategy	(population study — no strategy simulation)
Cohort consumed	DET-BASELINE-2026-05-11
Status	complete

Purpose¶

Population-distribution analysis of the 15,528 baseline pennants in the DET-BASELINE-2026-05-11 cohort. Two parallel categorizations (Part A: 9-bucket hypothesis-driven schema with catch-all; Part B: k=5 k-means on 5 features per pennant) plus shared sections on time-to-reversal, sector / regime breakdown, and headline distribution statistics. Goal was to understand what these patterns actually do after they form, before designing any strategy around them.

Method¶

01_pull_trajectories.py — pulls anchor + 30-trading-day forward closes for every event in the cohort from Postgres prices_daily, plus tickers sector and market_context regime data. Writes baseline_trajectories.parquet, tickers.parquet, market_context.parquet.
02_analyze.py — end-to-end driver. Computes Section 1 headline stats, applies Part A categorization (with a programmatic check for unmatched patterns), runs Part B k-means with elbow analysis, computes Section 3 give-back/recovery and median trajectories, computes Section 4 sector + regime breakdowns. Writes per-section parquets + JSONs + 8 charts (7 required + bonus elbow).

Headline¶

The 15,528 baseline pennants form a near-symmetric outcome distribution: 41 % reach +10 % MFE, 36 % reach –10 % MAE within 30 days. Median MFE +7.5 %, median MAE –6.6 %.

Give-back / recovery is the central finding for strategy design: the median winner gives back 68 % of its peak by day 30; the median loser recovers 88 % of its trough by day 30. Pennants are bursts that mean-revert, not sustained-trend continuation patterns.

Part A: 9 buckets (with catch-all for 13.6 % "modest movers" that fell through the original 8). Part B: 5 clusters split into modest winners (37 %), slow drifters (36 %), strong late winners (12 %), sharp early disasters (12 %), huge winners (2.8 %).

Files in this directory¶

01_pull_trajectories.py — Postgres → parquet trajectory puller
02_analyze.py — analysis driver
02_analyze.log — run log
baseline_trajectories.parquet — anchor + d1..d30 forward returns per event (4.8 MB, 15,528 rows × 35 cols)
df_with_categories.parquet — main analysis frame with both Part A and Part B labels attached
tickers.parquet, market_context.parquet — Postgres pulls retained for re-running
section1.json — headline stats
partA_a1.parquet, partA_a5.parquet — Part A summary
per-year tables
partB_centroids.parquet, partB_centroids_all.parquet, partB_elbow.parquet — Part B k-means outputs
section3_up.parquet, section3_dn.parquet, section3_medians.npz, section3_counts.json — give-back / recovery / median trajectories
section4_sector.parquet, section4_regime_split.parquet, section4_cross.parquet, section4_cross_pct.parquet — sector + regime tables
report.md → ../../reports/Pennant/pennant_population_behavior_2026-05-12.md
Charts (referenced by report, in ../../charts/Pennant/):
pennant_population_mfe_distribution.png
pennant_population_mae_distribution.png
pennant_population_days_to_peak.png
pennant_population_outcome_distribution_partA.png
pennant_population_outcome_distribution_partB.png
pennant_population_partB_elbow.png (bonus elbow diagnostic)
pennant_population_trajectory_typical_shapes.png
pennant_population_regime_split.png

F-005 — Near-symmetric outcome distribution at the population level.
F-006 — 68 % median give-back / 88 % median recovery; pennants are bursts, not trends.

Notable judgment calls (preserved from execution)¶

9th Part A bucket added after the original 8 left 2,109 patterns (13.6 %) unmatched. Approved by El Don over three alternatives.
k = 5 manually chosen for Part B after winsorization flattened the elbow. The strict-elbow rule would have picked k = 4; k = 5 preserves the modest / strong / huge winner tri-modal decomposition.

Reproducer¶

cd /home/kungfujones/Projects/Uriel/build_v1 && source .venv/bin/activate
cd /home/kungfujones/Projects/Uriel/Pennant
python tests/2026-05-12_PEN-TEST-005/01_pull_trajectories.py
python tests/2026-05-12_PEN-TEST-005/02_analyze.py

Both scripts have hard-coded absolute paths to the cohort parquets (originally Pennant/ab_test/, now Pennant/cohorts/DET-BASELINE-2026-05-11/). To re-run today the input paths need updating — same future-infra story as PEN-TEST-004.