Pennant results matrix — column glossary¶
Definitive definition of every column in
results_matrix.csv (23 columns). Each entry
gives type, plain-English meaning, units / format, when the cell is
empty vs zero, and caveats worth knowing before reading or filling
the column. Where a column references another artifact (registry,
report), the entry points to it.
This glossary is the source of truth for column semantics. If the glossary and a row disagree, the glossary wins and the row is buggy — fix the row.
1. run_id¶
- Type: string
- Format:
PEN-TEST-NNNorPEN-TEST-NNN<a-z>for sub-rows - Definition: Unique identifier for one test row. Sequential
across all tests, reserved before the test runs. A single
conceptual test that compares multiple cohorts or variants gets
one base
PEN-TEST-NNNplus suffix letters per sub-row (PEN-TEST-004a,004b, …004e). The suffix is lowercase. - Empty vs zero: never empty. Required field.
- Caveats: Never reused. Skipping an ID is fine; reusing one
breaks the audit trail. The matching test directory at
tests/<date>_<run_id>/may collapse multiple sub-rows into one directory (PEN-TEST-004a–e sharetests/2026-05-11_PEN-TEST-004/); the suffix is for the matrix, not the filesystem.
2. run_date¶
- Type: string
- Format: ISO date
YYYY-MM-DD - Definition: Date the test produced its final headline result. For multi-day tests, the completion date. For sub-rows produced by the same execution, all share the same date.
- Empty vs zero: never empty. Required field.
- Caveats: Not the date the row was added to the matrix. If a result is back-filled later, the original run date is used.
3. purpose¶
- Type: string
- Format: short phrase, sentence-case, no trailing period
- Definition: Plain-English answer to "what were we trying to learn?" One row of this column should be intelligible without reading the report.
- Empty vs zero: never empty. Required field.
- Caveats: Keep terse; full context lives in the report. Avoid
result claims here ("V2 is best") — those go in
notesorkey_metriconce measured.
4. period¶
- Type: string
- Format:
YYYY-YYYY(e.g.2007-2026) - Definition: Calendar window the test covers. For detection scans, the date range of events emitted; for backtests, the date range of equity-curve simulation.
- Empty vs zero: never empty.
- Caveats: Year-resolution only. The actual scan/backtest date range may be a partial year on either end (2026 data through May 8 only). Report contains the precise range.
5. detection_id¶
- Type: string
- Format:
PEN-DET-<label>(lowercase short label, no embedded date) - Definition: ID of the detection-parameter set that produced
the event cohort consumed by this row. References an entry in
strategies/Pennant/registry.mdDetection- variants table. The corresponding cohort parquets live atcohorts/DET-<UPPER>-<scan-date>/. - Empty vs zero: never empty for any test (everything consumes some detection variant; even baseline counts).
- Caveats: Detection IDs are parameter-set IDs, not cohort IDs. If the same parameter set is re-scanned (different scan date), the detection ID stays the same; the cohort directory gets a new scan date in its directory name. The registry shows the latest cohort per detection ID.
6. strategy_id¶
- Type: string
- Format:
PEN-<asset>-<NNN>(e.g.PEN-STOCK-001,PEN-OPT-001) - Definition: ID of the trading strategy variant simulated. Each
has a locked spec at
strategies/<id>.mdregistered instrategies/Pennant/registry.md. - Empty vs zero: empty for detection-only tests and population analyses (no strategy simulated). Required for backtests.
- Caveats: A change to mechanics — different sizing, different exit thresholds, options overlay — requires a new strategy ID, not a parameter on the existing one. This is the lock-once-write rule.
7. precursor¶
- Type: string
- Format: one of
none,rule1,rule2,rule1_or_rule2 - Definition: Precursor filter applied at entry / event time.
Rule 1 (Momentum) and Rule 2 (Breakout) are the surviving 5-y /
10-y / 20-y precursor profiles from the original Phase 1 pennant
findings; documented in
build_v1/reports/findings_report.mdand the trading action plan.nonemeans the unconditional population.rule1_or_rule2means at least one of the two rules fires. - Empty vs zero: never empty;
noneis the explicit value. - Caveats: None of the Pennant-era tests (PEN-TEST-001..005)
have used precursor filtering yet — every row is
none. Future tests that apply Rule 1/2 will populate this column.
8. regime_filter¶
- Type: string
- Format: one of
none,spy200+vix35,vix_vvix - Definition: Market-regime gate applied at entry time.
none— entries unconditional on market state.spy200+vix35— skip entries on days where SPY < SPY-200-SMA and VIX > 35 (the Phase 7 "circuit breaker" gate; 240 such days in the 2007–2026 calendar).vix_vvix— placeholder for the VIX × VVIX joint gate explored in the VolGap call-only family; not yet used in any Pennant-line test.- Empty vs zero: never empty;
noneis the explicit value. - Caveats: Detection-only and population-analysis tests are
nonebecause the regime gate is a strategy concept, not a detector one. Backtests in PEN-TEST-004 usespy200+vix35.
9. trades¶
- Type: int
- Units: count (no thousands separators in CSV)
- Definition: Polysemous — meaning depends on test type:
- Backtest rows: number of trades taken (= entries that passed cash + regime gates). Not the number of detected events; not the number of cohort rows.
- Detection-only rows: number of pennants detected. Equal
to the row count of the cohort's
events.parquet. - Population analyses: number of patterns analyzed (= cohort rows with usable forward outcomes). For PEN-TEST-005, this is 15,528 — 6 fewer than the 15,534 detected events, because 6 had no forward data.
- Empty vs zero: never empty for retroactive rows; future rows could be empty if the test is a pure documentation exercise.
- Caveats: Do not compare across test types without understanding the units. A backtest's 4,533 trades against a detection's 5,155 events is the cohort minus skips (no-cash, regime-gated), not a quality difference.
10. win_rate¶
- Type: float
- Units: percent (decimal value, e.g.
43.7for 43.7 %) - Definition: % of completed trades that closed profitable (P&L > 0). Computed per the canonical backtest harness; the exact accounting (after-friction vs gross, partial fills as separate trades or aggregated) is whatever the report defines.
- Empty vs zero: empty for non-backtests (no trades to win or lose). Zero would mean the strategy ran 1+ trades and every one lost — a measured outcome, distinct from "doesn't apply".
- Caveats: Breakout / continuation strategies typically have
win rates in the 40–55 % range — this is structural, not
a defect. The PEN-STOCK-001 scaled-exit takes a half-exit at
+15 % and trails the runner, so the per-trade outcome is
asymmetric. Win-rate alone is not a quality metric for this
class of strategy; pair it with
profit_factororsharpe.
11. cagr_pct¶
- Type: float
- Units: percent (annualized)
- Definition: Compound annual growth rate of equity over the
period. Computed as(final_equity / starting_capital) ^ (1 / years) − 1. - Empty vs zero: empty for non-backtests. Zero means a measured CAGR of 0 % (strategy ran but ended at starting capital).
- Caveats: Sensitive to choice of starting capital and to whether dividends / cash drift are included. The Pennant-line backtests use $10K starting capital and ignore SPY drift on idle cash. Compare CAGRs only across backtests with matching capital + cash conventions.
12. total_return_pct¶
- Type: float
- Units: percent (cumulative, not annualized)
- Definition:
(final_equity − starting_capital) / starting_capital × 100. The headline "what did $10K become" number. - Empty vs zero: empty for non-backtests. Zero means the strategy returned exactly the starting capital.
- Caveats: Like
cagr_pct, depends on starting capital and cash drift. Useful side-by-side with CAGR for sanity check:(1 + total/100) ^ (1/years)should equal1 + cagr/100.
13. final_equity¶
- Type: float
- Units: dollars (no
$sign, no comma, raw number) - Definition: Equity at the end of the
period. The dollar amount $10K turned into. - Empty vs zero: empty for non-backtests. Zero would mean a wipeout (PEN-STOCK-001 can't wipe out — the worst case is a drift down with no leverage).
- Caveats: All Pennant-line backtests start at $10,000. Future
variants might start elsewhere; if so, document in
notes.
14. max_dd_pct¶
- Type: float (negative)
- Units: percent
- Definition: Largest peak-to-trough drawdown in equity over
the
period, expressed as a negative percent of the high-water mark at the peak. - Empty vs zero: empty for non-backtests. Zero would mean the equity curve was monotonically non-decreasing — possible but unlikely over a 19-year window.
- Caveats: Drawdown is path-dependent. Two strategies with
identical CAGR can have very different max-DD. Pair with CAGR
to compute the MAR ratio (
cagr_pct / abs(max_dd_pct)) — values above 0.5 are good, above 1.0 are exceptional, below 0.2 mean the equity curve has poor sequence properties.
15. sharpe¶
- Type: float
- Units: dimensionless ratio (annualized)
- Definition: Annualized Sharpe ratio of daily equity returns. Risk-free rate assumed zero (the convention in the Pennant backtest harness — both the strategy and the benchmark would see the same risk-free, so it cancels in apples-to-apples comparisons).
- Empty vs zero: empty for non-backtests. Zero would mean measured mean return = 0 with positive variance.
- Caveats: Sharpe penalizes upside variance equally with
downside variance — a strategy with frequent +30 % winners
scores worse on Sharpe than a strategy with steady +5 %
winners. For breakout strategies prefer the Sortino ratio
(when reported in
key_metric) or the MAR ratio.
16. profit_factor¶
- Type: float
- Units: dimensionless ratio
- Definition: Sum of profits divided by absolute sum of losses across all closed trades. Values > 1.0 mean profitable; intuition: a PF of 1.5 means $1.50 won per $1.00 lost.
- Empty vs zero: empty for non-backtests. Zero would mean no winning trades — implausible but theoretically possible.
- Caveats: Not reported by the PEN-TEST-004 harness — that test's headline summary tracks Sharpe and max-DD instead. Left empty in all current rows; future backtests should populate.
17. mean_mfe_pct¶
- Type: float
- Units: percent
- Definition: Mean of the forward 30-trading-day Maximum Favorable Excursion across the cohort. MFE = highest forward close vs anchor close, in percent. Available for detection- only and population tests; not meaningful for backtests (the strategy may exit before MFE is reached).
- Empty vs zero: empty for backtest rows. Zero would mean no event in the cohort ever reached a positive forward close — not observed.
- Caveats: Heavily skewed by penny-stock pops in the right tail (max observed: +1,795 %). Use the median (in the report) for a more robust central value; the mean reflects the tail.
18. mean_mae_pct¶
- Type: float (negative)
- Units: percent
- Definition: Mean of the forward 30-trading-day Maximum Adverse Excursion. MAE = lowest forward close vs anchor close, in percent (so usually negative).
- Empty vs zero: empty for backtest rows. Zero would mean no event ever closed below anchor in the next 30 days — not observed.
- Caveats: Like
mean_mfe_pct, has a left tail (min observed: -98.6 %). The median is more robust. Note that an event with positive MAE means the stock never closed below anchor — it happens (~25 % of huge-winner cluster patterns).
19. hit_rate_15pct_mfe¶
- Type: float
- Units: percent
- Definition: % of events in the cohort whose MFE reached ≥ +15 % at any point in the forward 30-trading-day window. The "did the move happen at all?" metric, independent of whether it stuck.
- Empty vs zero: empty for backtest rows. Zero would mean not a single event in the cohort hit +15 %.
- Caveats: The 15 % threshold matches the PEN-STOCK-001 Leg 1
exit. Other reports may quote +5 %, +10 %, +20 %, etc.; only
the 15 % column lives in the matrix. The companion give-back
metric (median 68 % — see PEN-TEST-005's
key_metric) tells you that hit-rate ≠ retained gain.
20. key_metric¶
- Type: string
- Format: semicolon-separated
name: valuepairs, free-form per test - Definition: Flexible field for whatever's most distinctive about the test that the standard columns don't capture. Examples:
- Population analyses:
huge_winner_share_cluster2: 2.8%; give_back_median: 68% - Detection variants:
per_pattern_expectancy: +4.9% vs baseline - Future Rule-1/Rule-2 tests:
rule1_population_size: 4521; rule1_lift_vs_baseline: +X pp - Empty vs zero: empty when the standard columns capture everything important.
- Caveats: Not machine-parsed. If a value here becomes important enough to compare across tests, it earns a dedicated column.
21. report_link¶
- Type: string
- Format: relative path from
Pennant/root - Definition: Path to the canonical detailed report markdown.
Example:
reports/pennant_strategy_backtest_2026-05-11.md. - Empty vs zero: never empty for completed tests.
- Caveats: Always relative to
Pennant/, never absolute, never with a leading slash. The same report can be referenced by multiple rows (PEN-TEST-004a–e all link toreports/pennant_strategy_backtest_2026-05-11.md).
22. charts¶
- Type: string
- Format: semicolon-separated relative paths from
Pennant/root (no spaces around the;) - Definition: Charts associated with this row. In the
rendered HTML / markdown publishing layer these become
numbered hyperlinks (
[1],[2], …). One row can carry many charts (PEN-TEST-005 has 8). All paths must point to files that exist undercharts/. - Empty vs zero: empty if the test produced no chart (Phase
11a/a-2/a-3 detection-only A/B reports are pure text). Never
the literal string
none. - Caveats: Validator does not currently check that the
files exist on disk. If you mistype a path, the reference is
silently broken. Use the same path the report uses (relative
to
Pennant/).
23. notes¶
- Type: string
- Format: free-form, may contain commas (CSV-quoted on write)
- Definition: Anything else worth recording that doesn't fit the structured columns. Constraints / caveats specific to this row; reasons a number looks weird; references to related tests; failure-mode commentary.
- Empty vs zero: empty when nothing else needs saying.
- Caveats: Not machine-parsed. Keep prose terse; longer
context belongs in the report. Avoid duplicating information
already in
key_metricorpurpose.
Conventions across all columns¶
- Empty cells indicate "doesn't apply" or "not measured", not
"missing data" and not "zero". The publishing layer renders
empty cells as blank, not as
n/a. - All percentages are decimal values, not fractions (
43.7means 43.7 %, not 4,370 %). - All dollar values are raw numbers (
40398, not$40,398or40,398.00). - All paths are relative to
Pennant/, no leading slash, forward slashes only. - All IDs are case-sensitive:
PEN-TEST-001a≠pen-test-001a.
Updating the schema¶
If a new column is needed: edit SCHEMA in
infra/update_matrix.py, migrate existing rows by reading the CSV
into Python, adding the new key with appropriate defaults, writing
back; then add a glossary entry here. The _validate() function in
update_matrix.py rejects rows with extra or missing keys, so the
CSV and schema must stay in lockstep.