Skip to content

Pennant results matrix — column glossary

Definitive definition of every column in results_matrix.csv (23 columns). Each entry gives type, plain-English meaning, units / format, when the cell is empty vs zero, and caveats worth knowing before reading or filling the column. Where a column references another artifact (registry, report), the entry points to it.

This glossary is the source of truth for column semantics. If the glossary and a row disagree, the glossary wins and the row is buggy — fix the row.


1. run_id

  • Type: string
  • Format: PEN-TEST-NNN or PEN-TEST-NNN<a-z> for sub-rows
  • Definition: Unique identifier for one test row. Sequential across all tests, reserved before the test runs. A single conceptual test that compares multiple cohorts or variants gets one base PEN-TEST-NNN plus suffix letters per sub-row (PEN-TEST-004a, 004b, … 004e). The suffix is lowercase.
  • Empty vs zero: never empty. Required field.
  • Caveats: Never reused. Skipping an ID is fine; reusing one breaks the audit trail. The matching test directory at tests/<date>_<run_id>/ may collapse multiple sub-rows into one directory (PEN-TEST-004a–e share tests/2026-05-11_PEN-TEST-004/); the suffix is for the matrix, not the filesystem.

2. run_date

  • Type: string
  • Format: ISO date YYYY-MM-DD
  • Definition: Date the test produced its final headline result. For multi-day tests, the completion date. For sub-rows produced by the same execution, all share the same date.
  • Empty vs zero: never empty. Required field.
  • Caveats: Not the date the row was added to the matrix. If a result is back-filled later, the original run date is used.

3. purpose

  • Type: string
  • Format: short phrase, sentence-case, no trailing period
  • Definition: Plain-English answer to "what were we trying to learn?" One row of this column should be intelligible without reading the report.
  • Empty vs zero: never empty. Required field.
  • Caveats: Keep terse; full context lives in the report. Avoid result claims here ("V2 is best") — those go in notes or key_metric once measured.

4. period

  • Type: string
  • Format: YYYY-YYYY (e.g. 2007-2026)
  • Definition: Calendar window the test covers. For detection scans, the date range of events emitted; for backtests, the date range of equity-curve simulation.
  • Empty vs zero: never empty.
  • Caveats: Year-resolution only. The actual scan/backtest date range may be a partial year on either end (2026 data through May 8 only). Report contains the precise range.

5. detection_id

  • Type: string
  • Format: PEN-DET-<label> (lowercase short label, no embedded date)
  • Definition: ID of the detection-parameter set that produced the event cohort consumed by this row. References an entry in strategies/Pennant/registry.md Detection- variants table. The corresponding cohort parquets live at cohorts/DET-<UPPER>-<scan-date>/.
  • Empty vs zero: never empty for any test (everything consumes some detection variant; even baseline counts).
  • Caveats: Detection IDs are parameter-set IDs, not cohort IDs. If the same parameter set is re-scanned (different scan date), the detection ID stays the same; the cohort directory gets a new scan date in its directory name. The registry shows the latest cohort per detection ID.

6. strategy_id

  • Type: string
  • Format: PEN-<asset>-<NNN> (e.g. PEN-STOCK-001, PEN-OPT-001)
  • Definition: ID of the trading strategy variant simulated. Each has a locked spec at strategies/<id>.md registered in strategies/Pennant/registry.md.
  • Empty vs zero: empty for detection-only tests and population analyses (no strategy simulated). Required for backtests.
  • Caveats: A change to mechanics — different sizing, different exit thresholds, options overlay — requires a new strategy ID, not a parameter on the existing one. This is the lock-once-write rule.

7. precursor

  • Type: string
  • Format: one of none, rule1, rule2, rule1_or_rule2
  • Definition: Precursor filter applied at entry / event time. Rule 1 (Momentum) and Rule 2 (Breakout) are the surviving 5-y / 10-y / 20-y precursor profiles from the original Phase 1 pennant findings; documented in build_v1/reports/findings_report.md and the trading action plan. none means the unconditional population. rule1_or_rule2 means at least one of the two rules fires.
  • Empty vs zero: never empty; none is the explicit value.
  • Caveats: None of the Pennant-era tests (PEN-TEST-001..005) have used precursor filtering yet — every row is none. Future tests that apply Rule 1/2 will populate this column.

8. regime_filter

  • Type: string
  • Format: one of none, spy200+vix35, vix_vvix
  • Definition: Market-regime gate applied at entry time.
  • none — entries unconditional on market state.
  • spy200+vix35 — skip entries on days where SPY < SPY-200-SMA and VIX > 35 (the Phase 7 "circuit breaker" gate; 240 such days in the 2007–2026 calendar).
  • vix_vvix — placeholder for the VIX × VVIX joint gate explored in the VolGap call-only family; not yet used in any Pennant-line test.
  • Empty vs zero: never empty; none is the explicit value.
  • Caveats: Detection-only and population-analysis tests are none because the regime gate is a strategy concept, not a detector one. Backtests in PEN-TEST-004 use spy200+vix35.

9. trades

  • Type: int
  • Units: count (no thousands separators in CSV)
  • Definition: Polysemous — meaning depends on test type:
  • Backtest rows: number of trades taken (= entries that passed cash + regime gates). Not the number of detected events; not the number of cohort rows.
  • Detection-only rows: number of pennants detected. Equal to the row count of the cohort's events.parquet.
  • Population analyses: number of patterns analyzed (= cohort rows with usable forward outcomes). For PEN-TEST-005, this is 15,528 — 6 fewer than the 15,534 detected events, because 6 had no forward data.
  • Empty vs zero: never empty for retroactive rows; future rows could be empty if the test is a pure documentation exercise.
  • Caveats: Do not compare across test types without understanding the units. A backtest's 4,533 trades against a detection's 5,155 events is the cohort minus skips (no-cash, regime-gated), not a quality difference.

10. win_rate

  • Type: float
  • Units: percent (decimal value, e.g. 43.7 for 43.7 %)
  • Definition: % of completed trades that closed profitable (P&L > 0). Computed per the canonical backtest harness; the exact accounting (after-friction vs gross, partial fills as separate trades or aggregated) is whatever the report defines.
  • Empty vs zero: empty for non-backtests (no trades to win or lose). Zero would mean the strategy ran 1+ trades and every one lost — a measured outcome, distinct from "doesn't apply".
  • Caveats: Breakout / continuation strategies typically have win rates in the 40–55 % range — this is structural, not a defect. The PEN-STOCK-001 scaled-exit takes a half-exit at +15 % and trails the runner, so the per-trade outcome is asymmetric. Win-rate alone is not a quality metric for this class of strategy; pair it with profit_factor or sharpe.

11. cagr_pct

  • Type: float
  • Units: percent (annualized)
  • Definition: Compound annual growth rate of equity over the period. Computed as (final_equity / starting_capital) ^ (1 / years) − 1.
  • Empty vs zero: empty for non-backtests. Zero means a measured CAGR of 0 % (strategy ran but ended at starting capital).
  • Caveats: Sensitive to choice of starting capital and to whether dividends / cash drift are included. The Pennant-line backtests use $10K starting capital and ignore SPY drift on idle cash. Compare CAGRs only across backtests with matching capital + cash conventions.

12. total_return_pct

  • Type: float
  • Units: percent (cumulative, not annualized)
  • Definition: (final_equity − starting_capital) / starting_capital × 100. The headline "what did $10K become" number.
  • Empty vs zero: empty for non-backtests. Zero means the strategy returned exactly the starting capital.
  • Caveats: Like cagr_pct, depends on starting capital and cash drift. Useful side-by-side with CAGR for sanity check: (1 + total/100) ^ (1/years) should equal 1 + cagr/100.

13. final_equity

  • Type: float
  • Units: dollars (no $ sign, no comma, raw number)
  • Definition: Equity at the end of the period. The dollar amount $10K turned into.
  • Empty vs zero: empty for non-backtests. Zero would mean a wipeout (PEN-STOCK-001 can't wipe out — the worst case is a drift down with no leverage).
  • Caveats: All Pennant-line backtests start at $10,000. Future variants might start elsewhere; if so, document in notes.

14. max_dd_pct

  • Type: float (negative)
  • Units: percent
  • Definition: Largest peak-to-trough drawdown in equity over the period, expressed as a negative percent of the high-water mark at the peak.
  • Empty vs zero: empty for non-backtests. Zero would mean the equity curve was monotonically non-decreasing — possible but unlikely over a 19-year window.
  • Caveats: Drawdown is path-dependent. Two strategies with identical CAGR can have very different max-DD. Pair with CAGR to compute the MAR ratio (cagr_pct / abs(max_dd_pct)) — values above 0.5 are good, above 1.0 are exceptional, below 0.2 mean the equity curve has poor sequence properties.

15. sharpe

  • Type: float
  • Units: dimensionless ratio (annualized)
  • Definition: Annualized Sharpe ratio of daily equity returns. Risk-free rate assumed zero (the convention in the Pennant backtest harness — both the strategy and the benchmark would see the same risk-free, so it cancels in apples-to-apples comparisons).
  • Empty vs zero: empty for non-backtests. Zero would mean measured mean return = 0 with positive variance.
  • Caveats: Sharpe penalizes upside variance equally with downside variance — a strategy with frequent +30 % winners scores worse on Sharpe than a strategy with steady +5 % winners. For breakout strategies prefer the Sortino ratio (when reported in key_metric) or the MAR ratio.

16. profit_factor

  • Type: float
  • Units: dimensionless ratio
  • Definition: Sum of profits divided by absolute sum of losses across all closed trades. Values > 1.0 mean profitable; intuition: a PF of 1.5 means $1.50 won per $1.00 lost.
  • Empty vs zero: empty for non-backtests. Zero would mean no winning trades — implausible but theoretically possible.
  • Caveats: Not reported by the PEN-TEST-004 harness — that test's headline summary tracks Sharpe and max-DD instead. Left empty in all current rows; future backtests should populate.

17. mean_mfe_pct

  • Type: float
  • Units: percent
  • Definition: Mean of the forward 30-trading-day Maximum Favorable Excursion across the cohort. MFE = highest forward close vs anchor close, in percent. Available for detection- only and population tests; not meaningful for backtests (the strategy may exit before MFE is reached).
  • Empty vs zero: empty for backtest rows. Zero would mean no event in the cohort ever reached a positive forward close — not observed.
  • Caveats: Heavily skewed by penny-stock pops in the right tail (max observed: +1,795 %). Use the median (in the report) for a more robust central value; the mean reflects the tail.

18. mean_mae_pct

  • Type: float (negative)
  • Units: percent
  • Definition: Mean of the forward 30-trading-day Maximum Adverse Excursion. MAE = lowest forward close vs anchor close, in percent (so usually negative).
  • Empty vs zero: empty for backtest rows. Zero would mean no event ever closed below anchor in the next 30 days — not observed.
  • Caveats: Like mean_mfe_pct, has a left tail (min observed: -98.6 %). The median is more robust. Note that an event with positive MAE means the stock never closed below anchor — it happens (~25 % of huge-winner cluster patterns).

19. hit_rate_15pct_mfe

  • Type: float
  • Units: percent
  • Definition: % of events in the cohort whose MFE reached ≥ +15 % at any point in the forward 30-trading-day window. The "did the move happen at all?" metric, independent of whether it stuck.
  • Empty vs zero: empty for backtest rows. Zero would mean not a single event in the cohort hit +15 %.
  • Caveats: The 15 % threshold matches the PEN-STOCK-001 Leg 1 exit. Other reports may quote +5 %, +10 %, +20 %, etc.; only the 15 % column lives in the matrix. The companion give-back metric (median 68 % — see PEN-TEST-005's key_metric) tells you that hit-rate ≠ retained gain.

20. key_metric

  • Type: string
  • Format: semicolon-separated name: value pairs, free-form per test
  • Definition: Flexible field for whatever's most distinctive about the test that the standard columns don't capture. Examples:
  • Population analyses: huge_winner_share_cluster2: 2.8%; give_back_median: 68%
  • Detection variants: per_pattern_expectancy: +4.9% vs baseline
  • Future Rule-1/Rule-2 tests: rule1_population_size: 4521; rule1_lift_vs_baseline: +X pp
  • Empty vs zero: empty when the standard columns capture everything important.
  • Caveats: Not machine-parsed. If a value here becomes important enough to compare across tests, it earns a dedicated column.
  • Type: string
  • Format: relative path from Pennant/ root
  • Definition: Path to the canonical detailed report markdown. Example: reports/pennant_strategy_backtest_2026-05-11.md.
  • Empty vs zero: never empty for completed tests.
  • Caveats: Always relative to Pennant/, never absolute, never with a leading slash. The same report can be referenced by multiple rows (PEN-TEST-004a–e all link to reports/pennant_strategy_backtest_2026-05-11.md).

22. charts

  • Type: string
  • Format: semicolon-separated relative paths from Pennant/ root (no spaces around the ;)
  • Definition: Charts associated with this row. In the rendered HTML / markdown publishing layer these become numbered hyperlinks ([1], [2], …). One row can carry many charts (PEN-TEST-005 has 8). All paths must point to files that exist under charts/.
  • Empty vs zero: empty if the test produced no chart (Phase 11a/a-2/a-3 detection-only A/B reports are pure text). Never the literal string none.
  • Caveats: Validator does not currently check that the files exist on disk. If you mistype a path, the reference is silently broken. Use the same path the report uses (relative to Pennant/).

23. notes

  • Type: string
  • Format: free-form, may contain commas (CSV-quoted on write)
  • Definition: Anything else worth recording that doesn't fit the structured columns. Constraints / caveats specific to this row; reasons a number looks weird; references to related tests; failure-mode commentary.
  • Empty vs zero: empty when nothing else needs saying.
  • Caveats: Not machine-parsed. Keep prose terse; longer context belongs in the report. Avoid duplicating information already in key_metric or purpose.

Conventions across all columns

  • Empty cells indicate "doesn't apply" or "not measured", not "missing data" and not "zero". The publishing layer renders empty cells as blank, not as n/a.
  • All percentages are decimal values, not fractions (43.7 means 43.7 %, not 4,370 %).
  • All dollar values are raw numbers (40398, not $40,398 or 40,398.00).
  • All paths are relative to Pennant/, no leading slash, forward slashes only.
  • All IDs are case-sensitive: PEN-TEST-001apen-test-001a.

Updating the schema

If a new column is needed: edit SCHEMA in infra/update_matrix.py, migrate existing rows by reading the CSV into Python, adding the new key with appropriate defaults, writing back; then add a glossary entry here. The _validate() function in update_matrix.py rejects rows with extra or missing keys, so the CSV and schema must stay in lockstep.