# Player Prop Pipeline Audit

Audited at: 2026-04-02 00:00 ET-ish, using live repo state and live database state from the April 1, 2026 slate.

## Pipeline Map

1. Upstream event + lineup ingestion
   - Events come from `rm_events`.
   - Lineups come from `GameLineup`.
   - MLB lineups are synced by [sync-mlb-lineups.ts](/var/www/html/rainmaker/backend/src/scripts/sync-mlb-lineups.ts).
   - NBA lineups are scraped externally before forecast runs.
   - Canonical rosters and name resolution come from [canonical-names.ts](/var/www/html/rainmaker/backend/src/services/canonical-names.ts) and `/home/administrator/rainmaker_players.json`.

2. Candidate generation
   - Cross-sport candidates come from [team-prop-market-candidates.ts](/var/www/html/rainmaker/backend/src/services/team-prop-market-candidates.ts).
   - MLB candidates come from [mlb-prop-markets.ts](/var/www/html/rainmaker/backend/src/services/mlb-prop-markets.ts).

3. Precompute generation
   - Daily precompute runs through [weather-report.ts](/var/www/html/rainmaker/backend/src/workers/weather-report.ts).
   - Team bundles are generated first.
   - `PLAYER_PROP` rows are extracted from team bundles and stored into `rm_forecast_precomputed`.
   - The worker expects at least `5` player props per team and `10` per event.

4. Serve-time filtering + fallback
   - Event-card readiness is computed in [events.ts](/var/www/html/rainmaker/backend/src/routes/events.ts).
   - Forecast popup data is served from [forecast.ts](/var/www/html/rainmaker/backend/src/routes/forecast.ts).
   - When no visible stored props exist, the route seeds `featuredPlayers` from lineups and tries to attach source-backed fallback props.

5. Frontend consumption
   - Event badges render in [EventCard.tsx](/var/www/html/rainmaker/src/components/EventCard.tsx).
   - Popup per-player table already exists in [ForecastPopup.tsx](/var/www/html/rainmaker/src/components/ForecastPopup.tsx).
   - Clips are still wired to radar cards, not the per-player signal table.

## What The Audit Found

### 1. Markets exist. Inventory is not the main upstream problem.

Candidate audit run:

```bash
npm --prefix /var/www/html/rainmaker/backend run audit:player-prop-candidates -- --date 2026-04-01 --include-mlb --no-fail-on-findings
```

Result:

- 27 events
- 54 teams
- 0 zero-candidate teams
- 0 zero-candidate events
- Avg 23 candidates per team
- MLB: 151 two-way candidates, 258 one-way candidates
- NBA: 1 two-way candidate, 745 one-way candidates
- NHL: 0 two-way candidates, 87 one-way candidates

Conclusion:

- There is enough candidate volume to surface more players.
- The main failure is downstream of candidate discovery.
- NBA and NHL are heavily one-way. If the pipeline implicitly favors two-way quality too hard, it will underproduce despite plenty of candidate rows.

### 2. Persisted `PLAYER_PROP` output is far below the worker's own floor.

April 1 stored `PLAYER_PROP` counts:

- MLB: 15 events, only 1 met the worker minimum of 5 home + 5 away + 10 total.
- NBA: 9 events, 0 met the worker minimum.
- NHL: 3 events, 0 met the worker minimum.

Raw table summary:

- `PLAYER_PROP EXPIRED`: 173
- `PLAYER_PROP STALE`: 3
- `TEAM_PROPS EXPIRED`: 53
- `TEAM_PROPS STALE`: 1

Example hard failure:

- `mlb-cws-mia-20260401` had 2 `TEAM_PROPS` rows and 1 `GAME_MARKETS` row but 0 `PLAYER_PROP` rows.

Conclusion:

- Team bundle generation is landing.
- Player-prop extraction or persistence is still failing frequently after team props exist.

### 3. The lineup feed is polluted, and roster validation alone is not enough.

Example: `mlb-tex-bal-20260401`

- `GameLineup` for `BAL/TEX` included names like `Brandon Nimmo`, `Andrew McCutchen`, `Pete Alonso`, and `Taylor Ward`.
- Both `rotowire` and `mlb_statsapi` rows contained the same polluted names for that game.

Worse:

- Canonical roster data is also dirty.
- `TEX` currently includes `Brandon Nimmo`.
- `BAL` currently includes `Pete Alonso` and `Taylor Ward`.

Conclusion:

- Source preference alone will not fix MLB lineup integrity.
- Roster-only validation is too weak because the roster source is contaminated too.

### 4. The route currently trusts the latest lineup row too much.

[forecast.ts](/var/www/html/rainmaker/backend/src/routes/forecast.ts) fetches one `GameLineup` row ordered by latest `updatedAt`.

That means:

- whichever lineup source wrote last wins
- there is no source ranking
- there is no sanity scoring for impossible player/team combinations
- polluted lineup rows seed `featuredPlayers`

Conclusion:

- Radar quality is currently gated by `GameLineup` quality, not just prop quality.

### 5. Event-card and popup readiness can drift from true pipeline health.

[events.ts](/var/www/html/rainmaker/backend/src/routes/events.ts) derives card readiness from visible priced `PLAYER_PROP` rows after filtering.

That means:

- if extraction underproduces, the card looks empty even when candidates exist
- if rows exist but are filtered out, the badge downgrades to `empty_after_run`
- there is no first-class instrumentation telling us whether failure was:
  - no candidates
  - no extracted rows
  - unpriced rows
  - injury filtering
  - sibling suppression
  - dedup

Conclusion:

- The current API shape hides the real failure stage.

### 6. Operational repair flow is not healthy enough yet.

`rm_weather_report_runs` on April 1 showed:

- `DAILY_9AM_ET`: `SUCCESS`
- `MLB_1230_ET`: `SUCCESS`
- `MLB_1630_ET`: `SUCCESS`
- `MLB_1930_ET`: `SUCCESS`
- `PLAYER_PROP_FLOOR_REPAIR`: 2 rows stuck in `RUNNING`

Conclusion:

- There is already a repair concept in production.
- It is not recovering cleanly.
- If the repair path stalls, the player-prop floor never gets restored.

### 7. Frontend popup table is not the blocker anymore.

Current repo state:

- Per-player popup signal table is already implemented.
- The remaining UI gap is clips from the per-player table action area.
- Aggregate player analysis is still mostly the radar/card layer, not a richer player-summary system.

Conclusion:

- Do not treat “build the table UI” as the next core task.
- It is already in the repo.

## Ranked Todo List

### P0. Fix lineup integrity before using lineups as truth

Why:

- `featuredPlayers` are seeded from polluted lineups.
- MLB lineup contamination exists in both `rotowire` and `mlb_statsapi`.
- Canonical rosters are also contaminated, so simple roster filtering is not enough.

Work:

- Add source trust ranking for `GameLineup` reads instead of blind latest-row selection.
- Add per-player lineup validation against event teams using stronger evidence than roster alone:
  - candidate market presence for that event/team
  - opponent-aware game membership
  - probable starter / batting-order plausibility for MLB
- Add quarantine logic for lineup rows with too many impossible names.
- Add an audit job that compares lineup players against event-team candidate markets and flags suspicious rows.

Done criteria:

- Radar no longer shows obvious cross-game players for MLB.
- A dirty lineup row cannot fully poison `featuredPlayers`.

### P0. Fix extraction/persistence gap from `TEAM_PROPS` to `PLAYER_PROP`

Why:

- Candidate audit says markets exist.
- Team props exist for every event.
- Persisted player-prop counts are still far below the floor.

Work:

- Audit `generateTeamPropsForTeam()` output vs `storePrecomputed()` writes per side.
- Measure, per event:
  - candidates found
  - props returned by `generateTeamProps`
  - props filtered before team-bundle store
  - props skipped by `shouldPersistExtractedPlayerProp`
  - props inserted into `rm_forecast_precomputed`
- Add structured logging for those counts by event + team side.
- Find and fix the largest drop-off stage.

Done criteria:

- Stored `PLAYER_PROP` counts get close to available candidate volume where markets exist.
- `TEAM_PROPS` with zero extracted `PLAYER_PROP` becomes exceptional, not normal.

### P0. Sanitize `TEAM_PROPS` bundles and audit them separately from standalone rows

Why:

- The current live slate still has active `TEAM_PROPS` bundles with hollow props inside them.
- Standalone `PLAYER_PROP` rows can be clean while the forecast popup still renders junk from bundle payloads.
- Low-information unders are leaking through the bundle path and making the UI look broken even after standalone cleanup.

Work:

- Persist only renderable props into `TEAM_PROPS` bundle payloads.
- Filter old cache/precomputed bundle props on read so stale hollow entries do not surface publicly.
- Extend the slate audit to report bundle-level metadata gaps:
  - missing `signal_tier`
  - missing `forecast_direction`
  - missing `agreement_score`
  - missing `market_quality_label`
- Replay or invalidate pre-fix bundle rows after deploy so the live popup stops reading dirty payloads.

Done criteria:

- Forecast popup no longer shows hollow bundle props.
- Slate audit reports zero bundle-level metadata gaps for active regenerated rows.

### P0. Stop treating one-way candidate-heavy leagues like they have no usable inventory

Why:

- NBA had 745 one-way candidates and only 1 two-way candidate.
- NHL had 87 one-way candidates and 0 two-way candidates.
- Current output is still below floor.

Work:

- Explicitly define publishability rules by league for one-way candidates.
- Allow fallback/surfacing from verified one-way candidates where the current market source is good enough.
- Separate “high-confidence pick-grade” from “surfaceable radar-table row”.

Done criteria:

- NBA/NHL can surface broad player tables without requiring two-way market symmetry everywhere.

### P1. Add real stage-by-stage pipeline telemetry

Why:

- Right now the API only tells us the end state.
- That makes debugging slow and sloppy.

Work:

- Add per-event metrics for:
  - lineup source chosen
  - candidate counts
  - extracted player props
  - persisted player props
  - visible player props after serve filters
  - filtered-out counts by reason
- Expose an admin audit endpoint or log report.

Done criteria:

- One query or one audit report shows where an event failed in the pipeline.

### P1. Fix missing confidence/probability population for NBA and NCAAB props

Why:

- The April 2 slate audit found NBA confidence/probability mostly missing and NCAAB partially missing.
- Frontend prop rows render, but without usable signal quality data they look half-baked and make ranking less trustworthy.
- This is not a cosmetic bug. Missing probability means the generation or persistence path is skipping a core scoring field.

Work:

- Trace where `confidence_score`, `prob`, and related projected-probability fields are dropped for NBA/NCAAB.
- Compare league-specific generation output before persistence and after persistence.
- Add a hard audit that fails if a league publishes props without confidence above a tolerable missing-rate threshold.
- Ensure frontend contracts receive a normalized probability field consistently across legacy and per-player modes.

Done criteria:

- NBA and NCAAB props ship with populated probability/confidence at the same standard MLB already meets.
- Missing-confidence props become an exception that is visible in audits, not a normal slate condition.

### P1. Investigate severe directional skew and add publish-time guardrails

Why:

- The April 2 slate audit showed an extreme UNDER skew, especially in NBA/NHL/NCAAB.
- A one-sided board can be caused by candidate bias, scoring bias, or publishability rules that systematically suppress the other side.
- Frontend direction rendering is not the root cause here; the upstream slate itself looks lopsided.

Work:

- Audit direction distribution by league at each stage:
  - candidate market discovery
  - generated team bundles
  - extracted `PLAYER_PROP`
  - post-integrity published rows
- Check whether prompts, candidate fetchers, or signal-ranking heuristics implicitly favor unders.
- Add alerting for extreme same-day league skew instead of discovering it manually after publish.
- Add a minimum publishability threshold so obviously weak recommendations do not ship just because they have a direction.

Done criteria:

- We can identify exactly where over-side inventory is being lost.
- Slates with pathological direction skew trigger an audit warning before they reach the frontend.

### P1. Fix stale-prop recovery for events that lose all active player props

Why:

- The April 2 slate audit found a live MLB event with stale prop rows and no active replacement.
- That leaves the forecast page with game markets but zero surfaced player props, which is a real product hole.
- This is adjacent to repair-run orchestration, but it deserves a specific event-level recovery check.

Work:

- Add an audit for events with active game markets but zero active player props.
- Verify stale/expired replacement runs actually regenerate event-level props before lock.
- Add targeted checks for pitcher-driven MLB failures where all prop inventory disappears after stale marking.

Done criteria:

- Active events with usable markets do not sit on the frontend with zero player props because stale rows were never replaced.
- MLB stale-prop failures are visible by event and recover automatically or fail loudly.

### P1. Fix repair-run orchestration and stale-run recovery

Why:

- `PLAYER_PROP_FLOOR_REPAIR` runs are stuck in `RUNNING`.

Work:

- Identify where that schedule is launched.
- Ensure stale `RUNNING` floor-repair runs are auto-recovered or failed cleanly.
- Verify the repair path actually rehydrates missing player props instead of silently stalling.

Done criteria:

- No orphaned `RUNNING` floor-repair runs on the same date.
- Repair runs leave measurable inventory improvements.

### P2. Fix low-value or template-looking projections before they publish

Why:

- The April 2 slate audit found suspicious identical NBA 3PT projections across unrelated players and games.
- It also found props with recommendation directions that appear publishable despite very weak confidence.
- That smells like fallback/template values or insufficient per-player calibration, not real individualized output.

Work:

- Audit projection distributions by league/stat type and flag repeated default values that cluster unnaturally.
- Add league/stat-specific sanity checks for projection spread and probability coherence.
- Block publish of props that fail minimum confidence / probability / projection-consistency thresholds.
- Verify projection generation is using real player-specific inputs for NBA 3PT props instead of a fallback bucket.

Done criteria:

- Template-looking projection clusters are visible in audit output and blocked from publish when they indicate bad generation.
- Very low-confidence props do not make it to the board as actionable picks.

### P2. Upgrade radar seeding logic after lineup integrity is fixed

Why:

- Current fallback spread still caps breadth by side.
- Radar depends too much on lineups even when precomputed props are thin.

Work:

- Seed from validated lineup players first.
- Then backfill breadth from candidate-backed players when stored props are thin.
- Use event-side candidate coverage, not just lineup order, to reach the 10-player target.

Done criteria:

- “10 surfaced players per game where upstream markets exist” becomes true in practice, not just in tests.

### P2. Diversify NHL prop-market coverage beyond SOG unders

Why:

- The April 2 slate audit showed NHL inventory dominated by shots-on-goal unders.
- Even if technically valid, that board is too narrow and repetitive to be useful.
- This is a coverage problem, not just a UI problem.

Work:

- Audit which NHL market families are available upstream versus which ones survive publish.
- Expand candidate and publish support for goals, assists, points, saves, and blocked shots where source quality is acceptable.
- Add league-level diversity reporting so a single market family cannot silently dominate the board.

Done criteria:

- NHL slates show broader market-family representation when upstream inventory exists.
- Coverage audits can explain when the board narrows for real market reasons versus pipeline bias.

### P2. Hydrate `prop_highlights` reliably for game forecasts

Why:

- The April 2 slate audit found most games with empty `prop_highlights`.
- That leaves the main forecast experience without prop narratives even when event-level props exist elsewhere in the system.
- The contract already supports `prop_highlights`; the issue is keeping them populated and source-backed.

Work:

- Audit where `prop_highlights` are lost:
  - forecast generation
  - publish-time suppression
  - serve-time hydration fallback
- Ensure games with active, source-backed props get at least a minimal highlight set.
- Add event-level audit output for “forecast exists, props exist, highlights empty”.

Done criteria:

- Empty `prop_highlights` become explainable exceptions instead of the default state for most games.
- Game forecast pages consistently carry prop context when source-backed props are available.

### P2. Finish the frontend follow-through

Why:

- Popup table exists, but clips are still tied to radar cards.
- Aggregate player analysis is still shallow.

Work:

- Add clip actions to per-player table rows.
- Improve player summary blocks so the popup is not just a radar strip plus prop rows.

Done criteria:

- Table rows support clipping directly.
- Player analysis is richer than the current single-card summaries.

## Suggested Execution Order

1. Instrument the extraction/persistence drop-off and lineup-source choice.
2. Fix lineup integrity gating.
3. Fix extraction/persistence losses from team bundles into player props.
4. Relax or redesign one-way candidate handling by league.
5. Re-run coverage audit on a fresh pregame slate.
6. Only then finish clips and deeper aggregate-player UI.

## Bottom Line

The main blocker is not “we don’t have markets.”

The main blockers are:

- dirty lineup truth
- weak validation of lineup-seeded players
- underproduction of persisted `PLAYER_PROP` rows even when team bundles and candidates exist
- no good telemetry for where the rows disappear

## April 2 Follow-Up Todo

### Already fixed

- Poisoned `source_market_fallback` props no longer publish without exact PIFF support.
- Forecast routes now regenerate poisoned team bundles instead of trusting stale fallback cache.
- Public player-prop contracts now expose `marketLine` consistently alongside `marketLineValue`.
- Unlock patches now preserve normalized line + signal fields instead of stripping them back out on merge.

### P0. Deploy and backfill the shared standalone prop payload contract

Why:

- The repo now has a shared standalone `PLAYER_PROP` payload builder and metadata gate across the daily worker, route regen path, and repair path.
- The live April 2 audit still shows old bad rows:
  - NBA: 83.3% missing direction/agreement/market quality/display fields
  - NHL: 48-60% missing core metadata, 21.8% missing display fields
- Until the fixed code is deployed and the active slate is repaired, the frontend can still show hollow or mislabeled rows even though the write paths are fixed in repo.

Work:

- Deploy the shared persistence-contract fix across:
  - [weather-report.ts](/var/www/html/rainmaker/backend/src/workers/weather-report.ts)
  - [forecast.ts](/var/www/html/rainmaker/backend/src/routes/forecast.ts)
  - [mlb-team-props-repair.ts](/var/www/html/rainmaker/backend/src/workers/mlb-team-props-repair.ts)
- Backfill or stale active `PLAYER_PROP` rows missing any of:
  - `forecast_direction`
  - `agreement_score`
  - `market_quality_label`
  - `forecast`
  - `prop_type`
  - `sportsbook_display`
- Re-run the slate audit and require payload-gap counts to hit zero for active rows before calling the fix complete.

Done criteria:

- Active standalone props no longer ship with missing core metadata or blank display fields.
- The live slate audit shows `0` missing payload fields for active `PLAYER_PROP` rows.

### P0. Rebuild clean inventory for events that went empty after junk-row cleanup

Why:

- After deploying the shared payload contract and staling the old hollow rows, the live April 2 audit no longer shows payload gaps.
- That cleanup also exposed the true inventory hole:
  - `MIN @ DET`
  - `NOP @ POR`
  - `PHX @ CHA`
  - `LAL @ OKC`
  - `SAS @ LAC`
  - `CLE @ GSW`
  - `NSH @ LA`
  - `NYM @ SF`
  all have zero active player props.
- That is a better state than serving garbage, but it is still a product failure.

Work:

- Replay only the empty event/team sides under the fixed persistence contract.
- Measure, per empty event, whether replay produces:
  - no candidates
  - candidates but no publishable props
  - publishable props that fail metadata/pricing gates
- Keep the stale junk rows stale. Do not reactivate them just to make the board look full.
- Promote any event that remains empty after replay into a first-class audit failure with its actual cause.

Done criteria:

- Empty events are either repopulated with clean active props or explicitly classified by root cause.
- The board is not showing zero active props for events where usable source-backed inventory actually exists.

### P0. Persist complete signal metadata for every publishable prop

Why:

- The line field is not the main blocker anymore.
- Active props still ship with missing `signal_tier`, `agreement_score`, `market_quality_label`, and sometimes blank `forecast_direction`.
- That leaves the board looking hollow even when the prop itself exists.

Work:

- Trace where `agreement_score`, `agreement_label`, `agreement_sources`, `market_quality_score`, `market_quality_label`, and `signal_tier` are dropped between generation and `rm_forecast_precomputed`.
- Backfill or recompute missing signal metadata in `TEAM_PROPS_ROUTE_EXTRACT` and any remaining legacy write paths.
- Add an audit that fails if a publishable prop lands without line, direction, probability, and signal metadata.

Done criteria:

- Unlocked prop cards consistently show line, direction, projected probability, signal tier, agreement, and market quality when a prop is active.

### P0. Fix directional skew at the model/publish layer

Why:

- The poisoned fallback cleanup removed junk rows.
- The active slate is still overwhelmingly UNDER-heavy.
- That is now a real upstream calibration problem, not just a fallback artifact.

Work:

- Measure over/under distribution at:
  - candidate fetch
  - generated team bundle
  - persisted player-prop row
  - visible published row
- Identify where over-side inventory disappears.
- Add publish-time skew warnings by league and hard-stop thresholds for pathological same-day slates.

Done criteria:

- We can point to the exact stage causing over-side loss.
- Same-day leagues do not quietly ship 90%+ UNDER boards without alerts.

### P0. Fix odds freshness for MLB batter props

Why:

- The live audit showed Riley/Olson batter props off by 200+ cents.
- That is bad enough to make the board look fake even if the direction is right.

Work:

- Audit how often active `PLAYER_PROP` odds diverge from the latest normalized market rows by league and stat family.
- Add a freshness threshold that stales or refreshes props when odds drift past a tolerable gap.
- Verify MLB batter props are included in the same-day odds refresh path before first pitch.

Done criteria:

- Active MLB batter props do not sit 200+ cents off current books.
- Props with stale odds are refreshed or pulled before users see them.

### P1. Eliminate phantom lines and market mismatches

Why:

- The live audit found props with no sportsbook equivalent and cases where label/direction/source market did not line up cleanly.
- Even one fake-looking prop damages trust.

Work:

- Add a publish check that requires a current normalized market match for non-MLB source-backed props.
- Validate that prop label, `forecast_direction`, selected odds side, and normalized market all agree.
- Flag props that cannot be matched back to a live candidate row for the same event/player/stat/line.

Done criteria:

- No active prop reaches the board if the source market cannot be reconciled.
- Direction/label mismatches are blocked upstream.

### P1. Fix featured-player completeness

Why:

- Featured players still have too many nulls in `projectedMinutes` and `analysis`.
- The live audit also reported `?` prop labels in some featured-player payloads.

Work:

- Audit `buildFeaturedPlayersFromRows()` and `buildSupplementalFeaturedPlayerRows()` for cases where `propLabel`, `propType`, or projected minutes degrade to null.
- Ensure supplemental rows always carry a usable label and stat family.
- Improve minutes hydration from lineup snapshots and fallback context.

Done criteria:

- Featured-player props have usable labels.
- `projectedMinutes` is populated whenever lineup data exists.
- Bench filler rows no longer dominate the featured-player surface.

### P1. Fix name-quality and canonical-player mismatches

Why:

- The live audit surfaced suspicious names and likely spelling drift.
- Bad names poison trust and also break market matching.

Work:

- Audit active props against canonical player names and source-market player names.
- Add fuzzy-name mismatch reporting and hard blocks for low-confidence canonical matches.
- Repair the upstream canonical mapping for repeated offenders.

Done criteria:

- Obvious misspellings stop reaching the frontend.
- Player-market matching is stable enough that fake names do not survive publish.

### P1. Finish `prop_highlights` hydration

Why:

- The write path now hydrates highlights, but the slate audit still shows too many stale gaps and too few useful game-level prop narratives.

Work:

- Add a post-run audit that checks:
  - forecast exists
  - active props exist
  - `prop_highlights` still empty
- Backfill missing highlights for active events from top active props after publish.

Done criteria:

- Game forecasts with active props usually have prop highlights.
- Empty highlights become explainable exceptions instead of the default.

### P2. Diversify NHL market families

Why:

- NHL still leans hard into SOG unders.
- Even if valid, it makes the board repetitive and low-value.

Work:

- Compare available NHL candidate families to published families.
- Expand publish support for goals, assists, points, saves, and blocked shots when source quality is acceptable.
- Add a league diversity audit so one family cannot silently dominate.

Done criteria:

- NHL boards show more than SOG unders when upstream inventory exists.

### P2. Keep projection-cluster detection active

Why:

- The worst templated fallback rows were removed, but projection-cluster count is still non-zero in the audit.

Work:

- Keep the cluster audit in the publish pipeline.
- Add stat-family specific thresholds for repeated identical projections.
- Investigate remaining repeated values that survive after fallback cleanup.

Done criteria:

- Template-looking projection clusters fail audit before publish.
