# Projection Data Sources — Future Ingestion Plan

Reference doc for external projection data to enhance Rainmaker forecasts.
These are NOT yet integrated — this documents the plan for the next sprint.

---

## FanGraphs (MLB)

**Type**: Player projection models (Steamer, ZiPS, Depth Charts blend)

**Access**: Free CSV export
- URL: `https://www.fangraphs.com/projections?type=steamer&pos=all&stats=bat&lg=all`
- Also: `?type=zips`, `?type=depthcharts` (recommended — blends Steamer + ZiPS + manual)
- Pitchers: `&stats=pit`

**Key Fields (Batters)**:
| Field | Description |
|-------|-------------|
| PA | Plate Appearances |
| HR | Home Runs |
| RBI | Runs Batted In |
| SB | Stolen Bases |
| AVG | Batting Average |
| OBP | On-Base Percentage |
| wOBA | Weighted On-Base Average |
| WAR | Wins Above Replacement |

**Key Fields (Pitchers)**:
| Field | Description |
|-------|-------------|
| IP | Innings Pitched |
| K/9 | Strikeouts per 9 |
| ERA | Earned Run Average |
| WHIP | Walks + Hits per IP |
| FIP | Fielding Independent Pitching |

**Refresh**: Daily during MLB season (April–October)

**Ingestion Plan**:
1. Scheduled download: `curl` CSV daily at 3:00 AM ET
2. Parse CSV → normalize player names (match to CanonicalPlayer)
3. Store in new `rm_projections` table:
   ```sql
   CREATE TABLE rm_projections (
     id SERIAL PRIMARY KEY,
     player_name TEXT NOT NULL,
     canonical_player_id UUID REFERENCES "CanonicalPlayer"(id),
     league TEXT NOT NULL,
     source TEXT NOT NULL,  -- 'fangraphs_steamer', 'fangraphs_zips', 'torvik'
     season INT NOT NULL,
     projections JSONB NOT NULL,
     fetched_at TIMESTAMPTZ DEFAULT NOW()
   );
   ```
4. PIFF engine reads projections as additional signal for edge calculation

**License**: FanGraphs data is free for non-commercial/research use.

---

## Bart Torvik / T-Rank (NCAAB)

**Type**: Team efficiency ratings and win probability model

**Access**: Free data via barttorvik.com or `toRvik` R package
- Web: `https://barttorvik.com/trank.php`
- R package: `devtools::install_github("jflancer/toRvik")`
- Python: Scrape from barttorvik.com (no official API)

**Key Fields**:
| Field | Description |
|-------|-------------|
| AdjOE | Adjusted Offensive Efficiency (points per 100 possessions, adj. for opponent) |
| AdjDE | Adjusted Defensive Efficiency |
| Barthag | Win probability (based on AdjOE/AdjDE ratio) |
| Tempo | Possessions per 40 minutes |
| Luck | Record vs expected record |
| SOS | Strength of Schedule |
| eFG% | Effective Field Goal % |
| TO% | Turnover % |
| OR% | Offensive Rebound % |
| FTRate | Free Throw Rate |

**Refresh**: Daily during NCAAB season (November–March/April)

**Ingestion Plan**:
1. Python scraper or R script → CSV export daily at 2:00 AM ET
2. Team name normalization (map Torvik names to CanonicalTeam)
3. Store in `rm_projections` table with `source = 'torvik'`
4. DVP generator uses Torvik AdjOE/AdjDE to weight defensive efficiency
5. PIFF engine uses tempo + efficiency to adjust prop expectations

**License**: Free for research; no commercial redistribution without permission.

---

## Integration Priority

1. **Torvik (NCAAB)** — High priority, season ends soon (March Madness)
   - AdjOE/AdjDE directly improves DVP accuracy
   - Tempo adjusts expected stat volumes for props
2. **FanGraphs (MLB)** — Medium priority, season starts April
   - Projection data most valuable for early-season prop evaluation
   - WAR/wOBA context for Grok prompt enrichment

## Next Steps

- [ ] Create `rm_projections` table migration
- [ ] Build Torvik scraper (`scripts/torvik_scraper.py`)
- [ ] Build FanGraphs CSV downloader (`scripts/fangraphs_fetch.py`)
- [ ] Add projection context to PIFF engine
- [ ] Add projection context to Grok prompt builder
