# PLAN.md — TipSharks Predictive Decision Engine Migration

> Strategic plan to replace classic race book with live predictive decision engine.
> Architecture adapted from BoxBox F1 Fantasy investigation (forecast → simulate → optimize → visualize).
> Created: 2026-06-22. Status: DRAFT — pending approval before implementation.

---

## 1. Executive Summary

TipSharks currently runs a **pairwise logistic Elo engine** (horse/driver/trainer) with softmax win probabilities and Bradley-Terry place estimates. No ML, no Monte Carlo, no speed figures, no exotic bet optimization, no interactive UI. The mobile client is a **read-only text-string race book** with single-tip generation.

Goal: transform into a **live, predictive decision engine** with:
- Gradient-boosted ranking models (replacing/augmenting Elo as feature input)
- 10,000-run Monte Carlo simulation per race (with harness DNF/break-stride sampling)
- Exotic bet portfolio optimizer (Kelly criterion, budget-constrained)
- Interactive speed maps, confidence interval overlays, ticket portfolio builder
- Value-finder push alerts

Three-service monorepo stays. Work spans all three services.

---

## 2. Current State (Baseline)

### 2.1 tab-api-ingest (TypeScript/Prisma/BullMQ)

**Working:**
- TAB Affiliates API client (meetings, races, runners, results, dividends, odds)
- Prisma schema: Meeting, Race, Horse, Runner, Result, Dividend, OddsSnapshot, Scrape, JobRun
- BullMQ queues: morning-scrape (6AM), pre-race (T-60/T-15), post-race (T+5), cleanup, dead-letter
- Horse matching (6-priority name normalization)
- ~10 test files, ~80-100 cases

**Gaps (fields available in TAB API but NOT stored):**
| Field | API Location | Impact |
|-------|-------------|--------|
| `mile_rate_400`, `mile_rate_800` | EventRace | Sectional timing — critical for pace ratings |
| `gear` | EventRunner | Tack changes (blinkers, tongue tie) — feature input |
| `speedmap.settling_lengths` | EventRunner | Gate speed / early positioning — speed map UI |
| `track_direction`, `rail_position`, `track_circumference`, `track_home_straight` | EventRace | Track bias normalization |
| `start_type`, `gait` | EventRace | Harness-specific modeling (mobile/standing start) |
| `track_surface` | EventRace | Good/Soft/Heavy normalization |
| `actual_start` | EventRace | Lateness / going-down analysis |
| `flucs`, `flucs_with_timestamp` | EventRunner | Odds movement — value detection |
| `allowance_weight` | RunnerDetail | Weight normalization precision |
| `deduction` | EventRunner | Scratching deductions — exotic math |
| `favourite`, `mover` | EventRunner | Market signals |
| `form_indicators` | EventRunner | Structured form flags |
| `prize_money` | EventRunner | Career earnings — class feature |
| `silk_url_64x64`, `silk_url_128x128` | EventRunner | UI silks |

**Not available from TAB API (needs external source):**
- Stride length/frequency (video analysis — Punter's Intelligence, Sky Racing)
- Track moisture readings (penetrometer — Bureau of Meteorology, track APIs)
- Wind speed/direction (weather API)
- Breaking-stride rates (harness — HRNZ stewards' reports)

### 2.2 tipsharks-elo-api (Python/FastAPI/SQLAlchemy)

**Working:**
- Elo engine: `packages/core/ratings/engine.py` (698 lines) — pairwise logistic Elo, K-factor with RD scaling, barrier/handicap learned adjustments
- Effective rating: `R_eff = R_horse + α*R_driver + β*R_trainer + barrier_adj + handicap_adj` (α=0.35, β=0.15)
- Prediction engine: `packages/core/ratings/predictions.py` (613 lines) — softmax win probs, Bradley-Terry place probs, 95% CI from RD
- PostgreSQL: 9 tables (meetings, races, horses, drivers, trainers, starters, rating_snapshots, barrier_adjustments, handicap_adjustments, prediction_history, audit_logs)
- REST API: 35+ endpoints under `/v1/` (ratings, races, predictions, analytics, export CSV/Parquet/PDF, admin ingest/recompute/jobs/audit)
- Web UI: 10 Bootstrap + Chart.js pages (home, horse/driver/trainer detail, race, race-card, search, analytics, analytics-dashboard, data-correction)
- CLI worker: `apps/backend/worker/cli.py` (1170 lines) — Click-based
- Scheduler, data quality, rate limiting, caching
- ROI simulation script (`scripts/roi_simulation.py`, 391 lines)
- 23 test files, ~41% coverage (core packages ~90%)

**Stubs (NotImplementedError):**
- `packages/ml/features.py` (99 lines) — ML feature engineering
- `packages/ml/ensemble.py` (109 lines) — ensemble model
- `packages/betting/odds_client.py` (98 lines) — live odds client
- `packages/core/ratings/form_cycle.py` (133 lines) — form cycle detection
- `packages/core/ratings/time_weighted_elo.py` (148 lines) — working class but NOT integrated
- `packages/core/storage/track_conditions.py` (235 lines) — learning stub, no DB/API integration
- `packages/regions/australia.py` — stub

**Built but never called:**
- `packages/betting/value_bets.py` `ValueBetFinder.kelly_criterion()` (line 67-93) — works, no integration

**Completely missing:**
- Gradient-boosted ranking models (XGBoost/LightGBM)
- Monte Carlo simulation engine
- Exotic bet optimizer (exacta/trifecta/quinella/first-four)
- Portfolio optimizer (Kelly-based multi-race bankroll)
- Speed figures, pace ratings
- Track bias normalization (integrate `track_conditions.py`)
- Weather/track condition handling in rating engine (DB columns exist, never used)
- Modern interactive UI

### 2.3 tipsharks-client (React Native/Expo + FastAPI/MongoDB)

**Working:**
- Expo Router screens: Tips Home, Races, Schedule, Profile, Race Detail, Tip Builder, Tip Result, Login, Register
- Backend: `backend/server.py` (2424 lines) — FastAPI + MongoDB, proxies to elo-api via `EloApiClient`, mock fallback
- Zustand stores: appStore, authStore, prefStore (persisted)
- React Query (staleTime 5min), AsyncStorage cache layer
- Components: RaceCard, RunnerRow, ConfidenceMeter (static 3-segment), TipCard
- Types: Runner, Race, Tip, Schedule, Notification, UserPreferences
- Tests: RunnerRow (10), TipCard (~12), stores, cache, filters, formatters, tip-result, notifications (477 lines)

**UI/UX paradigm: TEXT-STRING RACE BOOK.**
- No interactive speed maps
- No drag-and-drop
- No confidence interval visualizations (static 3-segment bar only)
- No ticket/portfolio builder
- No value-finder alerts (badges exist in data, no alerting UI)
- No track visualization, no pace diagram
- Probability display: mini horizontal bar, no distribution curve

---

## 3. Gap Analysis vs Brief

| Brief Requirement | Current State | Work Required |
|-------------------|--------------|---------------|
| Sectional timing (opening 400m, closing 600m) | API has `mile_rate_400/800`, not stored | Add Prisma fields + ingest + Python storage |
| Stride length/frequency | Not available from TAB API | External source TBD (out of scope v1) |
| Track condition (Good/Soft/Heavy) | DB columns exist, never used in engine | Integrate `track_conditions.py` into engine |
| Weather (moisture, wind, temp) | Not available from TAB API | External weather API integration |
| Barrier draw / post position | Captured in Runner.barrier | Already have; use in speed maps + features |
| Gate speed (harness) | `speedmap.settling_lengths` in API, not stored | Add field + ingest + feature |
| Trainer skill (constructor equivalent) | Elo rating exists | Add track/distance-specific strike rate features |
| Jockey/driver skill | Elo rating exists | Add tactical execution, pathfinding, track-specific ROI features |
| Speed figures (normalized) | NOTHING | Build from scratch — feature engineering pipeline |
| Early/Late pace rating | NOTHING | Build from sectionals + settling_lengths |
| Gradient-boosted ranking models | STUBS only | Build XGBoost/LightGBM pipeline |
| Monte Carlo simulation (10,000 runs) | NOTHING | Build from scratch |
| Harness break-stride sampling (DNF layer) | NOTHING | Build probability model from HRNZ data |
| Thoroughbred weight normalization | Weight captured, not normalized | Build weight-adjusted speed figure |
| Exotic bet optimizer | NOTHING | Build exacta/trifecta/quinella/first-four probability models |
| Kelly criterion portfolio optimizer | `value_bets.py` has Kelly, never called | Integrate + build portfolio optimizer |
| Value-finder alerts | Badges in data, no alerting | Build alerting service + UI |
| 90% confidence intervals | 95% CI from RD exists | Add Monte Carlo-derived 90% CI |
| Interactive speed maps | NOTHING | Build drag-and-drop track visualization |
| Confidence interval overlays | Static 3-segment bar | Build distribution curve viz |
| Ticket portfolio builder | NOTHING | Build budget-constrained ticket composer UI |
| Modern UI/UX | Text-string race book | Full UI overhaul |

---

## 4. Target Architecture

Adapted from BoxBox pipeline: **RAW DATA → FEATURE ENGINEERING → PREDICTION MODELS → SIMULATION ENGINE → VALUE & ODDS RULES → OPTIMIZATION LAYER → MODERN UI/UX**.

```
┌─────────────────────────────────────────────────────────────┐
│                    tab-api-ingest (TS)                       │
│  TAB API → Prisma (meetings, races, runners, results, odds,  │
│            sectionals, gear, speedmap, track context)        │
│  + External: weather API, HRNZ stewards (break-stride)       │
│  BullMQ: morning-scrape, pre-race, post-race, cleanup        │
└──────────────────────────┬──────────────────────────────────┘
                           │ HTTP (existing ingest_client)
┌──────────────────────────▼──────────────────────────────────┐
│                   tipsharks-elo-api (Python)                 │
│                                                              │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────┐  │
│  │ Elo Engine  │  │ Feature      │  │ ML Ranking Models  │  │
│  │ (existing)  │→ │ Engineering  │→ │ (XGBoost/LightGBM) │  │
│  │ horse/driver│  │ (new package)│  │ (new, replaces     │  │
│  │ /trainer    │  │ speed figs,  │  │  ml stubs)         │  │
│  │ + RD + adj  │  │ pace, bias,  │  │                    │  │
│  └─────────────┘  │ weight norm  │  └─────────┬──────────┘  │
│                   └──────────────┘            │             │
│                                               ▼             │
│                              ┌──────────────────────────┐    │
│                              │ Monte Carlo Simulation   │    │
│                              │ (new package)            │    │
│                              │ 10,000 runs/race         │    │
│                              │ + DNF/break-stride       │    │
│                              │ + pace variance          │    │
│                              └───────────┬──────────────┘    │
│                                          ▼                   │
│                              ┌──────────────────────────┐    │
│                              │ Value & Odds Rules        │    │
│                              │ (integrate value_bets.py) │    │
│                              │ overlays/underlays       │    │
│                              └───────────┬──────────────┘    │
│                                          ▼                   │
│                              ┌──────────────────────────┐    │
│                              │ Portfolio Optimizer      │    │
│                              │ (new package)            │    │
│                              │ Kelly + budget constraint│    │
│                              │ exotics: exacta/trifecta │    │
│                              │ /quinella/first-four     │    │
│                              └───────────┬──────────────┘    │
│                                          ▼                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  REST API (existing /v1/ + new endpoints)           │    │
│  │  + WebSocket (existing, enhanced)                   │    │
│  │  + Web UI (existing Bootstrap, enhanced)            │    │
│  └──────────────────────────┬──────────────────────────┘    │
└─────────────────────────────┼───────────────────────────────┘
                              │ HTTP
┌─────────────────────────────▼───────────────────────────────┐
│                tipsharks-client (React Native)              │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Speed Map    │  │ Confidence   │  │ Ticket Portfolio │  │
│  │ (drag-drop   │  │ Interval     │  │ Builder          │  │
│  │  track viz)  │  │ Overlays     │  │ (budget + risk   │  │
│  │              │  │ (distribution│  │  profile + Kelly)│  │
│  │              │  │  curves)     │  │                  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Value Finder │  │ Race Browser │  │ Tips (enhanced)  │  │
│  │ Alerts       │  │ (existing,   │  │ (MC-backed)      │  │
│  │ (push)       │  │  enhanced)   │  │                  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│                                                              │
│  Backend (FastAPI + MongoDB): proxy + cache + notifications  │
└─────────────────────────────────────────────────────────────┘
```

---

## 5. Data Mapping: F1 Telemetry → Horse Racing

| F1 Component (BoxBox) | Horse Racing Equivalent | Product Implementation |
|------------------------|------------------------|------------------------|
| Telemetry & Lap Data | Sectional timing (opening 400m, closing 600m), stride data | Normalized speed figures (track-bias-corrected "True Pace" + closing burst) |
| Tyre Degradation | Track condition (Good/Soft/Heavy, Fast/Slushy) + distance changes | Stamina drop-off model over extra 200m; moisture profile degradation |
| Pit-Stop Performance | Barrier draw (T) / post position (H) + gate speed | Early positioning efficiency; "trapped wide" / "pocketed on pegs" risk |
| Constructor Skill | Trainer + stud/bloodline historical metrics | Trainer strike rate at track/distance when peaking horse |
| Driver Skill | Jockey (T) / Driver (H) analytics | Tactical execution, pathfinding efficiency, pace awareness, track ROI |
| Weather/Environment | Track moisture, temp, wind speed/direction on home straight | Dynamic sim adjustment for headwind impact on front-runners |
| DNF / Reliability | Breaking stride (harness) | DNF probability sampling in Monte Carlo |
| Fantasy Points | Win/Place/Exotic dividends | Official tote dividend rules conversion |
| Lineup Optimizer (1.4M combos) | Exotic bet portfolio (trifectas, first fours, quaddies) | Millions of exotic permutations, budget-constrained Kelly |
| Two-Team Portfolio | Risk profile (Conservative/Balanced/Aggressive) | Portfolio ticket distribution under uncertainty |
| Confidence Intervals | 90% CI for favorites | Safe floor vs volatile ceiling visualization |
| Multiplier Tiers | Risk tiers | Conservative (win/place), Balanced (quinella/exacta), Aggressive (trifecta/first-four) |

---

## 6. Migration Phases

### Phase 0 — Schema & Ingest Backfill (tab-api-ingest)
**Goal:** Capture all available TAB API fields currently dropped.

- [ ] Extend Prisma schema: add fields to Race (track_direction, rail_position, track_circumference, track_home_straight, start_type, gait, track_surface, actual_start), Runner (gear, speedmap_settling_lengths, favourite, mover, deduction, flucs, flucs_with_timestamp, form_indicators, prize_money, allowance_weight, silk_url), Result (mile_rate_400, mile_rate_800, sectional_times JSON)
- [ ] Update TAB API client schemas (`schemas.ts`) to parse new fields
- [ ] Update RaceService/MeetingService to persist new fields
- [ ] Add migration script to backfill historical data where available
- [ ] Add tests for new field parsing
- [ ] External: integrate weather API (Bureau of Meteorology / OpenWeatherMap) for moisture/wind/temp per track

### Phase 1 — Feature Engineering (tipsharks-elo-api)
**Goal:** Build speed figures, pace ratings, track bias normalization, weight normalization.

- [ ] New package `packages/features/`:
  - `speed_figures.py` — normalized speed figures (track-bias-corrected, moisture-adjusted)
  - `pace_ratings.py` — early pace rating (gate speed from settling_lengths), late pace rating (closing 600m sectional)
  - `track_bias.py` — integrate `track_conditions.py` stub into engine; normalize for track variance
  - `weight_normalization.py` — thoroughbred weight-adjusted speed (3.5kg drop → calculated upgrade)
  - `trainer_features.py` — track/distance-specific strike rates
  - `jockey_features.py` — tactical execution, pathfinding, track ROI
- [ ] New storage tables: `speed_figures`, `pace_ratings`, `track_bias_profiles`, `feature_vectors`
- [ ] Integrate features into Elo engine's `compute_effective_rating` as additional terms
- [ ] Tests for each feature module

### Phase 2 — ML Ranking Models (tipsharks-elo-api)
**Goal:** Replace `packages/ml/` stubs with working gradient-boosted ranking models.

- [ ] Implement `packages/ml/features.py` — build feature vectors from Phase 1 features + Elo ratings + historical form
- [ ] Implement `packages/ml/ensemble.py` — XGBoost/LightGBM ranking models for finishing order
- [ ] Training pipeline: historical races (3-5 years) → features → model training → versioning
- [ ] Model registry: store trained models with metadata (accuracy, training date, feature set)
- [ ] Prediction endpoint: feed Elo + features → ML model → predicted finishing order
- [ ] Integrate with existing `PredictionEngine` (ML as augmentation, not replacement)
- [ ] Tests + accuracy tracking (extend `prediction_history` table)

### Phase 3 — Monte Carlo Simulation (tipsharks-elo-api)
**Goal:** Build 10,000-run simulation engine per race.

- [ ] New package `packages/simulation/`:
  - `monte_carlo.py` — core simulation loop (10,000 runs/race)
  - `samplers.py` — sample high-variance variables:
    - Bad starts (barrier-dependent)
    - Checking/blocking in running
    - Breaking stride (harness) — DNF probability from HRNZ stewards' data
    - Pace variance (early vs late pace rating noise)
    - Track condition shift (Good → Heavy)
  - `distributions.py` — Win/Place/Exotic probability distributions from sim results
  - `confidence_intervals.py` — 90% CI from sim percentiles (replace RD-based 95% CI)
- [ ] Integrate with ML ranking model (sim samples around ML predictions)
- [ ] New storage: `simulation_results` table (race_id, run_count, win_dist, place_dist, exotic_dist, ci_lower, ci_upper, percentiles)
- [ ] API endpoints: `GET /v1/races/{id}/simulation`, `POST /v1/admin/simulate`
- [ ] WebSocket enhancement: stream sim progress
- [ ] Tests for simulation determinism (seeded), distribution validity

### Phase 4 — Value & Odds Rules (tipsharks-elo-api)
**Goal:** Map simulated probabilities against live market odds; identify overlays/underlays.

- [ ] Implement `packages/betting/odds_client.py` (replace stub) — live odds from tab-api-ingest OddsSnapshot or direct TAB API
- [ ] Integrate `ValueBetFinder.kelly_criterion()` into API + worker
- [ ] New module `packages/betting/value_rules.py`:
  - Overlay detection (model prob > market-implied prob)
  - Underlay detection (market traps)
  - Edge calculation
- [ ] New storage: `value_bets` table (race_id, runner_id, model_prob, market_prob, edge, kelly_fraction, bet_size)
- [ ] API endpoints: `GET /v1/races/{id}/value-bets`, `GET /v1/value-alerts`
- [ ] Worker job: scan upcoming races for value bets, trigger alerts
- [ ] Tests

### Phase 5 — Portfolio Optimizer (tipsharks-elo-api)
**Goal:** Budget-constrained exotic bet optimization via Kelly criterion.

- [ ] New package `packages/optimizer/`:
  - `exotic_probs.py` — exacta/trifecta/quinella/first-four probability models from Monte Carlo results
  - `portfolio.py` — evaluate millions of exotic permutations, recommend optimal ticket distribution
  - `kelly.py` — budget-constrained Kelly criterion (extend existing `value_bets.py`)
  - `risk_profiles.py` — Conservative (win/place), Balanced (quinella/exacta), Aggressive (trifecta/first-four)
- [ ] New storage: `portfolio_recommendations` table (race_id/meeting_id, budget, risk_profile, tickets JSON, expected_value, confidence)
- [ ] API endpoints: `POST /v1/optimize/portfolio` (input: budget, risk profile, race/meeting), `GET /v1/portfolios/{id}`
- [ ] Tests for optimizer correctness, budget constraints, Kelly math

### Phase 6 — UI/UX Overhaul (tipsharks-client)
**Goal:** Replace text-string race book with interactive predictive decision engine.

- [ ] **Interactive Speed Maps** (`frontend/src/components/SpeedMap.tsx`):
  - Visual track grid (SVG/Canvas) with running lines
  - Drag-and-drop horses to different running lines
  - Toggle slider for pace-leader missing start → re-simulate
  - Uses `speedmap.settling_lengths` + Monte Carlo position tendencies
- [ ] **Confidence Interval Overlays** (`frontend/src/components/ConfidenceCurve.tsx`):
  - Distribution curve per runner (from Monte Carlo percentiles)
  - 90% CI visualization (safe floor vs volatile ceiling)
  - Track condition shift toggle (Good → Heavy shows downside)
- [ ] **Ticket Portfolio Builder** (`frontend/src/components/TicketBuilder.tsx` + `app/ticket-builder.tsx`):
  - Budget input ($50)
  - Risk profile selector (Conservative/Balanced/Aggressive)
  - Engine evaluates combinations → structured wagering ticket
  - Wins/Places/Exotics breakdown
  - Expected value + confidence display
- [ ] **Value Finder Alerts** (`frontend/src/components/ValueAlerts.tsx` + push notifications):
  - "Model detects 25% win probability; Market price offers 10-1"
  - Push notification integration (existing Twilio/SendGrid/Resend)
  - Alert feed screen
- [ ] **Enhanced Race Detail** (`app/race/[id].tsx`):
  - Integrate speed map, confidence curves, value badges
  - Monte Carlo-backed probabilities (replace static bars)
  - Runner cards with CI ranges
- [ ] **Enhanced Tips** (`app/tip-result.tsx`):
  - MC-backed confidence (distribution curve, not 3-segment bar)
  - Portfolio ticket display (not single recommendation)
- [ ] Backend (`backend/server.py`):
  - New endpoints: `/simulation/{race_id}`, `/optimize/portfolio`, `/value-alerts`, `/portfolios/{id}`
  - Proxy to elo-api new endpoints
  - MongoDB cache for simulation results, portfolios
- [ ] New types: `SimulationResult`, `ConfidenceInterval`, `PortfolioTicket`, `ValueAlert`, `SpeedMapData`
- [ ] Tests for each new component + screen

### Phase 7 — Web UI Enhancement (tipsharks-elo-api)
**Goal:** Enhance existing Bootstrap web UI with new features (parity with mobile).

- [ ] Add simulation results to race page
- [ ] Add value bets panel
- [ ] Add portfolio optimizer page
- [ ] Add confidence interval charts (Chart.js distribution curves)
- [ ] Add model accuracy tracking for ML + Monte Carlo

---

## 7. Schema Changes Summary

### tab-api-ingest (Prisma)

**Race** — add:
```prisma
trackDirection      VarChar(10)?    // "Left" | "Right"
railPosition        VarChar(100)?   // "True Entire Circuit"
trackCircumference  Int?            // meters
trackHomeStraight   Int?            // meters
startType           VarChar(20)?    // "mobile" | "standing" (harness)
gait                VarChar(20)?    // harness gait
trackSurface        VarChar(20)?    // "turf" | "synthetic" | "dirt"
actualStart         Timestamptz?    // actual start timestamp
```

**Runner** — add:
```prisma
gear                Json?           // ["Blinkers", "Tongue Tie"]
speedmapSettling    Int?            // settling_lengths from speedmap
favourite           Boolean         default false
mover               Boolean         default false
deduction           Json?           // {win: Decimal, place: Decimal}
flucs               Json?           // odds fluctuation array
flucsTimeline       Json?           // flucs_with_timestamp
formIndicators      Json?           // [{name, group, negative, priority}]
prizeMoney          VarChar(100)?   // career earnings string
allowanceWeight     Decimal(4,2)?   // jockey claim
silkUrl64           VarChar(255)?
silkUrl128          VarChar(255)?
```

**Result** — add:
```prisma
mileRate400         Decimal(10,3)?  // last 400m sectional
mileRate800         Decimal(10,3)?  // last 800m sectional
sectionalTimes      Json?           // {opening_400, closing_600, ...}
```

**New model: WeatherSnapshot**
```prisma
model WeatherSnapshot {
  id              String   @id @default(uuid())
  raceId          String
  race            Race    @relation(fields: [raceId], references: [id])
  temperature     Decimal(5,2)?  // Celsius
  windSpeed       Decimal(5,2)?  // km/h
  windDirection   VarChar(10)?   // degrees or cardinal
  moistureReading Decimal(5,2)? // penetrometer
  capturedAt      Timestamptz
  source          VarChar(50)    // "BOM" | "OpenWeatherMap"
}
```

### tipsharks-elo-api (SQLAlchemy)

**New tables:**
```python
class SpeedFigure:      # horse_id, race_id, figure, track_bias_adj, moisture_adj
class PaceRating:        # horse_id, race_id, early_pace, late_pace
class TrackBiasProfile: # venue, start_type, distance_bucket, bias_value
class FeatureVector:    # race_id, horse_id, features JSONB, version
class MLModel:           # model_id, version, type, trained_at, accuracy, features
class SimulationResult:  # race_id, run_count, win_dist JSONB, place_dist JSONB, exotic_dist JSONB, ci_lower, ci_upper, percentiles JSONB
class ValueBet:          # race_id, horse_id, model_prob, market_prob, edge, kelly_fraction, bet_size
class PortfolioRecommendation: # race_id/meeting_id, budget, risk_profile, tickets JSONB, expected_value, confidence
```

### tipsharks-client (MongoDB + TypeScript types)

**New types:**
```typescript
interface SimulationResult {
  raceId: string; runCount: number;
  winDistribution: Record<string, number>;
  placeDistribution: Record<string, number>;
  exoticDistribution: Record<string, number>;
  confidenceIntervals: Record<string, { lower: number; median: number; upper: number }>;
}
interface ConfidenceInterval { lower: number; median: number; upper: number; }
interface PortfolioTicket {
  id: string; raceId: string; budget: number; riskProfile: 'conservative'|'balanced'|'aggressive';
  tickets: Ticket[]; expectedValue: number; confidence: number;
}
interface ValueAlert {
  id: string; raceId: string; runnerId: string; runnerName: string;
  modelProbability: number; marketOdds: number; edge: number; message: string;
}
interface SpeedMapData {
  raceId: string; runners: { id: string; name: string; barrier: number; settlingLine: number; }[];
}
```

---

## 8. New Packages/Modules

### tipsharks-elo-api
| Package | Purpose | Replaces |
|---------|---------|----------|
| `packages/features/` | Speed figures, pace ratings, track bias, weight norm | — |
| `packages/simulation/` | Monte Carlo engine, samplers, distributions | — |
| `packages/optimizer/` | Exotic probs, portfolio, Kelly, risk profiles | — |
| `packages/ml/` (implement) | Feature vectors, XGBoost/LightGBM ranking | Existing stubs |
| `packages/betting/` (implement) | Odds client, value rules | Existing stubs |

### tipsharks-client
| Component | Purpose |
|-----------|---------|
| `src/components/SpeedMap.tsx` | Interactive drag-drop track visualization |
| `src/components/ConfidenceCurve.tsx` | Distribution curve + 90% CI overlay |
| `src/components/TicketBuilder.tsx` | Budget + risk profile + ticket composer |
| `src/components/ValueAlerts.tsx` | Value-finder alert feed |
| `app/ticket-builder.tsx` | Portfolio builder screen |
| `app/value-alerts.tsx` | Value alerts screen |

---

## 9. Testing Strategy

- **tab-api-ingest:** Add tests for new field parsing, backfill script, weather API integration
- **tipsharks-elo-api:**
  - Feature engineering: unit tests per feature module (speed figure correctness, pace rating, weight norm)
  - ML: model accuracy tests, feature vector consistency, training pipeline
  - Monte Carlo: determinism (seeded), distribution validity, DNF sampling rates, performance (10k runs < 5s)
  - Optimizer: budget constraint satisfaction, Kelly math correctness, exotic probability validity
  - Integration: end-to-end race → features → ML → sim → value → portfolio
- **tipsharks-client:**
  - Component tests: SpeedMap drag-drop, ConfidenceCurve render, TicketBuilder interactions, ValueAlerts
  - Screen tests: ticket-builder, value-alerts, enhanced race detail
  - Backend: new endpoint proxy tests

---

## 10. Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|-----------|
| Stride/moisture/wind data unavailable from TAB API | Feature gaps | Integrate external weather API (BOM/OpenWeatherMap); stride data deferred to v2 |
| Breaking-stride rates need HRNZ stewards' data | Harness DNF model accuracy | Scrape HRNZ stewards' reports (existing `hrnz_scraper` package); start with historical break-stride rates |
| ML model accuracy with limited features | Poor predictions | Start with Elo + features as inputs; ML augments, doesn't replace; track accuracy publicly (like BoxBox) |
| Monte Carlo performance (10k runs × N races) | Slow API | Run sims in worker (BullMQ equivalent — Python RQ/Celery); cache results; WebSocket for progress |
| Exotic optimizer combinatorial explosion | Timeout | Prune low-probability permutations; use Kelly to rank; cap evaluation count |
| UI complexity (drag-drop speed maps) | Dev time | Use React Native gesture handler + Reanimated; SVG track via react-native-svg |
| Backfill of historical sectionals | Data gaps | Only available for recent races; document limitation; model degrades gracefully |
| Existing Elo engine disruption | Regression | ML + Monte Carlo augment, don't replace Elo; feature flags for rollout |

---

## 11. Out of Scope (v1)

- Stride length/frequency data (needs video analysis source — TBD)
- Real-time sectional tracking during live race (post-race only for v1)
- Multi-week planner heuristics (BoxBox-style future-round planning)
- Two-team portfolio (BoxBox uses 2 teams; TipSharks uses risk profiles instead)
- Model accuracy public dashboard (defer to v2)

---

## 12. Approval Gate

**This plan is DRAFT.** Before implementation:
1. Confirm scope (v1 vs deferred items)
2. Confirm external data sources (weather API, HRNZ stewards access)
3. Confirm ML stack (XGBoost vs LightGBM vs CatBoost)
4. Confirm UI framework choices (react-native-svg, gesture handler)
5. Confirm phase ordering (sequential vs parallel)

Once approved, create TODO.md entries per phase and dispatch implementation.