# HarnessElo Rating Mathematics

## Overview

HarnessElo uses a **multi-runner Elo system** based on pairwise logistic comparisons. Unlike traditional 1v1 Elo (chess), harness races have 2-20+ starters competing simultaneously. We handle this by:

1. Computing pairwise expected outcomes for all pairs
2. Updating each entity based on average performance against all opponents
3. Incorporating multi-entity contributions (horse + driver + trainer)
4. Learning condition adjustments (barrier, handicap) from data

## Core Elo Formula

### Pairwise Expected Outcome

For two starters $i$ and $j$ with effective ratings $R_i$ and $R_j$:

$$
E_{ij} = \frac{1}{1 + e^{-(R_i - R_j) / c}}
$$

Where:
- $E_{ij}$ = probability that $i$ finishes ahead of $j$
- $c$ = scale factor (default: 400)
- $e$ = Euler's number

This is the **logistic sigmoid** function.

### Actual Outcome

$$
S_{ij} = \begin{cases}
1 & \text{if } placing_i < placing_j \\
0 & \text{if } placing_i > placing_j \\
\text{skip} & \text{if either has no placing}
\end{cases}
$$

### Rating Update

For starter $i$ in a race with $n$ finishers:

$$
\Delta R_i = K \cdot \frac{1}{n-1} \sum_{j \neq i} (S_{ij} - E_{ij})
$$

Where:
- $K$ = update rate (default: 24.0)
- Sum is over all other starters $j$
- Average pairwise error determines update

### Properties

**Zero-sum**: In expectation, sum of updates = 0 (no rating inflation)

$$
\sum_{i=1}^{n} \Delta R_i \approx 0
$$

**Winner gains, loser loses**: Winner has positive $\Delta R$, last place has negative

**Magnitude depends on surprise**: Upset wins → large update, expected outcome → small update

---

## Multi-Entity Effective Rating

HarnessElo combines ratings from horse, driver, and trainer:

$$
R_{\text{eff}} = R_{\text{horse}} + \alpha \cdot R_{\text{driver}} + \beta \cdot R_{\text{trainer}} + A_{\text{barrier}} + A_{\text{handicap}}
$$

### Entity Weights

| Entity  | Symbol | Default | Rationale                                  |
|---------|--------|---------|-------------------------------------------|
| Horse   | -      | 1.0     | Primary athlete                           |
| Driver  | α      | 0.35    | Significant skill impact                  |
| Trainer | β      | 0.15    | Preparation and strategy                  |

**Configurable via**: `DRIVER_WEIGHT_ALPHA`, `TRAINER_WEIGHT_BETA`

### Rating Update Distribution

When horse $i$ receives update $\Delta R_i$:

- Horse rating: $R_{\text{horse}} \leftarrow R_{\text{horse}} + \Delta R_i$
- Driver rating: $R_{\text{driver}} \leftarrow R_{\text{driver}} + \alpha \cdot \Delta R_i$
- Trainer rating: $R_{\text{trainer}} \leftarrow R_{\text{trainer}} + \beta \cdot \Delta R_i$

This ensures proportional contribution to effective rating.

---

## Condition Adjustments

### Barrier Adjustment

Barrier position can affect performance (inside/outside gates).

$$
A_{\text{barrier}}(v, t, d, b) = \text{lookup}(\text{venue}, \text{start\_type}, \text{distance\_bucket}, \text{barrier})
$$

**Learning Rule**:

$$
A \leftarrow A + \eta \cdot (\text{actual\_advantage} - A)
$$

Where:
- $\eta$ = learning rate (default: 0.5)
- Actual advantage = average rating delta for this barrier condition

**Example**: If barrier 1 horses consistently beat expectations, $A_{\text{barrier}}$ increases.

### Handicap Adjustment

Handicaps (back marks) force horses to start further back.

$$
A_{\text{handicap}}(v, t, d, h) = \text{lookup}(\text{venue}, \text{start\_type}, \text{distance\_bucket}, \text{handicap\_m})
$$

**Expected**: Negative adjustments for larger handicaps (disadvantage)

**Learning**: Same incremental update as barriers

### Distance Buckets

To generalize across similar races, distances are bucketed:

| Bucket       | Range (meters) |
|--------------|----------------|
| `<1700`      | 0 - 1699       |
| `1700-2000`  | 1700 - 1999    |
| `2000-2400`  | 2000 - 2399    |
| `>2400`      | 2400+          |

**Configurable via**: `DISTANCE_BUCKETS`

---

### Adjustment Interpretation

**Barrier adjustments** reflect the advantage/disadvantage of starting positions:

- **Barrier 1 (inside)**: Typically has positive adjustment (+5 to +15 rating points)
  - Shorter distance to travel
  - Better position in early stages
  - Higher on faster tracks (mobile starts)

- **Barrier 8+ (outside)**: Typically has negative adjustment (-10 to -20 rating points)
  - Longer distance to travel
  - Risk of being caught wide
  - More pronounced at shorter distances

**Handicap adjustments** reflect the disadvantage of back marks:

- **0m (front line)**: Zero adjustment (baseline)
- **10m back**: Approximately -10 to -15 rating points
- **20m back**: Approximately -20 to -30 rating points
- **30m+ back**: Approximately -40 to -60 rating points

**Venue-specific variations**:
- **Tight tracks** (e.g., Addington): Larger barrier effects
- **Wide tracks** (e.g., Auckland): Smaller barrier effects
- **Distance dependency**: Effects more pronounced at shorter distances (<1700m)

**Validation guidelines**:
1. Barrier 1 should generally have positive adjustment
2. Higher barriers should trend toward negative adjustments
3. Larger handicaps should have increasingly negative adjustments
4. Sum of all adjustments for a condition should be near zero (balanced)
5. Adjustments should stabilize after 100+ observations

**Configuration**: Set `ENABLE_ADJUSTMENTS=true` and run recompute with `--learn-adjustments` flag.

---

## Tuning Reference

For a full list of tuning knobs and recommended workflows, see `docs/elo_tuning.md`.

## Rating Deviation (RD)

**(Implemented, optional)**

Track **uncertainty** in ratings using Rating Deviation (RD), inspired by Glicko.

### Initial RD
New entities start with high uncertainty:

$$
RD_0 = 350
$$

**Configuration**: `RD_MAX=350.0`

---

## Break-even Odds for Place Probability

To convert a place probability $p$ into the **minimum decimal odds** required to
break even:

$$
\\text{break-even decimal odds} = \\frac{1}{p}
$$

Decimal odds here mean **payout per $1 stake, including stake** (e.g., 2.50
pays $2.50 total on a $1 bet). Any offered odds above this threshold imply
positive expected value for that probability.

| Place Probability | Break-even Decimal Odds | Observed Place Rate | Break-even Odds (Observed) |
|---:|---:|---:|---:|
| 0.05 | 20.00 | 6.65% | 15.04 |
| 0.10 | 10.00 | 11.05% | 9.05 |
| 0.15 | 6.67 | 16.73% | 5.98 |
| 0.20 | 5.00 | 21.16% | 4.73 |
| 0.25 | 4.00 | 24.76% | 4.04 |
| 0.30 | 3.33 | 28.27% | 3.54 |
| 0.35 | 2.86 | 31.92% | 3.13 |
| 0.40 | 2.50 | 36.23% | 2.76 |
| 0.45 | 2.22 | 39.44% | 2.54 |
| 0.50 | 2.00 | 41.31% | 2.42 |
| 0.55 | 1.82 | 45.96% | 2.18 |
| 0.60 | 1.67 | 48.64% | 2.06 |
| 0.65 | 1.54 | 50.88% | 1.97 |
| 0.70 | 1.43 | 50.17% | 1.99 |
| 0.75 | 1.33 | 57.85% | 1.73 |
| 0.80 | 1.25 | 55.19% | 1.81 |
| 0.85 | 1.18 | 61.54% | 1.63 |
| 0.90 | 1.11 | 65.51% | 1.53 |
| 0.95 | 1.05 | 56.93% | 1.76 |
| 1.00 | 1.00 | 66.27% | 1.51 |

Observed place rate is the historical top-3 finish rate for starters whose
predicted place probability rounds to the band (nearest 0.05), using the
current database and model settings.

Observed break-even odds use the same conversion:

$$
\\text{break-even decimal odds (observed)} = \\frac{1}{p_{\\text{observed}}}
$$

### RD Decay (with activity)
After each race, uncertainty decreases:

$$
RD_{\text{new}} = \max(RD_{\text{old}} - \Delta_{decay}, RD_{\min})
$$

Where $\Delta_{decay}$ defaults to 15.0 rating points per race.

**Configuration**: `RD_DECAY_PER_RACE=15.0`, `RD_MIN=50.0`

Converges to minimum (~50) after ~20 races.

### RD Inflation (inactivity)
If no races for $t$ days since last race:

$$
RD_{\text{new}} = \min(RD_{\text{old}} + \lambda \cdot t, RD_{\max})
$$

Where $\lambda$ defaults to 0.5 rating points per day.

**Configuration**: `RD_INFLATION_PER_DAY=0.5`

**Example**: 100 days inactive → +50 RD increase

### Effective K-Factor

When RD tracking is enabled, adjust update rate based on uncertainty:

$$
K_{\text{eff}} = K_{\text{base}} \cdot \frac{RD}{RD_0}
$$

Where $RD_0 = 350$ (initial RD for new entities).

**Effect**:
- New/inactive entities (high RD): Larger updates, faster convergence
- Established entities (low RD ~50): Smaller updates, more stable
- Mid-range entities (RD ~150): Moderate update rates

**Example**:
- Entity with RD=350: $K_{eff} = 24.0 \times \frac{350}{350} = 24.0$
- Entity with RD=100: $K_{eff} = 24.0 \times \frac{100}{350} = 6.9$
- Entity with RD=50: $K_{eff} = 24.0 \times \frac{50}{350} = 3.4$

**Enable via**: `ENABLE_RD=true`

**Note**: RD-based K-factor adjustment helps ratings converge faster for uncertain entities while maintaining stability for established entities.

---

## Parameter Tuning

### Scale Factor ($c$)

Controls sensitivity of expected outcomes.

- **Smaller** $c$: Sharper probabilities, faster rating changes
- **Larger** $c$: Smoother probabilities, slower rating changes

**Formula**:
$$
E_{ij} = \frac{1}{1 + e^{-(R_i - R_j) / c}}
$$

**Default**: 400 (standard Elo chess scale)

**Tuning**: Adjust based on calibration (see evaluation reports)

### K-Factor

Controls update magnitude.

- **Smaller** $K$: Ratings stabilize quickly, less responsive to new data
- **Larger** $K$: Ratings change rapidly, more volatile

**Default**: 24.0

**Recommendations**:
- Increase $K$ if ratings don't converge in 20-30 races
- Decrease $K$ if ratings are too noisy

### Driver/Trainer Weights

**Current defaults**:
- $\alpha = 0.35$ (driver)
- $\beta = 0.15$ (trainer)

**Tuning approach**:
1. Disable adjustments: `ENABLE_ADJUSTMENTS=false`
2. Disable trainer: `ENABLE_TRAINER=false`
3. Vary $\alpha$ from 0.2 to 0.5
4. Measure winner accuracy and calibration
5. Re-enable trainer, vary $\beta$ from 0.1 to 0.3
6. Select values that maximize predictive accuracy

### Adjustment Learning Rate

Controls how quickly condition adjustments adapt.

**Default**: $\eta = 0.5$

**Trade-off**:
- High $\eta$: Fast adaptation, but noisy
- Low $\eta$: Stable, but slow to learn

---

## Example Calculation

### Scenario
Race with 3 horses, equal ratings (1500), horse A wins:

**Initial ratings**:
- $R_A = 1500$
- $R_B = 1500$
- $R_C = 1500$

**Expected outcomes** (all equal):
- $E_{AB} = E_{AC} = 0.5$ (A vs others)
- $E_{BA} = E_{BC} = 0.5$ (B vs others)
- $E_{CA} = E_{CB} = 0.5$ (C vs others)

**Actual outcomes**:
- A wins (placing=1): $S_{AB} = 1$, $S_{AC} = 1$
- B second (placing=2): $S_{BA} = 0$, $S_{BC} = 1$
- C third (placing=3): $S_{CA} = 0$, $S_{CB} = 0$

**Updates** (with $K = 24$, $n = 3$):

$$
\Delta R_A = 24 \cdot \frac{1}{2} \cdot [(1 - 0.5) + (1 - 0.5)] = 24 \cdot 0.5 \cdot 1.0 = +12
$$

$$
\Delta R_B = 24 \cdot \frac{1}{2} \cdot [(0 - 0.5) + (1 - 0.5)] = 24 \cdot 0.5 \cdot 0.0 = 0
$$

$$
\Delta R_C = 24 \cdot \frac{1}{2} \cdot [(0 - 0.5) + (0 - 0.5)] = 24 \cdot 0.5 \cdot (-1.0) = -12
$$

**New ratings**:
- $R_A = 1512$
- $R_B = 1500$ (no change)
- $R_C = 1488$

**Sum**: $+12 + 0 - 12 = 0$ ✓ (zero-sum)

---

## Convergence and Stability

### Typical Convergence
- **10 races**: Rating stabilizes to within ±50 points
- **30 races**: Rating reliable to within ±20 points
- **100+ races**: Very stable rating

### Preventing Drift
- Zero-sum updates ensure no inflation
- Periodic validation: check global average rating stays near 1500

### Handling Outliers
- DNFs excluded from calculations (no rating penalty)
- Very large upsets capped by sigmoid saturation

---

## Evaluation Metrics

See `scripts/evaluate.py` for implementation.

### Winner Accuracy
Fraction of races where top-rated horse wins:

$$
\text{Accuracy} = \frac{\text{races where } \arg\max(R_i) \text{ won}}{\text{total races}}
$$

**Target**: 35-45% (harness racing has high variance)

### Top-3 Hit Rate
Fraction of races where top 3 rated horses overlap with actual top 3:

$$
\text{Hit Rate} = \frac{|\text{predicted top 3} \cap \text{actual top 3}|}{3}
$$

**Target**: 60-75%

### Calibration
Group predictions by probability bins, compare predicted vs actual win rates:

| Predicted Win % | Actual Win % | Error |
|-----------------|--------------|-------|
| 5-10%           | 7.2%         | 0.02  |
| 10-15%          | 12.8%        | 0.02  |
| 40-50%          | 46.3%        | 0.04  |

**Target**: Mean absolute error < 0.05

---

## Known Limitations

1. **No time decay**: Ratings don't account for horse aging or injury
2. **Track conditions**: Weather and track state not modeled
3. **Race strategy**: No modeling of pace, positioning tactics
4. **Small sample**: New horses/drivers volatile until 20+ races
5. **Form cycles**: No recency weighting (all races equal)

## Future Improvements

1. **Time-weighted updates**: Recent races count more
2. **Track condition adjustments**: Wet/dry, heavy/fast
3. **Class/grade modeling**: Separate ratings by race class
4. **Non-parametric methods**: ML on top of Elo features
5. **Multi-objective**: Optimize for both accuracy and calibration