# HarnessElo Data Model

## Overview

The HarnessElo database schema is designed for:
- **Traceability**: Store raw HRNZ JSON for audit trails
- **Normalization**: Separate dimensions (horses, drivers, trainers)
- **Determinism**: Support reproducible rating computation
- **Performance**: Indexes on query patterns

## Entity-Relationship Diagram

```
┌─────────────┐
│  meetings   │
│ (PK: id)    │
└──────┬──────┘
       │ 1:N
       │
       ▼
┌─────────────┐
│   races     │
│ (PK: id)    │
│ FK: meeting │
└──────┬──────┘
       │ 1:N
       │
       ▼
┌─────────────┐        ┌──────────────┐
│  starters   │────────│    horses    │
│ (PK: id)    │  N:1   │  (PK: id)    │
│ FK: race    │        └──────────────┘
│ FK: horse   │
│ FK: driver  │        ┌──────────────┐
│ FK: trainer │────────│   drivers    │
└─────────────┘  N:1   │  (PK: id)    │
       │               └──────────────┘
       │
       │               ┌──────────────┐
       └───────────────│  trainers    │
                 N:1   │  (PK: id)    │
                       └──────────────┘

┌──────────────────────┐
│  rating_snapshots    │
│  (PK: id)            │
│  FK: as_of_race_id   │
│  entity_type + id    │
└──────────────────────┘

┌──────────────────────┐
│ barrier_adjustments  │
│ (PK: id)             │
└──────────────────────┘

┌──────────────────────┐
│ handicap_adjustments │
│ (PK: id)             │
└──────────────────────┘
```

## Tables

### meetings

Harness racing meetings/meets.

| Column        | Type      | Constraints           | Description                    |
|---------------|-----------|----------------------|--------------------------------|
| id            | INTEGER   | PRIMARY KEY          | HRNZ meeting ID                |
| meeting_date  | DATE      | NOT NULL, INDEXED    | Date of meeting                |
| venue         | VARCHAR   | NOT NULL             | Track/venue name               |
| raw_json      | JSONB     | NOT NULL             | Original HRNZ response         |
| created_at    | TIMESTAMP | NOT NULL, DEFAULT NOW| Record creation time           |
| updated_at    | TIMESTAMP | NOT NULL, DEFAULT NOW| Last update time               |

**Indexes:**
- `ix_meetings_meeting_date` on `meeting_date`

**Notes:**
- `id` matches HRNZ `meetingId` (external ID, not auto-increment)
- `raw_json` stores complete API response for traceability

---

### races

Individual races within meetings.

| Column        | Type      | Constraints                    | Description                    |
|---------------|-----------|-------------------------------|--------------------------------|
| id            | INTEGER   | PRIMARY KEY, AUTO INCREMENT   | Internal race ID               |
| meeting_id    | INTEGER   | FK meetings.id, INDEXED       | Parent meeting                 |
| race_number   | INTEGER   | NOT NULL                      | Race number (1, 2, 3, ...)     |
| distance_m    | INTEGER   | NULL                          | Distance in meters             |
| start_type    | VARCHAR   | NULL                          | "mobile" or "standing"         |
| gait          | VARCHAR   | NULL                          | "pace" or "trot"               |
| race_datetime | TIMESTAMP | NULL, INDEXED                 | Scheduled start time           |
| raw_json      | JSONB     | NOT NULL                      | Original HRNZ response         |
| created_at    | TIMESTAMP | NOT NULL, DEFAULT NOW         | Record creation time           |
| updated_at    | TIMESTAMP | NOT NULL, DEFAULT NOW         | Last update time               |

**Indexes:**
- `ix_races_meeting_id` on `meeting_id`
- `ix_races_race_datetime` on `race_datetime`

**Constraints:**
- `uq_meeting_race` UNIQUE(`meeting_id`, `race_number`)

**Notes:**
- `race_datetime` used for chronological ordering in recompute
- Optional fields (`distance_m`, `start_type`, `gait`) may be NULL if not available

---

### horses

Horse dimension table.

| Column     | Type      | Constraints           | Description                    |
|------------|-----------|----------------------|--------------------------------|
| id         | INTEGER   | PRIMARY KEY          | HRNZ horse ID                  |
| name       | VARCHAR   | NOT NULL, INDEXED    | Horse name                     |
| raw_json   | JSONB     | NULL                 | Additional metadata            |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Record creation time           |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Last update time               |

**Indexes:**
- `ix_horses_name` on `name`

**Notes:**
- `id` matches HRNZ `horseId`
- `raw_json` can store additional attributes (breed, color, etc.)

---

### drivers

Driver dimension table.

| Column     | Type      | Constraints           | Description                    |
|------------|-----------|----------------------|--------------------------------|
| id         | INTEGER   | PRIMARY KEY          | HRNZ driver ID                 |
| name       | VARCHAR   | NOT NULL, INDEXED    | Driver name                    |
| raw_json   | JSONB     | NULL                 | Additional metadata            |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Record creation time           |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Last update time               |

**Indexes:**
- `ix_drivers_name` on `name`

**Notes:**
- `id` matches HRNZ `driverId`

---

### trainers

Trainer dimension table.

| Column     | Type      | Constraints           | Description                    |
|------------|-----------|----------------------|--------------------------------|
| id         | INTEGER   | PRIMARY KEY          | HRNZ trainer ID                |
| name       | VARCHAR   | NOT NULL, INDEXED    | Trainer name                   |
| raw_json   | JSONB     | NULL                 | Additional metadata            |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Record creation time           |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW| Last update time               |

**Indexes:**
- `ix_trainers_name` on `name`

**Notes:**
- `id` matches HRNZ `trainerId`

---

### starters

Runners/starters in races.

| Column          | Type      | Constraints                    | Description                    |
|-----------------|-----------|-------------------------------|--------------------------------|
| id              | INTEGER   | PRIMARY KEY, AUTO INCREMENT   | Internal starter ID            |
| race_id         | INTEGER   | FK races.id, INDEXED          | Parent race                    |
| horse_id        | INTEGER   | FK horses.id, INDEXED, NULL   | Horse running                  |
| driver_id       | INTEGER   | FK drivers.id, INDEXED, NULL  | Driver                         |
| trainer_id      | INTEGER   | FK trainers.id, INDEXED, NULL | Trainer                        |
| barrier         | INTEGER   | NULL                          | Starting barrier/gate number   |
| handicap_m      | INTEGER   | NULL                          | Handicap in meters             |
| placing         | INTEGER   | NULL                          | Final placing (1=winner)       |
| did_not_finish  | BOOLEAN   | DEFAULT FALSE                 | DNF flag                       |
| raw_json        | JSONB     | NOT NULL                      | Original runner data           |
| created_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Record creation time           |
| updated_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Last update time               |

**Indexes:**
- `ix_starters_race_id` on `race_id`
- `ix_starters_horse_id` on `horse_id`
- `ix_starters_driver_id` on `driver_id`
- `ix_starters_trainer_id` on `trainer_id`

**Foreign Key Behavior:**
- `race_id`: CASCADE on delete (remove starters when race deleted)
- `horse_id`, `driver_id`, `trainer_id`: SET NULL on delete (preserve race data)

**Notes:**
- `placing` is NULL if race has no results yet or horse DNF
- `did_not_finish` indicates pulled up, disqualified, etc.
- `handicap_m` is back mark distance (positive = further back)

---

### rating_snapshots

Rating snapshots after each race.

| Column          | Type      | Constraints                    | Description                    |
|-----------------|-----------|-------------------------------|--------------------------------|
| id              | INTEGER   | PRIMARY KEY, AUTO INCREMENT   | Internal snapshot ID           |
| entity_type     | ENUM      | NOT NULL, INDEXED             | "horse", "driver", "trainer"   |
| entity_id       | INTEGER   | NOT NULL, INDEXED             | ID in respective table         |
| as_of_race_id   | INTEGER   | FK races.id, INDEXED          | Rating after this race         |
| rating          | FLOAT     | NOT NULL                      | Elo rating                     |
| rd              | FLOAT     | NULL                          | Rating deviation (uncertainty) |
| meta            | JSONB     | NULL                          | Metadata (race_count, etc.)    |
| created_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Record creation time           |

**Indexes:**
- `ix_rating_snapshots_entity_type` on `entity_type`
- `ix_rating_snapshots_entity_id` on `entity_id`
- `ix_rating_snapshots_as_of_race_id` on `as_of_race_id`

**Constraints:**
- `uq_entity_race_snapshot` UNIQUE(`entity_type`, `entity_id`, `as_of_race_id`)

**Foreign Key Behavior:**
- `as_of_race_id`: CASCADE on delete (remove snapshots when race deleted)

**Notes:**
- Stores rating **after** processing the race
- `meta` can store:
  - `race_count`: Number of races entity has participated in
  - `delta`: Rating change from this race
  - `components`: Breakdown of effective rating
- To get rating **before** a race, query for `as_of_race_id < target_race_id` ordered by DESC

---

### barrier_adjustments

Learned adjustments for barrier positions.

| Column          | Type      | Constraints                    | Description                    |
|-----------------|-----------|-------------------------------|--------------------------------|
| id              | INTEGER   | PRIMARY KEY, AUTO INCREMENT   | Internal ID                    |
| venue           | VARCHAR   | NULL                          | Specific venue or NULL=global  |
| start_type      | VARCHAR   | NULL                          | "mobile"/"standing" or NULL    |
| distance_bucket | VARCHAR   | NOT NULL                      | e.g., "<1700", "1700-2000"     |
| barrier         | INTEGER   | NOT NULL                      | Barrier number                 |
| adjustment      | FLOAT     | NOT NULL, DEFAULT 0.0         | Rating points adjustment       |
| sample_count    | INTEGER   | NOT NULL, DEFAULT 0           | Number of observations         |
| created_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Record creation time           |
| updated_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Last update time               |

**Constraints:**
- `uq_barrier_adjustment` UNIQUE(`venue`, `start_type`, `distance_bucket`, `barrier`)

**Notes:**
- `venue = NULL` means global adjustment (all venues)
- `start_type = NULL` means applies to any start type
- Adjustments learned via incremental updates (similar to Elo K)
- Positive adjustment = advantage, negative = disadvantage

---

### handicap_adjustments

Learned adjustments for handicaps (back marks).

| Column          | Type      | Constraints                    | Description                    |
|-----------------|-----------|-------------------------------|--------------------------------|
| id              | INTEGER   | PRIMARY KEY, AUTO INCREMENT   | Internal ID                    |
| venue           | VARCHAR   | NULL                          | Specific venue or NULL=global  |
| start_type      | VARCHAR   | NULL                          | "mobile"/"standing" or NULL    |
| distance_bucket | VARCHAR   | NOT NULL                      | e.g., "<1700", "1700-2000"     |
| handicap_m      | INTEGER   | NOT NULL                      | Handicap in meters             |
| adjustment      | FLOAT     | NOT NULL, DEFAULT 0.0         | Rating points adjustment       |
| sample_count    | INTEGER   | NOT NULL, DEFAULT 0           | Number of observations         |
| created_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Record creation time           |
| updated_at      | TIMESTAMP | NOT NULL, DEFAULT NOW         | Last update time               |

**Constraints:**
- `uq_handicap_adjustment` UNIQUE(`venue`, `start_type`, `distance_bucket`, `handicap_m`)

**Notes:**
- Handicaps are typically in 10m increments (0, 10, 20, ...)
- Larger handicaps (further back) should have negative adjustments

---

## Query Patterns

### Get Latest Rating for Entity
```sql
SELECT *
FROM rating_snapshots
WHERE entity_type = 'horse'
  AND entity_id = 12345
ORDER BY as_of_race_id DESC
LIMIT 1;
```

### Get Top 100 Horses by Rating
```sql
WITH latest_ratings AS (
  SELECT DISTINCT ON (entity_id)
    entity_id,
    rating,
    as_of_race_id
  FROM rating_snapshots
  WHERE entity_type = 'horse'
  ORDER BY entity_id, as_of_race_id DESC
)
SELECT
  h.name,
  lr.rating
FROM latest_ratings lr
JOIN horses h ON h.id = lr.entity_id
ORDER BY lr.rating DESC
LIMIT 100;
```

### Get All Races in Date Range (Chronological)
```sql
SELECT
  r.*,
  m.meeting_date,
  m.venue
FROM races r
JOIN meetings m ON m.id = r.meeting_id
WHERE m.meeting_date BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY m.meeting_date, r.race_number;
```

### Get Starters with Full Details
```sql
SELECT
  s.*,
  h.name AS horse_name,
  d.name AS driver_name,
  t.name AS trainer_name
FROM starters s
LEFT JOIN horses h ON h.id = s.horse_id
LEFT JOIN drivers d ON d.id = s.driver_id
LEFT JOIN trainers t ON t.id = s.trainer_id
WHERE s.race_id = 12345;
```

---

## Migration Management

### Alembic
- Migrations in `alembic/versions/`
- Auto-generate: `alembic revision --autogenerate -m "description"`
- Apply: `alembic upgrade head`
- Rollback: `alembic downgrade -1`

### Initial Schema
- Version: `001_initial`
- Creates all tables with indexes and constraints
- Includes ENUM type for `entity_type`

---

## Data Retention

### Current Policy
- Keep all data indefinitely (for deterministic recompute)
- No automatic cleanup

### Future Considerations
- Archive old meetings/races to separate storage
- Keep rating snapshots compressed
- Implement soft deletes for dimension tables

---

## Backup and Recovery

### Recommended Strategy
- Daily full backups
- Continuous WAL archiving for point-in-time recovery
- Test restore procedures regularly

### Critical Tables
1. `meetings`, `races`, `starters` (source data)
2. `horses`, `drivers`, `trainers` (dimensions)
3. `rating_snapshots` (can be recomputed but expensive)
