# Database Schema Documentation

## Entity Relationship Diagram

```mermaid
erDiagram
    MEETINGS ||--o{ RACES : contains
    RACES ||--o{ RUNNERS : has
    RACES ||--o{ SCRAPES : tracked_by
    RACES ||--o{ RESULTS : produces
    RACES ||--o{ DIVIDENDS : pays
    HORSES ||--o{ RUNNERS : participates_as
    RUNNERS ||--o{ RESULTS : finishes_in
    RUNNERS ||--o{ ODDS_SNAPSHOTS : has
    JOB_RUNS ||--o{ SCRAPES : executes
    JOB_RUNS ||--o{ JOB_RUNS : retries

    MEETINGS {
        uuid id PK
        string name
        date date
        string country
        string state
        string category
        string categoryName
        string trackCondition
        string weather
        json videoChannels
        int[] quaddie
        int[] earlyQuaddie
        int toteMeetingNumber
        string toteStatus
        json metadata
        timestamptz createdAt
        timestamptz updatedAt
    }

    RACES {
        uuid id PK
        uuid meetingId FK
        int raceNumber
        string name
        timestamptz startTime
        time toteStartTime
        int distance
        string trackCondition
        string weather
        string status
        string country
        string state
        json metadata
        timestamptz createdAt
        timestamptz updatedAt
    }

    HORSES {
        uuid id PK
        uuid tabEntrantId "TAB primary ID"
        int tabHorseId "TAB secondary ID"
        bigint harnessNzHorseId "Harness NZ ID"
        string name "Original name"
        string normalizedName "For matching"
        string sire
        string dam
        string damSire
        string sex
        string colour
        string country
        timestamptz firstSeen
        timestamptz lastSeen
        int raceCount "Unique races"
    }

    RUNNERS {
        uuid id PK
        uuid raceId FK
        int runnerNumber
        uuid entrantId "Legacy TAB ID"
        string horseName "Display name"
        uuid tabHorseIdLegacy "Legacy field"
        uuid horseId FK "Link to horses"
        int barrier
        decimal weight
        string jockeyName
        uuid jockeyId
        decimal jockeyWeightAllowance
        string trainerName
        uuid trainerId
        string trainerLocation
        string form
        json lastStarts
        string lastTwentyStarts
        int age
        string silkColours
        boolean scratched
        timestamptz scratchedAt
        decimal openingOdds
        decimal currentOdds
        json metadata
        timestamptz createdAt
        timestamptz updatedAt
    }

    RESULTS {
        uuid id PK
        uuid raceId FK
        uuid runnerId FK
        int finishPosition
        decimal margin
        decimal raceTime
        decimal winDividend
        decimal placeDividend
        boolean official
        boolean disqualified
        json metadata
        timestamptz createdAt
    }

    DIVIDENDS {
        uuid id PK
        uuid raceId FK
        string tote "VIC, NSW, SUP"
        string productName "Win, Place, etc"
        string status "final, interim"
        decimal dividend
        decimal poolSize
        decimal jackpotSize
        json positions "runner positions"
        string description
        timestamptz createdAt
    }

    ODDS_SNAPSHOTS {
        uuid id PK
        uuid runnerId FK
        string snapshotType "morning, t60, t15, final"
        decimal fixedWin
        decimal fixedPlace
        decimal poolWin
        decimal poolPlace
        timestamptz capturedAt
    }

    SCRAPES {
        uuid id PK
        uuid raceId FK
        string scrapeType
        timestamptz scrapedAt
        json changesDetected
        boolean success
        string errorMessage
        uuid jobRunId FK
    }

    JOB_RUNS {
        uuid id PK
        string jobType
        string status
        timestamptz startedAt
        timestamptz completedAt
        int durationMs
        int itemsProcessed
        int itemsFailed
        string errorMessage
        json metadata
        int retryCount
        uuid parentJobRunId FK
    }
```

## Table Descriptions

### meetings
**Purpose:** Represents a race meeting at a specific venue on a specific date.

**Key Characteristics:**
- One meeting = one venue + one date
- A meeting contains multiple races (typically 8-12)
- Tracks aggregate information like quaddie race numbers

**Example Record:**
```json
{
  "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "name": "Ellerslie",
  "date": "2026-01-16",
  "country": "NZ",
  "state": null,
  "category": "T",
  "categoryName": "Thoroughbred",
  "trackCondition": "Good 3",
  "quaddie": [6, 7, 8, 9]
}
```

**Indexes:**
- `(date, country, category)` - Efficient date-based queries

### races
**Purpose:** Individual race within a meeting.

**Key Characteristics:**
- Unique per meeting + race number combination
- Status progresses: Open → Interim → Final
- Links to runners, results, and dividends

**Example Record:**
```json
{
  "id": "a7b8c9d0-1234-5678-90ab-cdef12345678",
  "meetingId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "raceNumber": 7,
  "name": "Cambridge Stud Zabeel Classic",
  "startTime": "2026-01-16T15:35:00+13:00",
  "distance": 2000,
  "status": "Final"
}
```

**Indexes:**
- `meetingId` - Find all races in a meeting
- `startTime` - Time-based queries (upcoming races)
- `status` - Filter by race state

**Unique Constraints:**
- `(meetingId, raceNumber)` - One race number per meeting

### horses
**Purpose:** Master table for horse records. Single source of truth for each horse.

**Key Characteristics:**
- Normalized storage (no duplication across races)
- Supports multiple external ID systems
- Tracks career statistics (race count)
- Name stored in two forms: original + normalized

**Example Record:**
```json
{
  "id": "17f92c27-c7ea-4e4a-9626-fecdfd4640ba",
  "tabEntrantId": "e3aa798c-2b5f-4c3d-9a1e-7f8c9d0e1f2a",
  "tabHorseId": 12345,
  "harnessNzHorseId": null,
  "name": "Hypnotize Me",
  "normalizedName": "hypnotize me",
  "sire": "Zacinto",
  "dam": "Mesmerize",
  "damSire": "Thorn Park",
  "sex": "Mare",
  "colour": "Bay",
  "country": "NZ",
  "raceCount": 2,
  "firstSeen": "2026-01-15T06:00:00+13:00",
  "lastSeen": "2026-01-16T06:00:00+13:00"
}
```

**Indexes:**
- `tabEntrantId` (unique) - TAB primary identifier lookup
- `tabHorseId` (unique) - TAB secondary identifier lookup
- `harnessNzHorseId` (unique) - Harness Racing NZ lookup
- `normalizedName` - Fuzzy name matching

**Design Notes:**
- `tabEntrantId` is most reliable for matching (UUID from TAB)
- `tabHorseId` is numeric ID also from TAB (secondary)
- `harnessNzHorseId` supports up to 64-bit integers for Harness Racing NZ imports
- `raceCount` increments only for unique races (not re-scrapes)
- `normalizedName` is lowercase with apostrophes removed for matching

### runners
**Purpose:** A horse's entry in a specific race.

**Key Characteristics:**
- Links horse to race
- Contains race-specific information (barrier, jockey, trainer)
- Tracks odds and scratched status
- Unique per race + runner number

**Example Record:**
```json
{
  "id": "c78a67a5-2e6e-4b5f-b076-1d74a4d5631b",
  "raceId": "a7b8c9d0-1234-5678-90ab-cdef12345678",
  "horseId": "17f92c27-c7ea-4e4a-9626-fecdfd4640ba",
  "runnerNumber": 5,
  "horseName": "Hypnotize Me",
  "barrier": 5,
  "jockeyName": "J. McDonald",
  "trainerName": "M. Smith",
  "scratched": false,
  "currentOdds": 4.50
}
```

**Indexes:**
- `raceId` - Find all runners in a race
- `horseId` - Find all races for a horse
- `entrantId` - Legacy TAB lookups

**Unique Constraints:**
- `(raceId, runnerNumber)` - One runner per number per race

### results
**Purpose:** Race outcomes and finish positions.

**Key Characteristics:**
- One record per runner per race (after race completion)
- Links to both race and runner
- Includes margin in lengths, race time
- Tracks official vs provisional status

**Example Record:**
```json
{
  "id": "d1e2f3a4-5678-90ab-cdef-123456789012",
  "raceId": "a7b8c9d0-1234-5678-90ab-cdef12345678",
  "runnerId": "c78a67a5-2e6e-4b5f-b076-1d74a4d5631b",
  "finishPosition": 1,
  "margin": 0.0,
  "raceTime": 121.45,
  "winDividend": 4.50,
  "official": true
}
```

**Unique Constraints:**
- `(raceId, runnerId)` - One result per runner per race

### dividends
**Purpose:** Betting dividend payouts for various bet types.

**Key Characteristics:**
- Multiple dividends per race (Win, Place, Quinella, etc.)
- Varies by tote (VIC, NSW, SUP)
- Includes pool size and jackpot information

**Example Record:**
```json
{
  "id": "e2f3a4b5-6789-01cd-ef12-345678901234",
  "raceId": "a7b8c9d0-1234-5678-90ab-cdef12345678",
  "tote": "VIC",
  "productName": "Win",
  "status": "final",
  "dividend": 4.50,
  "poolSize": 125430.00,
  "positions": [{"runner_number": 5, "position": 1}]
}
```

**Unique Constraints:**
- `(raceId, tote, productName)` - One dividend per product per tote per race

### odds_snapshots
**Purpose:** Time-series capture of odds at specific intervals.

**Key Characteristics:**
- Multiple snapshots per runner (morning, t60, t15, final)
- Tracks both fixed and pool (tote) odds
- Enables odds movement analysis

**Example Records:**
```json
[
  {
    "runnerId": "c78a67a5-2e6e-4b5f-b076-1d74a4d5631b",
    "snapshotType": "morning",
    "fixedWin": 5.00,
    "poolWin": 4.80,
    "capturedAt": "2026-01-16T06:00:00+13:00"
  },
  {
    "runnerId": "c78a67a5-2e6e-4b5f-b076-1d74a4d5631b",
    "snapshotType": "t15",
    "fixedWin": 4.50,
    "poolWin": 4.60,
    "capturedAt": "2026-01-16T15:20:00+13:00"
  }
]
```

**Unique Constraints:**
- `(runnerId, snapshotType)` - One snapshot per type per runner

**Indexes:**
- `runnerId` - Find all snapshots for a runner
- `(snapshotType, capturedAt)` - Time-series queries

### scrapes
**Purpose:** Audit trail of data collection operations.

**Key Characteristics:**
- One record per race per scrape execution
- Tracks detected changes
- Links to job run for operational context

**Example Record:**
```json
{
  "id": "f3a4b5c6-789a-bcde-f012-3456789abcde",
  "raceId": "a7b8c9d0-1234-5678-90ab-cdef12345678",
  "scrapeType": "morning_scrape",
  "scrapedAt": "2026-01-16T06:15:23+13:00",
  "success": true,
  "changesDetected": {
    "runners_added": 0,
    "scratched": 1,
    "odds_changed": 12
  }
}
```

**Indexes:**
- `(raceId, scrapeType, scrapedAt)` - Race scrape history
- `jobRunId` - Link to batch execution

### job_runs
**Purpose:** Tracks batch processing jobs (scrape runs, data operations).

**Key Characteristics:**
- Parent-child relationship for retries
- Captures execution statistics
- Enables performance monitoring

**Example Record:**
```json
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "jobType": "morning_scrape",
  "status": "completed",
  "startedAt": "2026-01-16T06:00:00+13:00",
  "completedAt": "2026-01-16T06:12:34+13:00",
  "durationMs": 754000,
  "itemsProcessed": 85,
  "itemsFailed": 0
}
```

**Indexes:**
- `(jobType, startedAt)` - Job history by type
- `status` - Active/failed job monitoring
- `startedAt` - Chronological queries

## Data Flow Through Schema

```mermaid
graph TD
    Start[TAB API Response] --> Meeting[Upsert Meeting]
    Meeting --> Race[Upsert Race]
    Race --> JobRun[Create Job Run]

    Race --> ProcessRunners{For Each Runner}
    ProcessRunners --> HorseMatch[Horse Matching]

    HorseMatch --> CheckEntrant{tabEntrantId<br/>exists?}
    CheckEntrant -->|Yes| FindHorse1[Find by tabEntrantId]
    CheckEntrant -->|No| CheckHorseId{tabHorseId<br/>exists?}

    CheckHorseId -->|Yes| FindHorse2[Find by tabHorseId]
    CheckHorseId -->|No| CheckHarnessId{harnessNzHorseId<br/>exists?}

    CheckHarnessId -->|Yes| FindHorse3[Find by harnessNzHorseId]
    CheckHarnessId -->|No| CheckBreeding{Has sire<br/>AND dam?}

    CheckBreeding -->|Yes| FindHorse4[Find by breeding]
    CheckBreeding -->|No| FindHorse5[Find by name only]

    FindHorse1 --> HorseFound{Horse<br/>found?}
    FindHorse2 --> HorseFound
    FindHorse3 --> HorseFound
    FindHorse4 --> HorseFound
    FindHorse5 --> HorseFound

    HorseFound -->|Yes| UseHorse[Use Existing Horse]
    HorseFound -->|No| CreateHorse[Create New Horse]

    UseHorse --> UpdateHorse[Update lastSeen]
    CreateHorse --> UpdateHorse
    UpdateHorse --> IncrementRace{New race<br/>for horse?}

    IncrementRace -->|Yes| IncCount[Increment raceCount]
    IncrementRace -->|No| SkipCount[Skip increment]

    IncCount --> UpsertRunner[Upsert Runner]
    SkipCount --> UpsertRunner

    UpsertRunner --> SnapshotType{Scrape<br/>type?}

    SnapshotType -->|Morning| CreateMorning[Create morning snapshot]
    SnapshotType -->|T-60| CreateT60[Create t60 snapshot]
    SnapshotType -->|T-15| CreateT15[Create t15 snapshot]
    SnapshotType -->|Results| CreateResults[Create result record]

    CreateMorning --> CreateScrape[Create Scrape Record]
    CreateT60 --> CreateScrape
    CreateT15 --> CreateScrape
    CreateResults --> CreateDividends{Dividends<br/>available?}

    CreateDividends -->|Yes| StoreDividends[Store Dividends]
    CreateDividends -->|No| CreateScrape
    StoreDividends --> CreateScrape

    CreateScrape --> Complete[Complete]

    style Start fill:#e1f5ff
    style HorseMatch fill:#fff3e0
    style CreateHorse fill:#c8e6c9
    style UseHorse fill:#c8e6c9
    style Complete fill:#f1f8e9
```

## Relationship Cardinalities

```mermaid
graph LR
    Meeting["MEETING<br/>(1)"]
    Race["RACE<br/>(many)"]
    Horse["HORSE<br/>(1)"]
    Runner["RUNNER<br/>(many)"]
    Result["RESULT<br/>(1)"]
    Odds["ODDS_SNAPSHOT<br/>(many)"]
    Dividend["DIVIDEND<br/>(many)"]
    Scrape["SCRAPE<br/>(many)"]

    Meeting -->|"1:N"| Race
    Horse -->|"1:N"| Runner
    Race -->|"1:N"| Runner
    Runner -->|"1:1"| Result
    Runner -->|"1:N"| Odds
    Race -->|"1:N"| Result
    Race -->|"1:N"| Dividend
    Race -->|"1:N"| Scrape

    style Meeting fill:#ffeb3b
    style Race fill:#ff9800
    style Horse fill:#4caf50
    style Runner fill:#2196f3
    style Result fill:#9c27b0
    style Odds fill:#00bcd4
    style Dividend fill:#e91e63
    style Scrape fill:#795548
```

## Query Patterns

### Find all races for a horse
```sql
SELECT
  r.raceNumber,
  m.name AS meeting,
  m.date,
  ru.runnerNumber,
  ru.barrier,
  res.finishPosition
FROM horses h
JOIN runners ru ON ru.horseId = h.id
JOIN races r ON r.id = ru.raceId
JOIN meetings m ON m.id = r.meetingId
LEFT JOIN results res ON res.runnerId = ru.id
WHERE h.name = 'Hypnotize Me'
ORDER BY m.date DESC, r.startTime DESC;
```

### Find horses racing multiple times this weekend
```sql
SELECT
  h.name,
  h.raceCount,
  COUNT(ru.id) AS this_weekend
FROM horses h
JOIN runners ru ON ru.horseId = h.id
JOIN races r ON r.id = ru.raceId
JOIN meetings m ON m.id = r.meetingId
WHERE m.date BETWEEN '2026-01-17' AND '2026-01-19'
GROUP BY h.id, h.name, h.raceCount
HAVING COUNT(ru.id) > 1
ORDER BY this_weekend DESC;
```

### Analyze odds movements for a race
```sql
SELECT
  ru.runnerNumber,
  ru.horseName,
  os.snapshotType,
  os.fixedWin,
  os.poolWin,
  os.capturedAt
FROM races r
JOIN runners ru ON ru.raceId = r.id
JOIN odds_snapshots os ON os.runnerId = ru.id
WHERE r.id = 'a7b8c9d0-1234-5678-90ab-cdef12345678'
ORDER BY ru.runnerNumber, os.capturedAt;
```

### Check scrape coverage for today
```sql
SELECT
  m.name,
  r.raceNumber,
  r.startTime,
  STRING_AGG(DISTINCT s.scrapeType, ', ') AS scrapes_completed
FROM meetings m
JOIN races r ON r.meetingId = m.id
LEFT JOIN scrapes s ON s.raceId = r.id
WHERE m.date = CURRENT_DATE
GROUP BY m.id, m.name, r.id, r.raceNumber, r.startTime
ORDER BY r.startTime;
```
