# Racing Data Pipeline Documentation

Complete documentation for the TAB horse racing data ingestion system.

## 📚 Documentation Index

### 1. [Data Pipeline Overview](./data-pipeline-overview.md)
**Start here** for a high-level understanding of the entire system.

**Contents:**
- System architecture diagrams
- Data flow sequencing
- Component descriptions (Scheduler, API Client, Services, Database)
- Data types processed (Meetings, Races, Horses, Runners, Results)
- Deployment architecture
- Key design decisions
- Monitoring and observability
- Performance characteristics

**Best for:** New team members, stakeholders, system architects

---

### 2. [Database Schema](./database-schema.md)
Comprehensive reference for all database tables and relationships.

**Contents:**
- Entity Relationship Diagram (ERD)
- Complete table descriptions with examples
- Field explanations and data types
- Index strategy
- Unique constraints
- Relationship cardinalities
- Common query patterns

**Best for:** Database administrators, developers writing queries, data analysts

---

### 3. [Horse Matching Algorithm](./horse-matching-algorithm.md)
Deep dive into the 6-priority horse matching and deduplication strategy.

**Contents:**
- Problem statement (why matching is needed)
- 6-priority matching flow diagrams
- Detailed explanation of each priority level
- Name normalization algorithm
- Race count tracking logic
- Edge case handling
- Performance considerations
- Testing strategy

**Best for:** Developers working on data ingestion, quality assurance engineers

---

---

## 🎯 Quick Navigation by Role

### For Developers
1. Start: [Data Pipeline Overview](./data-pipeline-overview.md) - Understand the system
2. Deep dive: [Horse Matching Algorithm](./horse-matching-algorithm.md) - Learn matching logic
3. Reference: [Database Schema](./database-schema.md) - Query patterns and tables

### For Database Administrators
1. Start: [Database Schema](./database-schema.md) - Tables and relationships
2. Context: [Data Pipeline Overview](./data-pipeline-overview.md) - How data flows

### For Operations Engineers
1. Start: [Data Pipeline Overview](./data-pipeline-overview.md) - System components
2. Reference: [Database Schema](./database-schema.md) - Data verification queries

### For Data Analysts
1. Start: [Database Schema](./database-schema.md) - Available data and examples
2. Context: [Data Pipeline Overview](./data-pipeline-overview.md) - Data collection schedule
3. Reference: [Horse Matching Algorithm](./horse-matching-algorithm.md) - Understanding horse deduplication

---

## 🔍 Find Information Quickly

### "How do I...?"

**Query all races for a specific horse?**
→ [Database Schema](./database-schema.md#query-patterns) (scroll to "Find all races for a horse")

**Understand why a horse wasn't matched correctly?**
→ [Horse Matching Algorithm](./horse-matching-algorithm.md#priority-descriptions)

**See what data is collected at each scrape?**
→ [Data Pipeline Overview](./data-pipeline-overview.md#data-flow-overview)

**Understand the database relationships?**
→ [Database Schema](./database-schema.md#entity-relationship-diagram)

---

## 📊 Key Diagrams

### System Architecture
See: [Data Pipeline Overview § Architecture Diagram](./data-pipeline-overview.md#architecture-diagram)

```mermaid
graph TB
    TAB[TAB API] -->|HTTP/JSON| RaceService[Race Service]
    RaceService -->|Matches| HorseMatching[Horse Matching]
    HorseMatching -->|Creates/Links| PostgreSQL[(Database)]
```

### Horse Matching Flow
See: [Horse Matching Algorithm § 6-Priority Strategy](./horse-matching-algorithm.md#solution-6-priority-matching-strategy)

```mermaid
flowchart TD
    Start([New Runner]) --> P1{TAB Entrant ID?}
    P1 -->|Match| Use1[Use Horse]
    P1 -->|No Match| P2{TAB Horse ID?}
    P2 -->|Match| Use2[Use Horse]
    P2 -->|No Match| P3{Harness NZ ID?}
    P3 -->|Match| Use3[Use Horse]
    P3 -->|No Match| P4{Breeding Match?}
    P4 -->|Match| Use4[Use Horse]
    P4 -->|No Match| P5{Name Match?}
    P5 -->|Match| Use5[Use Horse + Warn]
    P5 -->|No Match| Create[Create New Horse]
```

### Database ERD
See: [Database Schema § Entity Relationship Diagram](./database-schema.md#entity-relationship-diagram)

```mermaid
erDiagram
    MEETINGS ||--o{ RACES : contains
    RACES ||--o{ RUNNERS : has
    HORSES ||--o{ RUNNERS : participates_as
    RUNNERS ||--o{ RESULTS : finishes_in
    RUNNERS ||--o{ ODDS_SNAPSHOTS : has
```

---

## 📈 System Metrics

**Database:**
- Total runners: 1,795
- Unique horses: 1,765
- Deduplication rate: 1.67%
- Horses running multiple races: 10

**Performance:**
- Morning scrape: 8-12 minutes (~85 races)
- T-60/T-15 scrapes: 5-8 minutes
- Database growth: ~15 MB/day

**Coverage:**
- TAB Entrant ID: 100%
- TAB Horse ID: 0% (not provided by current API)
- Harness NZ ID: 0% (ready for future imports)

---

## 🚀 Getting Started

### Prerequisites
- Docker and Docker Compose installed
- PostgreSQL 16 (included in docker-compose)
- Node.js 18+ (for local development)
- TAB API access (key not required for public endpoints)

### Quick Start
```bash
# Clone repository
git clone <repository-url>
cd tab-api-ingest

# Start services
docker compose up -d

# Check status
docker logs racing-app --tail 20

# Access dashboards
open http://localhost:3000    # Grafana (admin/admin)
open http://localhost:9091    # Prometheus
open http://localhost:16686   # Jaeger (tracing)
```

### Verify Installation
```bash
# Run tests
npm test

# Check API health
curl http://localhost:9090/health
```

---

## 🛠️ Maintenance Commands

### Database Commands
```bash
# Generate Prisma client after schema changes
npm run prisma:generate

# Create new migration
npm run prisma:migrate

# Open Prisma Studio (database GUI)
npm run prisma:studio
```

### Development Commands
```bash
# Run in development mode (with hot reload)
npm run dev

# Run tests
npm test

# Run specific test file
npm test -- horse-matching

# Build for production
npm run build
npm start
```

---

## 🐛 Troubleshooting

### Common Issues

**1. Application won't start**
```bash
# Check if database is ready
docker logs racing-postgres

# Regenerate Prisma client
docker exec racing-app npx prisma generate

# Restart application
docker compose restart app
```

**2. Missing horses.sex column error**
```bash
# This means Prisma client is out of date
docker exec racing-app npx prisma generate
docker compose restart app
```

### Health Checks

```bash
# Application health
curl http://localhost:9090/health

# Database health
docker exec racing-postgres pg_isready

# Redis health
docker exec racing-redis redis-cli ping

# Check logs
docker logs racing-app --tail 100 -f
```

---

## 📞 Support

### Resources
- **Source Code:** [GitHub Repository]
- **Issue Tracker:** [GitHub Issues]
- **API Documentation:** https://api.tab.co.nz/docs

### Team Contacts
- **System Owner:** Tom
- **Database Admin:** [Contact]
- **DevOps:** [Contact]

---

## 📝 Contributing

When making changes:

1. **Update documentation** - Keep these docs in sync with code
2. **Write tests** - Follow TDD approach (17/17 tests passing)
3. **Test schema changes** - Test against a development database first
4. **Update diagrams** - Keep Mermaid diagrams current
5. **Document decisions** - Add to relevant doc sections

### Documentation Structure
```
docs/
├── README.md                      # This file (index)
├── data-pipeline-overview.md      # System architecture
├── database-schema.md             # Database reference
└── horse-matching-algorithm.md    # Matching logic
```

---

## 🎯 System Capabilities

### Horse Data Management
- Horses table with multi-source ID support (TAB Entrant ID, TAB Horse ID, Harness NZ ID)
- 6-priority matching algorithm for deduplication
- Normalized storage preventing duplicate horse records
- Race count tracking and career statistics

### Data Ingestion
- TAB API integration with automatic retry logic
- PostgreSQL database for persistent storage
- Scheduled scraping (morning, T-60, T-15, results)
- Comprehensive documentation

### Observability
- Prometheus metrics collection
- Grafana dashboards
- Jaeger distributed tracing
- Health check endpoints

---

## 📖 Glossary

**Meeting:** A race day at a specific venue (e.g., Ellerslie on 2026-01-16)

**Race:** An individual race within a meeting (e.g., Race 7)

**Runner:** A horse's entry in a specific race

**Horse (Master Table):** Normalized record representing the actual horse across all races

**Scrape:** Data collection operation at a specific time

**Snapshot:** Point-in-time odds capture (morning, T-60, T-15, final)

**TAB:** New Zealand's official betting operator (Totalisator Agency Board)

**Thoroughbred (T):** Flat racing category

**Harness (H):** Standardbred/harness racing category

**Greyhound (G):** Greyhound racing category

**Normalized Name:** Lowercase name with apostrophes removed (for matching)

**Deduplication:** Identifying that multiple runner records refer to the same horse
