# HarnessElo Architecture

## Overview

HarnessElo is a harness racing ratings system that computes advanced multi-runner Elo-style ratings for horses, drivers, and trainers. The system ingests race data from the HRNZ API, stores it in PostgreSQL, computes ratings deterministically, and exposes results via a REST API.

## System Components

```
┌─────────────────────────────────────────────────────────────┐
│                        HRNZ API                             │
│              (Harness Racing New Zealand)                   │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          │ HTTP + Auth
                          ▼
                ┌─────────────────────┐
                │   HRNZ Client       │
                │  - Authentication   │
                │  - Retry Logic      │
                │  - Rate Limiting    │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │  Ingestion Service  │
                │  - Fetch Meetings   │
                │  - Fetch Races      │
                │  - Store Starters   │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │    PostgreSQL       │
                │  - Meetings         │
                │  - Races            │
                │  - Starters         │
                │  - Dimensions       │
                │  - Ratings          │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │  Rating Engine      │
                │  - Multi-runner Elo │
                │  - Pairwise Logic   │
                │  - Adjustments      │
                └──────────┬──────────┘
                           │
                           ▼
                ┌─────────────────────┐
                │  Rating Snapshots   │
                │  - Horse Ratings    │
                │  - Driver Ratings   │
                │  - Trainer Ratings  │
                └──────────┬──────────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
    ┌──────────┐   ┌────────────┐   ┌──────────┐
    │   API    │   │  Worker    │   │ Evaluate │
    │ FastAPI  │   │    CLI     │   │  Script  │
    └──────────┘   └────────────┘   └──────────┘
```

## Data Flow

### Ingestion Flow

1. **Fetch Meetings**: Query HRNZ API for meetings in date range
2. **Authenticate**: Get HR key using Basic Auth, cache with TTL
3. **Fetch Races**: For each meeting, fetch individual race details
4. **Extract Starters**: Parse runners/starters from race data
5. **Upsert Dimensions**: Insert/update horses, drivers, trainers
6. **Store Race Data**: Save meetings, races, starters with raw JSON
7. **Commit**: Transactional commits per race for safety

### Rating Computation Flow

1. **Load Races**: Query races chronologically (by meeting date, race number)
2. **For Each Race**:
   - Load starters with horse/driver/trainer relationships
   - Load or initialize ratings for all entities
   - Compute effective ratings: R_eff = R_horse + α*R_driver + β*R_trainer + adjustments
   - Calculate pairwise expected outcomes using sigmoid
   - Compute rating deltas based on actual vs expected
   - Update ratings for horse, driver, trainer proportionally
   - Save rating snapshots to database
3. **Commit**: Periodic commits for performance

### Prediction Flow

1. **Load Ratings**: Fetch latest rating snapshots before the race.
2. **Compute Effective Ratings**: Same formula as the rating engine.
3. **Win Probabilities**: Softmax over effective ratings.
4. **Place Probabilities**: Place scores use recent top-3 rate and consistency.
5. **Return**: API returns win/place probabilities plus place rankings.

### API Request Flow

1. **Client Request**: HTTP request to API endpoint
2. **Authentication**: Verify admin token for protected endpoints
3. **Database Query**: Query ratings, races, or starters
4. **Response Formatting**: Convert to Pydantic models
5. **JSON Response**: Return formatted data

## Package Structure

### `packages/core/common`
Shared utilities and configuration:
- **settings.py**: Pydantic settings from environment
- **logging.py**: Structured JSON logging
- **utils.py**: Date parsing, distance bucketing, safe dict access

### `packages/hrnz_scraper`
HRNZ API integration:
- **scraper.py**: HTML scraping with parsing helpers
- **mapper.py**: Normalizes scraped results

### `packages/tab_client`
TAB API integration:
- **client.py**: Async HTTP client with auth, retries, rate limiting

### `packages/core/storage`
Database layer:
- **models.py**: SQLAlchemy ORM models
- **database.py**: Session management
- **repositories.py**: Idempotent CRUD operations
- **ingestion.py**: Orchestrates data fetching and storage

### `packages/core/ratings`
Rating computation:
- **engine.py**: Multi-runner Elo implementation
- **recompute.py**: Batch and incremental rating computation
- **predictions.py**: Win/place probabilities with consistency signal

### `apps/backend/api`
REST API service:
- **main.py**: FastAPI application with endpoints

### `apps/backend/worker`
CLI for background tasks:
- **cli.py**: Click-based CLI for ingest/recompute/info

### `scripts`
Standalone scripts:
- **evaluate_accuracy.py**: Compute accuracy and calibration metrics
- **sweep_param.sh**: Sweep a single tuning parameter
- **sweep_grid.sh**: Grid search across multiple parameters
- **auto_sweep.sh**: Batch sweeps with logging

## Database Schema

See [data_model.md](./data_model.md) for detailed schema.

### Key Tables
- **meetings**: Race meetings (venue, date)
- **races**: Individual races (distance, start type)
- **starters**: Runners in races (barrier, placing)
- **horses/drivers/trainers**: Dimension tables
- **rating_snapshots**: Ratings after each race
- **barrier_adjustments/handicap_adjustments**: Learned condition effects

## Rating Algorithm

See [rating_math.md](./rating_math.md) for detailed formulas.

### Core Approach
- **Multi-runner Elo**: Pairwise logistic model
- **Effective Rating**: Combines horse, driver, trainer
- **Deterministic**: Same input → same output
- **Traceable**: All snapshots stored

## Configuration

All configuration via environment variables (`.env`):

### HRNZ API
- `HRNZ_USERNAME`, `HRNZ_PASSWORD`
- `HRNZ_BASE_URL`
- `HRNZ_KEY_HEADER`

### Database
- `DATABASE_URL`
- `DATABASE_POOL_SIZE`

### Rating Parameters
- `ELO_SCALE_C`: Logistic scale (default 400.0)
- `ELO_K_BASE`: Update rate (default 24.0)
- `DRIVER_WEIGHT_ALPHA`: Driver weight (default 0.35)
- `TRAINER_WEIGHT_BETA`: Trainer weight (default 0.15)
- `ADJ_LEARNING_RATE`: Adjustment learning rate (default 0.5)

### Feature Flags
- `ENABLE_DRIVER`: Include driver ratings
- `ENABLE_TRAINER`: Include trainer ratings
- `ENABLE_ADJUSTMENTS`: Learn barrier/handicap effects
- `ENABLE_RD`: Track rating deviation

## Deployment

### Docker Compose (Development)
```bash
docker compose up -d
docker compose run --rm worker alembic upgrade head
docker compose run --rm worker python -m apps.worker.cli ingest --date 2024-01-01
docker compose run --rm worker python -m apps.worker.cli recompute --from 2024-01-01 --to 2024-01-31
```

### Production Considerations
- Use managed PostgreSQL (e.g., RDS, Cloud SQL)
- Run worker as cron jobs or scheduled tasks
- Set up proper secrets management (not .env files)
- Configure logging aggregation
- Add monitoring/alerting
- Set up API rate limiting
- Enable CORS for web clients
- Use HTTPS with proper certificates

## Security

### API Authentication
- Admin endpoints protected by bearer token
- Token configured via `API_ADMIN_TOKEN`
- **Production**: Use strong random tokens, rotate regularly

### Database
- Connection pooling with limits
- Prepared statements (SQLAlchemy ORM)
- No raw SQL injection vectors

### HRNZ API
- Credentials in environment, never in code
- HR key cached securely with TTL
- Rate limiting to respect API terms

## Performance Optimization

### Database
- Indexes on foreign keys and query columns
- Composite unique constraints for upserts
- Batch commits during recompute (every 50 races)
- Connection pooling

### Rating Computation
- In-memory state during computation
- Minimal database queries (bulk loads)
- Incremental updates possible (future)

### API
- Pagination on list endpoints
- Limit max results (500)
- Eager loading with joinedload to avoid N+1

## Monitoring and Observability

### Logging
- Structured JSON logs
- Contextual fields (meeting_id, race_id)
- Log levels: INFO for operations, DEBUG for details

### Metrics (Future)
- Ingestion rate and errors
- Rating computation time
- API request latency
- Database query performance

## Testing Strategy

### Unit Tests
- Rating engine logic (pairwise calculations)
- HRNZ client auth and retries
- Utility functions

### Integration Tests
- Database operations
- End-to-end ingestion
- Recompute determinism

### Coverage Target
- 70%+ on `packages/ratings`
- 70%+ on `packages/hrnz_client`

## Future Enhancements

### Short Term
- Rating deviation (RD) implementation
- Barrier/handicap adjustment learning
- Web UI for browsing ratings
- Export to CSV/Excel

### Long Term
- Real-time updates via WebSocket
- Betting market integration
- ML-based feature engineering
- Multi-region deployment
- GraphQL API
