# CLAUDE.md - AI Assistant Guide for TipSharks

## Project Overview

**TipSharks** is a sophisticated racing ratings and predictions platform that computes advanced multi-runner Elo-style ratings for horses, drivers, and trainers using data from the TAB Affiliates API. The system is production-ready and implements a deterministic rating algorithm with machine learning enhancements.

## Privacy and Anonymity Requirements

**CRITICAL - MUST FOLLOW**: This project has strict privacy and anonymity requirements:

### API Usage Rules
1. **No User Tracking**: Never include user identifiers, session IDs, or tracking tokens in API requests
2. **No Personal Data**: Never send personal information (names, emails, addresses) to external APIs
3. **Anonymous Requests**: All TAB API requests must be anonymous - use only public racing data endpoints
4. **No Authentication Data Logging**: Never log API credentials, tokens, or authentication headers
5. **Minimal Headers**: Use only essential HTTP headers - avoid user-agent strings with identifying information

### Data Handling Rules
1. **Public Data Only**: Only ingest and store publicly available racing data (meeting schedules, race results, horse/driver/trainer names)
2. **No User Profiles**: Do not create user accounts or profiles that could identify individuals
3. **Sanitized Logs**: Ensure all logs are sanitized - never log IP addresses, session data, or user identifiers
4. **Aggregate Only**: When exposing data via API, provide only aggregate statistics and public racing information

### Implementation Checklist
- ✅ TAB client makes anonymous requests (no auth required - public API)
- ✅ HTTP client uses minimal headers (httpx default)
- ✅ No user tracking or analytics in API calls
- ✅ Logs contain only racing data, no personal information
- ✅ Database stores only public racing data (horses, drivers, trainers are public figures)
- ✅ No cookies, sessions, or user tracking mechanisms

**Why This Matters**:
- Maintains user privacy and trust
- Complies with data protection regulations
- Prevents accidental data leakage
- Ensures system can operate without user consent requirements (public data only)

**Verification**: Before deploying changes, verify:
1. No user-identifying data in API requests
2. No personal information in database
3. Logs are sanitized and contain only public racing data
4. All external API calls use anonymous/public endpoints

### Key Capabilities
- Multi-runner Elo ratings using pairwise logistic model (handles 2-20+ starters per race)
- Multi-entity ratings: horses, drivers, and trainers with configurable weights
- Condition adjustments: learned barrier and handicap effects from historical data
- Rating deviation (RD) tracking for uncertainty quantification
- REST API with 12 endpoints for querying ratings and predictions
- **Web UI with 6 responsive pages** for browsing ratings and predictions
- Comprehensive evaluation system (winner accuracy, calibration)
- Deterministic recomputation for reproducibility

### Technology Stack
- **Language**: Python 3.12+
- **Web Framework**: FastAPI (async)
- **Web UI**: Vanilla HTML/CSS/JS + Bootstrap 5 + Chart.js
- **Database**: PostgreSQL 16 with SQLAlchemy 2.0 ORM
- **Migrations**: Alembic
- **HTTP Client**: httpx (async)
- **CLI**: Click + Rich
- **Testing**: pytest + pytest-asyncio
- **Linting**: ruff + black + mypy
- **Deployment**: Docker + Docker Compose
- **CI/CD**: GitHub Actions

---

## Repository Structure

```
/home/user/tipsharks/
├── TODO.md                    # Project roadmap and implementation status
├── harnesselo/               # Main application directory
│   ├── apps/                 # Application entry points
│   │   ├── api/             # FastAPI REST API service
│   │   │   └── main.py      # API endpoints (800+ lines)
│   │   ├── web/             # Web UI (NEW)
│   │   │   ├── templates/   # HTML pages (6 pages)
│   │   │   └── static/      # CSS, JS, assets
│   │   └── worker/          # CLI worker for background tasks
│   │       └── cli.py       # Click commands (ingest, recompute, info)
│   ├── packages/            # Shared business logic libraries
│   │   ├── common/          # Utilities and configuration
│   │   │   ├── settings.py  # Pydantic settings (211 lines)
│   │   │   ├── logging.py   # Structured JSON logging
│   │   │   └── utils.py     # Date parsing, bucketing
│   │   ├── tab_client/      # TAB API integration
│   │   │   └── client.py    # Retries, rate limiting (328 lines)
│   │   ├── ratings/         # Core rating engine
│   │   │   ├── engine.py    # Multi-runner Elo algorithm (522 lines)
│   │   │   └── recompute.py # Batch computation (227 lines)
│   │   └── storage/         # Database layer
│   │       ├── models.py    # SQLAlchemy ORM models (342 lines)
│   │       ├── repositories.py # Data access layer (795 lines)
│   │       ├── ingestion.py # Orchestration (182 lines)
│   │       └── database.py  # Session management
│   ├── tests/               # Test suite (4 files)
│   │   ├── test_rating_engine.py
│   │   ├── test_adjustment_learning.py
│   │   ├── test_api_endpoints.py
│   │   └── test_hrnz_client.py
│   ├── scripts/             # Standalone utilities
│   │   └── evaluate.py      # Accuracy evaluation (323 lines)
│   ├── docs/                # Documentation (4 files)
│   │   ├── architecture.md  # System design
│   │   ├── data_model.md    # Database schema
│   │   ├── rating_math.md   # Elo mathematics
│   │   └── ops.md          # Operations guide
│   ├── alembic/            # Database migrations
│   │   └── versions/       # Migration files
│   ├── docker/             # Docker configurations
│   ├── .env.example        # Environment variable template
│   ├── docker-compose.yml  # Multi-container orchestration
│   ├── pyproject.toml      # Python project config
│   └── README.md           # Main documentation
```

---

## Database Schema

### Core Tables (9 total)

#### Race Data
1. **meetings** - Race meetings/meets
   - PK: `id` (String - TAB meeting ID)
   - Fields: `meeting_date`, `venue`, `category`, `raw_json`

2. **races** - Individual races
   - PK: `id` (auto-increment)
   - FK: `meeting_id` (String)
   - Fields: `tab_event_id`, `race_number`, `distance_m`, `race_datetime`
   - Unique: `(meeting_id, race_number)`

3. **starters** - Runners in races
   - PK: `id` (auto-increment)
   - FKs: `race_id`, `horse_id`, `driver_id`, `trainer_id`
   - Fields: `barrier`, `barrier_position`, `runner_number`, `handicap_m`, `placing`, `did_not_finish`

#### Dimension Tables
4. **horses** - Horse entities (PK: Integer from TAB horse ID)
5. **drivers** - Driver entities (PK: Integer generated from name hash)
6. **trainers** - Trainer entities (PK: Integer generated from name hash)

#### Rating Tables
7. **rating_snapshots** - Rating history
   - Unique: `(entity_type, entity_id, as_of_race_id)`
   - Fields: `rating`, `rd`, `meta` (JSONB)

8. **barrier_adjustments** - Learned barrier effects
9. **handicap_adjustments** - Learned handicap effects

### Key Database Features
- **JSONB columns**: Store raw API responses for audit trails
- **Idempotent operations**: Using `INSERT ... ON CONFLICT DO UPDATE`
- **Cascade deletes**: Meetings → Races → Starters
- **Performance indexes**: Composite indexes for common queries (added in migration 20250126_0001)

---

## Development Workflows

### Initial Setup

```bash
# Navigate to harnesselo directory
cd /home/user/tipsharks/harnesselo

# Copy environment template
cp .env.example .env

# Edit .env - TAB API requires no credentials (public API)
# Optionally enable mock mode for testing: TAB_MOCK_MODE=true

# Start services
docker compose up -d

# Run migrations
docker compose run --rm worker alembic upgrade head

# Verify setup
docker compose run --rm worker python -m apps.worker.cli info
```

### Common Development Tasks

#### Running Tests
```bash
# Inside harnesselo directory
pytest                              # Run all tests
pytest -v                           # Verbose output
pytest --cov=packages --cov=apps    # With coverage
pytest tests/test_rating_engine.py  # Specific test file
pytest -k "test_adjustment"         # Pattern matching
```

#### Linting & Formatting
```bash
ruff check .                # Lint code
ruff check . --fix          # Auto-fix issues
black .                     # Format code
black --check .             # Check formatting
mypy packages/ apps/        # Type checking
```

#### Database Operations
```bash
# Create new migration
docker compose run --rm worker alembic revision --autogenerate -m "description"

# Apply migrations
docker compose run --rm worker alembic upgrade head

# Rollback migration
docker compose run --rm worker alembic downgrade -1

# View migration history
docker compose run --rm worker alembic history

# Access database directly
docker compose exec db psql -U harnesselo
```

#### Data Ingestion
```bash
# Ingest single day
docker compose run --rm worker python -m apps.worker.cli ingest --date 2024-01-15

# Ingest date range
docker compose run --rm worker python -m apps.worker.cli ingest \
  --from 2024-01-01 \
  --to 2024-01-31

# Ingest with verbose logging
LOG_LEVEL=DEBUG docker compose run --rm worker python -m apps.worker.cli ingest --date 2024-01-15
```

#### Rating Computation
```bash
# Recompute ratings (incremental)
docker compose run --rm worker python -m apps.worker.cli recompute \
  --from 2024-01-01 \
  --to 2024-01-31

# Recompute with adjustment learning
docker compose run --rm worker python -m apps.worker.cli recompute \
  --from 2024-01-01 \
  --to 2024-12-31 \
  --learn-adjustments

# Full recompute (clear existing ratings)
docker compose run --rm worker python -m apps.worker.cli recompute \
  --from 2020-01-01 \
  --to 2024-12-31 \
  --clear
```

#### Evaluation
```bash
# Run evaluation for date range
docker compose run --rm worker python scripts/evaluate.py \
  --from 2024-01-01 \
  --to 2024-12-31 \
  --out reports/eval_2024.json

# View evaluation results
cat reports/eval_2024.json | jq .
```

### Git Workflow

**IMPORTANT**: This repository uses feature branches with a specific naming convention.

- **Branch Naming**: All Claude-created branches MUST follow pattern: `claude/{description}-{session_id}`
- **Current Branch**: `claude/add-claude-documentation-xza33`
- **Main Branch**: Not specified (check with `git branch -r`)

```bash
# Always work on claude/* branches
git checkout -b claude/feature-name-abc123

# Commit with descriptive messages
git add .
git commit -m "feat: Add barrier adjustment learning algorithm

- Implement performance residual-based learning
- Add repository layer for adjustment storage
- Integrate with recompute workflow"

# Push to remote (use -u flag, retry on network errors)
git push -u origin claude/feature-name-abc123
```

**Commit Message Conventions**:
- `feat:` - New feature
- `fix:` - Bug fix
- `docs:` - Documentation changes
- `test:` - Test additions/changes
- `refactor:` - Code refactoring
- `perf:` - Performance improvements
- `chore:` - Maintenance tasks

---

## Key Conventions & Patterns

### Code Organization

#### 1. Repository Pattern
All database operations go through repositories with consistent interface:

```python
class MeetingRepository:
    @staticmethod
    def upsert(session, data) -> Meeting:
        """Upsert meeting using INSERT ... ON CONFLICT"""
        ...

    @staticmethod
    def get_by_id(session, meeting_id: int) -> Optional[Meeting]:
        """Get meeting by ID"""
        ...

    @staticmethod
    def get_by_date_range(session, from_date, to_date) -> List[Meeting]:
        """Get meetings in date range"""
        ...
```

**Location**: `/home/user/tipsharks/harnesselo/packages/storage/repositories.py`

#### 2. Idempotent Operations
All database writes use PostgreSQL's `INSERT ... ON CONFLICT DO UPDATE`:

```python
stmt = insert(Horse).values(id=horse_id, name=name, raw_json=data)
stmt = stmt.on_conflict_do_update(
    index_elements=["id"],
    set_=dict(name=stmt.excluded.name, raw_json=stmt.excluded.raw_json)
)
session.execute(stmt)
```

**Why**: Ensures ingestion can be safely re-run without duplicates.

#### 3. Settings Singleton
Pydantic settings loaded from environment, cached as singleton:

```python
from packages.common.settings import get_settings

settings = get_settings()  # Singleton, safe to call multiple times
```

**Location**: `/home/user/tipsharks/harnesselo/packages/common/settings.py`

#### 4. Dataclasses for State
Use dataclasses for in-memory state (not ORM models):

```python
from dataclasses import dataclass

@dataclass
class RatingState:
    rating: float
    rd: Optional[float] = None
    race_count: int = 0
    last_race_date: Optional[date] = None
```

**Location**: `/home/user/tipsharks/harnesselo/packages/ratings/engine.py`

#### 5. Async Context Managers
For resource management (HTTP clients, database sessions):

```python
async with TABClient() as client:
    meetings = await client.get_meetings(from_date, to_date)
```

### Naming Conventions

- **Files**: `snake_case.py` (e.g., `rating_engine.py`)
- **Classes**: `PascalCase` (e.g., `RatingEngine`, `HorseRepository`)
- **Functions/Methods**: `snake_case()` (e.g., `compute_effective_rating()`)
- **Constants**: `UPPER_SNAKE_CASE` (e.g., `ELO_SCALE_C`, `DEFAULT_RATING`)
- **Private Methods**: `_underscore_prefix()` (e.g., `_get_barrier_adjustment()`)
- **Database Tables**: `snake_case` plural (e.g., `meetings`, `rating_snapshots`)
- **Foreign Keys**: `{table}_id` (e.g., `meeting_id`, `horse_id`)

### API Conventions

- **Endpoints**: RESTful design (`/ratings/horses`, `/races/{race_id}`)
- **Response Models**: Pydantic `BaseModel` subclasses
- **Dependency Injection**: FastAPI `Depends()` for database sessions
- **Error Handling**: `HTTPException` with appropriate status codes
- **Pagination**: Query parameters `limit` (max 500) and `offset`
- **Authentication**: Bearer token for admin endpoints (`/admin/*`)

### Configuration

**Environment Variables** (see `.env.example`):

```bash
# Database
DATABASE_URL=postgresql+psycopg://user:pass@host:port/db

# TAB API (public API - no authentication required)
TAB_BASE_URL=https://api.tab.co.nz/affiliates/v1
TAB_TIMEOUT=30.0
TAB_MAX_RETRIES=3
TAB_DEFAULT_CATEGORY=H              # H (Harness), T (Thoroughbred), G (Greyhound)
TAB_DEFAULT_COUNTRY=NZ
TAB_MOCK_MODE=false                 # Use mock data instead of real API (for testing)

# Rating Parameters
ELO_SCALE_C=400.0                   # Logistic scale factor
ELO_K_BASE=24.0                     # Base K-factor
DRIVER_WEIGHT_ALPHA=0.35            # Driver contribution
TRAINER_WEIGHT_BETA=0.15            # Trainer contribution
ADJ_LEARNING_RATE=0.5               # Adjustment learning rate

# Feature Flags
ENABLE_DRIVER=true                  # Include driver ratings
ENABLE_TRAINER=true                 # Include trainer ratings
ENABLE_ADJUSTMENTS=true             # Learn barrier/handicap adjustments
ENABLE_RD=false                     # Track rating deviation

# API
API_ADMIN_TOKEN=your_secret_token   # Admin endpoint auth
CORS_ALLOW_ORIGINS=*                # CORS configuration

# Logging
LOG_LEVEL=INFO                      # DEBUG, INFO, WARNING, ERROR
LOG_FORMAT=json                     # json or text
```

### Logging

Use structured logging with contextual fields:

```python
import logging

logger = logging.getLogger(__name__)

# Info with context
logger.info("Processing meeting", extra={
    "meeting_id": meeting_id,
    "venue": venue,
    "race_count": len(races)
})

# Debug
logger.debug("Computed effective rating", extra={
    "rating": r_eff,
    "starter_id": starter.id
})

# Error with exception
logger.error("Failed to fetch race", exc_info=True, extra={
    "race_id": race_id
})
```

---

## Important Implementation Details

### Rating Engine (`packages/ratings/engine.py`)

The core rating algorithm is in `RatingEngine` class. Key methods:

- `process_race()`: Main entry point for rating a race
- `_compute_expected_scores()`: Pairwise logistic probabilities
- `_apply_update()`: Update ratings with K-factor scaling
- `_get_barrier_adjustment()`: Retrieve learned barrier effect
- `_get_handicap_adjustment()`: Retrieve learned handicap effect
- `learn_adjustments_from_race()`: Learn adjustments from performance residuals

**Zero-sum property**: All rating changes in a race sum to zero.

**Multi-entity formula**:
```
r_eff = r_horse + α × r_driver + β × r_trainer + adj_barrier + adj_handicap
```

### Repository Patterns (`packages/storage/repositories.py`)

All repositories follow these patterns:

1. **Upsert**: `upsert(session, data)` - Create or update entity
2. **Get by ID**: `get_by_id(session, id)` - Retrieve single entity
3. **Get by range**: `get_by_date_range(session, from, to)` - Retrieve multiple
4. **Get latest**: `get_latest_{entity}(session, limit, offset)` - Paginated queries

**Critical**: Always use SQLAlchemy sessions properly:
```python
from packages.storage.database import get_session

with get_session() as session:
    horse = HorseRepository.get_by_id(session, horse_id)
    session.commit()  # Commit if making changes
```

### API Structure (`apps/api/main.py`)

FastAPI application with dependency injection:

```python
from fastapi import Depends, FastAPI
from sqlalchemy.orm import Session
from packages.storage.database import get_db

app = FastAPI()

@app.get("/ratings/horses")
def get_horses(
    limit: int = 100,
    offset: int = 0,
    db: Session = Depends(get_db)
):
    # Query database using repositories
    ratings = RatingSnapshotRepository.get_latest_horses(db, limit, offset)
    return ratings
```

**Admin endpoints** require Bearer token authentication:
```python
from fastapi import Header, HTTPException

def verify_admin_token(authorization: str = Header(None)):
    if not authorization or not authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Missing authorization")
    token = authorization.replace("Bearer ", "")
    if token != get_settings().api_admin_token:
        raise HTTPException(status_code=403, detail="Invalid token")
```

### Testing Patterns (`tests/`)

Use pytest with async support:

```python
import pytest
from unittest.mock import Mock
from packages.ratings.engine import RatingEngine

@pytest.fixture
def engine():
    """Create rating engine with test settings"""
    settings = Mock()
    settings.elo_scale_c = 400.0
    settings.elo_k_base = 24.0
    return RatingEngine(settings)

def test_two_runner_race_equal_ratings(engine):
    """Test that equal-rated horses split rating changes equally"""
    race = Mock()
    race.id = 1
    race.distance_m = 2000

    starters = [
        Mock(horse_id=1, placing=1, barrier=1, handicap_m=0),
        Mock(horse_id=2, placing=2, barrier=2, handicap_m=0),
    ]

    initial_ratings = {1: 1500.0, 2: 1500.0}
    result = engine.process_race(race, starters, initial_ratings)

    assert result.updates[1].delta > 0  # Winner gains
    assert result.updates[2].delta < 0  # Loser loses
    assert abs(result.updates[1].delta + result.updates[2].delta) < 0.01  # Zero-sum
```

---

## Common Tasks for AI Assistants

### Adding a New API Endpoint

1. **Define Pydantic response model** in `apps/api/main.py`:
```python
class MyNewResponse(BaseModel):
    field1: str
    field2: int
```

2. **Add endpoint function**:
```python
@app.get("/my-new-endpoint", response_model=MyNewResponse)
def my_new_endpoint(db: Session = Depends(get_db)):
    # Query database using repositories
    data = MyRepository.get_data(db)
    return MyNewResponse(field1=data.field1, field2=data.field2)
```

3. **Add test** in `tests/test_api_endpoints.py`:
```python
def test_my_new_endpoint(client, db_session):
    response = client.get("/my-new-endpoint")
    assert response.status_code == 200
    assert "field1" in response.json()
```

4. **Update documentation** if needed in `docs/`

### Adding a Database Table

1. **Add SQLAlchemy model** in `packages/storage/models.py`:
```python
class MyNewTable(Base):
    __tablename__ = "my_new_table"

    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String, nullable=False)
    created_at = Column(DateTime, server_default=func.now())
    updated_at = Column(DateTime, server_default=func.now(), onupdate=func.now())
```

2. **Create migration**:
```bash
docker compose run --rm worker alembic revision --autogenerate -m "Add my_new_table"
```

3. **Review and edit migration** in `alembic/versions/`, add indexes if needed

4. **Apply migration**:
```bash
docker compose run --rm worker alembic upgrade head
```

5. **Add repository** in `packages/storage/repositories.py`:
```python
class MyNewTableRepository:
    @staticmethod
    def upsert(session: Session, data: dict) -> MyNewTable:
        # Implement upsert logic
        ...
```

### Modifying the Rating Algorithm

**CRITICAL**: Rating algorithm changes affect determinism. Always:

1. **Document the change** in `docs/rating_math.md`
2. **Add tests** in `tests/test_rating_engine.py`
3. **Validate zero-sum property** is preserved
4. **Run full recompute** after changes to verify determinism:
```bash
docker compose run --rm worker python -m apps.worker.cli recompute \
  --from 2020-01-01 --to 2024-12-31 --clear
```
5. **Run evaluation** and compare metrics before/after
6. **Update TODO.md** with the change

### Adding Configuration Parameters

1. **Add to Pydantic settings** in `packages/common/settings.py`:
```python
class Settings(BaseSettings):
    # Existing fields...

    my_new_param: float = Field(
        default=1.0,
        description="Description of the parameter"
    )

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False
    )
```

2. **Add to `.env.example`**:
```bash
# My New Feature
MY_NEW_PARAM=1.0
```

3. **Use in code**:
```python
from packages.common.settings import get_settings

settings = get_settings()
value = settings.my_new_param
```

---

## Things to Avoid

### Database Operations

❌ **Don't** query database directly without repositories:
```python
# BAD
horses = session.query(Horse).filter(Horse.id == horse_id).first()
```

✅ **Do** use repository pattern:
```python
# GOOD
horses = HorseRepository.get_by_id(session, horse_id)
```

❌ **Don't** use raw SQL for complex queries without idempotency:
```python
# BAD
session.execute("INSERT INTO horses (id, name) VALUES (?, ?)", (id, name))
```

✅ **Do** use SQLAlchemy with ON CONFLICT:
```python
# GOOD
HorseRepository.upsert(session, {"id": id, "name": name})
```

### Rating Algorithm

❌ **Don't** modify rating calculations without preserving zero-sum:
```python
# BAD - Not zero-sum
for starter in starters:
    rating_change = k_factor * (1 - expected_score)  # Always positive!
```

✅ **Do** ensure zero-sum property:
```python
# GOOD
total_delta = sum(update.delta for update in updates.values())
assert abs(total_delta) < 1e-6  # Verify zero-sum
```

❌ **Don't** change rating parameters without documentation:
```python
# BAD
k_factor = 32.0  # Magic number, no explanation
```

✅ **Do** use configuration with documentation:
```python
# GOOD
k_factor = settings.elo_k_base  # Documented in rating_math.md
```

### API Development

❌ **Don't** return ORM models directly:
```python
# BAD
@app.get("/horses/{id}")
def get_horse(id: int, db: Session = Depends(get_db)):
    return HorseRepository.get_by_id(db, id)  # Returns ORM model!
```

✅ **Do** use Pydantic response models:
```python
# GOOD
@app.get("/horses/{id}", response_model=HorseResponse)
def get_horse(id: int, db: Session = Depends(get_db)):
    horse = HorseRepository.get_by_id(db, id)
    return HorseResponse.from_orm(horse)
```

❌ **Don't** expose internal errors to API users:
```python
# BAD
@app.get("/horses/{id}")
def get_horse(id: int):
    horse = query_database(id)  # Might raise database exception
```

✅ **Do** handle errors gracefully:
```python
# GOOD
@app.get("/horses/{id}")
def get_horse(id: int, db: Session = Depends(get_db)):
    horse = HorseRepository.get_by_id(db, id)
    if not horse:
        raise HTTPException(status_code=404, detail="Horse not found")
    return horse
```

### Testing

❌ **Don't** test with live API in unit tests:
```python
# BAD
def test_tab_client():
    client = TABClient()
    meetings = client.get_meetings(from_date, to_date)  # Calls real API!
```

✅ **Do** use mock mode for testing:
```python
# GOOD
@pytest.fixture
def mock_tab_client():
    return MockTABClient()  # Uses deterministic mock data

def test_tab_client(mock_tab_client):
    meetings = await mock_tab_client.get_meetings('2024-12-26', '2024-12-26')
    assert len(meetings) > 0
```

✅ **Do** use respx for HTTP mocking:
```python
# GOOD
@respx.mock
def test_tab_client_retry():
    respx.get("https://api.tab.co.nz/affiliates/v1/racing/meetings").mock(
        return_value=httpx.Response(200, json={"meetings": []})
    )
    # Test retry logic without hitting real API
```

### Git Workflow

❌ **Don't** push to main/master directly:
```bash
# BAD
git push origin main
```

✅ **Do** use feature branches:
```bash
# GOOD
git push -u origin claude/feature-name-abc123
```

❌ **Don't** commit without testing:
```bash
# BAD
git add . && git commit -m "fix" && git push
```

✅ **Do** test before committing:
```bash
# GOOD
pytest && ruff check . && black --check .
git add . && git commit -m "feat: Add feature" && git push
```

---

## Troubleshooting Guide

### Database Issues

**Problem**: Database connection errors
```
sqlalchemy.exc.OperationalError: could not connect to server
```

**Solution**:
```bash
# Check database is running
docker compose ps

# Restart database
docker compose restart db

# Check logs
docker compose logs db

# Verify connection
docker compose exec db psql -U harnesselo -c "SELECT 1"
```

**Problem**: Migration conflicts
```
alembic.util.exc.CommandError: Target database is not up to date
```

**Solution**:
```bash
# View current revision
docker compose run --rm worker alembic current

# View history
docker compose run --rm worker alembic history

# Downgrade and re-upgrade
docker compose run --rm worker alembic downgrade -1
docker compose run --rm worker alembic upgrade head
```

### TAB API Issues

**Problem**: Historical Data Not Available (FR0001)
```
error: "requested meetings/events cannot be over 2 weeks old"
error_code: "FR0001"
```

**Root Cause**: The TAB Affiliates API (public tier) only provides data for the current ± 2 weeks window.

**Solutions**:
1. **For recent data**: Query dates within the last 2 weeks
2. **For historical analysis**: Use mock mode or implement daily ingestion to build historical database
3. **For production**: Set up scheduled daily ingestion to capture data before it expires from API
4. **Alternative**: Contact TAB for access to historical data API (may require different tier/product)

**Problem**: No Meetings Found
```
Found 0 meetings
```

**Root Cause**: No meetings scheduled for the specified date/category/country combination.

**Solutions**:
```bash
# Try different date
docker compose run --rm worker python -m apps.worker.cli ingest --date 2026-01-07

# Try different category (T=Thoroughbred, H=Harness, G=Greyhound)
docker compose run --rm worker python -m apps.worker.cli ingest --date 2026-01-06 --category T

# Try without category filter to see all meetings
docker compose run --rm worker python -m apps.worker.cli ingest --date 2026-01-06
```

**Problem**: Rate limiting
```
429 Too Many Requests
```

**Solution**: The client has built-in exponential backoff. Just wait and retry. TAB API has generous rate limits for the public tier.

**Mock Mode for Testing Without API Access**:

Mock mode is useful for development and testing:

```bash
# Enable mock mode in .env
TAB_MOCK_MODE=true

# Or pass as environment variable
docker compose run --rm -e TAB_MOCK_MODE=true worker \
  python -m apps.backend.worker.cli ingest --date 2024-12-26

# Result: Generates realistic mock data (1 meeting, 8 races, 80 starters)
```

Mock mode provides:
- Realistic data structure matching TAB API format
- 1-2 meetings per day with 8 races each
- 8-12 runners per race with proper attributes
- Deterministic data (same dates = same data)
- Works for any date (no 2-week restriction)

**For detailed troubleshooting, see**: `/docs/troubleshooting.md`

### Rating Issues

**Problem**: Ratings not appearing after recompute
```bash
# Check if races were ingested
docker compose exec db psql -U harnesselo -c \
  "SELECT COUNT(*) FROM races WHERE race_datetime BETWEEN '2024-01-01' AND '2024-01-31'"

# Check if starters exist
docker compose exec db psql -U harnesselo -c \
  "SELECT COUNT(*) FROM starters WHERE race_id IN (SELECT id FROM races)"

# Check if ratings were computed
docker compose exec db psql -U harnesselo -c \
  "SELECT COUNT(*) FROM rating_snapshots"
```

**Problem**: Rating determinism broken (different results on re-run)
- Check if any random number generation was introduced
- Verify race processing order is deterministic (sorted by race_datetime)
- Check if any async operations are non-deterministic
- Review recent changes to `packages/ratings/engine.py`

### API Issues

**Problem**: 401 Unauthorized on admin endpoints
```bash
# Check token in .env
cat .env | grep API_ADMIN_TOKEN

# Test with curl
curl -H "Authorization: Bearer YOUR_TOKEN" \
  -X POST http://localhost:8000/admin/ingest \
  -H "Content-Type: application/json" \
  -d '{"date_from": "2024-01-01", "date_to": "2024-01-31"}'
```

**Problem**: Slow API queries
```bash
# Check query performance
docker compose exec db psql -U harnesselo -c \
  "EXPLAIN ANALYZE SELECT * FROM rating_snapshots WHERE entity_type = 'HORSE' ORDER BY rating DESC LIMIT 100"

# Add indexes if needed (create migration)
```

---

## Performance Considerations

### Database Query Optimization

1. **Use composite indexes** for common query patterns:
```sql
CREATE INDEX idx_rating_snapshots_entity_rating
ON rating_snapshots(entity_type, rating DESC);
```

2. **Use `joinedload()` to avoid N+1 queries**:
```python
from sqlalchemy.orm import joinedload

races = session.query(Race)\
    .options(joinedload(Race.starters))\
    .filter(Race.meeting_id == meeting_id)\
    .all()
```

3. **Batch database operations**:
```python
# BAD - One query per horse
for horse_id in horse_ids:
    horse = HorseRepository.get_by_id(session, horse_id)

# GOOD - Single query
horses = session.query(Horse).filter(Horse.id.in_(horse_ids)).all()
```

### Rating Computation Performance

1. **Recompute speed target**: >5,000 races/minute
2. **Memory usage**: Monitor for large date ranges
3. **Batch commits**: Commit rating snapshots in batches of 1,000

### API Performance

1. **Response time target**: p95 < 200ms
2. **Use pagination**: Always limit query results
3. **Cache static data**: Consider Redis for top N ratings

---

## Security Considerations

### Authentication

- **Admin endpoints**: Protected by Bearer token (`API_ADMIN_TOKEN`)
- **TAB API**: Public API - no credentials required (anonymous access)
- **Database credentials**: Use environment variables, never commit to git
- **Privacy**: No user tracking or personal data collection (see Privacy Requirements section above)

### Input Validation

- **API parameters**: Validated by Pydantic models
- **SQL injection**: Prevented by SQLAlchemy parameterized queries
- **Date ranges**: Validated to prevent excessive queries

### CORS Configuration

**Development**: Allow all origins (`CORS_ALLOW_ORIGINS=*`)
**Production**: Restrict to specific domains (`CORS_ALLOW_ORIGINS=https://app.example.com`)

---

## Deployment Checklist

Before deploying to production:

- [ ] Set strong `API_ADMIN_TOKEN` (32+ random characters)
- [ ] Configure `CORS_ALLOW_ORIGINS` to specific domains
- [ ] Set up managed PostgreSQL with backups
- [ ] Configure SSL/TLS for API
- [ ] Set up logging aggregation (CloudWatch, Datadog, etc.)
- [ ] Configure monitoring and alerts
- [ ] Run full data backfill
- [ ] Run evaluation and verify metrics (35-45% winner accuracy)
- [ ] Set up automated scheduling (daily ingestion + recompute)
- [ ] Test disaster recovery procedures
- [ ] Review and adjust rating parameters if needed

---

## Additional Resources

### Documentation Files
- **Architecture**: `/home/user/tipsharks/harnesselo/docs/architecture.md`
- **Data Model**: `/home/user/tipsharks/harnesselo/docs/data_model.md`
- **Rating Math**: `/home/user/tipsharks/harnesselo/docs/rating_math.md`
- **Operations**: `/home/user/tipsharks/harnesselo/docs/ops.md`

### Project Status
- **TODO.md**: `/home/user/tipsharks/TODO.md` - Implementation status and roadmap
- **README.md**: `/home/user/tipsharks/harnesselo/README.md` - Quick start guide

### Key Metrics (Target Performance)

**Rating Quality**:
- Winner accuracy: 35-45%
- Top-3 hit rate: 60-75%
- Calibration error: <5%

**System Performance**:
- Ingestion error rate: <1%
- Recompute speed: >5,000 races/minute
- API p95 latency: <200ms
- Database query p95: <50ms

**Operational**:
- System uptime: >99.5%
- Backup success rate: 100%
- CI/CD success rate: >95%

---

## Summary for AI Assistants

When working with this codebase:

1. **Always read before modifying** - Read files before making changes
2. **Use repository pattern** - Don't query database directly
3. **Maintain determinism** - Rating changes must be reproducible
4. **Test thoroughly** - Add tests for new features
5. **Document changes** - Update relevant .md files
6. **Follow conventions** - Use existing patterns and naming
7. **Check TODO.md** - Understand project status and roadmap
8. **Preserve zero-sum** - All rating changes in a race must sum to zero
9. **Use feature branches** - Follow `claude/*` naming convention
10. **Run evaluation** - Verify rating quality after algorithm changes

**Working Directory**: `/home/user/tipsharks/harnesselo` (for development tasks)
**Root Directory**: `/home/user/tipsharks` (for git operations)

---

**Last Updated**: 2025-12-26
**Version**: 1.0.0
**Status**: Production-ready with ML enhancements