# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**Horse Racing Data Scraper** - Near-realtime data collection system for Australian and New Zealand thoroughbred and harness racing using TAB Affiliates API.

**Tech Stack**: TypeScript, Node.js, PostgreSQL, Redis, Prisma ORM, Docker
**Status**: Phase 3 Complete - Schedulers and observability operational

## Essential Development Commands

```bash
# Development
npm run dev                     # Start with hot reload
npm test                        # Run all tests
npm run test:watch              # Watch mode
npm run test:coverage           # Coverage report
npm run test:unit               # Unit tests only
npm run test:integration        # Integration tests only

# Database
npm run prisma:generate         # Generate Prisma client
npm run prisma:migrate          # Run migrations
npm run prisma:studio           # Open database GUI

# Code Quality
npm run lint                    # Lint TypeScript
npm run lint:fix                # Auto-fix lint issues
npm run format                  # Format with Prettier
npm run type-check              # TypeScript type checking

# Build & Deploy
npm run build                   # Compile TypeScript
npm start                       # Start production server

# Docker
docker-compose up -d            # Start all services
docker-compose logs -f app      # View app logs
docker-compose ps               # Check service status
```

## High-Level Architecture

### Data Flow

```
Schedulers (node-cron)
  ↓
Service Layer (MeetingService, etc.)
  ↓
API Client (TabApiClient with rate limiting & retry)
  ↓
TAB Affiliates API
  ↓
Validation (Zod schemas)
  ↓
Database (Prisma → PostgreSQL)
  ↓
Observability (Pino logs, Prometheus metrics, OpenTelemetry traces)
```

### Scheduler System Architecture

The scheduler system is the core orchestration layer:

1. **SchedulerManager** (`src/schedulers/scheduler-manager.ts`)
   - Initializes and manages all schedulers
   - Provides unified start/stop/status interface
   - Exposes endpoints: `/health`, `/schedulers`, `/trigger/:type`

2. **Base Scheduler Pattern** (`src/schedulers/base-scheduler.ts`)
   - All schedulers extend BaseScheduler
   - Provides common logging, metrics, and error handling
   - Enforces JobRun tracking for all operations

3. **Scheduler Types**:
   - **MorningScrapeScheduler**: Fetches all meetings for the day (6:00 AM)
   - **PreRaceScheduler**: Checks for scratches/changes (T-60, T-15 mins)
   - **PostRaceScheduler**: Collects results (T+5 mins after start)
   - **CleanupScheduler**: Purges old JobRuns and Scrapes (daily 2:00 AM)

4. **Dynamic Scheduling**:
   - Pre/post-race jobs are scheduled dynamically based on race start times
   - MorningScrapeScheduler creates the day's race schedule
   - Schedulers coordinate via database state (JobRun model)

### Key Design Decisions

1. **Upsert Pattern Everywhere**: All database operations use upsert to handle re-scraping gracefully
2. **JobRun Tracking**: Every scheduler execution creates a JobRun record for monitoring and debugging
3. **Rate Limiting**: Bottleneck library handles TAB API rate limits (100 req/min default)
4. **Retry Logic**: Exponential backoff with axios-retry for transient failures
5. **Timezone Handling**: All times stored in UTC, converted for display only (uses Luxon)
6. **Change Detection**: Compare previous vs current data, log changes to Scrapes table
7. **Graceful Degradation**: API failures logged but don't crash schedulers

### Configuration System

**Zod-validated environment config** (`src/config/index.ts`):
- Database URL, Redis URL, TAB API credentials
- Feature flags for racing types and countries
- Rate limiting and retry parameters
- Scheduler cron expressions and timezone
- Observability settings

All config validated on startup - app fails fast with clear error if misconfigured.

### Observability Stack

**Three Pillars Implementation**:

1. **Structured Logging (Pino)**
   - Pattern: `logger.info({ contextData }, 'message')`
   - Always include traceId for correlation
   - Log entry/exit of critical operations
   - Located: `src/utils/logger.ts`

2. **Metrics (Prometheus)**
   - Counter: `metrics.counter('name', value, { labels })`
   - Histogram: `metrics.histogram('name', duration, { labels })`
   - Gauge: `metrics.gauge('name', value, { labels })`
   - Exported at: `http://localhost:9090/metrics`
   - Located: `src/metrics/index.ts`

3. **Distributed Tracing (OpenTelemetry)**
   - Auto-instrumentation enabled for HTTP, DB, Redis
   - Manual spans for business operations
   - Traces exported to Jaeger: `http://localhost:16686`

### Testing Philosophy

**TDD Approach - Write tests FIRST**:

```typescript
// Test Pyramid: 60% Unit, 30% Integration, 10% E2E
// Coverage Gates: 85%+ overall, 100% for critical paths

// Unit Test Pattern (mock external dependencies)
describe('ChangeDetectionService', () => {
  it('should detect scratched runner', () => {
    // Arrange, Act, Assert
  });
});

// Integration Test Pattern (use real Prisma + test DB)
describe('MeetingService Integration', () => {
  beforeEach(async () => {
    await prisma.meeting.deleteMany();
  });

  it('should fetch and store meetings', async () => {
    // Test with real database
  });
});
```

**Critical Paths requiring 100% coverage**:
- API client error handling
- Data validation (Zod schemas)
- Change detection logic
- Database upsert operations

### Database Schema Key Points

**Models** (see `prisma/schema.prisma`):
- **Meeting**: Top-level race meeting (venue, date, conditions)
- **Race**: Individual races within a meeting
- **Runner**: Horses/entries in each race
- **Result**: Race outcomes and dividends
- **Scrape**: Audit trail of all scraping operations
- **JobRun**: Scheduler execution tracking

**Relationships**:
- Meeting → Races (1:many)
- Race → Runners (1:many)
- Race → Results (1:many, through Runner)
- Race → Scrapes (1:many) - audit trail

**Important Indexes**:
- `(date, country, category)` on meetings - for daily scrapes
- `startTime` on races - for scheduler queries
- `(raceId, scrapeType, scrapedAt)` on scrapes - for change tracking

### API Client Implementation

**TabApiClient** (`src/api/tab/tab-api-client.ts`):

```typescript
// Features:
// - Rate limiting via Bottleneck (reservoir pattern)
// - Exponential backoff retry via axios-retry
// - Zod schema validation on all responses
// - Full observability (logs, metrics, traces)
// - TypeScript types generated from openapi.json

// Usage:
const client = new TabApiClient({
  baseUrl: config.tabApi.baseUrl,
  rateLimitPerMinute: 100,
  maxRetries: 3,
  retryDelay: 1000,
  timeout: 10000,
});

const meetings = await client.getMeetings(date, country, category);
```

**Response Validation**:
- All responses validated with Zod schemas (`src/api/tab/schemas.ts`)
- TypeScript types in `src/api/tab/types.ts`
- Throws ValidationError if API response doesn't match schema

### Service Layer Pattern

**MeetingService** (`src/services/meeting-service.ts`) example:

```typescript
class MeetingService {
  constructor(
    private prisma: PrismaClient,
    private apiClient: TabApiClient
  ) {}

  async fetchAndStore(date: Date, country: string): Promise<number> {
    // 1. Fetch from API with error handling
    // 2. Validate with Zod
    // 3. Transform API types to DB types
    // 4. Upsert to database (prevents duplicates)
    // 5. Log and emit metrics
    // 6. Return count of items processed
  }
}
```

**Pattern to follow**:
- Services are thin orchestration layers
- Inject dependencies (Prisma, ApiClient) for testability
- Use upsert for idempotent operations
- Always log with context
- Emit metrics for all operations
- Throw typed errors with context

### Working Context System

**`WORKING_CONTEXT.md`** is the living memory:
- Current phase and task status
- Recent decisions and rationale
- Blockers and questions
- Test coverage and quality metrics
- Infrastructure status
- Next steps for session resumption

**Update after every major task** (30-60 min work blocks)

### Reference Documentation Structure

Detailed specs in `reference-data/claude-reference/`:
- **PROJECT_BRIEF.md**: Complete technical specification (1300+ lines)
- **PATTERNS.md**: Code patterns and examples
- **API_REFERENCE.md**: TAB API endpoint details
- **DOCKER_SETUP.md**: Infrastructure setup

Don't duplicate info from these files - reference them when needed.

## Development Guidelines

### Code Quality Standards

**Non-negotiable gates**:
- ⚠️ NO CODE WITHOUT TESTS (85%+ coverage)
- ⚠️ NO CODE WITHOUT LOGGING (structured with context)
- ⚠️ NO CODE WITHOUT METRICS (async operations must emit metrics)
- ⚠️ All lint and TypeScript errors must be fixed

### Common Patterns to Follow

**1. Structured Logging**:
```typescript
logger.info('Starting operation', {
  date: date.toISOString(),
  country,
  traceId,
});
```

**2. Metrics Emission**:
```typescript
metrics.counter('meetings.scraped', 1, { country, category });
metrics.histogram('api.request.duration', durationMs, { endpoint });
```

**3. Error Handling**:
```typescript
try {
  // operation
} catch (error) {
  logger.error('Operation failed', {
    error: error.message,
    stack: error.stack,
    context,
  });
  throw new ServiceError('Friendly message', { cause: error });
}
```

**4. Database Upsert**:
```typescript
await prisma.meeting.upsert({
  where: { id: meeting.id },
  update: { /* changeable fields */ },
  create: { /* all fields */ },
});
```

### Running Tests

```bash
# Run specific test file
npm test -- meeting-service.unit.test

# Run tests matching pattern
npm test -- --testNamePattern="should detect changes"

# Run with coverage for specific file
npm test -- --coverage --collectCoverageFrom="src/services/meeting-service.ts"

# Debug tests with Node inspector
node --inspect-brk node_modules/.bin/jest --runInBand
```

### Docker Services

**Services running locally**:
- PostgreSQL: `localhost:5432`
- Redis: `localhost:6379`
- Jaeger UI: `http://localhost:16686`
- Prometheus: `http://localhost:9091`
- Grafana: `http://localhost:3000` (admin/admin)
- App metrics: `http://localhost:9090/metrics`
- App health: `http://localhost:9090/health`
- Scheduler status: `http://localhost:9090/schedulers`

**Useful Docker commands**:
```bash
# View logs for specific service
docker-compose logs -f postgres

# Restart a service
docker-compose restart app

# Rebuild and restart
docker-compose up -d --build app

# Check resource usage
docker stats
```

### Prisma Tips

```bash
# Generate client after schema changes
npm run prisma:generate

# Create and apply migration
npm run prisma:migrate

# Reset database (DESTROYS ALL DATA)
npx prisma migrate reset

# Open Prisma Studio (DB GUI)
npm run prisma:studio

# Format schema file
npx prisma format

# Validate schema
npx prisma validate
```

## Common Tasks

### Adding a New Scheduler

1. Create in `src/schedulers/[name]-scheduler.ts`
2. Extend `BaseScheduler` class
3. Implement `execute()` method
4. Add to `SchedulerManager.ts` initialization
5. Add cron config to `src/schedulers/config.ts`
6. Write tests in `src/schedulers/__tests__/`

### Adding a New API Endpoint

1. Add TypeScript types to `src/api/tab/types.ts`
2. Create Zod schema in `src/api/tab/schemas.ts`
3. Add method to `TabApiClient` class
4. Write unit tests with mocked axios
5. Write integration tests with real API (optional)

### Adding a New Service

1. Create in `src/services/[name]-service.ts`
2. Inject dependencies (Prisma, ApiClient) in constructor
3. Add structured logging with context
4. Add metrics for key operations
5. Write unit tests with mocked dependencies
6. Write integration tests with real database

### Debugging Common Issues

**Scheduler not running**:
- Check `/schedulers` endpoint for status
- Verify cron expression in config
- Check logs for initialization errors
- Ensure timezone is set correctly

**Database connection failures**:
- Verify `DATABASE_URL` in `.env`
- Check PostgreSQL is running: `docker-compose ps`
- Test connection: `npx prisma db pull`

**API rate limit errors**:
- Check Prometheus metrics: `api_requests_total`
- Adjust `API_RATE_LIMIT_PER_MINUTE` in config
- Review Bottleneck settings in TabApiClient

**Tests failing**:
- Clear Jest cache: `npx jest --clearCache`
- Check test database is clean: see `beforeEach` blocks
- Verify mocks are set up correctly
- Run with `--verbose` for detailed output

## Important Notes

- **Timezones**: All database timestamps in UTC, convert on read for local time
- **Race IDs**: UUIDs from TAB API, used as primary keys
- **Upsert Strategy**: Prevents duplicates when re-scraping
- **Change Detection**: Scrapes table maintains audit trail
- **Feature Flags**: Enable/disable racing types in config
- **Rate Limits**: TAB API limits unknown, start conservative (100/min)
- **JobRun Tracking**: All scheduler executions logged for monitoring

## Session Workflow

**When starting work**:
1. Read `WORKING_CONTEXT.md` for current state
2. Check git status for recent changes
3. Run `npm test` to verify current state
4. Review relevant docs in `reference-data/claude-reference/`
5. Continue from documented next steps

**During work**:
- Update `WORKING_CONTEXT.md` after major tasks
- Document decisions and reasoning
- Note blockers or questions

**Before ending session**:
- Update `WORKING_CONTEXT.md` with current status
- Document what's complete, what's in progress
- List clear next steps for resumption
- Commit changes to git

## Quick Reference

**Project structure**:
```
src/
├── api/tab/           # TAB API client with retry & rate limiting
├── services/          # Business logic (MeetingService, etc.)
├── schedulers/        # Cron jobs (Morning, PreRace, PostRace)
├── config/            # Zod-validated environment config
├── utils/             # Logger, helpers
├── metrics/           # Prometheus metrics
└── index.ts           # Application entry point

prisma/
└── schema.prisma      # Database models and relations

reference-data/
└── claude-reference/  # Detailed technical docs
```

**Environment variables** (see `.env.example`):
- Database and Redis URLs
- TAB API credentials and base URL
- Scheduler cron expressions and timezone
- Rate limiting parameters
- Feature flags
- Observability settings

---

**For comprehensive technical details**, refer to `reference-data/claude-reference/PROJECT_BRIEF.md` and `PATTERNS.md`.

**For current work status**, always check `WORKING_CONTEXT.md` first.
