# TipSharks Operations Guide

## Quick Start

### Initial Setup

1. **Clone repository**
```bash
git clone <repository-url>
cd tipsharks
```

2. **Configure environment**
```bash
cp .env.example .env
# Edit .env with your admin token and scraper defaults
```

3. **Start services**
```bash
docker compose up -d
```

4. **Run migrations**
```bash
docker compose run --rm worker alembic upgrade head
```

5. **Verify setup**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli info
```

---

## Data Ingestion

### Backfill Historical Data

**Single month:**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli ingest \
  --from 2024-01-01 \
  --to 2024-01-31
```

**Full year:**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli ingest \
  --from 2024-01-01 \
  --to 2024-12-31
```

**Single date:**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli ingest \
  --date 2024-01-15
```

### HRNZ InfoHorse Scrape (Webhook)

The webhook generates URL lists and scrapes via Playwright. Use `club_codes` as `all` or a list.

```bash
curl -X POST http://localhost:8000/webhook/scrape \
  -H "Authorization: Bearer <API_ADMIN_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "club_codes": "all",
    "date_from": "2024-01-01",
    "date_to": "2024-12-31",
    "recompute": true
  }'
```

Duplicates are handled by database upserts on meetings, races, starters, and entities.

### Incremental Daily Updates

Add to crontab or scheduler:

```bash
# Run daily at 2 AM
0 2 * * * docker compose -f /path/to/docker-compose.yml run --rm worker \
  python -m apps.backend.worker.cli ingest --date $(date +%Y-%m-%d)
```

### Checking Ingestion Status

**Count records:**
```bash
docker compose exec db psql -U tipsharks -c "
  SELECT
    (SELECT COUNT(*) FROM meetings) AS meetings,
    (SELECT COUNT(*) FROM races) AS races,
    (SELECT COUNT(*) FROM starters) AS starters;
"
```

**Recent meetings:**
```bash
docker compose exec db psql -U tipsharks -c "
  SELECT meeting_date, venue, id
  FROM meetings
  ORDER BY meeting_date DESC
  LIMIT 10;
"
```

---

## Rating Computation

### Full Recompute

**From scratch (clears existing ratings):**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli recompute \
  --from 2020-01-01 \
  --to 2024-12-31 \
  --clear
```

**Incremental (preserves existing):**
```bash
docker compose run --rm worker python -m apps.backend.worker.cli recompute \
  --from 2024-01-01 \
  --to 2024-01-31
```

### Scheduled Recompute

**Daily recompute of last 7 days:**
```bash
# Run daily at 3 AM (after ingestion)
0 3 * * * docker compose -f /path/to/docker-compose.yml run --rm worker \
  python -m apps.backend.worker.cli recompute \
  --from $(date -d '7 days ago' +%Y-%m-%d) \
  --to $(date +%Y-%m-%d)
```

### Verify Determinism

Run recompute twice and compare:

```bash
# First run
docker compose run --rm worker python -m apps.backend.worker.cli recompute \
  --from 2024-01-01 --to 2024-01-31 --clear

# Export ratings
docker compose exec db psql -U tipsharks -c "
  COPY (
    SELECT entity_type, entity_id, rating
    FROM rating_snapshots
    ORDER BY entity_type, entity_id, as_of_race_id
  ) TO '/tmp/ratings_run1.csv' CSV;
"

# Second run
docker compose run --rm worker python -m apps.backend.worker.cli recompute \
  --from 2024-01-01 --to 2024-01-31 --clear

# Export again
docker compose exec db psql -U tipsharks -c "
  COPY (
    SELECT entity_type, entity_id, rating
    FROM rating_snapshots
    ORDER BY entity_type, entity_id, as_of_race_id
  ) TO '/tmp/ratings_run2.csv' CSV;
"

# Compare
diff /tmp/ratings_run1.csv /tmp/ratings_run2.csv
# Should output nothing (identical files)
```

---

## Evaluation

### Generate Evaluation Report

```bash
docker compose run --rm worker python scripts/evaluate.py \
  --from 2024-01-01 \
  --to 2024-12-31 \
  --out reports/eval_2024.json
```

**Outputs:**
- `reports/eval_2024.json` - Detailed metrics
- `reports/eval_2024.md` - Human-readable summary

### View Results

```bash
cat reports/eval_2024.md
```

### Monitoring Metrics Over Time

Run evaluation monthly and track:
- Winner accuracy trend
- Calibration error
- Sample sizes

---

## API Operations

### Start API

```bash
docker compose up -d api
```

API available at: http://localhost:8000

### Health Check

```bash
curl http://localhost:8000/health
```

Expected:
```json
{
  "status": "healthy",
  "version": "0.1.0"
}
```

### Query Top Horses

```bash
curl http://localhost:8000/ratings/horses?limit=10
```

### Query Horse Details

```bash
curl http://localhost:8000/ratings/horses/12345
```

### Trigger Ingestion (Admin)

```bash
curl -X POST http://localhost:8000/admin/ingest \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "date_from": "2024-01-01",
    "date_to": "2024-01-07"
  }'
```

### Trigger Recompute (Admin)

```bash
curl -X POST http://localhost:8000/admin/recompute \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "date_from": "2024-01-01",
    "date_to": "2024-01-31",
    "clear_existing": false
  }'
```

---

## Database Management

### Backup Database

**Full backup:**
```bash
docker compose exec db pg_dump -U tipsharks tipsharks \
  | gzip > backups/tipsharks_$(date +%Y%m%d).sql.gz
```

**Schema only:**
```bash
docker compose exec db pg_dump -U tipsharks --schema-only tipsharks \
  > backups/schema_$(date +%Y%m%d).sql
```

### Restore Database

```bash
# Stop API
docker compose stop api

# Drop and recreate database
docker compose exec db psql -U tipsharks -c "DROP DATABASE IF EXISTS tipsharks;"
docker compose exec db psql -U tipsharks -c "CREATE DATABASE tipsharks;"

# Restore from backup
gunzip -c backups/tipsharks_20240115.sql.gz | \
  docker compose exec -T db psql -U tipsharks tipsharks

# Run migrations (if schema changed)
docker compose run --rm worker alembic upgrade head

# Restart API
docker compose up -d api
```

### Vacuum and Analyze

**Regular maintenance:**
```bash
docker compose exec db psql -U tipsharks -c "VACUUM ANALYZE;"
```

**Full vacuum (requires downtime):**
```bash
docker compose stop api worker
docker compose exec db psql -U tipsharks -c "VACUUM FULL ANALYZE;"
docker compose up -d api
```

---

## Migrations

### Create New Migration

```bash
docker compose run --rm worker alembic revision --autogenerate \
  -m "Add new column"
```

### Apply Migrations

```bash
docker compose run --rm worker alembic upgrade head
```

### Rollback Migration

```bash
# Rollback one version
docker compose run --rm worker alembic downgrade -1

# Rollback to specific version
docker compose run --rm worker alembic downgrade abc123
```

### View Migration History

```bash
docker compose run --rm worker alembic history
```

---

## Monitoring and Logs

### View Logs

**API logs:**
```bash
docker compose logs -f api
```

**Worker logs:**
```bash
docker compose logs worker
```

**Database logs:**
```bash
docker compose logs db
```

**All logs:**
```bash
docker compose logs -f
```

### Log Filtering

**Errors only:**
```bash
docker compose logs api | grep ERROR
```

**Specific meeting:**
```bash
docker compose logs api | grep "meeting_id=12345"
```

### Disk Usage

```bash
# Database size
docker compose exec db psql -U tipsharks -c "
  SELECT pg_size_pretty(pg_database_size('tipsharks'));
"

# Table sizes
docker compose exec db psql -U tipsharks -c "
  SELECT
    tablename,
    pg_size_pretty(pg_total_relation_size(tablename::text))
  FROM pg_tables
  WHERE schemaname = 'public'
  ORDER BY pg_total_relation_size(tablename::text) DESC;
"
```

---

## Performance Tuning

### Database Connection Pool

Edit `.env`:
```bash
DATABASE_POOL_SIZE=10
DATABASE_MAX_OVERFLOW=20
```

### Increase Worker Throughput

Run multiple ingestion workers:
```bash
# Terminal 1
docker compose run --rm worker python -m apps.backend.worker.cli ingest \
  --from 2024-01-01 --to 2024-01-15

# Terminal 2
docker compose run --rm worker python -m apps.backend.worker.cli ingest \
  --from 2024-01-16 --to 2024-01-31
```

### Optimize Recompute

**Batch size tuning** (edit `packages/ratings/recompute.py`):
```python
# Commit every N races
if idx % 100 == 0:  # Increase from 50 to 100
    session.commit()
```

---

## Troubleshooting

### HRNZ Scraper Issues

**Check webhook logs:**
```bash
docker logs --tail 200 tipsharks_api
```

**Validate a results URL:**
```bash
curl -I https://infohorse.hrnz.co.nz/datahrs/results/010131rs.htm
```

### Database Connection Errors

**Check database is running:**
```bash
docker compose ps db
```

**Test connection:**
```bash
docker compose exec db psql -U tipsharks -c "SELECT 1;"
```

**Restart database:**
```bash
docker compose restart db
```

### Ratings Not Appearing

**Check ingestion completed:**
```bash
docker compose exec db psql -U tipsharks -c "
  SELECT COUNT(*) FROM meetings;
  SELECT COUNT(*) FROM races;
  SELECT COUNT(*) FROM starters;
"
```

**Check recompute ran:**
```bash
docker compose exec db psql -U tipsharks -c "
  SELECT COUNT(*) FROM rating_snapshots;
"
```

**View rating distribution:**
```bash
docker compose exec db psql -U tipsharks -c "
  SELECT
    entity_type,
    COUNT(*) AS count,
    AVG(rating)::INT AS avg_rating,
    MIN(rating)::INT AS min_rating,
    MAX(rating)::INT AS max_rating
  FROM (
    SELECT DISTINCT ON (entity_type, entity_id)
      entity_type, entity_id, rating
    FROM rating_snapshots
    ORDER BY entity_type, entity_id, as_of_race_id DESC
  ) latest
  GROUP BY entity_type;
"
```

### API Returns 500 Errors

**Check logs:**
```bash
docker compose logs api | tail -100
```

**Restart API:**
```bash
docker compose restart api
```

**Test health endpoint:**
```bash
curl http://localhost:8000/health
```

### Out of Disk Space

**Clean Docker volumes:**
```bash
docker system prune -a --volumes
```

**Truncate old data:**
```bash
# Archive old meetings
docker compose exec db psql -U tipsharks -c "
  DELETE FROM meetings WHERE meeting_date < '2020-01-01';
"
```

---

## Production Deployment

### Environment Setup

1. Use managed PostgreSQL (RDS, Cloud SQL, etc.)
2. Set strong `API_ADMIN_TOKEN`
3. Configure backups (automated snapshots)
4. Set up monitoring (Datadog, Prometheus, etc.)
5. Configure log aggregation (CloudWatch, Stackdriver, etc.)

### Security Checklist

- [ ] Change default database password
- [ ] Generate strong admin token
- [ ] Enable SSL for database connections
- [ ] Restrict database network access (VPC/firewall)
- [ ] Use secrets manager for credentials
- [ ] Enable HTTPS for API
- [ ] Set up rate limiting on API
- [ ] Regular security updates

### Scaling Considerations

**Horizontal:**
- Run multiple API instances behind load balancer
- Use read replicas for query endpoints

**Vertical:**
- Increase database instance size
- Add more CPU/memory to worker container

### Monitoring Metrics

Track these metrics:
- API request latency (p50, p95, p99)
- API error rate
- Ingestion success rate
- Recompute duration
- Database query time
- Rating snapshot count
- Active horses/drivers/trainers

---

## Maintenance Schedule

### Daily
- Ingest previous day's races (automated)
- Incremental recompute (automated)
- Check logs for errors

### Weekly
- Review evaluation metrics
- Check disk usage
- Verify backup integrity

### Monthly
- Full database backup (off-site)
- Vacuum database
- Review and archive old logs
- Performance audit

### Quarterly
- Full recompute from scratch (validate determinism)
- Review and adjust rating parameters
- Update dependencies
- Security audit
