# Historical NZ Harness Racing Data Sources - Analysis

**Date**: 2026-01-07
**Purpose**: Evaluate options for obtaining historical harness racing data for TipSharks

---

## Current Situation

### TAB Affiliates API Limitations
✅ **Working**: Individual event access via `/racing/events/{event_id}`
✅ **Working**: `/racing/list` endpoint exists
❌ **Limitation**: No historical race listings - designed for live/upcoming races only
❌ **Impact**: Cannot build historical database by querying past dates

**Tested with**:
- `/racing/meetings?date_from=2026-01-05&date_to=2026-01-05` → 0 results
- `/racing/list?date_from=2026-01-05&date_to=2026-01-05&meet_types=H` → 0 results
- `/racing/events/e8213fb0-289c-4967-a81f-1efe05895284` → ✅ Full event data

---

## Alternative Data Sources Evaluated

### 1. The Racing API (theracingapi.com)

**Coverage Analysis**:
- **Geographic**: 46 countries including limited NZ coverage
- **NZ Data**: Only 805 all-time records, 212 in last 10 years, 2 in 2026
- **Focus**: Horse racing (thoroughbred) only
- **Historical**: Data from 2017-2026 available

**Verdict**: ❌ **NOT SUITABLE**
- Focus on thoroughbred horse racing, not harness racing
- Very limited NZ coverage (805 total records vs. thousands needed)
- No explicit harness racing support mentioned
- Unlikely to have the comprehensive NZ harness data required

**Sources**:
- [Data Coverage](https://www.theracingapi.com/data-coverage)
- [Main Site](https://www.theracingapi.com/)

---

### 2. Harness Racing New Zealand (HRNZ)

**Known Resources**:
- **Official Website**: https://www.hrnz.co.nz/
- **Results Archive**: https://infohorse.hrnz.co.nz/datahrs/results/results.htm
- **Public Data**: Race fields, form, results, replays, premiership tables

**API Status**: ❌ **NOT AVAILABLE (January 2026)**
- HRNZ is **not currently issuing API keys**
- No timeline for when API access will be available
- Public results archive remains accessible via web interface
- May revisit API access in future

**Data Quality**: ⭐⭐⭐⭐⭐ **EXCELLENT**
- Official governing body for NZ harness racing
- Most authoritative and complete source
- Same data that TAB likely uses

**Access Method**: 🕷️ **WEB SCRAPING REQUIRED**
- API keys not being issued (confirmed January 2026)
- Web scraping of public results archive is necessary
- TipSharks HRNZ scraper implemented for this purpose

**Sources**:
- [HRNZ Official](https://www.hrnz.co.nz/)
- [HRNZ Results Index](https://infohorse.hrnz.co.nz/datahrs/results/results.htm)

**Note**: As of January 2026, HRNZ has confirmed they are not issuing API keys until further notice. The web scraper is currently the only viable method for obtaining historical data.

---

### 3. Third-Party Aggregators

**Racing and Sports** (racingandsports.com.au):
- Provides NZ harness racing results
- Likely has historical data
- Unknown if API access available
- [NZ Harness Results](https://www.racingandsports.com.au/harness-racing-results/new-zealand)

**Sky Racing World**:
- Harness racing results including NZ
- Unknown API availability
- [Harness Results](https://www.skyracingworld.com/harness-racing-results)

**Harnesslink**:
- NZ harness racing news and data
- [NZ Section](https://harnesslink.com/category/new-zealand/)

---

## Recommendations

### Short-term Solutions (Immediate)

#### Option 1: Continue with Mock Data ✅ **CURRENT APPROACH**
**Pros**:
- Already implemented and working
- Perfect for development and testing
- Demonstrates all application capabilities
- No external dependencies

**Cons**:
- Not real data
- Cannot validate rating algorithm accuracy

**Use Case**: Development, testing, demonstrations

---

#### Option 2: Run TAB API Continuously 🔄 **LIVE CAPTURE**
**Implementation**:
```bash
# Cron job to capture live races every 30 minutes
*/30 * * * * docker compose run --rm worker python -m apps.backend.worker.cli ingest --date $(date +%Y-%m-%d)
```

**Pros**:
- Uses existing TAB API integration
- Builds genuine historical database over time
- Free and publicly accessible

**Cons**:
- Requires continuous operation
- Takes time to build historical dataset
- Misses races if system is down

**Use Case**: Production deployment for building future database

---

### Long-term Solutions (1-2 weeks effort)

#### Option 3: Contact HRNZ for Data Access 🎯 **RECOMMENDED**
**Action Plan**:
1. Email HRNZ technical/data department
2. Request API access or bulk historical data export
3. Inquire about:
   - Data licensing terms
   - Available date ranges
   - Data format (JSON, CSV, database dump)
   - Cost (if any)
   - Update frequency

**Pros**:
- Most authoritative source
- Likely most complete historical data
- Could get years of historical races
- Proper licensing for commercial use

**Cons**:
- May have costs
- May require approval process
- Unknown turnaround time

**Contact**:
- Website: https://www.hrnz.co.nz/
- Look for "Contact" or "Data Services" section

---

#### Option 4: Web Scraping HRNZ Results Archive ⚠️ **BACKUP OPTION**
**Source**: https://infohorse.hrnz.co.nz/datahrs/results/results.htm

**Implementation Approach**:
1. Scrape historical results pages
2. Parse HTML tables for race data
3. Map to TipSharks data model
4. Populate database

**Pros**:
- Data is publicly accessible
- Can obtain significant historical data
- One-time bulk import

**Cons**:
- Legal/ethical considerations (check ToS)
- Brittle (breaks if page structure changes)
- May miss some data fields
- Rate limiting required

**Note**: Only pursue if HRNZ doesn't offer official data access

---

## Hybrid Approach (REQUIRED - HRNZ API Unavailable)

**Status**: HRNZ API keys not being issued (January 2026)

**Solution**: Combine web scraping (historical) + TAB live capture (ongoing)

### Phase 1: Historical Backfill (Immediate)
🕷️ **HRNZ Web Scraper** - Import historical data
- Scrape HRNZ results archive for past races
- Start with recent history (last 1-2 years)
- Gradually expand to older data as needed
- Implement incremental updates to avoid re-scraping

### Phase 2: Live Data Capture (Parallel)
🔄 **TAB API Continuous Capture** - Build database going forward
- Deploy continuous TAB API ingestion
- Capture all races as they occur
- Run every 30 minutes to catch live races
- Creates "live edge" of database

### Phase 3: Reconciliation System (Critical)
🔄 **Deduplication & Merging** - Avoid duplicate records
- Implement meeting reconciliation logic
- Match HRNZ scraped data with TAB live data
- Prefer TAB data when both sources have same race
- Track data source for each record

### Phase 4: Incremental Updates (Ongoing)
♻️ **Smart Scraping** - Minimize unnecessary requests
- Track which dates/meetings already scraped
- Only scrape gaps in database
- Periodic re-scraping for data corrections
- Auto-discovery of new meetings on HRNZ index

---

## Technical Implementation Notes

### Live Capture System
```python
# apps/backend/worker/cli.py enhancement
@click.command()
@click.option('--continuous', is_flag=True, help='Run continuously every 30 minutes')
def ingest(continuous: bool):
    """Ingest race data with optional continuous mode."""
    if continuous:
        while True:
            today = date.today().isoformat()
            # Ingest today's races
            asyncio.run(ingest_date_range(today, today))

            # Sleep for 30 minutes
            time.sleep(1800)
    else:
        # Existing single-run logic
        pass
```

### Docker Deployment
```yaml
# docker-compose.yml addition
services:
  ingestion-worker:
    build:
      context: .
      dockerfile: docker/Dockerfile.worker
    command: python -m apps.backend.worker.cli ingest --continuous
    restart: unless-stopped
    environment:
      - TAB_MOCK_MODE=false
    depends_on:
      - db
```

### HRNZ Data Import Script
```python
# infrastructure/scripts/import_hrnz_bulk.py
def import_hrnz_csv(csv_path: str):
    """Import bulk historical data from HRNZ CSV export."""
    df = pd.read_csv(csv_path)

    # Map HRNZ fields to TipSharks model
    for _, row in df.iterrows():
        meeting = map_hrnz_meeting(row)
        MeetingRepository.upsert(session, meeting)

    # Compute ratings for imported data
    recompute_ratings(from_date, to_date)
```

---

## Next Steps

### Immediate Actions
1. ✅ Application rebuild complete (already done)
2. 🔄 Deploy continuous live capture if desired
3. 📧 Draft email to HRNZ requesting data access

### HRNZ Contact Email Template
```
Subject: Request for Historical Harness Racing Data Access

Dear HRNZ Team,

I am developing TipSharks, a harness racing ratings and predictions
platform for New Zealand harness racing. The system computes advanced
multi-runner Elo-style ratings for horses, drivers, and trainers.

I am reaching out to inquire about obtaining historical race data to
build a comprehensive database for rating computation:

1. Does HRNZ provide API access to historical race results?
2. Is bulk historical data export available (JSON/CSV/database)?
3. What date ranges are available?
4. What are the licensing terms and costs (if any)?
5. What data fields are included (starters, results, times, etc.)?

I am currently using the TAB Affiliates API for live races, but it
does not provide historical listings. I would greatly appreciate any
guidance on accessing HRNZ's authoritative historical data.

Thank you for your consideration.

Best regards,
[Your Name]
[Contact Information]
```

---

## Conclusion

**Best Path Forward**:

1. **Development**: Continue using mock data ✅
2. **Production**: Deploy live TAB API capture 🔄
3. **Historical Data**: Contact HRNZ for official access 📧
4. **Backup**: Web scraping only if needed ⚠️

The combination of **HRNZ official data** (if available) plus **continuous TAB API capture** provides the most robust long-term solution for TipSharks.

---

## References

### Sources Consulted
- [The Racing API - Data Coverage](https://www.theracingapi.com/data-coverage)
- [The Racing API - Main Site](https://www.theracingapi.com/)
- [HRNZ Official Website](https://www.hrnz.co.nz/)
- [HRNZ Results Index](https://infohorse.hrnz.co.nz/datahrs/results/results.htm)
- [Racing and Sports - NZ Harness Results](https://www.racingandsports.com.au/harness-racing-results/new-zealand)
- [Sky Racing World - Harness Results](https://www.skyracingworld.com/harness-racing-results)
- [Harnesslink - NZ Section](https://harnesslink.com/category/new-zealand/)
- [TAB NZ Racing Hub](https://www.tab.co.nz/racing-hub/racing-index)

---

**Document Status**: Initial Analysis
**Next Review**: After HRNZ contact response
