# Real API Metrics Enabled

## Overview

Removed the test data generator and enabled real API metrics collection from actual TAB API calls. The system now shows genuine observability data from scheduler operations.

## Changes Made

### 1. Removed Test Data Generator

**File**: `src/index.ts`

**Removed**:
- `generateTestData()` function (lines 64-102) - Fake metric generation
- `setInterval(generateTestData, 2000)` - Test data injection loop

**Impact**: No more fake metrics polluting Prometheus/Grafana

### 2. Unified Prometheus Registry

**File**: `src/index.ts`

**Before**:
```typescript
const register = new promClient.Registry(); // Custom registry
// ... metrics with registers: [register]
```

**After**:
```typescript
const register = promClient.register; // Use default global registry
```

**Why**: TabApiClient and MeetingService create metrics with the default registry. Using a custom registry in index.ts created two separate metric stores, causing the real metrics to be invisible.

### 3. Metric Collection Flow

```
┌─────────────────┐
│  TabApiClient   │ → Creates metrics on default registry
│  (API calls)    │   - tab_api_requests_total
└────────┬────────┘   - tab_api_errors_total
         │            - tab_api_request_duration_seconds
         ↓
┌─────────────────┐
│ MeetingService  │ → Creates metrics on default registry
│  (Processing)   │   - meetings_processed_total
└────────┬────────┘   - races_processed_total
         │            - meeting_service_operation_duration_seconds
         ↓
┌─────────────────┐
│   Schedulers    │ → Trigger API calls & processing
│  (Orchestration)│
└────────┬────────┘
         ↓
┌─────────────────┐
│   Prometheus    │ → Scrapes /metrics endpoint
│   (Scraping)    │   GET http://localhost:9090/metrics
└────────┬────────┘
         ↓
┌─────────────────┐
│    Grafana      │ → Visualizes real data
│  (Dashboard)    │
└─────────────────┘
```

## Current State

### Real Metrics Being Collected

From a test run with invalid API credentials:

```
tab_api_requests_total{endpoint="/affiliates/v1/racing/meetings",status="error"} 8
tab_api_errors_total{endpoint="/affiliates/v1/racing/meetings",error_type="enotfound"} 8
tab_api_request_duration_seconds_count{endpoint="/affiliates/v1/racing/meetings",status="error"} 8
tab_api_request_duration_seconds_sum{endpoint="/affiliates/v1/racing/meetings",status="error"} 52.696
```

**Interpretation**:
- 8 API calls attempted (1 per date/category/country combination)
- All failed with DNS error (invalid API key or network issue)
- Total duration: 52.7 seconds (includes retries)
- Average: 6.6 seconds per request

### Scheduler Logs (Real Behavior)

```json
{
  "level": "ERROR",
  "message": "Failed to fetch meetings for combination",
  "category": "H",
  "country": "AUS",
  "date": "2026-01-13",
  "error": "No response received from TAB API"
}
```

```json
{
  "level": "INFO",
  "message": "Scheduling morning scrape retry",
  "retryCount": 1,
  "failedCombinations": 8,
  "retryInMinutes": 30
}
```

## Benefits

### 1. **Authentic Observability**
- Real API latencies (not simulated)
- Actual error rates and types
- True retry behavior
- Genuine rate limiting patterns

### 2. **Trustworthy Metrics**
- Can confidently make decisions based on dashboard data
- Alert thresholds reflect real system behavior
- Performance trends show actual improvements/degradations

### 3. **Production-Ready**
- Same code path for dev and prod
- No test mode switches to forget about
- Easier debugging (what you see is what's happening)

## Next Steps: Getting Real TAB Data

Currently seeing `ENOTFOUND` errors because:
- API key is placeholder: `TAB_API_KEY=your-tab-api-key`
- Or network/DNS configuration issue

### To Fix (when you have a real API key):

1. **Update .env file**:
```bash
TAB_API_KEY=<your-actual-api-key>
```

2. **Restart the application**:
```bash
docker compose restart app
```

3. **Verify API connectivity**:
```bash
# Check for successful requests
docker compose logs app | grep "status=\"success\""

# Check metrics
curl -s http://localhost:9090/metrics | grep 'status="success"'
```

### Expected Behavior with Valid Key

**Morning Scrape (6 AM AEST)**:
```
INFO: Morning scrape completed
  itemsProcessed: 45
  meetingsByDay: {"2026-01-14": 12, "2026-01-15": 10}
  racesByDay: {"2026-01-14": 128, "2026-01-15": 94}
  errors: 0
  durationMs: 8234
```

**Metrics**:
```
tab_api_requests_total{endpoint="/affiliates/v1/racing/meetings",status="success"} 8
meetings_processed_total{operation="fetch",status="success"} 22
races_processed_total{operation="fetch",status="success"} 222
```

## API Endpoints Being Called

With real data, the system will call:

1. **GET /affiliates/v1/racing/meetings** (8 combinations)
   - Categories: T, H, G (Thoroughbred, Harness, Greyhound)
   - Countries: AUS, NZ
   - Dates: Today, Tomorrow

2. **GET /affiliates/v1/racing/meetings/{id}** (per meeting)
   - Called by PreRaceSchedulers at T-60 and T-15
   - Called by PostRaceScheduler at T+5

## Grafana Dashboard

The existing Grafana dashboard at http://localhost:3000 will automatically show real data once API calls succeed:

### Current Panels (will populate with real data):

1. **API Request Rate** - Actual requests/sec to TAB API
2. **API Success Rate** - Real success vs error ratio
3. **API Latency** - True response times (P50, P95, P99)
4. **Error Distribution** - Actual error types encountered
5. **Meetings Processed** - Real meetings fetched
6. **Races Processed** - Real races ingested

### Additional Panels to Add (Future):

- JobRun success rate over time
- Retry frequency and success rate
- Data freshness (time since last successful scrape)
- Race status distribution (provisional vs confirmed)

## Verification Commands

```bash
# Check if metrics are being collected
curl -s http://localhost:9090/metrics | grep -E "tab_api|meetings_|races_" | head -20

# Watch real-time scheduler activity
docker compose logs -f app | grep -E "INFO|ERROR"

# Check JobRun database
docker exec racing-postgres psql -U racing -d racing_db -c "
  SELECT
    \"jobType\",
    status,
    \"itemsProcessed\",
    \"itemsFailed\",
    \"durationMs\",
    \"startedAt\"
  FROM job_runs
  ORDER BY \"startedAt\" DESC
  LIMIT 10;
"

# Check Prometheus targets
curl -s http://localhost:9090/health

# Access Grafana
open http://localhost:3000
# Default credentials: admin/admin
```

## Files Modified

| File | Change | Purpose |
|------|--------|---------|
| `src/index.ts` | Removed `generateTestData()` | Stop fake metrics |
| `src/index.ts` | Changed to default registry | Unify metric collection |
| `src/index.ts` | Removed test data interval | Stop periodic injection |

## Rollback (if needed)

If you need to restore test data temporarily:

```typescript
// In src/index.ts, before main():
function generateTestData() {
  // ... previous implementation
}

// In main(), before closing brace:
setInterval(generateTestData, 2000);
logger.info('🎲 Test data generator started');
```

However, this is **not recommended** as it defeats the purpose of having real observability.

---

**Status**: ✅ **COMPLETE**
**Date**: 2026-01-14
**Impact**: Dashboard now shows real API behavior instead of simulated data
**Next**: Configure valid TAB API key to see successful data ingestion
