# Scheduler Monitoring Dashboard

## Summary

A comprehensive Grafana dashboard has been created to monitor scheduler operations, race data status, and system health.

## What Was Done

### 1. Cleared Test Data ✅
- Removed all test meetings, races, runners, results, scrapes, and job runs from the database
- Database now ready for clean production data monitoring

### 2. Created Scheduler Monitoring Dashboard ✅

**Location**: `grafana/dashboards/scheduler-monitoring.json`

**Dashboard Features**:

#### Scheduler Status Overview
- **Active Scheduler Indicators**: Visual status for Morning Scrape, Pre-Race, and Post-Race schedulers
- **Last Successful Run**: Timestamp of the most recent successful job
- **Failed Jobs (1h)**: Count of failures in the last hour with color-coded thresholds

#### Scheduler Execution Metrics
- **Execution Rate**: Real-time rate of successful and failed scheduler runs per minute
- **Duration Tracking**: p50 and p95 latency metrics for each scheduler type
- **Items Processed**: 5-minute window bars showing processing volume by scheduler
- **Job Status Distribution**: Pie chart showing success/failure ratios over the last hour

#### Race Data Status
- **Races Today**: Total count of races scheduled for today
- **Meetings Today**: Total count of meetings scheduled for today
- **Minutes Since Last Update**: Monitors data freshness with color-coded thresholds
  - Green: < 60 minutes
  - Yellow: 60-120 minutes
  - Red: > 120 minutes
- **Next Race Start**: Timestamp of the next upcoming race
- **Upcoming Races Table**: Detailed view of next 20 races with:
  - Meeting name
  - Race number and name
  - Start time
  - Minutes until race (color-coded based on urgency)
  - Current status
  - Last updated timestamp

#### Job Run History
- **Recent Job Runs Table**: Last 50 job executions showing:
  - Job type
  - Status (color-coded background)
  - Start time
  - Duration in seconds
  - Items processed and failed
  - Retry count
  - Error messages (truncated to 100 chars)

### 3. PostgreSQL Datasource Configuration ✅
- Added PostgreSQL datasource provisioning for Grafana
- Connected to the racing database for direct SQL queries
- Configured environment variables in docker-compose for secure credential passing

### 4. Dashboard Configuration
- **Auto-refresh**: 10 seconds (configurable)
- **Timezone**: Pacific/Auckland (New Zealand)
- **Time Range**: Default last 6 hours
- **Live Mode**: Enabled for real-time monitoring

## Accessing the Dashboard

1. **Start the stack**:
   ```bash
   docker compose up -d
   ```

2. **Access Grafana**:
   - URL: http://localhost:3000
   - Username: `admin`
   - Password: `admin`

3. **Find the dashboard**:
   - Navigate to Dashboards
   - Look for "TAB API - Scheduler Monitoring"

## Testing the Dashboard

To generate test metrics and verify the dashboard:

```bash
# Generate scheduler metrics with real data
npx tsx scripts/test-scheduler-metrics.ts
```

This will:
- Trigger a manual morning scrape
- Fetch today's and tomorrow's racing data
- Generate metrics visible in the dashboard
- Create job run records in the database

## Dashboard Metrics Explained

### Prometheus Metrics
These metrics are scraped from the application's `/metrics` endpoint:

- `scheduler_runs_total`: Counter of scheduler executions by type and status
- `scheduler_duration_seconds`: Histogram of execution durations
- `scheduler_items_processed_total`: Counter of items (meetings/races) processed

### Database Queries
Direct SQL queries against PostgreSQL for business-level insights:

- Race and meeting counts for today
- Data freshness calculations
- Upcoming race schedules
- Job run history and status

## Key Insights

The dashboard provides several operational insights:

1. **Temporal Awareness**: "Minutes Since Last Update" tells you if you're keeping up with real-time data needs
2. **Health Monitoring**: Failed job counts and status distributions quickly reveal system issues
3. **Performance Tracking**: Duration histograms show if schedulers are slowing down over time
4. **Business Context**: Race counts and schedules provide the "why" behind scheduler activity

## Next Steps

Now that monitoring is set up, you can:

1. **Run schedulers**: Start the application and let schedulers run on their cron schedules
2. **Monitor performance**: Watch the dashboard during peak racing times
3. **Set up alerts**: Configure Grafana alerts for critical thresholds
4. **Pre-Race Scheduler**: Move on to implementing the PreRaceScheduler as planned

## Files Created/Modified

- ✅ `grafana/dashboards/scheduler-monitoring.json` - New dashboard
- ✅ `grafana/provisioning/datasources/postgres.yml` - PostgreSQL datasource config
- ✅ `docker-compose.yml` - Added Postgres env vars to Grafana service
- ✅ `scripts/clear-test-data.ts` - Utility to clear test data
- ✅ `scripts/test-scheduler-metrics.ts` - Test script for dashboard validation
