# Observability Setup Guide

This guide shows you how to set up and view metrics, logs, and traces for the Racing Scraper.

## 🚀 Quick Start

### 1. Start All Services

```bash
# Make sure Docker containers are running
docker-compose up -d

# Verify all containers are healthy
docker-compose ps
```

You should see:
- ✅ racing-postgres (healthy)
- ✅ racing-redis (healthy)
- ✅ racing-jaeger (running)
- ✅ racing-prometheus (running)
- ✅ racing-grafana (running)
- ✅ racing-app (running)

### 2. Generate Some Data

Run the demo script to generate metrics and traces:

```bash
# Run TAB API demo (generates API metrics)
tsx src/api/tab/demo.ts

# Or run the full app
npm run dev
```

### 3. View Observability Data

## 📊 Grafana Dashboards

**URL**: http://localhost:3000
**Username**: admin
**Password**: admin

### Import Dashboard

1. Open Grafana at http://localhost:3000
2. Login (admin/admin)
3. Click **+** → **Import Dashboard**
4. Upload `grafana/dashboards/tab-api-metrics.json`
5. Select **Prometheus** as the data source
6. Click **Import**

### What You'll See

**TAB API Metrics Dashboard** includes:
- 📈 Request rate by endpoint
- ⏱️  Request duration (p50, p95)
- ❌ Error rates by type
- 📊 Meetings processed (last hour)
- 🏇 Races processed (last hour)
- 🔀 Success vs Error ratio
- 📋 Request count by endpoint table

---

## 📈 Prometheus Metrics

**URL**: http://localhost:9091

### Available Metrics

#### TAB API Client Metrics

```promql
# Request duration histogram
tab_api_request_duration_seconds

# Total requests by endpoint and status
tab_api_requests_total

# Total errors by endpoint and error type
tab_api_errors_total
```

#### Meeting Service Metrics

```promql
# Meetings processed
meetings_processed_total

# Races processed
races_processed_total

# Operation duration
meeting_service_operation_duration_seconds
```

### Example Queries

```promql
# Request rate per second
rate(tab_api_requests_total[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(tab_api_request_duration_seconds_bucket[5m]))

# Error rate
rate(tab_api_errors_total[5m])

# Total meetings processed in last hour
sum(increase(meetings_processed_total[1h]))
```

---

## 🔍 Jaeger Distributed Tracing

**URL**: http://localhost:16686

### View Traces

1. Open Jaeger at http://localhost:16686
2. Select **Service**: `racing-scraper`
3. Select **Operation**:
   - `tab_api.get_meetings`
   - `tab_api.get_meeting_by_id`
   - `meeting_service.fetch_and_store_meetings`
4. Click **Find Traces**

### What Traces Show

Each trace includes:
- ⏱️  Total operation duration
- 📍 Span hierarchy (parent → child operations)
- 🏷️  Tags: endpoint, status, meeting info
- 📝 Logs: error messages, state changes
- 🔗 Context propagation across services

**Example Trace:**
```
meeting_service.fetch_and_store_meetings (10.5s)
  └─ tab_api.get_meetings (8.2s)
     └─ HTTP GET /affiliates/v1/racing/meetings (8.1s)
  └─ database.upsert (2.1s)
     └─ meeting.upsert (1.2s)
     └─ race.upsert (0.9s)
```

---

## 📝 Structured Logs (Pino)

### View Logs

```bash
# View app logs
docker-compose logs -f app

# View logs with timestamps
docker-compose logs -f --timestamps app

# Filter logs by level
docker-compose logs app | grep '"level":50'  # Error logs only
```

### Log Levels

- **10**: TRACE (very detailed)
- **20**: DEBUG (development info)
- **30**: INFO (normal operations) ← default
- **40**: WARN (warnings)
- **50**: ERROR (errors)
- **60**: FATAL (critical errors)

### Example Log Entry

```json
{
  "level": 30,
  "time": 1705196400000,
  "pid": 1,
  "hostname": "racing-app",
  "msg": "Fetching meetings from TAB API",
  "endpoint": "/affiliates/v1/racing/meetings",
  "category": "T",
  "country": "AUS",
  "date": "2026-01-14",
  "duration": 8234
}
```

---

## 🔥 Advanced: Custom Grafana Alerts

### Setup Email Alerts

1. In Grafana, go to **Alerting** → **Notification channels**
2. Add **Email** channel with your SMTP details
3. Edit dashboard panels to add alert rules

### Example Alert: High Error Rate

```
Alert: TAB API High Error Rate
Condition: rate(tab_api_errors_total[5m]) > 0.1
For: 5 minutes
Notify: Email
Message: "TAB API error rate exceeded 0.1 errors/sec for 5 minutes"
```

---

## 📊 Metrics Endpoint

The app exposes a `/metrics` endpoint for Prometheus:

**URL**: http://localhost:9090/metrics

```bash
# View raw metrics
curl http://localhost:9090/metrics

# Filter for specific metrics
curl http://localhost:9090/metrics | grep tab_api
```

---

## 🧪 Testing Observability

### Generate Load

```bash
# Run demo multiple times to generate data
for i in {1..10}; do
  tsx src/api/tab/demo.ts
  sleep 5
done
```

### Verify Metrics

```bash
# Check if metrics endpoint is working
curl http://localhost:9090/metrics | head -20

# Check Prometheus targets
curl http://localhost:9091/api/v1/targets | jq
```

### Verify Traces

```bash
# Check Jaeger health
curl http://localhost:16686/api/services

# Query traces
curl "http://localhost:16686/api/traces?service=racing-scraper&limit=10"
```

---

## 🐛 Troubleshooting

### Grafana shows "No Data"

1. Check Prometheus data source is configured:
   - Go to **Configuration** → **Data Sources**
   - Add **Prometheus** if not exists
   - URL: `http://prometheus:9090`
   - Click **Save & Test**

2. Check Prometheus is scraping metrics:
   - Open http://localhost:9091/targets
   - Ensure `racing-scraper` target is **UP**

3. Generate some data by running the demo

### Jaeger shows no traces

1. Check OTEL environment variables in `.env`:
   ```
   OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
   ENABLE_TRACING=true
   ```

2. Restart the app:
   ```bash
   docker-compose restart app
   ```

3. Run demo to generate traces

### Prometheus not scraping

1. Check Prometheus config:
   ```bash
   cat prometheus.yml
   ```

2. Ensure app container is reachable:
   ```bash
   docker-compose exec prometheus wget -O- http://app:9090/metrics
   ```

3. Restart Prometheus:
   ```bash
   docker-compose restart prometheus
   ```

---

## 📚 Additional Resources

- [Prometheus Query Language](https://prometheus.io/docs/prometheus/latest/querying/basics/)
- [Grafana Dashboard Documentation](https://grafana.com/docs/grafana/latest/dashboards/)
- [Jaeger Tracing Guide](https://www.jaegertracing.io/docs/)
- [Pino Logging Best Practices](https://github.com/pinojs/pino/blob/master/docs/best-practices.md)

---

## ✅ Verification Checklist

After setup, verify each component:

- [ ] Grafana accessible at http://localhost:3000
- [ ] Dashboard imported and showing data
- [ ] Prometheus accessible at http://localhost:9091
- [ ] Prometheus scraping app metrics (check /targets)
- [ ] Jaeger accessible at http://localhost:16686
- [ ] Traces visible for `racing-scraper` service
- [ ] App logs showing structured JSON
- [ ] Metrics endpoint returns data: http://localhost:9090/metrics

---

**Need help?** Check the troubleshooting section or review the logs:
```bash
docker-compose logs --tail=100
```