# LLM Metrics Guide

This guide explains how to use the metrics system to monitor LLM performance, costs, and reliability.

## Overview

The metrics system tracks:

* **Request latency** (p50, p95, p99 percentiles)
* **Success/failure rates** per provider
* **Cost tracking** with budget enforcement
* **Cache hit rates** and savings
* **Fallback rate** (LLM vs regex parser)
* **Per-provider breakdown**

## CLI Options

### Displaying Metrics

Use `--show-metrics` to display aggregated metrics after processing:

```bash
pr-resolve apply 123 --llm-enabled --show-metrics
```

Output:

```text
=== LLM Metrics Summary ===
Total requests: 42
Success rate: 97.6%
Latency: p50=0.234s, p95=1.456s, p99=2.103s
Total cost: $0.0523
Cache hit rate: 35.7%
Fallback rate: 4.8% (2 fallbacks)

Per-provider breakdown:
  anthropic: 35 requests, p95=1.234s, $0.0412
  ollama: 7 requests, p95=0.567s, $0.0000
```

### Exporting Metrics

Export metrics to a file for analysis:

```bash
# Export to JSON
pr-resolve apply 123 --llm-enabled --show-metrics --metrics-output metrics.json

# Export to CSV (per-request data)
pr-resolve apply 123 --llm-enabled --show-metrics --metrics-output metrics.csv
```

## JSON Export Format

The JSON export includes:

```json
{
  "summary": {
    "total_requests": 42,
    "successful_requests": 41,
    "failed_requests": 1,
    "success_rate": 0.976,
    "latency_p50": 0.234,
    "latency_p95": 1.456,
    "latency_p99": 2.103,
    "latency_avg": 0.567,
    "total_cost": 0.0523,
    "cost_per_comment": 0.00124,
    "cache_hit_rate": 0.357,
    "cache_savings": 0.0187,
    "fallback_count": 2,
    "fallback_rate": 0.048
  },
  "provider_stats": {
    "anthropic": {
      "provider": "anthropic",
      "model": "claude-haiku-4-20250514",
      "total_requests": 35,
      "successful_requests": 34,
      "failed_requests": 1,
      "success_rate": 0.971,
      "total_cost": 0.0412,
      "total_tokens": 15234,
      "avg_latency": 0.678,
      "latency_p50": 0.456,
      "latency_p95": 1.234,
      "latency_p99": 1.567,
      "cache_hit_rate": 0.286,
      "error_counts": {"RateLimitError": 1}
    }
  },
  "pr_info": {
    "owner": "VirtualAgentics",
    "repo": "my-repo",
    "pr_number": 123
  }
}
```

## CSV Export Format

CSV exports per-request data for detailed analysis:

| Column | Description |
|--------|-------------|
| `request_id` | Unique request identifier |
| `provider` | Provider name (anthropic, openai, ollama) |
| `model` | Model identifier |
| `latency_seconds` | Request duration |
| `success` | True/False |
| `tokens_input` | Input tokens consumed |
| `tokens_output` | Output tokens generated |
| `cost` | Request cost in USD |
| `cache_hit` | True if served from cache |
| `error_type` | Error class name if failed |

## Key Metrics Explained

### Latency Percentiles

* **p50 (median)**: Typical request latency
* **p95**: 95% of requests complete within this time
* **p99**: Worst-case latency (excluding outliers)

**Interpretation:**

* High p95 vs p50 gap indicates inconsistent performance
* High p99 may indicate timeout issues or provider instability

### Success Rate

Percentage of requests that completed successfully.

**Targets:**

* \> 99%: Excellent
* 95-99%: Good
* < 95%: Investigate failures

### Cache Hit Rate

Percentage of requests served from prompt cache.

**Impact:**

* Higher hit rate = lower costs and latency
* Typical range: 20-50% for varied PRs

### Fallback Rate

Percentage of comments where LLM parsing failed and regex fallback was used.

**Impact:**

* Higher fallback rate = lower parsing accuracy
* Target: < 10%

### Cache Savings

Estimated cost saved by cache hits, calculated as:

```text
savings = cache_hits × avg_non_cache_cost
```

## Programmatic Access

Use the `MetricsAggregator` class for custom integrations:

```python
from review_bot_automator.llm.metrics_aggregator import MetricsAggregator
from pathlib import Path

# Create aggregator
aggregator = MetricsAggregator()
aggregator.set_pr_info("owner", "repo", 123)

# Track requests
request_id = aggregator.start_request("anthropic", "claude-haiku-4")
# ... make LLM call ...
aggregator.end_request(
    request_id,
    success=True,
    tokens_input=100,
    tokens_output=50,
    cost=0.0012
)

# Get aggregated metrics
metrics = aggregator.get_aggregated_metrics(comments_processed=10)
print(f"Total cost: ${metrics.total_cost:.4f}")
print(f"Cost per comment: ${metrics.cost_per_comment:.4f}")

# Export
aggregator.export_json(Path("metrics.json"))
aggregator.export_csv(Path("metrics.csv"))

# Human-readable summary
print(aggregator.get_summary_report())
```

## Analyzing Metrics

### Cost Analysis

```python
import json
from pathlib import Path

data = json.loads(Path("metrics.json").read_text())

# Cost breakdown by provider
for provider, stats in data["provider_stats"].items():
    print(f"{provider}: ${stats['total_cost']:.4f} "
          f"({stats['total_requests']} requests)")

# Cost efficiency
summary = data["summary"]
print(f"Cost per comment: ${summary['cost_per_comment']:.4f}")
print(f"Cache savings: ${summary['cache_savings']:.4f}")
```

### Performance Analysis

```python
# Identify slow providers
for provider, stats in data["provider_stats"].items():
    if stats["latency_p95"] > 2.0:
        print(f"Warning: {provider} p95 latency is {stats['latency_p95']:.2f}s")

# Check error patterns
for provider, stats in data["provider_stats"].items():
    if stats["error_counts"]:
        print(f"{provider} errors: {stats['error_counts']}")
```

## Best Practices

1. **Enable metrics in CI/CD**: Track performance trends over time
2. **Set cost budgets**: Use `CR_LLM_COST_BUDGET` to prevent surprises
3. **Monitor fallback rate**: High rates indicate parsing issues
4. **Review error counts**: Identify provider-specific problems
5. **Export for analysis**: Use JSON/CSV for historical tracking

## See Also

* [Cost Estimation](cost-estimation.md) - Pre-run cost estimation
* [LLM Configuration](llm-configuration.md) - Full configuration reference
* [Performance Tuning](performance-tuning.md) - Optimizing performance