# Confidence Threshold Guide

The confidence threshold determines when to accept LLM-parsed changes versus falling back to regex parsing or rejecting the change entirely.

## Overview

When the LLM parses a CodeRabbit comment, it returns a confidence score (0.0-1.0) indicating how certain it is about the extracted changes. The confidence threshold filters out low-quality parses.

```text
Confidence Score:  0.0 ───────────────────────────────────────── 1.0
                    │              │                              │
                    │     REJECT   │      ACCEPT                  │
                    │  (fallback)  │   (use LLM result)           │
                    │              │                              │
                    └──────────────┴──────────────────────────────┘
                              threshold (default: 0.5)
```

## Configuration

### CLI Option

```bash
pr-resolve apply 123 --llm-enabled --llm-confidence-threshold 0.7
```

### Environment Variable

```bash
export CR_LLM_CONFIDENCE_THRESHOLD=0.7
```

### Config File (YAML)

```yaml
llm:
  confidence_threshold: 0.7
```

### Config File (TOML)

```toml
[llm]
confidence_threshold = 0.7
```

## How Confidence is Calculated

The LLM assigns confidence based on:

1. **Comment structure clarity**: Well-formatted suggestions score higher
2. **Code block presence**: Explicit code blocks increase confidence
3. **File path clarity**: Clear file references improve confidence
4. **Change type recognition**: Standard patterns (add, replace, delete) score higher

### Example Scores

| Comment Type | Typical Confidence |
|--------------|-------------------|
| Clear code suggestion with file path | 0.9-1.0 |
| Code block without explicit file | 0.7-0.85 |
| Inline suggestion | 0.5-0.7 |
| Ambiguous or complex comment | 0.3-0.5 |
| Unrelated or non-actionable | 0.0-0.3 |

## Threshold Recommendations

### Low Threshold (0.3-0.5)

**Use when:**

* You want maximum coverage
* Manual review will catch errors
* Comments are typically well-formatted

```yaml
llm:
  confidence_threshold: 0.3
  fallback_to_regex: true  # Let regex handle low-confidence
```

**Trade-off**: More changes applied, but some may be incorrect

### Default Threshold (0.5)

**Use when:**

* Balanced accuracy and coverage
* Standard CodeRabbit comment formats
* Automated pipelines with some oversight

```yaml
llm:
  confidence_threshold: 0.5
```

**Trade-off**: Good balance of coverage and accuracy

### High Threshold (0.7-0.9)

**Use when:**

* Accuracy is critical
* Automated apply without manual review
* Production environments

```yaml
llm:
  confidence_threshold: 0.8
  fallback_to_regex: false  # Don't apply uncertain changes
```

**Trade-off**: Fewer changes applied, but higher accuracy

## Interaction with Fallback

When `fallback_to_regex: true` (default), low-confidence LLM results trigger regex parsing:

```text
┌─────────────────────────────────────────────────────────────────┐
│                         Comment Input                           │
│                              │                                  │
│                              ▼                                  │
│                      ┌───────────────┐                          │
│                      │  LLM Parser   │                          │
│                      └───────────────┘                          │
│                              │                                  │
│                              ▼                                  │
│                  confidence >= threshold?                       │
│                     │              │                            │
│                    YES            NO                            │
│                     │              │                            │
│                     ▼              ▼                            │
│              ┌─────────┐   ┌─────────────┐                      │
│              │  Accept │   │ Regex       │                      │
│              │  LLM    │   │ Fallback    │                      │
│              │  Result │   │             │                      │
│              └─────────┘   └─────────────┘                      │
│                                  │                              │
│                                  ▼                              │
│                           regex match?                          │
│                           │          │                          │
│                          YES        NO                          │
│                           │          │                          │
│                           ▼          ▼                          │
│                     ┌─────────┐ ┌─────────┐                     │
│                     │ Accept  │ │ Reject  │                     │
│                     │ Regex   │ │ Comment │                     │
│                     └─────────┘ └─────────┘                     │
└─────────────────────────────────────────────────────────────────┘
```

### Disabling Fallback

For strict LLM-only parsing:

```yaml
llm:
  confidence_threshold: 0.7
  fallback_to_regex: false
```

With fallback disabled, low-confidence results are rejected entirely.

## Monitoring Confidence

### Check Fallback Rate

High fallback rates indicate the threshold may be too high:

```bash
pr-resolve apply 123 --llm-enabled --show-metrics
```

Output:

```text
Fallback rate: 15.2% (8 fallbacks)
```

**Interpretation:**

* < 5%: Threshold is appropriate
* 5-15%: Consider lowering threshold slightly
* \> 15%: Threshold may be too high for your comment style

### Per-Request Confidence

Export metrics for detailed analysis:

```bash
pr-resolve apply 123 --llm-enabled --show-metrics --metrics-output metrics.json
```

## Tuning Workflow

1. **Start with default (0.5)**:

   ```bash
   pr-resolve apply 123 --llm-enabled --show-metrics
   ```

2. **Check fallback rate**: If > 10%, try lowering threshold

3. **Check accuracy**: Review applied changes for correctness

4. **Adjust as needed**:

   ```bash
   # Lower threshold if too many fallbacks
   pr-resolve apply 123 --llm-confidence-threshold 0.4

   # Raise threshold if seeing incorrect parses
   pr-resolve apply 123 --llm-confidence-threshold 0.7
   ```

## Common Issues

### High Fallback Rate

**Symptom**: Fallback rate > 20%

**Causes:**

* Threshold too high for comment style
* Complex or non-standard comment formats
* LLM having trouble with specific patterns

**Solutions:**

* Lower confidence threshold
* Check if comments follow CodeRabbit format
* Review fallback comments for patterns

### Incorrect Parses

**Symptom**: Applied changes don't match comment intent

**Causes:**

* Threshold too low
* Ambiguous comments
* LLM misinterpretation

**Solutions:**

* Raise confidence threshold
* Enable stricter validation
* Review comment formatting

### No Changes Applied

**Symptom**: All comments rejected or fallback

**Causes:**

* Threshold too high
* LLM provider issues
* Non-actionable comments

**Solutions:**

* Lower confidence threshold
* Check LLM provider connectivity
* Verify comments contain code suggestions

## See Also

* [LLM Configuration](llm-configuration.md) - Full configuration reference
* [Troubleshooting](troubleshooting.md) - Common issues and solutions
* [Metrics Guide](metrics-guide.md) - Understanding metrics output