# Local LLM Operation Guide

Complete guide for running Review Bot Automator with local LLM inference using Ollama to reduce third-party exposure.

> **See Also**: [Privacy Architecture](privacy-architecture.md) for privacy benefits and [Ollama Setup Guide](ollama-setup.md) for installation instructions.

## Table of Contents

* [Overview](#overview)
* [Prerequisites](#prerequisites)
* [Setup Process](#setup-process)
* [Privacy Verification](#privacy-verification)
* [Troubleshooting](#troubleshooting)
* [Maintenance and Updates](#maintenance-and-updates)

---

## Overview

### Why Local LLM Operation

Local LLM operation with Ollama provides:

* ✅ **Reduced Third-Party Exposure**: LLM vendors (OpenAI/Anthropic) never see your code
* ✅ **Simpler Compliance**: One fewer data processor in your chain (no LLM vendor BAA/DPA)
* ✅ **Cost Savings**: Zero per-request LLM fees after hardware investment
* ✅ **Control**: You manage model updates and data retention
* ✅ **No LLM Rate Limits**: Process as many reviews as your hardware allows

### Important Limitations

**This tool is NOT air-gapped and CANNOT operate offline**:

* ❌ **Requires Internet**: Must fetch PR comments from GitHub API
* ⚠️ **GitHub Has Access**: Your code is on GitHub (required for PR workflow)
* ⚠️ **CodeRabbit Has Access**: Review bot processes your code (required)
* ✅ **LLM Processing Local**: Only the LLM inference step is local

**What Ollama Actually Does**: Processes review comments locally instead of sending them to OpenAI/Anthropic. This eliminates LLM vendor exposure but does not eliminate GitHub or CodeRabbit access.

### What Works Locally

After setup, these features use local LLM inference:

* ✅ LLM-powered comment parsing (via local Ollama)
* ✅ Code review suggestion application
* ✅ Conflict resolution
* ✅ All pr-resolve commands (`apply`, `analyze`)

### What Requires Internet

Internet is always required for:

* ✅ **GitHub API**: Fetching PR data and review comments
* ✅ **GitHub Push**: Pushing resolved changes back to PR
* ⚠️ **Initial Setup**: Downloading Ollama, models, and Review Bot Automator package (provides the `pr-resolve` CLI)

---

## Prerequisites

Before starting local LLM operation, you need:

### System Requirements

* **OS**: Linux, macOS, or Windows (with WSL2)
* **RAM**: Minimum 8GB (16GB+ recommended)
* **Disk**: 10-20GB free space (for models)
* **Internet**: Required for GitHub API access
* **Optional**: GPU (NVIDIA, AMD, or Apple Silicon) for faster inference

### Software Requirements

* **Ollama**: Latest version
* **Python 3.12+**: With pip and venv
* **Review Bot Automator**: Latest version from PyPI (provides the `pr-resolve` CLI)
* **LLM Model**: At least one model (qwen2.5-coder:7b recommended)
* **GitHub Token**: For API access

---

## Setup Process

Follow these steps to set up local LLM operation:

### Step 1: Install Ollama

```bash
# Linux / macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

# Start Ollama service
ollama serve

```

For detailed installation instructions, see [Ollama Setup Guide](ollama-setup.md).

### Step 2: Download LLM Model

```bash
# Recommended: Qwen2.5 Coder (best quality/speed balance)
ollama pull qwen2.5-coder:7b

# Alternative: CodeLlama
ollama pull codellama:7b

# Verify model downloaded
ollama list

```

**Storage Note**: Models are stored in `~/.ollama/models/` and can be 4-8GB each.

### Step 3: Install Review Bot Automator

```bash
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate    # Windows

# Install review-bot-automator (provides the pr-resolve CLI)
pip install review-bot-automator

# Verify installation
pr-resolve --version

```

### Step 4: Configure Local LLM

Create or update your configuration file:

**config.yaml**:

```yaml
llm:
  enabled: true
  provider: ollama
  model: qwen2.5-coder:7b
  ollama_base_url: http://localhost:11434
  fallback_to_regex: true

github:
  token: ${GITHUB_TOKEN}  # Set via environment variable

```

### Step 5: Set GitHub Token

```bash
# Set GitHub token (required for API access)
export GITHUB_TOKEN=ghp_your_token_here

# Verify GitHub API access
gh auth status

```

### Step 6: Test Local LLM Operation

```bash
# Test with actual PR
pr-resolve apply 123 --llm-preset ollama-local

# Or use custom config
pr-resolve apply 123 --config config.yaml

```

### Step 7: Verify Privacy

```bash
# Run privacy verification script
./scripts/verify_privacy.sh

# Expected output
# ✅ Privacy Verification: PASSED
# ✅ No external LLM connections detected
# ⚠️  GitHub API connections detected (expected)

```

**Note**: The verification script confirms that Ollama only uses localhost for LLM inference. GitHub API calls will still appear in network traffic (this is expected and required).

---

## Privacy Verification

### Automated Verification

Use the provided script to verify Ollama's localhost-only operation:

```bash
# Run privacy verification
./scripts/verify_privacy.sh

# Generates report: privacy-verification-report.md

```

**What This Verifies**:

* ✅ Ollama only communicates on localhost (127.0.0.1:11434)
* ✅ No connections to OpenAI or Anthropic LLM APIs
* ⚠️ GitHub API calls are not blocked (expected behavior)

**What This Does NOT Verify**:

* ❌ Does not prevent GitHub API access (required for tool to function)
* ❌ Does not verify air-gapped operation (not possible with this tool)
* ❌ Does not prevent CodeRabbit from accessing your code

### Manual Verification

You can manually verify Ollama's local operation:

#### Linux

```bash
# Monitor network connections during inference
# (Filter out GitHub API connections)
sudo lsof -i -n -P | grep -v "api.github.com"

# Run inference
pr-resolve apply 123 --llm-preset ollama-local

# Check connections again - should only see localhost:11434

```

#### macOS

```bash
# Monitor network connections
lsof -i -n -P | grep -v "api.github.com"

# Run inference
pr-resolve apply 123 --llm-preset ollama-local

# Verify no new LLM vendor connections (OpenAI/Anthropic)

```

### Understanding Network Traffic

When using pr-resolve with Ollama, you will see:

✅ **Expected Connections**:

* `localhost:11434` (Ollama LLM inference)
* `api.github.com:443` (Fetching PR data)
* `github.com:443` (Pushing changes)

❌ **Connections That Should NOT Appear**:

* `api.openai.com` (OpenAI LLM vendor)
* `api.anthropic.com` (Anthropic LLM vendor)

---

## Troubleshooting

### Issue: "Connection refused to localhost:11434"

**Cause**: Ollama service not running

**Fix**:

```bash
# Start Ollama
ollama serve

# Or use systemd (Linux)
sudo systemctl start ollama

# Verify it's running
curl http://localhost:11434/api/version

```

### Issue: "Model not found"

**Cause**: Model not downloaded or wrong name

**Fix**:

```bash
# List available models
ollama list

# Pull missing model
ollama pull qwen2.5-coder:7b

# Verify in config
cat config.yaml | grep model

```

### Issue: "GitHub API rate limit exceeded"

**Cause**: Too many GitHub API requests

**Fix**:

```bash
# Use authenticated token for higher rate limits
export GITHUB_TOKEN=ghp_your_token_here

# Check rate limit status
gh api rate_limit

```

### Issue: "Out of memory error"

**Cause**: Model too large for available RAM

**Fix**:

```bash
# Use smaller model
ollama pull qwen2.5-coder:3b  # Smaller version

# Update config
# model: qwen2.5-coder:3b

# Or add swap space (Linux)
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

```

### Issue: "Slow inference speed"

**Cause**: CPU inference without GPU acceleration

**Solutions**:

1. **Use GPU** if available (automatic with NVIDIA/AMD/Apple Silicon)
2. **Use smaller model** (3B instead of 7B)
3. **Increase RAM** allocation
4. **Close other applications** to free resources

```bash
# Check if GPU is being used
ollama ps

# Expected output shows GPU usage
# NAME              ... SIZE    PROCESSOR
# qwen2.5-coder:7b  ... 4.7GB   100% GPU

```

---

## Maintenance and Updates

### Updating Ollama

```bash
# Linux/macOS: Re-run installer
curl -fsSL https://ollama.ai/install.sh | sh

# Restart Ollama service
sudo systemctl restart ollama  # Linux
# Or restart manually: ollama serve

```

### Updating Models

```bash
# Pull latest version of model
ollama pull qwen2.5-coder:7b

# Old version is automatically replaced
ollama list

```

### Managing Model Storage

```bash
# List all models with sizes
ollama list

# Remove unused models
ollama rm codellama:7b

# Check disk usage
du -sh ~/.ollama/models/

```

### Updating Review Bot Automator

```bash
# Update review-bot-automator (provides the pr-resolve CLI)
pip install --upgrade review-bot-automator

# Verify new version
pr-resolve --version

```

### Monitoring Resource Usage

```bash
# Monitor Ollama memory/CPU usage
ollama ps

# Linux: Monitor with htop
htop

# macOS: Monitor with Activity Monitor
open -a "Activity Monitor"

```

---

## Best Practices

### Security

1. ✅ **Keep Ollama localhost-only** (default: 127.0.0.1:11434)
2. ✅ **Don't expose Ollama port** to external network
3. ✅ **Use encrypted disk** for model storage (optional)
4. ✅ **Keep GitHub token secure** (use environment variable)

### Performance

1. ✅ **Use GPU acceleration** when available
2. ✅ **Choose model size** based on RAM (7B for 16GB+, 3B for 8GB)
3. ✅ **Monitor resource usage** during inference
4. ✅ **Close unnecessary applications** during LLM processing

### Compliance

1. ✅ **Document data flows** for audits (GitHub → local LLM → GitHub)
2. ✅ **Keep privacy verification reports** (`privacy-verification-report.md`)
3. ✅ **Review model provenance** (use official Ollama registry only)
4. ⚠️ **Understand limitations** (GitHub/CodeRabbit still have access)

---

## Related Documentation

### Setup & Configuration

* [Ollama Setup Guide](ollama-setup.md) - Detailed Ollama installation
* [LLM Configuration Guide](llm-configuration.md) - Provider setup and presets
* [Configuration Guide](configuration.md) - General configuration options

### Privacy & Security

* [Privacy Architecture](privacy-architecture.md) - Comprehensive privacy analysis
* [Privacy FAQ](privacy-faq.md) - Common privacy questions answered
* [Security Architecture](security-architecture.md) - Overall security design

### Performance (Best Practices)

* [Performance Benchmarks](performance-benchmarks.md) - Provider performance comparison

---

## Frequently Asked Questions

### Q: Is this air-gapped operation?

**A: No.** This tool requires internet access to fetch PR comments from GitHub API. Air-gapped operation is not possible because:

* Your code is already on GitHub (required for PR workflow)
* CodeRabbit processes your code (required for review comments)
* pr-resolve must fetch comments from GitHub API

**What Ollama does**: Eliminates LLM vendor (OpenAI/Anthropic) exposure by processing review comments locally.

### Q: What data does GitHub see?

**A: Everything.** Your code is hosted on GitHub, and GitHub's terms of service apply. Review Bot Automator uses GitHub API to fetch PR data.

### Q: What data does CodeRabbit see?

**A: Everything.** CodeRabbit (or any review bot) needs access to your code to generate review comments. This is required for the tool to function.

### Q: What data does Ollama/Local LLM see?

**A: Review comments and code context.** Ollama processes the review comments locally on your machine. The data never leaves localhost.

### Q: What's the actual privacy benefit?

**A: Eliminating LLM vendor exposure.** Instead of:

* GitHub (has access) + CodeRabbit (has access) + OpenAI/Anthropic (has access)

You get:

* GitHub (has access) + CodeRabbit (has access) + Local LLM (localhost only)

This reduces third-party exposure by one entity (the LLM vendor).

### Q: Can I use this offline?

**A: No.** Internet is required to:

* Fetch PR comments from GitHub API
* Push resolved changes back to GitHub

Ollama inference runs locally, but the overall workflow requires internet connectivity.

### Q: Is this compliant with GDPR/HIPAA/SOC2?

**A: It helps, but doesn't solve everything.** Using Ollama:

* ✅ Reduces the number of data processors (one fewer)
* ✅ Simplifies BAA/DPA chain (no LLM vendor agreement)
* ⚠️ Still requires agreements with GitHub and CodeRabbit

Your code being on GitHub is the primary compliance consideration, not the LLM provider choice.

---

**For more privacy details, see [Privacy Architecture](privacy-architecture.md).**