Local LLM Operation Guide

Complete guide for running Review Bot Automator with local LLM inference using Ollama to reduce third-party exposure.

See Also: Privacy Architecture for privacy benefits and Ollama Setup Guide for installation instructions.

Table of Contents


Overview

Why Local LLM Operation

Local LLM operation with Ollama provides:

  • Reduced Third-Party Exposure: LLM vendors (OpenAI/Anthropic) never see your code

  • Simpler Compliance: One fewer data processor in your chain (no LLM vendor BAA/DPA)

  • Cost Savings: Zero per-request LLM fees after hardware investment

  • Control: You manage model updates and data retention

  • No LLM Rate Limits: Process as many reviews as your hardware allows

Important Limitations

This tool is NOT air-gapped and CANNOT operate offline:

  • Requires Internet: Must fetch PR comments from GitHub API

  • ⚠️ GitHub Has Access: Your code is on GitHub (required for PR workflow)

  • ⚠️ CodeRabbit Has Access: Review bot processes your code (required)

  • LLM Processing Local: Only the LLM inference step is local

What Ollama Actually Does: Processes review comments locally instead of sending them to OpenAI/Anthropic. This eliminates LLM vendor exposure but does not eliminate GitHub or CodeRabbit access.

What Works Locally

After setup, these features use local LLM inference:

  • ✅ LLM-powered comment parsing (via local Ollama)

  • ✅ Code review suggestion application

  • ✅ Conflict resolution

  • ✅ All pr-resolve commands (apply, analyze)

What Requires Internet

Internet is always required for:

  • GitHub API: Fetching PR data and review comments

  • GitHub Push: Pushing resolved changes back to PR

  • ⚠️ Initial Setup: Downloading Ollama, models, and Review Bot Automator package (provides the pr-resolve CLI)


Prerequisites

Before starting local LLM operation, you need:

System Requirements

  • OS: Linux, macOS, or Windows (with WSL2)

  • RAM: Minimum 8GB (16GB+ recommended)

  • Disk: 10-20GB free space (for models)

  • Internet: Required for GitHub API access

  • Optional: GPU (NVIDIA, AMD, or Apple Silicon) for faster inference

Software Requirements

  • Ollama: Latest version

  • Python 3.12+: With pip and venv

  • Review Bot Automator: Latest version from PyPI (provides the pr-resolve CLI)

  • LLM Model: At least one model (qwen2.5-coder:7b recommended)

  • GitHub Token: For API access


Setup Process

Follow these steps to set up local LLM operation:

Step 1: Install Ollama

# Linux / macOS
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

# Start Ollama service
ollama serve

For detailed installation instructions, see Ollama Setup Guide.

Step 2: Download LLM Model

# Recommended: Qwen2.5 Coder (best quality/speed balance)
ollama pull qwen2.5-coder:7b

# Alternative: CodeLlama
ollama pull codellama:7b

# Verify model downloaded
ollama list

Storage Note: Models are stored in ~/.ollama/models/ and can be 4-8GB each.

Step 3: Install Review Bot Automator

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate    # Windows

# Install review-bot-automator (provides the pr-resolve CLI)
pip install review-bot-automator

# Verify installation
pr-resolve --version

Step 4: Configure Local LLM

Create or update your configuration file:

config.yaml:

llm:
  enabled: true
  provider: ollama
  model: qwen2.5-coder:7b
  ollama_base_url: http://localhost:11434
  fallback_to_regex: true

github:
  token: ${GITHUB_TOKEN}  # Set via environment variable

Step 5: Set GitHub Token

# Set GitHub token (required for API access)
export GITHUB_TOKEN=ghp_your_token_here

# Verify GitHub API access
gh auth status

Step 6: Test Local LLM Operation

# Test with actual PR
pr-resolve apply 123 --llm-preset ollama-local

# Or use custom config
pr-resolve apply 123 --config config.yaml

Step 7: Verify Privacy

# Run privacy verification script
./scripts/verify_privacy.sh

# Expected output
# ✅ Privacy Verification: PASSED
# ✅ No external LLM connections detected
# ⚠️  GitHub API connections detected (expected)

Note: The verification script confirms that Ollama only uses localhost for LLM inference. GitHub API calls will still appear in network traffic (this is expected and required).


Privacy Verification

Automated Verification

Use the provided script to verify Ollama’s localhost-only operation:

# Run privacy verification
./scripts/verify_privacy.sh

# Generates report: privacy-verification-report.md

What This Verifies:

  • ✅ Ollama only communicates on localhost (127.0.0.1:11434)

  • ✅ No connections to OpenAI or Anthropic LLM APIs

  • ⚠️ GitHub API calls are not blocked (expected behavior)

What This Does NOT Verify:

  • ❌ Does not prevent GitHub API access (required for tool to function)

  • ❌ Does not verify air-gapped operation (not possible with this tool)

  • ❌ Does not prevent CodeRabbit from accessing your code

Manual Verification

You can manually verify Ollama’s local operation:

Linux

# Monitor network connections during inference
# (Filter out GitHub API connections)
sudo lsof -i -n -P | grep -v "api.github.com"

# Run inference
pr-resolve apply 123 --llm-preset ollama-local

# Check connections again - should only see localhost:11434

macOS

# Monitor network connections
lsof -i -n -P | grep -v "api.github.com"

# Run inference
pr-resolve apply 123 --llm-preset ollama-local

# Verify no new LLM vendor connections (OpenAI/Anthropic)

Understanding Network Traffic

When using pr-resolve with Ollama, you will see:

Expected Connections:

  • localhost:11434 (Ollama LLM inference)

  • api.github.com:443 (Fetching PR data)

  • github.com:443 (Pushing changes)

Connections That Should NOT Appear:

  • api.openai.com (OpenAI LLM vendor)

  • api.anthropic.com (Anthropic LLM vendor)


Troubleshooting

Issue: “Connection refused to localhost:11434”

Cause: Ollama service not running

Fix:

# Start Ollama
ollama serve

# Or use systemd (Linux)
sudo systemctl start ollama

# Verify it's running
curl http://localhost:11434/api/version

Issue: “Model not found”

Cause: Model not downloaded or wrong name

Fix:

# List available models
ollama list

# Pull missing model
ollama pull qwen2.5-coder:7b

# Verify in config
cat config.yaml | grep model

Issue: “GitHub API rate limit exceeded”

Cause: Too many GitHub API requests

Fix:

# Use authenticated token for higher rate limits
export GITHUB_TOKEN=ghp_your_token_here

# Check rate limit status
gh api rate_limit

Issue: “Out of memory error”

Cause: Model too large for available RAM

Fix:

# Use smaller model
ollama pull qwen2.5-coder:3b  # Smaller version

# Update config
# model: qwen2.5-coder:3b

# Or add swap space (Linux)
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Issue: “Slow inference speed”

Cause: CPU inference without GPU acceleration

Solutions:

  1. Use GPU if available (automatic with NVIDIA/AMD/Apple Silicon)

  2. Use smaller model (3B instead of 7B)

  3. Increase RAM allocation

  4. Close other applications to free resources

# Check if GPU is being used
ollama ps

# Expected output shows GPU usage
# NAME              ... SIZE    PROCESSOR
# qwen2.5-coder:7b  ... 4.7GB   100% GPU

Maintenance and Updates

Updating Ollama

# Linux/macOS: Re-run installer
curl -fsSL https://ollama.ai/install.sh | sh

# Restart Ollama service
sudo systemctl restart ollama  # Linux
# Or restart manually: ollama serve

Updating Models

# Pull latest version of model
ollama pull qwen2.5-coder:7b

# Old version is automatically replaced
ollama list

Managing Model Storage

# List all models with sizes
ollama list

# Remove unused models
ollama rm codellama:7b

# Check disk usage
du -sh ~/.ollama/models/

Updating Review Bot Automator

# Update review-bot-automator (provides the pr-resolve CLI)
pip install --upgrade review-bot-automator

# Verify new version
pr-resolve --version

Monitoring Resource Usage

# Monitor Ollama memory/CPU usage
ollama ps

# Linux: Monitor with htop
htop

# macOS: Monitor with Activity Monitor
open -a "Activity Monitor"

Best Practices

Security

  1. Keep Ollama localhost-only (default: 127.0.0.1:11434)

  2. Don’t expose Ollama port to external network

  3. Use encrypted disk for model storage (optional)

  4. Keep GitHub token secure (use environment variable)

Performance

  1. Use GPU acceleration when available

  2. Choose model size based on RAM (7B for 16GB+, 3B for 8GB)

  3. Monitor resource usage during inference

  4. Close unnecessary applications during LLM processing

Compliance

  1. Document data flows for audits (GitHub → local LLM → GitHub)

  2. Keep privacy verification reports (privacy-verification-report.md)

  3. Review model provenance (use official Ollama registry only)

  4. ⚠️ Understand limitations (GitHub/CodeRabbit still have access)



Frequently Asked Questions

Q: Is this air-gapped operation?

A: No. This tool requires internet access to fetch PR comments from GitHub API. Air-gapped operation is not possible because:

  • Your code is already on GitHub (required for PR workflow)

  • CodeRabbit processes your code (required for review comments)

  • pr-resolve must fetch comments from GitHub API

What Ollama does: Eliminates LLM vendor (OpenAI/Anthropic) exposure by processing review comments locally.

Q: What data does GitHub see?

A: Everything. Your code is hosted on GitHub, and GitHub’s terms of service apply. Review Bot Automator uses GitHub API to fetch PR data.

Q: What data does CodeRabbit see?

A: Everything. CodeRabbit (or any review bot) needs access to your code to generate review comments. This is required for the tool to function.

Q: What data does Ollama/Local LLM see?

A: Review comments and code context. Ollama processes the review comments locally on your machine. The data never leaves localhost.

Q: What’s the actual privacy benefit?

A: Eliminating LLM vendor exposure. Instead of:

  • GitHub (has access) + CodeRabbit (has access) + OpenAI/Anthropic (has access)

You get:

  • GitHub (has access) + CodeRabbit (has access) + Local LLM (localhost only)

This reduces third-party exposure by one entity (the LLM vendor).

Q: Can I use this offline?

A: No. Internet is required to:

  • Fetch PR comments from GitHub API

  • Push resolved changes back to GitHub

Ollama inference runs locally, but the overall workflow requires internet connectivity.

Q: Is this compliant with GDPR/HIPAA/SOC2?

A: It helps, but doesn’t solve everything. Using Ollama:

  • ✅ Reduces the number of data processors (one fewer)

  • ✅ Simplifies BAA/DPA chain (no LLM vendor agreement)

  • ⚠️ Still requires agreements with GitHub and CodeRabbit

Your code being on GitHub is the primary compliance consideration, not the LLM provider choice.


For more privacy details, see Privacy Architecture.