Local LLM Operation Guide
Complete guide for running Review Bot Automator with local LLM inference using Ollama to reduce third-party exposure.
See Also: Privacy Architecture for privacy benefits and Ollama Setup Guide for installation instructions.
Table of Contents
Overview
Why Local LLM Operation
Local LLM operation with Ollama provides:
✅ Reduced Third-Party Exposure: LLM vendors (OpenAI/Anthropic) never see your code
✅ Simpler Compliance: One fewer data processor in your chain (no LLM vendor BAA/DPA)
✅ Cost Savings: Zero per-request LLM fees after hardware investment
✅ Control: You manage model updates and data retention
✅ No LLM Rate Limits: Process as many reviews as your hardware allows
Important Limitations
This tool is NOT air-gapped and CANNOT operate offline:
❌ Requires Internet: Must fetch PR comments from GitHub API
⚠️ GitHub Has Access: Your code is on GitHub (required for PR workflow)
⚠️ CodeRabbit Has Access: Review bot processes your code (required)
✅ LLM Processing Local: Only the LLM inference step is local
What Ollama Actually Does: Processes review comments locally instead of sending them to OpenAI/Anthropic. This eliminates LLM vendor exposure but does not eliminate GitHub or CodeRabbit access.
What Works Locally
After setup, these features use local LLM inference:
✅ LLM-powered comment parsing (via local Ollama)
✅ Code review suggestion application
✅ Conflict resolution
✅ All pr-resolve commands (
apply,analyze)
What Requires Internet
Internet is always required for:
✅ GitHub API: Fetching PR data and review comments
✅ GitHub Push: Pushing resolved changes back to PR
⚠️ Initial Setup: Downloading Ollama, models, and Review Bot Automator package (provides the
pr-resolveCLI)
Prerequisites
Before starting local LLM operation, you need:
System Requirements
OS: Linux, macOS, or Windows (with WSL2)
RAM: Minimum 8GB (16GB+ recommended)
Disk: 10-20GB free space (for models)
Internet: Required for GitHub API access
Optional: GPU (NVIDIA, AMD, or Apple Silicon) for faster inference
Software Requirements
Ollama: Latest version
Python 3.12+: With pip and venv
Review Bot Automator: Latest version from PyPI (provides the
pr-resolveCLI)LLM Model: At least one model (qwen2.5-coder:7b recommended)
GitHub Token: For API access
Setup Process
Follow these steps to set up local LLM operation:
Step 1: Install Ollama
# Linux / macOS
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
# Start Ollama service
ollama serve
For detailed installation instructions, see Ollama Setup Guide.
Step 2: Download LLM Model
# Recommended: Qwen2.5 Coder (best quality/speed balance)
ollama pull qwen2.5-coder:7b
# Alternative: CodeLlama
ollama pull codellama:7b
# Verify model downloaded
ollama list
Storage Note: Models are stored in ~/.ollama/models/ and can be 4-8GB each.
Step 3: Install Review Bot Automator
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install review-bot-automator (provides the pr-resolve CLI)
pip install review-bot-automator
# Verify installation
pr-resolve --version
Step 4: Configure Local LLM
Create or update your configuration file:
config.yaml:
llm:
enabled: true
provider: ollama
model: qwen2.5-coder:7b
ollama_base_url: http://localhost:11434
fallback_to_regex: true
github:
token: ${GITHUB_TOKEN} # Set via environment variable
Step 5: Set GitHub Token
# Set GitHub token (required for API access)
export GITHUB_TOKEN=ghp_your_token_here
# Verify GitHub API access
gh auth status
Step 6: Test Local LLM Operation
# Test with actual PR
pr-resolve apply 123 --llm-preset ollama-local
# Or use custom config
pr-resolve apply 123 --config config.yaml
Step 7: Verify Privacy
# Run privacy verification script
./scripts/verify_privacy.sh
# Expected output
# ✅ Privacy Verification: PASSED
# ✅ No external LLM connections detected
# ⚠️ GitHub API connections detected (expected)
Note: The verification script confirms that Ollama only uses localhost for LLM inference. GitHub API calls will still appear in network traffic (this is expected and required).
Privacy Verification
Automated Verification
Use the provided script to verify Ollama’s localhost-only operation:
# Run privacy verification
./scripts/verify_privacy.sh
# Generates report: privacy-verification-report.md
What This Verifies:
✅ Ollama only communicates on localhost (127.0.0.1:11434)
✅ No connections to OpenAI or Anthropic LLM APIs
⚠️ GitHub API calls are not blocked (expected behavior)
What This Does NOT Verify:
❌ Does not prevent GitHub API access (required for tool to function)
❌ Does not verify air-gapped operation (not possible with this tool)
❌ Does not prevent CodeRabbit from accessing your code
Manual Verification
You can manually verify Ollama’s local operation:
Linux
# Monitor network connections during inference
# (Filter out GitHub API connections)
sudo lsof -i -n -P | grep -v "api.github.com"
# Run inference
pr-resolve apply 123 --llm-preset ollama-local
# Check connections again - should only see localhost:11434
macOS
# Monitor network connections
lsof -i -n -P | grep -v "api.github.com"
# Run inference
pr-resolve apply 123 --llm-preset ollama-local
# Verify no new LLM vendor connections (OpenAI/Anthropic)
Understanding Network Traffic
When using pr-resolve with Ollama, you will see:
✅ Expected Connections:
localhost:11434(Ollama LLM inference)api.github.com:443(Fetching PR data)github.com:443(Pushing changes)
❌ Connections That Should NOT Appear:
api.openai.com(OpenAI LLM vendor)api.anthropic.com(Anthropic LLM vendor)
Troubleshooting
Issue: “Connection refused to localhost:11434”
Cause: Ollama service not running
Fix:
# Start Ollama
ollama serve
# Or use systemd (Linux)
sudo systemctl start ollama
# Verify it's running
curl http://localhost:11434/api/version
Issue: “Model not found”
Cause: Model not downloaded or wrong name
Fix:
# List available models
ollama list
# Pull missing model
ollama pull qwen2.5-coder:7b
# Verify in config
cat config.yaml | grep model
Issue: “GitHub API rate limit exceeded”
Cause: Too many GitHub API requests
Fix:
# Use authenticated token for higher rate limits
export GITHUB_TOKEN=ghp_your_token_here
# Check rate limit status
gh api rate_limit
Issue: “Out of memory error”
Cause: Model too large for available RAM
Fix:
# Use smaller model
ollama pull qwen2.5-coder:3b # Smaller version
# Update config
# model: qwen2.5-coder:3b
# Or add swap space (Linux)
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Issue: “Slow inference speed”
Cause: CPU inference without GPU acceleration
Solutions:
Use GPU if available (automatic with NVIDIA/AMD/Apple Silicon)
Use smaller model (3B instead of 7B)
Increase RAM allocation
Close other applications to free resources
# Check if GPU is being used
ollama ps
# Expected output shows GPU usage
# NAME ... SIZE PROCESSOR
# qwen2.5-coder:7b ... 4.7GB 100% GPU
Maintenance and Updates
Updating Ollama
# Linux/macOS: Re-run installer
curl -fsSL https://ollama.ai/install.sh | sh
# Restart Ollama service
sudo systemctl restart ollama # Linux
# Or restart manually: ollama serve
Updating Models
# Pull latest version of model
ollama pull qwen2.5-coder:7b
# Old version is automatically replaced
ollama list
Managing Model Storage
# List all models with sizes
ollama list
# Remove unused models
ollama rm codellama:7b
# Check disk usage
du -sh ~/.ollama/models/
Updating Review Bot Automator
# Update review-bot-automator (provides the pr-resolve CLI)
pip install --upgrade review-bot-automator
# Verify new version
pr-resolve --version
Monitoring Resource Usage
# Monitor Ollama memory/CPU usage
ollama ps
# Linux: Monitor with htop
htop
# macOS: Monitor with Activity Monitor
open -a "Activity Monitor"
Best Practices
Security
✅ Keep Ollama localhost-only (default: 127.0.0.1:11434)
✅ Don’t expose Ollama port to external network
✅ Use encrypted disk for model storage (optional)
✅ Keep GitHub token secure (use environment variable)
Performance
✅ Use GPU acceleration when available
✅ Choose model size based on RAM (7B for 16GB+, 3B for 8GB)
✅ Monitor resource usage during inference
✅ Close unnecessary applications during LLM processing
Compliance
✅ Document data flows for audits (GitHub → local LLM → GitHub)
✅ Keep privacy verification reports (
privacy-verification-report.md)✅ Review model provenance (use official Ollama registry only)
⚠️ Understand limitations (GitHub/CodeRabbit still have access)
Frequently Asked Questions
Q: Is this air-gapped operation?
A: No. This tool requires internet access to fetch PR comments from GitHub API. Air-gapped operation is not possible because:
Your code is already on GitHub (required for PR workflow)
CodeRabbit processes your code (required for review comments)
pr-resolve must fetch comments from GitHub API
What Ollama does: Eliminates LLM vendor (OpenAI/Anthropic) exposure by processing review comments locally.
Q: What data does GitHub see?
A: Everything. Your code is hosted on GitHub, and GitHub’s terms of service apply. Review Bot Automator uses GitHub API to fetch PR data.
Q: What data does CodeRabbit see?
A: Everything. CodeRabbit (or any review bot) needs access to your code to generate review comments. This is required for the tool to function.
Q: What data does Ollama/Local LLM see?
A: Review comments and code context. Ollama processes the review comments locally on your machine. The data never leaves localhost.
Q: What’s the actual privacy benefit?
A: Eliminating LLM vendor exposure. Instead of:
GitHub (has access) + CodeRabbit (has access) + OpenAI/Anthropic (has access)
You get:
GitHub (has access) + CodeRabbit (has access) + Local LLM (localhost only)
This reduces third-party exposure by one entity (the LLM vendor).
Q: Can I use this offline?
A: No. Internet is required to:
Fetch PR comments from GitHub API
Push resolved changes back to GitHub
Ollama inference runs locally, but the overall workflow requires internet connectivity.
Q: Is this compliant with GDPR/HIPAA/SOC2?
A: It helps, but doesn’t solve everything. Using Ollama:
✅ Reduces the number of data processors (one fewer)
✅ Simplifies BAA/DPA chain (no LLM vendor agreement)
⚠️ Still requires agreements with GitHub and CodeRabbit
Your code being on GitHub is the primary compliance consideration, not the LLM provider choice.
For more privacy details, see Privacy Architecture.