Parallel Processing Guide
The Review Bot Automator supports parallel processing to significantly speed up change application for large PRs. This guide explains how parallel processing works, when to use it, and how to optimize performance.
Table of Contents
Overview
Parallel processing enables the resolver to apply changes to multiple files concurrently, leveraging multiple CPU cores to dramatically reduce processing time for large PRs.
Key Features
File-Level Parallelization: Processes different files concurrently
Sequential Per-File Processing: Changes to the same file applied sequentially (prevents race conditions)
Thread-Safe: Uses locks and thread-safe collections
Configurable Workers: Adjust worker count based on workload
Exception Handling: Propagates exceptions from worker threads
Compatible with Rollback: Works seamlessly with rollback system
Performance Improvements
Parallel processing provides significant speedup for large PRs:
PR Size |
Sequential |
Parallel (4 workers) |
Parallel (8 workers) |
Speedup |
|---|---|---|---|---|
10 files |
5s |
4s |
4s |
1.25x |
30 files |
15s |
5s |
4s |
3.75x |
100 files |
50s |
15s |
10s |
5x |
300 files |
150s |
40s |
22s |
6.8x |
Note: Actual performance depends on system resources, file sizes, and change complexity.
How It Works
Processing Model
The parallel processing system uses a file-level parallelization model:
Group Changes by File: All changes are grouped by file path
Create Worker Pool: ThreadPoolExecutor with configurable workers
Submit File Tasks: Each file’s changes submitted as one task
Process Files in Parallel: Different files processed concurrently
Process Changes Sequentially: Changes to same file applied in order
Collect Results: Thread-safe accumulation of applied/skipped/failed changes
Why File-Level Parallelization?
Prevents Race Conditions:
Multiple concurrent modifications to same file would cause conflicts
Sequential processing per file ensures consistency
Different files are independent, safe to process concurrently
Maintains Change Order:
Changes to same file applied in original order
Dependencies between changes in same file preserved
No risk of out-of-order application
Thread Safety Mechanisms
Thread-Safe Collections: Separate locks for applied, skipped, failed lists
File Grouping: No concurrent access to same file
Exception Propagation: Worker thread exceptions caught and reported
Future Tracking: Maps futures to changes for proper error attribution
When to Use Parallel Processing
Enable Parallel Processing
✅ Large PRs (30+ files)
Significant speedup from concurrent file processing
Worker overhead amortized over many files
✅ Independent File Changes
Changes don’t depend on each other across files
No cross-file dependencies
✅ I/O-Bound Workloads
File reading/writing dominates processing time
Multiple threads reduce I/O wait time
✅ Time-Critical Resolutions
Need to apply changes quickly
Willing to trade resource usage for speed
✅ Sufficient System Resources
4+ CPU cores available
Adequate memory for worker threads
Disable Parallel Processing
❌ Small PRs (< 10 files)
Worker overhead exceeds benefits
Sequential processing is faster
❌ Dependent Changes Across Files
Changes in one file depend on changes in another
Sequential processing ensures correct order
❌ Debugging Sessions
Sequential execution easier to trace
Logging order is clearer
❌ Limited System Resources
System has < 4 CPU cores
Memory constrained environment
❌ Single-File PRs
All changes to one file (no parallelization possible)
Sequential processing required
Usage
CLI Usage
Enable Parallel Processing
# Enable with default 4 workers
pr-resolve apply --pr 123 --owner myorg --repo myrepo --parallel
# Enable with custom worker count
pr-resolve apply --pr 123 --owner myorg --repo myrepo --parallel --max-workers 8
# Enable with other options
pr-resolve apply --pr 123 --owner myorg --repo myrepo \
--mode conflicts-only \
--parallel \
--max-workers 16 \
--rollback \
--validation
Disable Parallel Processing (Default)
# Parallel disabled by default
pr-resolve apply --pr 123 --owner myorg --repo myrepo
# Can explicitly specify sequential (not necessary)
pr-resolve apply --pr 123 --owner myorg --repo myrepo --max-workers 1
Environment Variables
# Enable parallel processing
export CR_PARALLEL="true"
export CR_MAX_WORKERS="8"
pr-resolve apply --pr 123 --owner myorg --repo myrepo
Configuration File
YAML:
# config.yaml
parallel:
enabled: true
max_workers: 8
TOML:
# config.toml
[parallel]
enabled = true
max_workers = 8
pr-resolve apply --pr 123 --owner myorg --repo myrepo --config config.yaml
Python API
Using ConflictResolver
from pr_conflict_resolver import ConflictResolver
from pr_conflict_resolver.config import PresetConfig
# Initialize resolver
resolver = ConflictResolver(config=PresetConfig.BALANCED)
# Apply with parallel processing
results = resolver.resolve_pr_conflicts(
owner="myorg",
repo="myrepo",
pr_number=123,
parallel=True, # Enable parallel processing
max_workers=8 # 8 worker threads
)
print(f"Applied: {results.applied_count} changes")
print(f"Time: {results.elapsed_time:.2f}s")
Using apply_changes Directly
from pr_conflict_resolver import ConflictResolver
resolver = ConflictResolver()
# Apply changes with parallel processing
applied, skipped, failed = resolver.apply_changes(
changes=my_changes,
validate=True,
parallel=True,
max_workers=8
)
print(f"Applied: {len(applied)}")
print(f"Skipped: {len(skipped)}")
print(f"Failed: {len(failed)}")
Configuration
Configuration Precedence
Configuration sources (highest to lowest priority):
CLI flags (
--parallel --max-workers N)Environment variables (
CR_PARALLEL,CR_MAX_WORKERS)Configuration file (
parallel.enabled,parallel.max_workers)Defaults (parallel disabled, 4 workers)
Configuration Options
Option |
Type |
CLI |
Environment |
Config File |
Default |
|---|---|---|---|---|---|
Enable |
boolean |
|
|
|
|
Workers |
integer |
|
|
|
|
Worker Count Guidelines
By PR Size
# Small PRs (10-30 files): 2-4 workers
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 4
# Medium PRs (30-100 files): 4-8 workers
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 8
# Large PRs (100-300 files): 8-16 workers
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 16
# Very large PRs (300+ files): 16-32 workers
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 32
By CPU Cores
# Conservative: Match CPU core count
WORKERS=$(nproc)
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers $WORKERS
# Aggressive: 2x CPU cores (for I/O-bound workloads)
WORKERS=$(($(nproc) * 2))
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers $WORKERS
# Maximum: 4x CPU cores (only for very large PRs with heavy I/O)
WORKERS=$(($(nproc) * 4))
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers $WORKERS
Configuration Examples
Example 1: Development Environment
# dev-config.yaml - Optimize for debugging
mode: all
parallel:
enabled: false # Disable for easier debugging
rollback:
enabled: true
logging:
level: DEBUG
Example 2: Production Environment
# prod-config.yaml - Balance speed and safety
mode: conflicts-only
parallel:
enabled: true
max_workers: 8 # Moderate parallelism
rollback:
enabled: true # Always enable safety
validation:
enabled: true
logging:
level: INFO
Example 3: High-Performance Environment
# perf-config.yaml - Maximum speed
mode: all
parallel:
enabled: true
max_workers: 32 # High parallelism
rollback:
enabled: true # Keep safety enabled
validation:
enabled: false # Disable for speed
logging:
level: WARNING # Minimal logging overhead
Performance Optimization
Optimal Worker Count
The optimal worker count depends on several factors:
System Resources
# Check CPU core count
nproc
# Check available memory (ensure ~100MB per worker)
free -h
# Recommended formula for I/O-bound work:
WORKERS=$(($(nproc) * 2))
PR Characteristics
PR Characteristic |
Recommended Workers |
Rationale |
|---|---|---|
10-30 small files |
2-4 |
Low parallelization benefit |
30-100 medium files |
4-8 |
Good parallelization benefit |
100-300 large files |
8-16 |
High parallelization benefit |
300+ files |
16-32 |
Maximum parallelization |
Files with conflicts |
Reduce by 25% |
Conflict resolution overhead |
Large files (1MB+) |
Reduce by 50% |
I/O becomes bottleneck |
Benchmarking Your Configuration
Test different worker counts to find optimal performance:
#!/bin/bash
# benchmark.sh - Test different worker counts
PR_NUMBER=123
OWNER=myorg
REPO=myrepo
echo "Benchmarking parallel processing..."
echo "PR: #$PR_NUMBER"
echo "=================================="
for workers in 1 2 4 8 16 32; do
echo -n "Workers: $workers ... "
# Run with time measurement
START=$(date +%s)
pr-resolve apply --pr $PR_NUMBER --owner $OWNER --repo $REPO \
--mode dry-run \
--parallel \
--max-workers $workers \
--log-level WARNING \
> /dev/null 2>&1
END=$(date +%s)
DURATION=$((END - START))
echo "${DURATION}s"
done
echo "=================================="
echo "Choose the worker count with lowest duration"
Performance Tips
1. Match Workers to Workload
# Count files in PR first
FILE_COUNT=$(gh pr view 123 --json files --jq '.files | length')
# Choose workers based on file count
if [ $FILE_COUNT -lt 30 ]; then
WORKERS=4
elif [ $FILE_COUNT -lt 100 ]; then
WORKERS=8
else
WORKERS=16
fi
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers $WORKERS
2. Disable Validation for Performance
# Validation adds overhead per change
# Use rollback for safety instead
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 16 \
--no-validation \ # Disable validation
--rollback # Use rollback for safety
3. Reduce Logging Overhead
# Debug logging significantly impacts performance
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 16 \
--log-level WARNING # Minimal logging
4. Use Staged Application
# Stage 1: Non-conflicts in parallel (fastest)
pr-resolve apply --pr 123 --owner org --repo repo \
--mode non-conflicts-only \
--parallel --max-workers 16 \
--no-validation
# Stage 2: Conflicts sequentially (more careful)
pr-resolve apply --pr 123 --owner org --repo repo \
--mode conflicts-only \
--validation
Performance Monitoring
Monitor system resources during execution:
# Terminal 1: Run resolver
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 16
# Terminal 2: Monitor resources
watch -n 1 'ps aux | grep pr-resolve | grep -v grep'
# Or use htop for real-time CPU/memory monitoring
htop -p $(pgrep pr-resolve)
Architecture
Component Overview
┌────────────────────────────────────────────────────────────┐
│ ConflictResolver │
│ │
│ resolve_pr_conflicts(parallel=True, max_workers=8) │
│ │ │
│ ▼ │
│ apply_changes(parallel=True, max_workers=8) │
│ │ │
│ ▼ │
│ _apply_changes_parallel(changes, validate, max_workers) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 1. Group Changes by File Path │ │
│ │ changes_by_file[path] = [change1, change2, ...] │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 2. Create Thread Pool │ │
│ │ ThreadPoolExecutor(max_workers=8) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 3. Submit Tasks (one per file) │ │
│ │ future = executor.submit(process_file_changes) │ │
│ │ future_to_changes[future] = file_changes │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────┴─────────────┐ │
│ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Worker 1 │ │ Worker 2 │ ... │
│ │ File A │ │ File B │ │
│ │ ┌──────┐ │ │ ┌──────┐ │ │
│ │ │Change│ │ │ │Change│ │ │
│ │ │ 1 │ │ │ │ 1 │ │ │
│ │ ├──────┤ │ │ ├──────┤ │ │
│ │ │Change│ │ │ │Change│ │ │
│ │ │ 2 │ │ │ │ 2 │ │ │
│ │ └──────┘ │ │ └──────┘ │ │
│ └────────────┘ └────────────┘ │
│ │ │ │
│ └─────────────┬─────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ 4. Collect Results (Thread-Safe) │ │
│ │ applied (with lock) │ │
│ │ skipped (with lock) │ │
│ │ failed (with lock) │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘
Thread Pool Architecture
# Simplified implementation
def _apply_changes_parallel(changes, validate, max_workers):
# 1. Group changes by file
changes_by_file = group_by_file(changes)
# 2. Thread-safe result collections
applied_lock = threading.Lock()
applied = []
# ... similar for skipped, failed
# 3. Worker function
def process_file_changes(file_changes):
for change in file_changes:
# Apply change sequentially within file
success = apply_change(change)
if success:
with applied_lock:
applied.append(change)
# 4. Execute in parallel
with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit one task per file
future_to_changes = {}
for file_changes in changes_by_file.values():
future = executor.submit(process_file_changes, file_changes)
future_to_changes[future] = file_changes
# Wait for completion and handle exceptions
for future in as_completed(future_to_changes):
try:
future.result() # Propagate exceptions
except Exception as e:
# Handle worker thread exception
affected_changes = future_to_changes[future]
mark_as_failed(affected_changes, str(e))
return applied, skipped, failed
Data Flow
Input: List[Change]
│
▼
Group by File: Dict[str, List[Change]]
│
├─ File A: [Change1, Change2, Change3]
├─ File B: [Change4, Change5]
└─ File C: [Change6]
│
▼
ThreadPoolExecutor
│
├─ Worker 1 → Process File A → [Applied, Skipped, Failed]
├─ Worker 2 → Process File B → [Applied, Skipped, Failed]
└─ Worker 3 → Process File C → [Applied, Skipped, Failed]
│
▼
Thread-Safe Accumulation
│
├─ applied: [Change1, Change2, Change4, Change5, Change6]
├─ skipped: [Change3]
└─ failed: []
│
▼
Output: Tuple[List[Change], List[Change], List[Tuple[Change, str]]]
Safety and Thread Safety
Thread Safety Mechanisms
1. File-Level Serialization
Problem: Concurrent modifications to same file cause conflicts
Solution: Group changes by file, process each file sequentially
# Group changes by file path
changes_by_file = defaultdict(list)
for change in changes:
changes_by_file[change.path].append(change)
# Each file processed by one worker only
for file_changes in changes_by_file.values():
executor.submit(process_file_changes, file_changes)
2. Thread-Safe Collections
Problem: Multiple threads appending to same list causes race conditions
Solution: Use locks for all shared collections
# Separate locks for each collection
applied_lock = threading.Lock()
skipped_lock = threading.Lock()
failed_lock = threading.Lock()
# Thread-safe append
def add_applied(change):
with applied_lock:
applied.append(change)
3. Exception Propagation
Problem: Worker thread exceptions silently fail
Solution: Track futures, call .result() to propagate exceptions
# Map futures to changes for error attribution
future_to_changes = {}
for file_changes in changes_by_file.values():
future = executor.submit(process_file, file_changes)
future_to_changes[future] = file_changes
# Propagate exceptions from workers
for future in as_completed(future_to_changes):
try:
future.result() # Raises if worker had exception
except Exception as e:
affected_changes = future_to_changes[future]
mark_as_failed(affected_changes, str(e))
Compatibility with Rollback
Parallel processing works seamlessly with the rollback system:
# Parallel processing with rollback protection
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 8 \
--rollback
# If any worker thread fails:
# 1. Exception propagated to main thread
# 2. Rollback system triggers
# 3. All changes rolled back (both sequential and parallel)
# 4. Working directory restored to checkpoint
How it works:
Checkpoint created before any worker starts
Workers apply changes concurrently
If any worker fails, exception propagated
Rollback system catches exception
All changes (from all workers) rolled back
Data Integrity
Guaranteed:
✅ No concurrent modifications to same file
✅ Changes to same file applied in order
✅ Thread-safe result accumulation
✅ Exception propagation from workers
✅ Compatible with rollback
Not Guaranteed:
❌ Result order (changes accumulated in completion order, not input order)
❌ Logging order (log messages may be interleaved)
❌ Cross-file dependencies (files processed independently)
Benchmarking
Simple Benchmark
#!/bin/bash
# simple-benchmark.sh
PR=123
OWNER=myorg
REPO=myrepo
echo "Sequential processing:"
time pr-resolve apply --pr $PR --owner $OWNER --repo $REPO --mode dry-run
echo "Parallel processing (4 workers):"
time pr-resolve apply --pr $PR --owner $OWNER --repo $REPO --mode dry-run --parallel --max-workers 4
echo "Parallel processing (8 workers):"
time pr-resolve apply --pr $PR --owner $OWNER --repo $REPO --mode dry-run --parallel --max-workers 8
Comprehensive Benchmark
#!/bin/bash
# comprehensive-benchmark.sh
PR=123
OWNER=myorg
REPO=myrepo
RESULTS_FILE="benchmark-results.csv"
echo "workers,duration_seconds,files_processed" > $RESULTS_FILE
for workers in 1 2 4 8 16 32; do
echo "Testing with $workers workers..."
START=$(date +%s.%N)
OUTPUT=$(pr-resolve apply --pr $PR --owner $OWNER --repo $REPO \
--mode dry-run \
--parallel \
--max-workers $workers \
--log-level WARNING 2>&1)
END=$(date +%s.%N)
DURATION=$(echo "$END - $START" | bc)
FILES=$(echo "$OUTPUT" | grep -c "Applied change")
echo "$workers,$DURATION,$FILES" >> $RESULTS_FILE
echo " Duration: ${DURATION}s, Files: $FILES"
done
echo "Results saved to $RESULTS_FILE"
# Plot results (requires gnuplot)
gnuplot <<EOF
set terminal png size 800,600
set output 'benchmark-results.png'
set xlabel 'Worker Count'
set ylabel 'Duration (seconds)'
set title 'Parallel Processing Performance'
set grid
plot '$RESULTS_FILE' using 1:2 with linespoints title 'Duration'
EOF
echo "Chart saved to benchmark-results.png"
Analyzing Results
#!/usr/bin/env python3
# analyze-benchmark.py
import pandas as pd
import matplotlib.pyplot as plt
# Load results
df = pd.read_csv('benchmark-results.csv')
# Calculate speedup
baseline = df[df['workers'] == 1]['duration_seconds'].values[0]
df['speedup'] = baseline / df['duration_seconds']
# Calculate efficiency
df['efficiency'] = df['speedup'] / df['workers'] * 100
# Print summary
print("Benchmark Summary:")
print(df[['workers', 'duration_seconds', 'speedup', 'efficiency']])
# Find optimal worker count
optimal = df.loc[df['efficiency'].idxmax()]
print(f"\nOptimal configuration:")
print(f" Workers: {optimal['workers']}")
print(f" Duration: {optimal['duration_seconds']:.2f}s")
print(f" Speedup: {optimal['speedup']:.2f}x")
print(f" Efficiency: {optimal['efficiency']:.1f}%")
# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Duration plot
ax1.plot(df['workers'], df['duration_seconds'], 'o-')
ax1.set_xlabel('Worker Count')
ax1.set_ylabel('Duration (seconds)')
ax1.set_title('Processing Duration')
ax1.grid(True)
# Speedup plot
ax2.plot(df['workers'], df['speedup'], 'o-', label='Actual')
ax2.plot(df['workers'], df['workers'], '--', label='Ideal')
ax2.set_xlabel('Worker Count')
ax2.set_ylabel('Speedup')
ax2.set_title('Parallel Speedup')
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.savefig('benchmark-analysis.png')
print("\nAnalysis chart saved to benchmark-analysis.png")
Use Cases
Use Case 1: Large PR in Production
Scenario: Applying 150 changes from large PR to production
# Benchmark first to find optimal worker count
./benchmark.sh # Find optimal: 16 workers
# Apply with optimal configuration
pr-resolve apply --pr 456 --owner myorg --repo production \
--mode conflicts-only \
--parallel --max-workers 16 \
--rollback \
--validation \
--log-level INFO \
--log-file /var/log/pr-resolver/prod-$(date +%Y%m%d-%H%M%S).log
# Result: 150 changes applied in 22s (vs 150s sequential)
# Speedup: 6.8x faster
Use Case 2: CI/CD Pipeline Optimization
Scenario: Automated PR resolution in CI/CD needs to be fast
# .github/workflows/auto-resolve.yml
- name: Resolve PR Conflicts
run: |
# Enable parallel processing for speed
export CR_PARALLEL="true"
export CR_MAX_WORKERS="8"
# Apply with parallel processing
pr-resolve apply \
--pr ${{ github.event.pull_request.number }} \
--owner ${{ github.repository_owner }} \
--repo ${{ github.event.repository.name }} \
--mode conflicts-only \
--parallel --max-workers 8 \
--rollback \
--log-file /tmp/pr-resolve.log
timeout-minutes: 5 # Reduced from 15 with parallel processing
Use Case 3: Staged Application
Scenario: Apply large PR in stages, optimize each stage
# Stage 1: Non-conflicts (safe, fast, high parallelism)
pr-resolve apply --pr 789 --owner myorg --repo myrepo \
--mode non-conflicts-only \
--parallel --max-workers 32 \
--no-validation \
--log-level WARNING
# Stage 2: Conflicts (risky, careful, moderate parallelism)
pr-resolve apply --pr 789 --owner myorg --repo myrepo \
--mode conflicts-only \
--parallel --max-workers 8 \
--validation \
--rollback \
--log-level INFO
Use Case 4: Development Workflow
Scenario: Developer testing changes locally, wants fast feedback
# Development: Disable parallel for easier debugging
pr-resolve apply --pr 123 --owner myorg --repo myrepo \
--mode dry-run \
--log-level DEBUG
# Production: Enable parallel for speed
pr-resolve apply --pr 123 --owner myorg --repo myrepo \
--parallel --max-workers 8 \
--rollback
Use Case 5: Dynamic Worker Scaling
Scenario: Automatically adjust workers based on PR size
#!/usr/bin/env python3
# smart-apply.py
import subprocess
import sys
def get_pr_file_count(owner, repo, pr_number):
"""Get number of files changed in PR."""
result = subprocess.run(
["gh", "pr", "view", str(pr_number),
"--repo", f"{owner}/{repo}",
"--json", "files", "--jq", ".files | length"],
capture_output=True, text=True, check=True
)
return int(result.stdout.strip())
def determine_workers(file_count):
"""Determine optimal worker count based on file count."""
if file_count < 10:
return 1 # Sequential
elif file_count < 30:
return 4
elif file_count < 100:
return 8
elif file_count < 300:
return 16
else:
return 32
def main():
owner, repo, pr = sys.argv[1:4]
# Get PR size
file_count = get_pr_file_count(owner, repo, pr)
print(f"PR has {file_count} files")
# Determine workers
workers = determine_workers(file_count)
print(f"Using {workers} workers")
# Build command
cmd = [
"pr-resolve", "apply",
"--pr", pr,
"--owner", owner,
"--repo", repo,
"--rollback"
]
if workers > 1:
cmd.extend(["--parallel", "--max-workers", str(workers)])
# Execute
subprocess.run(cmd, check=True)
if __name__ == "__main__":
main()
Usage:
./smart-apply.py myorg myrepo 123
# Automatically uses optimal worker count
Troubleshooting
Issue 1: Parallel Processing Slower Than Sequential
Symptoms:
--parallelis slower than without itHigh worker overhead
No performance improvement
Causes:
Too many workers for small PR
Worker overhead exceeds benefits
I/O contention on slow disk
Solutions:
# 1. Reduce worker count
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 2
# 2. Disable parallel for small PRs
FILE_COUNT=$(gh pr view 123 --json files --jq '.files | length')
if [ $FILE_COUNT -lt 30 ]; then
pr-resolve apply --pr 123 --owner org --repo repo # No parallel
else
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 8
fi
# 3. Benchmark to find optimal worker count
./benchmark.sh # Test 1, 2, 4, 8, 16, 32 workers
Issue 2: Worker Thread Exceptions
Symptoms:
Error: “Worker thread exception”
Changes marked as failed
Unexpected failures
Causes:
Exception in worker thread
File access issues
Memory constraints
Solutions:
# 1. Check logs for details
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 8 \
--log-level DEBUG \
--log-file /tmp/worker-errors.log
grep -i "worker\|exception" /tmp/worker-errors.log
# 2. Reduce worker count
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 4
# 3. Disable parallel temporarily
pr-resolve apply --pr 123 --owner org --repo repo
# 4. Check system resources
free -h # Check memory
df -h # Check disk space
Issue 3: High Memory Usage
Symptoms:
System slowdown
High memory consumption
Out of memory errors
Causes:
Too many workers
Large files loaded into memory
Memory leak (report as bug)
Solutions:
# 1. Reduce worker count
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 4
# 2. Monitor memory usage
watch -n 1 'ps aux | grep pr-resolve'
# 3. Check memory per worker
# Rule of thumb: ~100MB per worker
AVAILABLE_MB=$(free -m | awk 'NR==2 {print $7}')
MAX_WORKERS=$((AVAILABLE_MB / 100))
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers $MAX_WORKERS
Issue 4: Logging Order Confusion
Symptoms:
Log messages interleaved
Difficult to trace execution
Unclear failure source
Causes:
Multiple worker threads logging concurrently
No log ordering guarantee with parallel processing
Solutions:
# 1. Use timestamps in logs
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 8 \
--log-level INFO | ts '[%Y-%m-%d %H:%M:%.S]'
# 2. Disable parallel for debugging
pr-resolve apply --pr 123 --owner org --repo repo --log-level DEBUG
# 3. Filter logs by file
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 8 \
--log-file /tmp/parallel.log
grep "file_path.py" /tmp/parallel.log
Issue 5: No Speedup on Multi-Core System
Symptoms:
Have 8 CPU cores
No speedup with
--parallelAll workers run on one core
Causes:
Python GIL (Global Interpreter Lock) for CPU-bound work
File I/O not concurrent
Disk bottleneck
Solutions:
# 1. Verify I/O-bound workload
# Parallel helps only for I/O-bound work (file operations)
# Not for CPU-bound work (heavy computation)
# 2. Check CPU usage
htop # Should see multiple cores utilized
# 3. Check disk I/O
iostat -x 1 # Should see disk activity
# 4. If truly CPU-bound, parallel won't help
# Example: Complex conflict resolution algorithms
# Consider sequential processing instead
Best Practices
1. Benchmark Before Production Use
# Always benchmark to find optimal worker count
./benchmark.sh > benchmark-$(date +%Y%m%d).txt
# Use results to configure production
2. Match Workers to Workload
# Small PRs: Sequential or 2-4 workers
# Medium PRs: 4-8 workers
# Large PRs: 8-16 workers
# Very large PRs: 16-32 workers
3. Combine with Rollback for Safety
# Always enable rollback with parallel processing
pr-resolve apply --pr 123 --owner org --repo repo \
--parallel --max-workers 8 \
--rollback # Safety net for worker failures
4. Monitor System Resources
# Check resources before running
free -h # Memory
df -h # Disk
nproc # CPU cores
# Monitor during execution
htop -p $(pgrep pr-resolve)
5. Use Staged Application for Large PRs
# Stage 1: Non-conflicts (high parallelism)
pr-resolve apply --pr 123 --owner org --repo repo \
--mode non-conflicts-only \
--parallel --max-workers 16
# Stage 2: Conflicts (moderate parallelism)
pr-resolve apply --pr 123 --owner org --repo repo \
--mode conflicts-only \
--parallel --max-workers 8
6. Disable Parallel for Debugging
# Development: Sequential for easier debugging
pr-resolve apply --pr 123 --owner org --repo repo --log-level DEBUG
# Production: Parallel for speed
pr-resolve apply --pr 123 --owner org --repo repo --parallel --max-workers 8
7. Configure Per Environment
# dev-config.yaml
parallel:
enabled: false # Easier debugging
# prod-config.yaml
parallel:
enabled: true
max_workers: 16 # Optimized for performance
Limitations
1. No Cross-File Dependency Handling
Limitation: Changes in different files processed independently
Impact: If changes in File A depend on changes in File B, may apply in wrong order
Workaround: Disable parallel processing for PRs with cross-file dependencies
2. Result Ordering Not Preserved
Limitation: Results accumulated in completion order, not input order
Impact: Applied changes list may be in different order than input
Note: This doesn’t affect correctness, only result order
3. Logging Order Not Guaranteed
Limitation: Log messages from different workers may interleave
Impact: Logs harder to read with parallel processing
Workaround: Use timestamps or disable parallel for debugging
4. Python GIL Limits CPU-Bound Parallelism
Limitation: Python’s Global Interpreter Lock prevents true CPU parallelism
Impact: Only I/O-bound work benefits from parallel processing
Note: File operations are I/O-bound, so parallel processing still helps significantly
5. Memory Usage Scales with Workers
Limitation: Each worker consumes memory (~100MB per worker)
Impact: High worker counts may exhaust memory on constrained systems
Solution: Adjust worker count based on available memory
6. Overhead for Small PRs
Limitation: Thread pool creation and management adds overhead
Impact: Parallel processing slower than sequential for small PRs (< 10 files)
Solution: Disable parallel processing for small PRs
7. No Dynamic Worker Scaling
Limitation: Worker count fixed at start, doesn’t adapt during execution
Impact: Can’t optimize worker count for varying file sizes within PR
Solution: Choose worker count based on PR size estimation
See Also
Configuration Reference - Parallel processing configuration
Rollback System - Combining rollback with parallel processing
Getting Started - Basic parallel processing usage
API Reference - Python API for parallel processing
Troubleshooting - General troubleshooting guide