Case Study: Code Review and Auto-Fix Agent
Chapter 70: Case Study — Code Review & Auto-Fix Agent
Chapter Introduction
Code review is one of the most time-consuming and cognitively demanding phases in software engineering. Studies show that mid-sized engineering teams spend 15–20% of their weekly development time on code review, and a significant portion of review comments address repetitive, mechanical issues — naming violations, unhandled exceptions, missing tests, SQL injection risks. These problems should be caught and fixed automatically. This chapter builds a complete Hermes Code Review & Auto-Fix Agent integrated with GitHub Actions CI/CD, giving every Pull Request an AI-powered audit pass before human review begins.
70.1 Requirements: The Pain Points of Automated Code Review
The Traditional PR Review Bottleneck
Traditional PR review flow:
Developer opens PR
↓
Wait for reviewer availability (0.5–2 days)
↓
Reviewer reads code line-by-line (30–90 min/PR)
↓
Leave comments (50% are repetitive style/safety issues)
↓
Developer fixes → waits again → repeat...
Core pain point summary:
| Pain Point | Impact | Severity |
|---|---|---|
| Long review wait times | Blocks feature delivery | High |
| Repetitive issues drain reviewer attention | Reduces review quality | High |
| Security vulnerabilities may slip through | Production incident risk | Critical |
| No unified quality baseline | Inconsistent codebase quality | Medium |
| Subjective reviewer differences | Hard to enforce team standards | Medium |
Agent Target Capabilities
The agent should:
- Multi-language support: Python, JavaScript/TypeScript, Go
- Multi-dimensional review: Security, performance, readability, conventions
- Auto-fix: Generate patches for clearly fixable issues
- PR integration: Output results as GitHub PR Review comments
- Configurable rules: Teams define their own standards
70.2 System Architecture
High-Level Architecture
┌─────────────────────────────────────────────────────────┐
│ GitHub Repository │
│ │
│ Developer Push → Pull Request created/updated │
└────────────────────────┬────────────────────────────────┘
│ webhook / GitHub Actions trigger
▼
┌─────────────────────────────────────────────────────────┐
│ GitHub Actions CI Runner │
│ │
│ 1. Fetch PR diff (GitHub API) │
│ 2. Parse changed file list │
│ 3. Invoke Hermes Agent for analysis │
│ 4. Format results and publish Review │
└────────────────────────┬────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Hermes Agent Core │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Static │ │ Semantic │ │ Fix │ │
│ │ Analysis │ │ (LLM Core) │ │ Generation │ │
│ │ Toolset │ │ │ │ Toolset │ │
│ │ │ │ - Security │ │ │ │
│ │ - AST parse │ │ - Logic │ │ - diff gen │ │
│ │ - Linter │ │ - Best prac │ │ - patch apply │ │
│ │ - Dep scan │ │ - Review │ │ - suggestions │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ GitHub PR Review Output │
│ │
│ • Line-level comments (issue location) │
│ • Fix suggestions (code blocks) │
│ • Overall decision (APPROVE / REQUEST_CHANGES) │
│ • Optional: auto-commit fix patches │
└─────────────────────────────────────────────────────────┘
Tool Inventory
| Tool Name | Purpose | Input | Output |
|---|---|---|---|
get_pr_diff |
Fetch PR changes | PR number | diff text |
parse_code_ast |
AST parse | code + language | AST tree |
run_static_analysis |
Lint + static checks | file + language | issue list |
check_security_patterns |
Security pattern match | code snippet | vuln report |
generate_fix_patch |
Generate fix patch | issue + code | unified diff |
post_pr_review |
Publish PR comments | review data | GitHub API response |
70.3 Full Implementation
Project Structure
code-review-agent/
├── agent/
│ ├── hermes_agent.py
│ ├── tools/
│ │ ├── github_tools.py
│ │ ├── analysis_tools.py
│ │ └── fix_tools.py
│ └── prompts/
│ └── system_prompt.py
├── config/
│ └── review_rules.yaml
├── .github/
│ └── workflows/
│ └── code_review.yml
└── main.py
Core Agent
# agent/hermes_agent.py
import os
import json
from openai import OpenAI
client = OpenAI(
base_url=os.getenv("HERMES_BASE_URL", "http://localhost:11434/v1"),
api_key=os.getenv("HERMES_API_KEY", "ollama"),
)
MODEL = os.getenv("HERMES_MODEL", "nous-hermes-2-mixtral-8x7b-dpo")
SYSTEM_PROMPT = """You are a senior software engineer specializing in code quality and security review.
Your task is to thoroughly review Pull Request changes and provide actionable feedback.
Review dimensions:
1. Security: SQL injection, XSS, unsafe deserialization, hardcoded secrets
2. Performance: N+1 queries, unnecessary loops, memory leaks, blocking I/O
3. Maintainability: naming, function complexity, duplication, missing docs
4. Robustness: error handling, edge cases, null checks
5. Test coverage: critical paths covered
For each issue, provide:
- Location (filename + line number)
- Severity (critical/major/minor/suggestion)
- Clear description of the problem
- Concrete fix code
Work systematically through each changed file."""
TOOLS = [
{
"type": "function",
"function": {
"name": "get_pr_diff",
"description": "Fetch the code diff for a Pull Request",
"parameters": {
"type": "object",
"properties": {
"pr_number": {"type": "integer"},
"file_filter": {"type": "string", "default": "*"}
},
"required": ["pr_number"]
}
}
},
{
"type": "function",
"function": {
"name": "run_static_analysis",
"description": "Run static analysis on a file",
"parameters": {
"type": "object",
"properties": {
"file_content": {"type": "string"},
"language": {
"type": "string",
"enum": ["python", "javascript", "typescript", "go"]
},
"filename": {"type": "string"}
},
"required": ["file_content", "language", "filename"]
}
}
},
{
"type": "function",
"function": {
"name": "check_security_patterns",
"description": "Check for security vulnerability patterns",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string"},
"language": {"type": "string"}
},
"required": ["code", "language"]
}
}
},
{
"type": "function",
"function": {
"name": "generate_fix_patch",
"description": "Generate a fix patch for a discovered issue",
"parameters": {
"type": "object",
"properties": {
"original_code": {"type": "string"},
"issue_description": {"type": "string"},
"language": {"type": "string"}
},
"required": ["original_code", "issue_description", "language"]
}
}
},
{
"type": "function",
"function": {
"name": "post_pr_review",
"description": "Post review comments to the GitHub PR",
"parameters": {
"type": "object",
"properties": {
"pr_number": {"type": "integer"},
"review_body": {"type": "string"},
"comments": {"type": "array"},
"action": {
"type": "string",
"enum": ["APPROVE", "REQUEST_CHANGES", "COMMENT"]
}
},
"required": ["pr_number", "review_body", "comments", "action"]
}
}
}
]
def run_code_review_agent(pr_number: int, repo: str) -> dict:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": f"""Please review PR #{pr_number} in repository {repo}.
Steps:
1. Fetch the PR code changes
2. Run static analysis and security checks on each changed file
3. Summarize issues by severity
4. Generate fix suggestions for fixable issues
5. Post the PR Review
Begin the review."""
}
]
max_iterations = 20
for iteration in range(max_iterations):
response = client.chat.completions.create(
model=MODEL,
messages=messages,
tools=TOOLS,
tool_choice="auto",
temperature=0.1,
)
message = response.choices[0].message
messages.append(message)
if not message.tool_calls:
return {"status": "completed", "summary": message.content, "iterations": iteration + 1}
for tool_call in message.tool_calls:
tool_args = json.loads(tool_call.function.arguments)
result = _dispatch_tool(tool_call.function.name, tool_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return {"status": "max_iterations_reached"}
def _dispatch_tool(name: str, args: dict):
from .tools import github_tools, analysis_tools, fix_tools
dispatch = {
"get_pr_diff": github_tools.get_pr_diff,
"run_static_analysis": analysis_tools.run_static_analysis,
"check_security_patterns": analysis_tools.check_security_patterns,
"generate_fix_patch": fix_tools.generate_fix_patch,
"post_pr_review": github_tools.post_pr_review,
}
return dispatch[name](**args) if name in dispatch else {"error": f"unknown tool: {name}"}
Security Pattern Checker
# agent/tools/analysis_tools.py
import re
SECURITY_PATTERNS = {
"python": [
{"pattern": r"eval\s*\(", "name": "Dangerous eval()", "severity": "critical",
"description": "eval() executes arbitrary code — code injection risk"},
{"pattern": r'f".*SELECT.*\{', "name": "SQL Injection", "severity": "critical",
"description": "f-string SQL concatenation — use parameterized queries"},
{"pattern": r"pickle\.loads?\(", "name": "Unsafe deserialization", "severity": "major",
"description": "pickle.loads on untrusted data can lead to RCE"},
{"pattern": r'(password|secret|api_key)\s*=\s*["\'][^"\']+["\']',
"name": "Hardcoded credential", "severity": "critical",
"description": "Secrets must not be hardcoded in source"},
],
"javascript": [
{"pattern": r"eval\s*\(", "name": "Dangerous eval()", "severity": "critical",
"description": "eval() creates XSS and code injection vectors"},
{"pattern": r"innerHTML\s*=", "name": "XSS risk", "severity": "major",
"description": "Direct innerHTML assignment — use textContent or DOMPurify"},
],
"go": [
{"pattern": r'fmt\.Sprintf.*".*SELECT', "name": "SQL Injection", "severity": "critical",
"description": "String-formatted SQL — use parameterized queries"},
]
}
def check_security_patterns(code: str, language: str) -> dict:
patterns = SECURITY_PATTERNS.get(language, [])
issues = []
for i, line in enumerate(code.split("\n"), 1):
for p in patterns:
if re.search(p["pattern"], line, re.IGNORECASE):
issues.append({"line": i, "line_content": line.strip(),
"issue_name": p["name"], "severity": p["severity"],
"description": p["description"]})
return {"language": language, "total_issues": len(issues), "issues": issues}
def run_static_analysis(file_content: str, language: str, filename: str) -> dict:
issues = []
sec = check_security_patterns(file_content, language)
for issue in sec["issues"]:
issues.append({
"line": issue["line"],
"severity": issue["severity"],
"message": f"[Security] {issue['issue_name']}: {issue['description']}",
"rule": "security"
})
# Additional linter integration would go here
return {"filename": filename, "language": language, "issues": issues}
GitHub Actions Workflow
# .github/workflows/code_review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize, reopened]
paths: ['**.py', '**.js', '**.ts', '**.go']
concurrency:
group: code-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
ai-code-review:
runs-on: ubuntu-latest
if: github.event.pull_request.head.repo.full_name == github.repository
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install openai requests flake8
- name: Run AI Code Review Agent
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
HERMES_BASE_URL: ${{ secrets.HERMES_BASE_URL }}
HERMES_API_KEY: ${{ secrets.HERMES_API_KEY }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: python main.py
timeout-minutes: 10
70.4 Pitfalls & Solutions
Pitfall 1: GitHub API Rate Limiting
Large PRs (100+ files) quickly exhaust the GitHub API rate limit. Solution: implement exponential backoff retry with Retry-After header respect.
def get_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
return resp
if resp.status_code == 403:
wait = int(resp.headers.get("Retry-After", 60))
time.sleep(wait)
else:
resp.raise_for_status()
raise Exception("Max retries exceeded")
Pitfall 2: Unstable Tool-Call Format
Hermes occasionally produces malformed JSON in tool arguments. Guard with:
def safe_parse_args(raw: str) -> dict:
try:
return json.loads(raw)
except json.JSONDecodeError:
return json.loads(raw.replace("'", '"'))
Pitfall 3: Token Limit on Large Files
Never pass full file content — only analyze the diff hunks plus surrounding context:
def extract_changed_sections(patch: str, context: int = 10) -> str:
lines = patch.split("\n")
sections = []
for i, line in enumerate(lines):
if line.startswith(("+", "-")):
s = max(0, i - context)
e = min(len(lines), i + context)
section = "\n".join(lines[s:e])
if section not in sections:
sections.append(section)
return "\n---\n".join(sections)
Pitfall 4: False Positives in Comments/Strings
Pattern matching fires on comments and string literals containing "dangerous" keywords. Strip them first:
def strip_comments(code: str, language: str) -> str:
if language == "python":
code = re.sub(r'#.*$', '', code, flags=re.MULTILINE)
code = re.sub(r'""".*?"""', '""', code, flags=re.DOTALL)
return code
Chapter Summary
This chapter built a complete Hermes-powered code review and auto-fix agent:
- Problem: Repetitive review comments drain engineer time and delay delivery
- Architecture: GitHub Actions → Hermes Agent → multi-tool pipeline → PR comments
- Implementation: Security patterns, static analysis, fix generation, GitHub API posting
- Engineering: Practical solutions for rate limiting, format instability, token limits, and false positives
The agent's value is not to replace human review but to filter noise — so reviewers focus on architecture and business logic, not console.log left in production.
Discussion Questions
- How would you design a metric to measure the alignment rate between AI review and human review?
- Should the agent be allowed to auto-merge PRs if it finds no critical issues?
- How can historical PR data be used to fine-tune the review agent over time?
- In a polyglot project (Python backend + TypeScript frontend), how do you unify the quality baseline?