Case Study 2: Enterprise Knowledge Base Agent (Complete Implementation with RAG + Memory Tool + Managed Agents)
Chapter 79: Building a Code Review Agent: Automated PR Review and Quality Gate Pipeline
79.1 Value and Challenges
Code review is one of the highest-cost human activities in software engineering. Research shows that experienced engineers can carefully review only 200-400 lines of code per hour. For fast-iterating teams, code review routinely becomes the bottleneck in release cycles.
An AI code review agent can:
- Respond instantly 24/7, reducing PR review wait times from hours to minutes
- Consistently check coding standards, security vulnerabilities, and performance issues
- Free human engineers to focus on architecture and business logic review
However, automated code review also presents unique challenges:
- Context understanding — PR changes cannot be understood without broader project context
- False positive control — Too many false positives erode engineer trust
- Quality gating — How to integrate AI review results into the CI/CD pipeline
79.2 System Architecture
GitHub/GitLab
↓ Webhook (PR opened/updated)
[Webhook Server]
↓
[PR Context Collector]
- Fetch diff
- Fetch changed file list
- Fetch relevant context files
- Fetch PR description and title
↓
[Code Review Agent (Claude)]
- Security vulnerability scan
- Coding standards check
- Logic error detection
- Performance issue identification
- Documentation completeness check
↓
[Review Result Processor]
- Format comments
- Determine quality gate result
- Post via GitHub API
↓
[Quality Gate]
- Pass: Auto-approve (optional)
- Fail: Request changes, block merge
79.3 Webhook Server
79.3.1 Receiving GitHub Webhooks
from fastapi import FastAPI, Request, HTTPException, BackgroundTasks
import hmac, hashlib, json
import httpx
app = FastAPI(title="Code Review Agent")
GITHUB_WEBHOOK_SECRET = "your-webhook-secret"
GITHUB_TOKEN = "your-github-token"
def verify_github_signature(payload: bytes, signature: str) -> bool:
expected = "sha256=" + hmac.new(
GITHUB_WEBHOOK_SECRET.encode(), payload, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
@app.post("/webhook/github")
async def handle_github_webhook(request: Request, background_tasks: BackgroundTasks):
signature = request.headers.get("X-Hub-Signature-256", "")
payload = await request.body()
if not verify_github_signature(payload, signature):
raise HTTPException(status_code=401, detail="Invalid signature")
event = request.headers.get("X-GitHub-Event")
data = json.loads(payload)
if event == "pull_request":
action = data.get("action")
if action in ["opened", "synchronize", "reopened"]:
background_tasks.add_task(process_pull_request, data)
return {"status": "received"}
79.3.2 PR Context Collection
class GitHubPRContextCollector:
def __init__(self, token: str):
self.token = token
self.headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github.v3+json"
}
async def collect(self, repo_full_name: str, pr_number: int) -> dict:
async with httpx.AsyncClient() as client:
pr_info = await self._get_pr_info(client, repo_full_name, pr_number)
diff = await self._get_pr_diff(client, repo_full_name, pr_number)
files = await self._get_pr_files(client, repo_full_name, pr_number)
config_files = await self._get_config_files(
client, repo_full_name, pr_info.get("base", {}).get("sha", "HEAD")
)
return {
"pr_title": pr_info.get("title", ""),
"pr_description": pr_info.get("body", ""),
"pr_author": pr_info.get("user", {}).get("login", ""),
"base_branch": pr_info.get("base", {}).get("ref", ""),
"diff": diff,
"changed_files": files,
"config_files": config_files,
"stats": {
"additions": pr_info.get("additions", 0),
"deletions": pr_info.get("deletions", 0),
"changed_files_count": pr_info.get("changed_files", 0)
}
}
async def _get_pr_diff(self, client, repo: str, pr_number: int) -> str:
response = await client.get(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
headers={**self.headers, "Accept": "application/vnd.github.v3.diff"}
)
return response.text[:50000] # Cap at 50K chars
async def _get_pr_files(self, client, repo: str, pr_number: int) -> list:
response = await client.get(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}/files",
headers=self.headers, params={"per_page": 100}
)
return [
{
"filename": f["filename"],
"status": f["status"],
"additions": f["additions"],
"deletions": f["deletions"],
"patch": f.get("patch", "")[:3000]
}
for f in response.json()
]
async def _get_config_files(self, client, repo: str, sha: str) -> dict:
config_paths = [".eslintrc.json", "pyproject.toml", ".flake8", "CONTRIBUTING.md"]
configs = {}
for path in config_paths:
try:
response = await client.get(
f"https://api.github.com/repos/{repo}/contents/{path}",
headers=self.headers, params={"ref": sha}
)
if response.status_code == 200:
import base64
content = base64.b64decode(response.json()["content"]).decode()
configs[path] = content[:2000]
except Exception:
pass
return configs
async def _get_pr_info(self, client, repo: str, pr_number: int) -> dict:
response = await client.get(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
headers=self.headers
)
return response.json()
79.4 Code Review Agent
79.4.1 Review Prompt Design
from anthropic import Anthropic
client = Anthropic()
CODE_REVIEW_SYSTEM = """You are a senior engineer specializing in code quality, security, and maintainability.
Review principles:
1. **Constructive**: When identifying issues, always provide specific improvement suggestions
2. **Severity levels**: Distinguish BLOCKER (blocks merge), MAJOR (important recommendation), MINOR (detail suggestion), PRAISE (worth acknowledging)
3. **Precise location**: Specify exact filename and line number
4. **Context-aware**: Consider the overall purpose of the change; avoid criticizing sound design decisions
5. **False positive control**: Under-report rather than over-report uncertain issues
Review priorities:
- 🔴 BLOCKER: Security vulnerabilities (SQL injection, XSS, hardcoded credentials, auth bypass)
- 🔴 BLOCKER: Data loss risk (unhandled exceptions, resource leaks)
- 🟡 MAJOR: Logic errors, unhandled edge cases, O(n²)+ complexity in hot paths
- 🟡 MAJOR: Missing necessary error handling
- 🟢 MINOR: Code style, naming, comments
- 💪 PRAISE: Worthy design decisions or improvements"""
def build_review_prompt(pr_context: dict) -> str:
files_summary = "\n".join([
f"- {f['filename']} ({f['status']}, +{f['additions']}/-{f['deletions']})"
for f in pr_context["changed_files"][:20]
])
config_context = ""
if pr_context.get("config_files"):
config_context = "\n\nProject configuration reference:\n" + "\n".join([
f"### {path}\n```\n{content[:500]}\n```"
for path, content in list(pr_context["config_files"].items())[:3]
])
return f"""Please review the following Pull Request.
## PR Information
- **Title**: {pr_context['pr_title']}
- **Description**: {pr_context['pr_description'] or '(no description)'}
- **Author**: {pr_context['pr_author']}
- **Target branch**: {pr_context['base_branch']}
- **Changes**: +{pr_context['stats']['additions']} / -{pr_context['stats']['deletions']}
## Changed Files
{files_summary}
{config_context}
## Code Changes (diff)
```diff
{pr_context['diff'][:40000]}
Output your review in this format:
<review_summary> Overall assessment of this PR (2-3 sentences) </review_summary>
79.4.2 Review Result Processing and Publishing
class ReviewResultProcessor:
def __init__(self, github_token: str):
self.token = github_token
self.headers = {
"Authorization": f"Bearer {github_token}",
"Accept": "application/vnd.github.v3+json"
}
def parse_review_response(self, raw_response: str) -> dict:
result = {"summary": "", "issues": [], "verdict": {}}
if "<review_summary>" in raw_response:
result["summary"] = raw_response.split("<review_summary>")[1]\
.split("</review_summary>")[0].strip()
if "<issues>" in raw_response:
issues_text = raw_response.split("<issues>")[1].split("</issues>")[0].strip()
try:
result["issues"] = json.loads(issues_text)
except json.JSONDecodeError:
result["issues"] = []
if "<verdict>" in raw_response:
verdict_text = raw_response.split("<verdict>")[1].split("</verdict>")[0].strip()
try:
result["verdict"] = json.loads(verdict_text)
except json.JSONDecodeError:
result["verdict"] = {"recommendation": "COMMENT", "summary": "Parse error"}
return result
def format_pr_comment(self, review_result: dict) -> str:
verdict = review_result.get("verdict", {})
issues = review_result.get("issues", [])
recommendation = verdict.get("recommendation", "COMMENT")
rec_emoji = {"APPROVE": "✅", "REQUEST_CHANGES": "❌", "COMMENT": "💬"}.get(recommendation, "💬")
blockers = [i for i in issues if i.get("severity") == "BLOCKER"]
majors = [i for i in issues if i.get("severity") == "MAJOR"]
minors = [i for i in issues if i.get("severity") == "MINOR"]
praises = [i for i in issues if i.get("severity") == "PRAISE"]
comment = f"""## {rec_emoji} AI Code Review Report
**Overall Assessment**: {review_result.get('summary', '')}
**Review Summary**: 🔴 {len(blockers)} BLOCKER | 🟡 {len(majors)} MAJOR | 🟢 {len(minors)} MINOR | 💪 {len(praises)} PRAISE
---
"""
for severity, label, items in [
("BLOCKER", "🔴 BLOCKER (must fix)", blockers),
("MAJOR", "🟡 MAJOR (recommended fix)", majors),
("MINOR", "🟢 MINOR (optional improvement)", minors),
("PRAISE", "💪 Well done", praises)
]:
if items:
comment += f"### {label}\n\n"
for issue in items:
location = f"`{issue.get('file', 'unknown')}`"
if issue.get('line'):
location += f" (line {issue['line']})"
comment += f"**{issue.get('title', 'Issue')}** — {location}\n\n"
comment += f"{issue.get('description', '')}\n\n"
if issue.get('suggestion'):
comment += f"> 💡 Suggestion: {issue['suggestion']}\n\n"
comment += "\n---\n*Generated by AI Code Review Agent | Model: Claude Opus | This review is advisory — final decisions rest with human reviewers*"
return comment
async def post_review(self, repo: str, pr_number: int, review_result: dict) -> bool:
comment_body = self.format_pr_comment(review_result)
recommendation = review_result.get("verdict", {}).get("recommendation", "COMMENT")
async with httpx.AsyncClient() as client:
response = await client.post(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
headers=self.headers,
json={"body": comment_body, "event": recommendation}
)
return response.status_code in [200, 201]
79.5 Quality Gate Pipeline
79.5.1 Quality Gate Rules
class QualityGate:
def __init__(self, config: dict):
"""
config example:
{
"block_on_blocker": true,
"block_on_major_count": 5,
"max_pr_size": 500,
"blocked_file_patterns": ["secrets.py", "*.pem"]
}
"""
self.config = config
def evaluate(self, pr_context: dict, review_result: dict) -> dict:
failures = []
warnings = []
issues = review_result.get("issues", [])
blockers = [i for i in issues if i.get("severity") == "BLOCKER"]
majors = [i for i in issues if i.get("severity") == "MAJOR"]
# Rule 1: BLOCKER issues block merge
if self.config.get("block_on_blocker", True) and blockers:
failures.append({
"rule": "no_blockers",
"message": f"{len(blockers)} BLOCKER issue(s) must be fixed before merging",
"issues": [b.get("title") for b in blockers]
})
# Rule 2: MAJOR count limit
max_major = self.config.get("block_on_major_count", 999)
if len(majors) > max_major:
failures.append({
"rule": "major_count_limit",
"message": f"MAJOR issue count ({len(majors)}) exceeds limit ({max_major})"
})
# Rule 3: PR size check
max_size = self.config.get("max_pr_size", 1000)
total_changes = pr_context["stats"]["additions"] + pr_context["stats"]["deletions"]
if total_changes > max_size:
warnings.append({
"rule": "pr_size",
"message": f"Large PR ({total_changes} lines changed) — consider splitting into smaller PRs"
})
# Rule 4: Blocked file patterns
import fnmatch
for file in pr_context["changed_files"]:
for pattern in self.config.get("blocked_file_patterns", []):
if fnmatch.fnmatch(file["filename"], pattern):
failures.append({
"rule": "blocked_file",
"message": f"Changed protected file: {file['filename']}"
})
return {
"passed": len(failures) == 0,
"failures": failures,
"warnings": warnings,
"recommendation": "APPROVE" if len(failures) == 0 else "REQUEST_CHANGES"
}
79.5.2 Complete PR Processing Pipeline
async def process_pull_request(data: dict):
repo = data["repository"]["full_name"]
pr_number = data["pull_request"]["number"]
try:
# Step 1: Collect context
collector = GitHubPRContextCollector(GITHUB_TOKEN)
pr_context = await collector.collect(repo, pr_number)
# Step 2: Call Claude for review
review_prompt = build_review_prompt(pr_context)
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4000,
system=CODE_REVIEW_SYSTEM,
messages=[{"role": "user", "content": review_prompt}]
)
# Step 3: Parse review results
processor = ReviewResultProcessor(GITHUB_TOKEN)
review_result = processor.parse_review_response(response.content[0].text)
# Step 4: Quality gate evaluation
gate = QualityGate({
"block_on_blocker": True,
"block_on_major_count": 5,
"max_pr_size": 800
})
gate_result = gate.evaluate(pr_context, review_result)
if not gate_result["passed"]:
review_result["verdict"]["recommendation"] = "REQUEST_CHANGES"
# Step 5: Post review to GitHub
await processor.post_review(repo, pr_number, review_result)
print(f"PR #{pr_number} review complete. Quality gate passed: {gate_result['passed']}")
except Exception as e:
print(f"PR #{pr_number} review failed: {e}")
79.6 Specialized Review Modes
SPECIALIZED_REVIEW_PROMPTS = {
"security": """Pay special attention to:
- SQL injection: Are there un-parameterized database queries?
- XSS: Is unsanitized user input rendered directly to HTML?
- Hardcoded secrets: Are there API keys, passwords, or tokens in the code?
- Auth checks: Do critical operations have proper authorization?
- Dependency security: Do new dependencies have known vulnerabilities?""",
"performance": """Pay special attention to:
- N+1 queries: Are database queries made inside loops?
- Missing indexes: Do high-frequency queries leverage indexes?
- Memory leaks: Are resources properly released?
- Blocking synchronous operations: Could any be made async?""",
"api_design": """Pay special attention to:
- RESTful convention adherence
- Consistent error response format
- Backward compatibility (does this break existing clients?)
- API documentation completeness"""
}
Summary
The core engineering pattern for a code review agent is: Webhook trigger → Context collection → AI analysis → Structured output → API publishing. Each stage has its own engineering challenges: webhook reliability and security, balancing context collection completeness against token limits, parsing structured review results, and tuning quality gate thresholds.
The critical success factor is false positive control. A review bot that generates too many meaningless comments is worse than no bot at all — engineers will start dismissing all bot comments, ultimately destroying trust. Start with conservative thresholds and adjust gradually as data accumulates. This is the most robust production launch strategy.