Chapter 79

Case Study 2: Enterprise Knowledge Base Agent (Complete Implementation with RAG + Memory Tool + Managed Agents)

Chapter 79: Building a Code Review Agent: Automated PR Review and Quality Gate Pipeline

79.1 Value and Challenges

Code review is one of the highest-cost human activities in software engineering. Research shows that experienced engineers can carefully review only 200-400 lines of code per hour. For fast-iterating teams, code review routinely becomes the bottleneck in release cycles.

An AI code review agent can:

Respond instantly 24/7, reducing PR review wait times from hours to minutes
Consistently check coding standards, security vulnerabilities, and performance issues
Free human engineers to focus on architecture and business logic review

However, automated code review also presents unique challenges:

Context understanding — PR changes cannot be understood without broader project context
False positive control — Too many false positives erode engineer trust
Quality gating — How to integrate AI review results into the CI/CD pipeline

79.2 System Architecture

GitHub/GitLab
     ↓ Webhook (PR opened/updated)
[Webhook Server]
     ↓
[PR Context Collector]
  - Fetch diff
  - Fetch changed file list
  - Fetch relevant context files
  - Fetch PR description and title
     ↓
[Code Review Agent (Claude)]
  - Security vulnerability scan
  - Coding standards check
  - Logic error detection
  - Performance issue identification
  - Documentation completeness check
     ↓
[Review Result Processor]
  - Format comments
  - Determine quality gate result
  - Post via GitHub API
     ↓
[Quality Gate]
  - Pass: Auto-approve (optional)
  - Fail: Request changes, block merge

79.3 Webhook Server

79.3.1 Receiving GitHub Webhooks

from fastapi import FastAPI, Request, HTTPException, BackgroundTasks
import hmac, hashlib, json
import httpx

app = FastAPI(title="Code Review Agent")

GITHUB_WEBHOOK_SECRET = "your-webhook-secret"
GITHUB_TOKEN = "your-github-token"

def verify_github_signature(payload: bytes, signature: str) -> bool:
    expected = "sha256=" + hmac.new(
        GITHUB_WEBHOOK_SECRET.encode(), payload, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.post("/webhook/github")
async def handle_github_webhook(request: Request, background_tasks: BackgroundTasks):
    signature = request.headers.get("X-Hub-Signature-256", "")
    payload = await request.body()
    
    if not verify_github_signature(payload, signature):
        raise HTTPException(status_code=401, detail="Invalid signature")
    
    event = request.headers.get("X-GitHub-Event")
    data = json.loads(payload)
    
    if event == "pull_request":
        action = data.get("action")
        if action in ["opened", "synchronize", "reopened"]:
            background_tasks.add_task(process_pull_request, data)
    
    return {"status": "received"}

79.3.2 PR Context Collection

class GitHubPRContextCollector:
    
    def __init__(self, token: str):
        self.token = token
        self.headers = {
            "Authorization": f"Bearer {token}",
            "Accept": "application/vnd.github.v3+json"
        }
    
    async def collect(self, repo_full_name: str, pr_number: int) -> dict:
        async with httpx.AsyncClient() as client:
            pr_info = await self._get_pr_info(client, repo_full_name, pr_number)
            diff = await self._get_pr_diff(client, repo_full_name, pr_number)
            files = await self._get_pr_files(client, repo_full_name, pr_number)
            config_files = await self._get_config_files(
                client, repo_full_name, pr_info.get("base", {}).get("sha", "HEAD")
            )
        
        return {
            "pr_title": pr_info.get("title", ""),
            "pr_description": pr_info.get("body", ""),
            "pr_author": pr_info.get("user", {}).get("login", ""),
            "base_branch": pr_info.get("base", {}).get("ref", ""),
            "diff": diff,
            "changed_files": files,
            "config_files": config_files,
            "stats": {
                "additions": pr_info.get("additions", 0),
                "deletions": pr_info.get("deletions", 0),
                "changed_files_count": pr_info.get("changed_files", 0)
            }
        }
    
    async def _get_pr_diff(self, client, repo: str, pr_number: int) -> str:
        response = await client.get(
            f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
            headers={**self.headers, "Accept": "application/vnd.github.v3.diff"}
        )
        return response.text[:50000]  # Cap at 50K chars
    
    async def _get_pr_files(self, client, repo: str, pr_number: int) -> list:
        response = await client.get(
            f"https://api.github.com/repos/{repo}/pulls/{pr_number}/files",
            headers=self.headers, params={"per_page": 100}
        )
        return [
            {
                "filename": f["filename"],
                "status": f["status"],
                "additions": f["additions"],
                "deletions": f["deletions"],
                "patch": f.get("patch", "")[:3000]
            }
            for f in response.json()
        ]
    
    async def _get_config_files(self, client, repo: str, sha: str) -> dict:
        config_paths = [".eslintrc.json", "pyproject.toml", ".flake8", "CONTRIBUTING.md"]
        configs = {}
        for path in config_paths:
            try:
                response = await client.get(
                    f"https://api.github.com/repos/{repo}/contents/{path}",
                    headers=self.headers, params={"ref": sha}
                )
                if response.status_code == 200:
                    import base64
                    content = base64.b64decode(response.json()["content"]).decode()
                    configs[path] = content[:2000]
            except Exception:
                pass
        return configs
    
    async def _get_pr_info(self, client, repo: str, pr_number: int) -> dict:
        response = await client.get(
            f"https://api.github.com/repos/{repo}/pulls/{pr_number}",
            headers=self.headers
        )
        return response.json()

79.4 Code Review Agent

79.4.1 Review Prompt Design

from anthropic import Anthropic

client = Anthropic()

CODE_REVIEW_SYSTEM = """You are a senior engineer specializing in code quality, security, and maintainability.

Review principles:
1. **Constructive**: When identifying issues, always provide specific improvement suggestions
2. **Severity levels**: Distinguish BLOCKER (blocks merge), MAJOR (important recommendation), MINOR (detail suggestion), PRAISE (worth acknowledging)
3. **Precise location**: Specify exact filename and line number
4. **Context-aware**: Consider the overall purpose of the change; avoid criticizing sound design decisions
5. **False positive control**: Under-report rather than over-report uncertain issues

Review priorities:
- 🔴 BLOCKER: Security vulnerabilities (SQL injection, XSS, hardcoded credentials, auth bypass)
- 🔴 BLOCKER: Data loss risk (unhandled exceptions, resource leaks)
- 🟡 MAJOR: Logic errors, unhandled edge cases, O(n²)+ complexity in hot paths
- 🟡 MAJOR: Missing necessary error handling
- 🟢 MINOR: Code style, naming, comments
- 💪 PRAISE: Worthy design decisions or improvements"""

def build_review_prompt(pr_context: dict) -> str:
    files_summary = "\n".join([
        f"- {f['filename']} ({f['status']}, +{f['additions']}/-{f['deletions']})"
        for f in pr_context["changed_files"][:20]
    ])
    
    config_context = ""
    if pr_context.get("config_files"):
        config_context = "\n\nProject configuration reference:\n" + "\n".join([
            f"### {path}\n```\n{content[:500]}\n```"
            for path, content in list(pr_context["config_files"].items())[:3]
        ])
    
    return f"""Please review the following Pull Request.

## PR Information
- **Title**: {pr_context['pr_title']}
- **Description**: {pr_context['pr_description'] or '(no description)'}
- **Author**: {pr_context['pr_author']}
- **Target branch**: {pr_context['base_branch']}
- **Changes**: +{pr_context['stats']['additions']} / -{pr_context['stats']['deletions']}

## Changed Files
{files_summary}

{config_context}

## Code Changes (diff)

```diff
{pr_context['diff'][:40000]}

Output your review in this format:

<review_summary> Overall assessment of this PR (2-3 sentences) </review_summary>

[ {{ "severity": "BLOCKER|MAJOR|MINOR|PRAISE", "file": "file path", "line": line_number_or_null, "category": "security|logic|performance|style|docs|design", "title": "short title", "description": "detailed explanation", "suggestion": "specific improvement suggestion (if applicable)" }} ] {{ "recommendation": "APPROVE|REQUEST_CHANGES|COMMENT", "blocker_count": 0, "major_count": 0, "minor_count": 0, "praise_count": 0, "summary": "one-sentence summary" }} """ ```

79.4.2 Review Result Processing and Publishing

class ReviewResultProcessor:
    
    def __init__(self, github_token: str):
        self.token = github_token
        self.headers = {
            "Authorization": f"Bearer {github_token}",
            "Accept": "application/vnd.github.v3+json"
        }
    
    def parse_review_response(self, raw_response: str) -> dict:
        result = {"summary": "", "issues": [], "verdict": {}}
        
        if "<review_summary>" in raw_response:
            result["summary"] = raw_response.split("<review_summary>")[1]\
                                            .split("</review_summary>")[0].strip()
        
        if "<issues>" in raw_response:
            issues_text = raw_response.split("<issues>")[1].split("</issues>")[0].strip()
            try:
                result["issues"] = json.loads(issues_text)
            except json.JSONDecodeError:
                result["issues"] = []
        
        if "<verdict>" in raw_response:
            verdict_text = raw_response.split("<verdict>")[1].split("</verdict>")[0].strip()
            try:
                result["verdict"] = json.loads(verdict_text)
            except json.JSONDecodeError:
                result["verdict"] = {"recommendation": "COMMENT", "summary": "Parse error"}
        
        return result
    
    def format_pr_comment(self, review_result: dict) -> str:
        verdict = review_result.get("verdict", {})
        issues = review_result.get("issues", [])
        
        recommendation = verdict.get("recommendation", "COMMENT")
        rec_emoji = {"APPROVE": "✅", "REQUEST_CHANGES": "❌", "COMMENT": "💬"}.get(recommendation, "💬")
        
        blockers = [i for i in issues if i.get("severity") == "BLOCKER"]
        majors = [i for i in issues if i.get("severity") == "MAJOR"]
        minors = [i for i in issues if i.get("severity") == "MINOR"]
        praises = [i for i in issues if i.get("severity") == "PRAISE"]
        
        comment = f"""## {rec_emoji} AI Code Review Report

**Overall Assessment**: {review_result.get('summary', '')}

**Review Summary**: 🔴 {len(blockers)} BLOCKER | 🟡 {len(majors)} MAJOR | 🟢 {len(minors)} MINOR | 💪 {len(praises)} PRAISE

---

"""
        for severity, label, items in [
            ("BLOCKER", "🔴 BLOCKER (must fix)", blockers),
            ("MAJOR", "🟡 MAJOR (recommended fix)", majors),
            ("MINOR", "🟢 MINOR (optional improvement)", minors),
            ("PRAISE", "💪 Well done", praises)
        ]:
            if items:
                comment += f"### {label}\n\n"
                for issue in items:
                    location = f"`{issue.get('file', 'unknown')}`"
                    if issue.get('line'):
                        location += f" (line {issue['line']})"
                    comment += f"**{issue.get('title', 'Issue')}** — {location}\n\n"
                    comment += f"{issue.get('description', '')}\n\n"
                    if issue.get('suggestion'):
                        comment += f"> 💡 Suggestion: {issue['suggestion']}\n\n"
        
        comment += "\n---\n*Generated by AI Code Review Agent | Model: Claude Opus | This review is advisory — final decisions rest with human reviewers*"
        return comment
    
    async def post_review(self, repo: str, pr_number: int, review_result: dict) -> bool:
        comment_body = self.format_pr_comment(review_result)
        recommendation = review_result.get("verdict", {}).get("recommendation", "COMMENT")
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
                headers=self.headers,
                json={"body": comment_body, "event": recommendation}
            )
            return response.status_code in [200, 201]

79.5 Quality Gate Pipeline

79.5.1 Quality Gate Rules

class QualityGate:
    
    def __init__(self, config: dict):
        """
        config example:
        {
            "block_on_blocker": true,
            "block_on_major_count": 5,
            "max_pr_size": 500,
            "blocked_file_patterns": ["secrets.py", "*.pem"]
        }
        """
        self.config = config
    
    def evaluate(self, pr_context: dict, review_result: dict) -> dict:
        failures = []
        warnings = []
        
        issues = review_result.get("issues", [])
        blockers = [i for i in issues if i.get("severity") == "BLOCKER"]
        majors = [i for i in issues if i.get("severity") == "MAJOR"]
        
        # Rule 1: BLOCKER issues block merge
        if self.config.get("block_on_blocker", True) and blockers:
            failures.append({
                "rule": "no_blockers",
                "message": f"{len(blockers)} BLOCKER issue(s) must be fixed before merging",
                "issues": [b.get("title") for b in blockers]
            })
        
        # Rule 2: MAJOR count limit
        max_major = self.config.get("block_on_major_count", 999)
        if len(majors) > max_major:
            failures.append({
                "rule": "major_count_limit",
                "message": f"MAJOR issue count ({len(majors)}) exceeds limit ({max_major})"
            })
        
        # Rule 3: PR size check
        max_size = self.config.get("max_pr_size", 1000)
        total_changes = pr_context["stats"]["additions"] + pr_context["stats"]["deletions"]
        if total_changes > max_size:
            warnings.append({
                "rule": "pr_size",
                "message": f"Large PR ({total_changes} lines changed) — consider splitting into smaller PRs"
            })
        
        # Rule 4: Blocked file patterns
        import fnmatch
        for file in pr_context["changed_files"]:
            for pattern in self.config.get("blocked_file_patterns", []):
                if fnmatch.fnmatch(file["filename"], pattern):
                    failures.append({
                        "rule": "blocked_file",
                        "message": f"Changed protected file: {file['filename']}"
                    })
        
        return {
            "passed": len(failures) == 0,
            "failures": failures,
            "warnings": warnings,
            "recommendation": "APPROVE" if len(failures) == 0 else "REQUEST_CHANGES"
        }

79.5.2 Complete PR Processing Pipeline

async def process_pull_request(data: dict):
    repo = data["repository"]["full_name"]
    pr_number = data["pull_request"]["number"]
    
    try:
        # Step 1: Collect context
        collector = GitHubPRContextCollector(GITHUB_TOKEN)
        pr_context = await collector.collect(repo, pr_number)
        
        # Step 2: Call Claude for review
        review_prompt = build_review_prompt(pr_context)
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4000,
            system=CODE_REVIEW_SYSTEM,
            messages=[{"role": "user", "content": review_prompt}]
        )
        
        # Step 3: Parse review results
        processor = ReviewResultProcessor(GITHUB_TOKEN)
        review_result = processor.parse_review_response(response.content[0].text)
        
        # Step 4: Quality gate evaluation
        gate = QualityGate({
            "block_on_blocker": True,
            "block_on_major_count": 5,
            "max_pr_size": 800
        })
        gate_result = gate.evaluate(pr_context, review_result)
        
        if not gate_result["passed"]:
            review_result["verdict"]["recommendation"] = "REQUEST_CHANGES"
        
        # Step 5: Post review to GitHub
        await processor.post_review(repo, pr_number, review_result)
        
        print(f"PR #{pr_number} review complete. Quality gate passed: {gate_result['passed']}")
        
    except Exception as e:
        print(f"PR #{pr_number} review failed: {e}")

79.6 Specialized Review Modes

SPECIALIZED_REVIEW_PROMPTS = {
    "security": """Pay special attention to:
- SQL injection: Are there un-parameterized database queries?
- XSS: Is unsanitized user input rendered directly to HTML?
- Hardcoded secrets: Are there API keys, passwords, or tokens in the code?
- Auth checks: Do critical operations have proper authorization?
- Dependency security: Do new dependencies have known vulnerabilities?""",
    
    "performance": """Pay special attention to:
- N+1 queries: Are database queries made inside loops?
- Missing indexes: Do high-frequency queries leverage indexes?
- Memory leaks: Are resources properly released?
- Blocking synchronous operations: Could any be made async?""",
    
    "api_design": """Pay special attention to:
- RESTful convention adherence
- Consistent error response format
- Backward compatibility (does this break existing clients?)
- API documentation completeness"""
}

Summary

The core engineering pattern for a code review agent is: Webhook trigger → Context collection → AI analysis → Structured output → API publishing. Each stage has its own engineering challenges: webhook reliability and security, balancing context collection completeness against token limits, parsing structured review results, and tuning quality gate thresholds.

The critical success factor is false positive control. A review bot that generates too many meaningless comments is worse than no bot at all — engineers will start dismissing all bot comments, ultimately destroying trust. Start with conservative thresholds and adjust gradually as data accumulates. This is the most robust production launch strategy.

Rate this chapter

4.6 / 5 (3 ratings)