Chapter 12

AI Code Review — Automated PR Review Setup and 5 Specialized Templates

Chapter 12: AI Code Review — Automated PR Review and 5 Specialized Templates

Automate PR Review with Claude API: a complete, runnable GitHub Actions configuration that triggers on every PR, posts findings as comments, and keeps costs under $0.01 per review. Plus 5 specialized prompt templates covering security, database performance, error handling, test coverage, and API design — because a generic prompt finds half as many real issues.

Positioning AI Review: First-Pass Filter, Not Final Verdict

Before setting up AI review, be precise about what it can and cannot do:

AI Review is good at	AI Review is bad at
Finding common security vulnerabilities (SQL injection, XSS, hardcoded secrets)	Judging whether business logic is correct (AI doesn't know your domain rules)
Spotting N+1 queries, missing indexes, obvious performance issues	Evaluating architecture trade-offs (needs full system context)
Checking error handling completeness (swallowed exceptions)	Identifying bugs that require domain knowledge to recognize
Enforcing code style consistency	Catching conflicts with team historical decisions (tacit knowledge)
24/7 availability, completely objective	Judging whether a feature implementation fits product requirements

Conclusion: AI does the first-pass review, filtering mechanical issues. Humans do the second pass for business logic and architecture. With this division, human review time drops 40-60% while quality increases — humans focus only where judgment is genuinely needed.

Method 1: Local Quick Review (Zero Config, Works Now)

# Quick review of current changes before committing
git diff HEAD | claude -p "
You are a senior engineer doing a code review.

Analyze this diff and identify:
1. Security vulnerabilities (SQL injection, XSS, hardcoded secrets, missing auth)
2. Obvious performance issues (N+1 queries, synchronous IO in loops)
3. Incomplete error handling (swallowed exceptions, missing timeouts)
4. Null/undefined access risks

For each issue: filename and line number, severity (high/medium/low), fix suggestion.
Only report real issues. If none found, say LGTM.
"

Method 2: GitHub Actions Auto PR Review (Complete Runnable Config)

# YAML — .github/workflows/ai-review.yml
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR diff
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > pr.diff
          echo "DIFF_SIZE=$(wc -c < pr.diff)" >> $GITHUB_ENV

      - name: AI Review
        if: env.DIFF_SIZE != '0'
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install anthropic
          python3 << 'EOF'
          import anthropic, os

          diff = open('pr.diff').read()[:30000]  # cap size to stay within token limit

          client = anthropic.Anthropic()
          response = client.messages.create(
              model="claude-haiku-4-5-20251001",  # Haiku: cheaper and faster for review
              max_tokens=2048,
              messages=[{
                  "role": "user",
                  "content": f"""You are a senior engineer doing a code review.

Analyze this PR diff and identify:
1. Security vulnerabilities (SQL injection, XSS, hardcoded secrets, missing auth)
2. Obvious performance issues (N+1 queries, missing indexes, blocking operations)
3. Incomplete error handling (swallowed exceptions, missing timeouts)
4. Null/undefined dereference risks

Output format:
**[HIGH/MEDIUM/LOW]** `file:line` — description — fix suggestion

Only report real issues. Skip nitpicks. If no issues found, say "LGTM".

Diff:
{diff}"""
              }]
          )

          review = response.content[0].text
          with open('review.txt', 'w') as f:
              f.write(review)
          print(review)
          EOF

      - name: Post Review Comment
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          REVIEW=$(cat review.txt)
          gh pr comment ${{ github.event.number }} --body "## AI Code Review

$REVIEW

---
*Auto-generated by Claude. Human review still required for business logic and architecture.*"

Setup note: Add ANTHROPIC_API_KEY in repository Settings → Secrets. Using claude-haiku-4-5-20251001 instead of claude-sonnet keeps cost at $0.002-0.01 per review — negligible for most teams. If PR diffs regularly exceed 30,000 characters, enforce a policy in .cursorrules: "Each PR focuses on one feature, diff should not exceed 400 lines."

5 Specialized Review Prompt Templates

Select the template that matches the PR type. Specialized prompts find 3x more real issues than generic ones.

Template 1: Security Review (required for auth, payment, user data PRs)

Security Review Prompt

As an application security expert, review the following code for security risks:

1. SQL injection: Is user input directly concatenated into SQL strings? Does the ORM use parameterized queries?
2. XSS: Is user-provided content inserted into the DOM without escaping? Is innerHTML / dangerouslySetInnerHTML safe?
3. Hardcoded secrets: Are there API keys, passwords, tokens, or private keys in the code?
4. Insecure random: Is Math.random() used to generate security tokens? Should use crypto.randomBytes()
5. Path traversal: Are file operations restricted against user-controlled paths?
6. Authorization: Does every API endpoint verify the caller's identity and permissions? Can user A access user B's data?

@[relevant files]

For each issue: file and line number, attack vector, fix with corrected code

Template 2: Database Performance Review (required for DB-intensive PRs)

DB Performance Review Prompt

As a database performance expert, review the database operations in this code:

1. N+1 queries: Are there database queries inside loops? (e.g., findById inside a for loop)
2. Missing indexes: Do the fields in WHERE conditions have indexes? Do ORDER BY fields have indexes?
3. SELECT *: Is SELECT * used when only specific fields are needed?
4. Unbounded result sets: Are there queries that could return tens of thousands of rows without LIMIT?
5. Oversized transactions: Are operations that don't need to be atomic all packed into one transaction?
6. Parallelizable serial queries: Are there independent queries executed sequentially that could use Promise.all?

@[files containing DB operations]

For each issue, estimate the performance impact at 1M rows and provide an optimized version

Template 3: Error Handling Review

Error Handling Review Prompt

Review the error handling completeness in this code:

1. Swallowed exceptions: catch blocks with only console.log and no handling or re-throw?
2. Missing timeouts: external API calls or DB queries without timeout configuration?
3. Null dereference: accessing properties on values that could be null/undefined without checks?
4. User-visible error messages: when operations fail, do users get meaningful error responses? (not just 500)
5. Partial failure handling: in batch operations (e.g., sending 100 emails), does one failure cause everything to fail?

@[relevant files]

For each issue: file and line number, plus a corrected code snippet

Template 4: Test Coverage Review

Test Coverage Review Prompt

Review the test completeness of this PR:

1. Does the new business logic have corresponding unit tests?
2. Is the happy path covered?
3. Are edge cases tested? (empty input, oversized input, negative numbers, empty arrays)
4. Are error paths tested? (third-party API failure, DB error, insufficient permissions)
5. Do the tests actually verify behavior? Or do they just mock all dependencies and assert the mocks were called?
   (The latter is an "illusion of testing" — it provides no real protection)

@[diff or test files]

For each important code path lacking tests, provide example test case code

Template 5: API Design Review (for new or modified public API endpoints)

API Design Review Prompt

Review this change from an API design perspective:

1. RESTful conventions: Are HTTP verbs used correctly? (GET for reads, POST/PUT/PATCH/DELETE for mutations)
2. Status codes: Do error scenarios return the correct HTTP status codes?
   (404 vs 403 vs 400 vs 422 vs 500 — know the difference)
3. Idempotency: Are PUT/PATCH endpoints idempotent? Is calling them twice safe?
4. Backward compatibility: If modifying an existing endpoint, does it break existing callers?
5. Error response format: Is there a consistent error response shape? { error: string, code: string }?
6. Pagination: Do list endpoints have pagination? Do they return total and hasMore?

@[relevant route files]

What Humans Must Still Cover

In the following scenarios, AI review conclusions cannot be trusted — human review is mandatory:

Business logic correctness. Code can be technically flawless but functionally wrong — only someone who knows the business can judge this.
Architecture consistency. AI sees the diff, not the whole system. A locally reasonable design decision may contradict the system's broader architectural principles.
Team tacit knowledge. "Why we don't use Redis for this," "why this module was designed that way" — these historical decisions live only in people's memories.
Product trade-offs. "Is this feature worth this complexity?" That's a product-engineering judgment call, not a question of technical correctness.

Division of labor: AI owns mechanical checks (security, performance, standards, common bugs). Humans own business correctness and architectural soundness. This keeps human review time focused on the decisions that actually require human judgment.

Chapter Key Takeaways

Takeaway	Core Principle
1. Clear positioning	AI Review is a first-pass filter, not a final verdict. Business logic and architecture require human review
2. Zero-config local option	`git diff HEAD \| claude -p "..."` — no setup needed, results in 2 minutes before committing
3. Automated pipeline	GitHub Actions + Claude API triggers on every PR; use Haiku model to keep cost under $0.01 per review
4. Specialized templates win	5 specialized templates (security / DB performance / error handling / test coverage / API design) find 3x more real issues than generic prompts
5. Control diff size	PR diffs over 400 lines exceed token limits and degrade review quality. Enforce "one PR, one feature" as a team rule

Rate this chapter

4.6 / 5 (24 ratings)