第 70 章

案例:代码审查与自动修复 Agent

第七十章:案例:代码审查与自动修复 Agent

章节导语

代码审查是软件工程中最耗费工程师时间与注意力的环节之一。统计数据显示,中等规模团队每周平均花费 15-20% 的研发时间在代码审查上,而其中相当比例的评审意见是重复性、规范性的问题——命名不规范、未处理异常、缺少单元测试、SQL 注入风险……这些问题本应由工具自动发现并修复。本章将构建一个完整的 Hermes 代码审查与自动修复 Agent,接入 GitHub Actions CI/CD 流水线,让每一个 Pull Request 都能在人工审查前获得一轮 AI 深度审计,大幅降低审查成本,提升代码库整体健康度。


70.1 需求分析:CI/CD 自动代码审查的痛点

当前代码审查的典型问题

传统 PR 审查流程:
开发者提交 PR
    ↓
等待 Reviewer 有空(0.5~2 天)
    ↓
Reviewer 逐行阅读(30~90 分钟/PR)
    ↓
留下评论(50% 是重复性规范问题)
    ↓
开发者修改 → 再次等待 → 循环...

核心痛点汇总:

痛点 影响 严重程度
审查等待时间长 阻塞功能交付
重复性规范问题占用 Reviewer 精力 降低审查质量
安全漏洞可能被遗漏 生产事故风险 极高
没有统一的质量基线 代码库质量参差不齐
Reviewer 主观差异大 团队标准难以统一

Agent 的目标能力

我们希望这个 Agent 能够:

  1. 多语言支持:Python、JavaScript/TypeScript、Go
  2. 多维度审查:安全性、性能、可读性、规范性
  3. 自动修复:对明确可修复的问题直接生成 patch
  4. PR 集成:以 GitHub PR Review 评论形式输出
  5. 可配置规则:团队可自定义审查标准

70.2 系统架构

整体架构图

┌─────────────────────────────────────────────────────────┐
│                   GitHub Repository                      │
│                                                         │
│  开发者 Push → Pull Request 创建/更新                    │
└────────────────────────┬────────────────────────────────┘
                         │ webhook / GitHub Actions trigger
                         ▼
┌─────────────────────────────────────────────────────────┐
│              GitHub Actions CI Runner                    │
│                                                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │          Code Review Agent Entrypoint            │  │
│  │                                                  │  │
│  │  1. 获取 PR diff (GitHub API)                   │  │
│  │  2. 解析变更文件列表                             │  │
│  │  3. 调用 Hermes Agent 分析                       │  │
│  │  4. 格式化结果并发布 Review                      │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   Hermes Agent Core                      │
│                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐ │
│  │  静态分析   │  │  语义理解   │  │   修复生成      │ │
│  │  工具集     │  │  (LLM Core) │  │   工具集        │ │
│  │             │  │             │  │                 │ │
│  │ - AST解析   │  │ - 安全审查  │  │ - diff生成      │ │
│  │ - Linter    │  │ - 逻辑审查  │  │ - patch应用     │ │
│  │ - 依赖检查  │  │ - 最佳实践  │  │ - 建议生成      │ │
│  └─────────────┘  └─────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│               GitHub PR Review Output                    │
│                                                         │
│  • 行级评论(问题定位)                                  │
│  • 修复建议(代码块)                                    │
│  • 总体评分(APPROVE / REQUEST_CHANGES / COMMENT)      │
│  • 自动提交修复 Commit(可选)                          │
└─────────────────────────────────────────────────────────┘

工具集设计

工具名 功能 输入 输出
get_pr_diff 获取 PR 变更内容 PR number diff 文本
parse_code_ast AST 解析 代码内容+语言 AST 树
run_static_analysis 静态分析 文件路径+语言 问题列表
check_security_patterns 安全模式检查 代码片段 漏洞报告
generate_fix_patch 生成修复补丁 问题描述+代码 unified diff
post_pr_review 发布 PR 评论 评论数据 GitHub API 响应
search_similar_issues 搜索相似问题 问题描述 历史案例

70.3 完整实现代码

项目结构

code-review-agent/
├── agent/
│   ├── __init__.py
│   ├── hermes_agent.py      # Hermes Agent 主体
│   ├── tools/
│   │   ├── github_tools.py  # GitHub API 工具
│   │   ├── analysis_tools.py # 代码分析工具
│   │   └── fix_tools.py     # 修复生成工具
│   └── prompts/
│       ├── system_prompt.py
│       └── review_templates.py
├── config/
│   └── review_rules.yaml    # 自定义审查规则
├── .github/
│   └── workflows/
│       └── code_review.yml  # GitHub Actions
└── main.py                  # 入口

核心 Agent 实现

# agent/hermes_agent.py
import os
import json
from typing import Optional
from openai import OpenAI
from .tools import github_tools, analysis_tools, fix_tools

# Hermes 使用 OpenAI 兼容接口
client = OpenAI(
    base_url=os.getenv("HERMES_BASE_URL", "http://localhost:11434/v1"),
    api_key=os.getenv("HERMES_API_KEY", "ollama"),
)

MODEL = os.getenv("HERMES_MODEL", "nous-hermes-2-mixtral-8x7b-dpo")

SYSTEM_PROMPT = """你是一位资深软件工程师,专注于代码质量和安全审查。
你的任务是对 Pull Request 的代码变更进行全面审查,找出潜在问题并提供修复建议。

审查维度:
1. 安全性:SQL注入、XSS、不安全的反序列化、硬编码密钥等
2. 性能:N+1查询、不必要的循环、内存泄漏、阻塞IO等
3. 可维护性:命名规范、函数复杂度、重复代码、缺少注释等
4. 健壮性:异常处理、边界条件、空值处理等
5. 测试覆盖:关键路径是否有测试

对于每个问题,请提供:
- 问题位置(文件名 + 行号)
- 问题严重程度(critical/major/minor/suggestion)
- 清晰的问题描述
- 具体的修复代码

使用工具时要有条理地逐步分析,不要遗漏重要文件。"""

# 工具定义
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_pr_diff",
            "description": "获取 Pull Request 的代码变更 diff",
            "parameters": {
                "type": "object",
                "properties": {
                    "pr_number": {"type": "integer", "description": "PR 编号"},
                    "file_filter": {
                        "type": "string",
                        "description": "文件过滤器,如 '*.py' 或 '*.js'",
                        "default": "*"
                    }
                },
                "required": ["pr_number"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_static_analysis",
            "description": "对指定文件运行静态代码分析",
            "parameters": {
                "type": "object",
                "properties": {
                    "file_content": {"type": "string", "description": "文件内容"},
                    "language": {
                        "type": "string",
                        "enum": ["python", "javascript", "typescript", "go"],
                        "description": "编程语言"
                    },
                    "filename": {"type": "string", "description": "文件名"}
                },
                "required": ["file_content", "language", "filename"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "check_security_patterns",
            "description": "检查代码中的安全漏洞模式",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "要检查的代码"},
                    "language": {"type": "string", "description": "编程语言"}
                },
                "required": ["code", "language"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "generate_fix_patch",
            "description": "为发现的问题生成修复代码补丁",
            "parameters": {
                "type": "object",
                "properties": {
                    "original_code": {"type": "string", "description": "原始代码"},
                    "issue_description": {"type": "string", "description": "问题描述"},
                    "language": {"type": "string", "description": "编程语言"}
                },
                "required": ["original_code", "issue_description", "language"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "post_pr_review",
            "description": "将审查结果发布为 GitHub PR Review",
            "parameters": {
                "type": "object",
                "properties": {
                    "pr_number": {"type": "integer"},
                    "review_body": {"type": "string", "description": "总体评审摘要"},
                    "comments": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "path": {"type": "string"},
                                "line": {"type": "integer"},
                                "body": {"type": "string"},
                                "severity": {"type": "string"}
                            }
                        }
                    },
                    "action": {
                        "type": "string",
                        "enum": ["APPROVE", "REQUEST_CHANGES", "COMMENT"]
                    }
                },
                "required": ["pr_number", "review_body", "comments", "action"]
            }
        }
    }
]


def dispatch_tool(tool_name: str, tool_args: dict) -> str:
    """工具调度器"""
    tool_map = {
        "get_pr_diff": github_tools.get_pr_diff,
        "run_static_analysis": analysis_tools.run_static_analysis,
        "check_security_patterns": analysis_tools.check_security_patterns,
        "generate_fix_patch": fix_tools.generate_fix_patch,
        "post_pr_review": github_tools.post_pr_review,
    }
    if tool_name not in tool_map:
        return json.dumps({"error": f"未知工具: {tool_name}"})
    try:
        result = tool_map[tool_name](**tool_args)
        return json.dumps(result, ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)})


def run_code_review_agent(pr_number: int, repo: str) -> dict:
    """运行代码审查 Agent"""
    print(f"[Agent] 开始审查 PR #{pr_number} in {repo}")
    
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": f"""请对 GitHub 仓库 {repo} 的 PR #{pr_number} 进行全面代码审查。

步骤:
1. 获取 PR 的代码变更
2. 对每个变更文件进行静态分析和安全检查
3. 总结发现的问题,按严重程度分类
4. 对可修复的问题生成修复建议
5. 发布 PR Review 评论

请开始审查。"""
        }
    ]
    
    # Agentic Loop
    max_iterations = 20
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        print(f"[Agent] 第 {iteration} 轮推理...")
        
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=TOOLS,
            tool_choice="auto",
            temperature=0.1,  # 代码审查需要低温度保证一致性
        )
        
        message = response.choices[0].message
        messages.append(message)
        
        # 没有工具调用,任务完成
        if not message.tool_calls:
            print(f"[Agent] 审查完成")
            return {
                "status": "completed",
                "summary": message.content,
                "iterations": iteration
            }
        
        # 执行工具调用
        for tool_call in message.tool_calls:
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            print(f"[Agent] 调用工具: {tool_name}({tool_args})")
            
            result = dispatch_tool(tool_name, tool_args)
            
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })
    
    return {"status": "max_iterations_reached", "iterations": iteration}

GitHub API 工具实现

# agent/tools/github_tools.py
import os
import re
import requests
from typing import Optional

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
REPO = os.getenv("GITHUB_REPOSITORY")  # owner/repo 格式

def _headers():
    return {
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3+json",
        "X-GitHub-Api-Version": "2022-11-28"
    }

def get_pr_diff(pr_number: int, file_filter: str = "*") -> dict:
    """获取 PR diff 内容"""
    url = f"https://api.github.com/repos/{REPO}/pulls/{pr_number}/files"
    resp = requests.get(url, headers=_headers())
    resp.raise_for_status()
    
    files = resp.json()
    result = {"files": [], "total_changes": 0}
    
    for f in files:
        filename = f["filename"]
        # 文件过滤
        if file_filter != "*":
            ext = file_filter.lstrip("*")
            if not filename.endswith(ext):
                continue
        
        # 跳过删除的文件
        if f["status"] == "removed":
            continue
            
        result["files"].append({
            "filename": filename,
            "status": f["status"],
            "additions": f["additions"],
            "deletions": f["deletions"],
            "patch": f.get("patch", ""),
            "raw_url": f.get("raw_url", "")
        })
        result["total_changes"] += f["additions"] + f["deletions"]
    
    return result


def post_pr_review(
    pr_number: int,
    review_body: str,
    comments: list,
    action: str = "COMMENT"
) -> dict:
    """发布 PR 审查评论"""
    # 获取最新 commit SHA
    pr_url = f"https://api.github.com/repos/{REPO}/pulls/{pr_number}"
    pr_resp = requests.get(pr_url, headers=_headers())
    commit_id = pr_resp.json()["head"]["sha"]
    
    # 构建审查请求
    review_payload = {
        "commit_id": commit_id,
        "body": review_body,
        "event": action,
        "comments": [
            {
                "path": c["path"],
                "line": c["line"],
                "body": _format_comment(c)
            }
            for c in comments
            if c.get("line")  # 只处理有行号的评论
        ]
    }
    
    url = f"https://api.github.com/repos/{REPO}/pulls/{pr_number}/reviews"
    resp = requests.post(url, json=review_payload, headers=_headers())
    resp.raise_for_status()
    return {"success": True, "review_id": resp.json()["id"]}


def _format_comment(comment: dict) -> str:
    """格式化单条评论"""
    severity_emoji = {
        "critical": "🔴",
        "major": "🟠",
        "minor": "🟡",
        "suggestion": "💡"
    }
    emoji = severity_emoji.get(comment.get("severity", "minor"), "💬")
    body = f"{emoji} **[{comment.get('severity', 'comment').upper()}]** {comment['body']}"
    
    if comment.get("fix_code"):
        body += f"\n\n**建议修复:**\n```\n{comment['fix_code']}\n```"
    
    return body

静态分析工具

# agent/tools/analysis_tools.py
import re
import subprocess
import tempfile
import os
from typing import List, Dict

# 安全漏洞模式库
SECURITY_PATTERNS = {
    "python": [
        {
            "pattern": r"eval\s*\(",
            "name": "危险的 eval() 调用",
            "severity": "critical",
            "description": "eval() 可执行任意代码,存在代码注入风险"
        },
        {
            "pattern": r"exec\s*\(",
            "name": "危险的 exec() 调用",
            "severity": "critical",
            "description": "exec() 可执行任意代码,存在代码注入风险"
        },
        {
            "pattern": r'f".*SELECT.*{',
            "name": "SQL 注入风险",
            "severity": "critical",
            "description": "使用 f-string 拼接 SQL 语句,存在注入风险,应使用参数化查询"
        },
        {
            "pattern": r"pickle\.loads?\(",
            "name": "不安全的反序列化",
            "severity": "major",
            "description": "pickle 反序列化不受信任数据可导致 RCE"
        },
        {
            "pattern": r'(password|secret|api_key|token)\s*=\s*["\'][^"\']+["\']',
            "name": "硬编码凭证",
            "severity": "critical",
            "description": "密钥不应硬编码在源代码中"
        },
        {
            "pattern": r"shell=True",
            "name": "Shell 注入风险",
            "severity": "major",
            "description": "subprocess 使用 shell=True 且参数包含用户输入时存在注入风险"
        },
    ],
    "javascript": [
        {
            "pattern": r"eval\s*\(",
            "name": "危险的 eval() 调用",
            "severity": "critical",
            "description": "eval() 存在 XSS 和代码注入风险"
        },
        {
            "pattern": r"innerHTML\s*=",
            "name": "XSS 风险",
            "severity": "major",
            "description": "直接设置 innerHTML 可能导致 XSS,应使用 textContent 或 DOMPurify"
        },
        {
            "pattern": r"document\.write\(",
            "name": "不安全的 document.write",
            "severity": "major",
            "description": "document.write 可能导致 XSS 攻击"
        },
    ],
    "go": [
        {
            "pattern": r'fmt\.Sprintf.*".*SELECT',
            "name": "SQL 注入风险",
            "severity": "critical",
            "description": "使用格式化字符串构建 SQL,应使用参数化查询"
        },
        {
            "pattern": r"os\.Exec\(",
            "name": "命令注入风险",
            "severity": "major",
            "description": "执行外部命令时应验证输入"
        },
    ]
}


def check_security_patterns(code: str, language: str) -> dict:
    """检查代码安全模式"""
    patterns = SECURITY_PATTERNS.get(language, [])
    issues = []
    
    lines = code.split("\n")
    for i, line in enumerate(lines, 1):
        for pattern_def in patterns:
            if re.search(pattern_def["pattern"], line, re.IGNORECASE):
                issues.append({
                    "line": i,
                    "line_content": line.strip(),
                    "issue_name": pattern_def["name"],
                    "severity": pattern_def["severity"],
                    "description": pattern_def["description"]
                })
    
    return {
        "language": language,
        "total_issues": len(issues),
        "issues": issues
    }


def run_static_analysis(
    file_content: str, language: str, filename: str
) -> dict:
    """运行静态分析(结合 linter)"""
    results = {"filename": filename, "language": language, "issues": []}
    
    if language == "python":
        results["issues"].extend(_run_python_analysis(file_content, filename))
    elif language in ("javascript", "typescript"):
        results["issues"].extend(_run_js_analysis(file_content, filename))
    elif language == "go":
        results["issues"].extend(_run_go_analysis(file_content, filename))
    
    # 加入安全检查结果
    sec_result = check_security_patterns(file_content, language)
    for issue in sec_result["issues"]:
        results["issues"].append({
            "line": issue["line"],
            "severity": issue["severity"],
            "message": f"[安全] {issue['issue_name']}: {issue['description']}",
            "rule": "security"
        })
    
    return results


def _run_python_analysis(content: str, filename: str) -> List[Dict]:
    """Python 静态分析(使用 pylint / flake8)"""
    issues = []
    with tempfile.NamedTemporaryFile(
        mode="w", suffix=".py", delete=False, encoding="utf-8"
    ) as f:
        f.write(content)
        tmp_path = f.name
    
    try:
        # 运行 flake8
        result = subprocess.run(
            ["flake8", "--max-line-length=100", "--format=%(row)d:%(col)d:%(code)s:%(text)s", tmp_path],
            capture_output=True, text=True, timeout=30
        )
        for line in result.stdout.strip().split("\n"):
            if not line:
                continue
            parts = line.split(":", 3)
            if len(parts) >= 4:
                issues.append({
                    "line": int(parts[0]),
                    "col": int(parts[1]),
                    "rule": parts[2],
                    "message": parts[3].strip(),
                    "severity": "major" if parts[2].startswith("E") else "minor"
                })
    except (subprocess.TimeoutExpired, FileNotFoundError):
        pass
    finally:
        os.unlink(tmp_path)
    
    return issues


def _run_js_analysis(content: str, filename: str) -> List[Dict]:
    """JavaScript/TypeScript 静态分析"""
    # 简化版:仅做基础模式检查
    issues = []
    lines = content.split("\n")
    for i, line in enumerate(lines, 1):
        # 检查 console.log(生产代码中不应出现)
        if re.search(r"console\.(log|debug|info)\(", line):
            issues.append({
                "line": i,
                "severity": "minor",
                "message": "发现 console.log,请在生产代码中移除调试输出",
                "rule": "no-console"
            })
        # 检查 var 声明(应使用 let/const)
        if re.search(r"^\s*var\s+", line):
            issues.append({
                "line": i,
                "severity": "minor",
                "message": "使用 var 声明,建议改为 let 或 const",
                "rule": "no-var"
            })
    return issues


def _run_go_analysis(content: str, filename: str) -> List[Dict]:
    """Go 静态分析"""
    issues = []
    lines = content.split("\n")
    for i, line in enumerate(lines, 1):
        # 检查错误忽略
        if re.search(r"_\s*,\s*err\s*:?=", line) or re.search(r"_\s*=\s*\w+\(", line):
            pass  # 正常的错误忽略场景
        if re.search(r",\s*_\b", line) and "err" in line.lower():
            issues.append({
                "line": i,
                "severity": "major",
                "message": "错误被忽略,应处理 error 返回值",
                "rule": "errcheck"
            })
    return issues

修复生成工具

# agent/tools/fix_tools.py
import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("HERMES_BASE_URL", "http://localhost:11434/v1"),
    api_key=os.getenv("HERMES_API_KEY", "ollama"),
)

def generate_fix_patch(
    original_code: str, 
    issue_description: str, 
    language: str
) -> dict:
    """使用 LLM 生成修复代码"""
    prompt = f"""你是一位专业的 {language} 开发者。
    
以下代码存在问题:
{issue_description}

原始代码:
```{language}
{original_code}

请提供修复后的完整代码。只返回修复后的代码,不要解释。"""

response = client.chat.completions.create(
    model=os.getenv("HERMES_MODEL", "nous-hermes-2-mixtral-8x7b-dpo"),
    messages=[{"role": "user", "content": prompt}],
    temperature=0.1,
    max_tokens=1000
)

fixed_code = response.choices[0].message.content
# 提取代码块内容
import re
code_match = re.search(r"```\w*\n(.*?)```", fixed_code, re.DOTALL)
if code_match:
    fixed_code = code_match.group(1)

return {
    "original": original_code,
    "fixed": fixed_code,
    "issue": issue_description
}

### GitHub Actions 集成

```yaml
# .github/workflows/code_review.yml
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, reopened]
    # 只审查这些文件类型的变更
    paths:
      - '**.py'
      - '**.js'
      - '**.ts'
      - '**.go'

# 防止同一 PR 的多个审查并发运行
concurrency:
  group: code-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  ai-code-review:
    runs-on: ubuntu-latest
    # 只在非 fork 的 PR 上运行(避免密钥泄露)
    if: github.event.pull_request.head.repo.full_name == github.repository
    
    permissions:
      contents: read
      pull-requests: write  # 需要写权限发布 Review
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install openai requests flake8

      - name: Run AI Code Review Agent
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GITHUB_REPOSITORY: ${{ github.repository }}
          HERMES_BASE_URL: ${{ secrets.HERMES_BASE_URL }}
          HERMES_API_KEY: ${{ secrets.HERMES_API_KEY }}
          HERMES_MODEL: ${{ vars.HERMES_MODEL || 'nous-hermes-2-mixtral-8x7b-dpo' }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
        run: |
          python main.py
        timeout-minutes: 10

主入口

# main.py
import os
import sys
from agent.hermes_agent import run_code_review_agent

def main():
    pr_number = int(os.getenv("PR_NUMBER", "0"))
    repo = os.getenv("GITHUB_REPOSITORY", "")
    
    if not pr_number or not repo:
        print("错误:缺少必要的环境变量 PR_NUMBER 或 GITHUB_REPOSITORY")
        sys.exit(1)
    
    result = run_code_review_agent(pr_number, repo)
    print(f"审查完成: {result}")
    
    if result["status"] == "completed":
        sys.exit(0)
    else:
        sys.exit(1)

if __name__ == "__main__":
    main()

70.4 多语言支持配置

语言检测与路由

# agent/tools/language_detector.py
import os
from typing import Optional

LANGUAGE_MAP = {
    ".py": "python",
    ".js": "javascript",
    ".jsx": "javascript",
    ".ts": "typescript",
    ".tsx": "typescript",
    ".go": "go",
    ".java": "java",
    ".rs": "rust",
    ".rb": "ruby",
    ".php": "php",
}

def detect_language(filename: str) -> Optional[str]:
    """根据文件扩展名检测语言"""
    _, ext = os.path.splitext(filename.lower())
    return LANGUAGE_MAP.get(ext)

def is_supported_language(filename: str) -> bool:
    """检查是否是支持的语言"""
    lang = detect_language(filename)
    return lang in ("python", "javascript", "typescript", "go")

自定义审查规则配置

# config/review_rules.yaml
# 可由团队定制的审查规则

general:
  max_function_lines: 50      # 函数最大行数
  max_file_lines: 500         # 文件最大行数
  require_docstrings: true    # 是否要求文档字符串
  min_test_coverage: 80       # 最低测试覆盖率(%)

python:
  style_guide: "pep8"
  max_complexity: 10          # 圈复杂度上限
  forbidden_imports:          # 禁止导入的模块
    - "pickle"
    - "marshal"
  
javascript:
  style_guide: "airbnb"
  allow_var: false
  require_strict_mode: true
  
go:
  require_error_handling: true
  max_goroutine_depth: 3

security:
  block_pr_on_critical: true  # 发现 critical 问题时阻止合并
  block_pr_on_major: false    # 发现 major 问题时是否阻止合并
  require_human_review_patterns:  # 这些模式出现时强制人工审查
    - "authentication"
    - "authorization"
    - "payment"
    - "crypto"
    - "password"

70.5 输出示例:PR Review 评论格式

## 🤖 AI Code Review Report

**审查摘要:**
- 📁 扫描文件:5 个
- ✅ 通过:2 个文件
- ⚠️ 需要关注:3 个文件
- 🔴 Critical 问题:1 个
- 🟠 Major 问题:3 个
- 🟡 Minor 问题:7 个

**总体评分:⭐⭐⭐ (3/5) - 需要修改**

---

### 🔴 Critical Issues(必须修复)

**[`api/user.py` 第 42 行]** SQL 注入漏洞
```python
# ❌ 危险:使用字符串拼接构建 SQL
query = f"SELECT * FROM users WHERE name = '{username}'"
cursor.execute(query)

# ✅ 修复:使用参数化查询
query = "SELECT * FROM users WHERE name = %s"
cursor.execute(query, (username,))

由 Hermes Code Review Agent 自动生成 | 查看配置


---

## 70.6 踩坑记录

### 坑 1:Rate Limit 问题

**问题**:大型 PR(100+ 文件)会在短时间内触发 GitHub API Rate Limit。

**解决方案**:
```python
import time

def get_file_content_with_retry(url: str, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        resp = requests.get(url, headers=_headers())
        if resp.status_code == 200:
            return resp.text
        if resp.status_code == 403:
            # Rate limited:等待后重试
            retry_after = int(resp.headers.get("Retry-After", 60))
            print(f"Rate limited,等待 {retry_after} 秒...")
            time.sleep(retry_after)
        else:
            resp.raise_for_status()
    raise Exception("超过最大重试次数")

坑 2:Agent 输出格式不稳定

问题:Hermes 模型有时不严格按照工具调用格式输出,导致解析失败。

解决方案:添加输出验证和降级处理:

def safe_parse_tool_args(raw_args: str) -> dict:
    """安全解析工具参数,处理格式错误"""
    try:
        return json.loads(raw_args)
    except json.JSONDecodeError:
        # 尝试修复常见的 JSON 格式问题
        # 处理单引号替代双引号的情况
        fixed = raw_args.replace("'", '"')
        try:
            return json.loads(fixed)
        except:
            return {}

坑 3:大文件导致 Token 超限

问题:超过 1000 行的文件直接放入 context 会超出模型 token 限制。

解决方案:仅分析 diff 变更部分,而非完整文件:

def extract_changed_sections(patch: str, context_lines: int = 10) -> str:
    """从 diff patch 中提取变更部分(含上下文)"""
    lines = patch.split("\n")
    changed_sections = []
    
    for i, line in enumerate(lines):
        if line.startswith("+") or line.startswith("-"):
            # 取前后 context_lines 行作为上下文
            start = max(0, i - context_lines)
            end = min(len(lines), i + context_lines)
            section = "\n".join(lines[start:end])
            if section not in changed_sections:
                changed_sections.append(section)
    
    return "\n---\n".join(changed_sections)

坑 4:误报安全问题

问题:正则模式匹配到注释或字符串中的"危险代码",产生大量误报。

解决方案:过滤注释和字符串字面量:

def remove_comments_and_strings(code: str, language: str) -> str:
    """移除注释和字符串后再做模式匹配"""
    if language == "python":
        # 移除 # 注释
        code = re.sub(r'#.*$', '', code, flags=re.MULTILINE)
        # 移除三引号字符串
        code = re.sub(r'""".*?"""', '""', code, flags=re.DOTALL)
        code = re.sub(r"'''.*?'''", "''", code, flags=re.DOTALL)
    return code

本章小结

本章完整构建了一个基于 Hermes Agent 的代码审查与自动修复系统:

这个 Agent 的价值不在于替代人工审查,而在于过滤噪音、聚焦关键——让 Reviewer 的有限注意力集中在架构设计和业务逻辑层面的问题上。

思考题

  1. 如何设计一个评分机制,量化 Agent 审查与人工审查的一致性?
  2. 对于安全敏感的代码(支付、认证),是否应该完全禁止 Agent 自动修复?
  3. 如何利用历史 PR 数据持续改进审查 Agent 的准确率?
  4. 多语言混合项目(如 Python 后端 + TypeScript 前端)如何统一质量基线?
本章评分
4.5  / 5  (3 评分)

💬 留言讨论