← 返回 Skills 市场
russell-yu

Contract Diff

作者 russell-yu · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
87
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install contract-diff
功能描述
Compare contract templates with scanned stamped contracts, list all differences (additions, deletions, modifications). Output as Word document for easy downl...
使用说明 (SKILL.md)

contract-diff

Compare contract templates (Word/PDF) with scanned stamped contracts (PDF/images), list ALL differences, and generate a highlighted visualization showing where changes are.

When to Use

  • User uploads a contract template AND a scanned signed contract
  • User wants to know EVERY difference between template and signed version
  • User needs detailed report showing additions, deletions, and modifications
  • User needs visual highlighting of modified areas in the scanned contract

Workflow

Step 1: Extract Text from Both Files

For contract template (.docx): Use python-docx to extract all text.

For contract template (.pdf): Use PyMuPDF (fitz) to extract text.

For scanned contract (PDF or image): Use OCR with pytesseract to extract text with bounding boxes.

Step 2: Detailed Comparison

Split text into sentences/paragraphs and categorize:

  1. Only in template - Content that was deleted
  2. Only in scanned - Content that was added
  3. Similar but different - Modified content (with similarity ratio)

Using difflib.SequenceMatcher with threshold:

  • 85% similarity: treated as same

  • 50-85% similarity: marked as modified
  • \x3C 50% similarity: marked as added/deleted

Step 3: Generate Highlighted Image

For modified content:

  • Find text position in OCR results
  • Draw colored highlight box:
    • 🟡 Yellow = Modified content

Step 4: Generate Detailed Report

Output format:

# 合同比对详细报告

## 📋 文件信息
- **模板文件**: [filename]
- **盖章合同**: [filename]

## 📊 比对结果总览
- **风险等级**: 🟢低/🟡中/🔴高
- 🔴 删除内容: X 处
- 🟢 新增内容: X 处
- 🟡 修改内容: X 处

## 🔴 删除内容(模板 → 盖章合同)
1. [content...]
2. [content...]

## 🟢 新增内容(模板 → 盖章合同)
1. [content...]
2. [content...]

## 🟡 修改内容对比
| 模板内容 | 扫描件内容 | 相似度 |
|----------|------------|--------|
| ... | ... | 0.xx |

---
*⚠️ 注:比对结果基于 OCR 文字识别,可能存在误差。*

Usage

# 安装依赖
pip install python-docx PyMuPDF pillow pytesseract

# 运行比对(输出 Word 文档)
python scripts/compare.py contract_template.docx signed_contract.pdf

# 指定输出文件
python scripts/compare.py template.pdf scan.pdf -o report.docx

Dependencies

Required Python packages:

  • python-docx - for .docx files
  • PyMuPDF (fitz) - for PDF text extraction
  • Pillow - image processing
  • pytesseract - OCR
  • Tesseract-OCR binary (system-level installation required)

Important Notes

  1. OCR 准确性: 扫描件 OCR 可能存在误差,特别是手写或模糊文字
  2. 高亮精度: 高亮依赖于 OCR 识别的坐标,可能有轻微偏移
  3. 详细比对: 新版算法会列出所有差异,包括新增、删除、修改
  4. 脱敏处理: 敏感信息用 *** 代替

Output Files

文件 说明
report.docx Word 文档格式的详细比对报告(含所有差异,可直接下载)
highlighted.png 带高亮标注的图片(可选)

Windows Setup

  1. Install Python 3.12+
  2. Install Tesseract OCR: winget install tesseract-ocr.tesseract
  3. Install Python packages:
    pip install python-docx PyMuPDF pillow pytesseract
    

Example

# Compare two contract files, output as Word document
python compare.py "合同模板.docx" "盖章合同.pdf" -o "详细比对报告.docx"

Output includes:

  • All content only in template (deletions)
  • All content only in scanned (additions)
  • All similar but modified content with similarity scores
安全使用建议
The skill appears to implement the advertised functionality (OCR + diff + Word report) but has several things you should consider before running it: - Inspect and/or remove list_files.py or any hard-coded paths. list_files.py references an absolute Windows path (C:\Users\yangy\...) and copies files: it can overwrite files if run in your environment. - The compare script will attempt to auto-install Python packages using pip (os.system('pip install ...')). If you run it, it will perform network installs from PyPI. Run in a controlled environment (virtualenv/container) or manually install the listed dependencies instead. - The SKILL.md claims sensitive-data redaction ("脱敏处理"), but the included scripts do not perform automated redaction; reports in the package include full contract text. Do not use this on real sensitive contracts until you confirm/implement redaction. - The scripts require the Tesseract OCR binary; install it from an official source and verify PATH configuration. - Because the skill writes files and can install packages, run it in a sandbox or isolated environment and back up any data you care about first. If you want to proceed safely: review/clean the code (remove or fix list_files.py), pre-install dependencies in an isolated venv, validate that reports redact sensitive fields if needed, and test on non-sensitive sample documents. If you want me to, I can point to exact lines to change/remove or produce a safer invocation plan (commands to run in a virtualenv).
功能分析
Type: OpenClaw Skill Name: contract-diff Version: 1.0.0 The skill bundle provides a legitimate contract comparison utility but contains high-risk coding practices. Specifically, scripts/compare.py uses os.system() to automatically install Python packages, which is a potential shell injection vector and an unsafe method for dependency management. The code also contains hardcoded absolute Windows file paths (e.g., C:\Users\yangy\...) and system-level binary paths in list_files.py and scripts/compare.py, indicating poor security hygiene and potential environment-specific vulnerabilities, though no clear evidence of intentional malice or data exfiltration was detected.
能力评估
Purpose & Capability
Name/description ask for template vs scanned-contract comparison with OCR and a Word report; the included scripts use python-docx, PyMuPDF, Pillow, pytesseract and difflib which are exactly the tools you would expect for this task.
Instruction Scope
SKILL.md stays on-purpose (text extraction, OCR, diff, highlighted images, Word report). However there are inconsistencies: list_files.py contains a hard-coded absolute path (C:\Users\yangy\.openclaw\workspace\contract-diff\input) and performs shutil.copy to 'template.docx'/'scanned.pdf' (can overwrite files). SKILL.md states '脱敏处理: 敏感信息用 *** 代替' (redaction), but I found no implementation of systematic redaction in the scripts — reports in the output folder contain full contract text. Also the scripts run pip installs at runtime (see install_mechanism), which expands the runtime scope beyond what the SKILL.md describes.
Install Mechanism
No formal install spec is provided, but compare.py includes a try_import helper that calls os.system('pip install <pkg> -q') to install missing Python packages at runtime. That means installing packages from PyPI when the script runs (network activity, arbitrary package install side-effects). This is riskier than an instruction-only skill that expects preinstalled dependencies. The script does not download code from arbitrary URLs, but auto-installing packages without user confirmation is a notable concern.
Credentials
The skill declares no required environment variables or credentials, which is proportional. It does attempt to set pytesseract.pytesseract.tesseract_cmd to a Windows path if present (TESSERACT_PATH = 'C:\Program Files\Tesseract-OCR\tesseract.exe') — that is reasonable but platform-specific. It also requires a system-level Tesseract binary (documented in SKILL.md). No secrets or unrelated credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not request elevated privileges. It does write/copy files in an 'input' directory (and could overwrite files via shutil.copy in list_files.py). It does not modify other skills or system-wide configurations. Running the scripts will modify local files (create report.docx, highlighted images, and the script's own copied files).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install contract-diff
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /contract-diff 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
contract-diff v1.0.0 - Initial release of contract-diff tool. - Compares contract templates (Word/PDF) with scanned stamped contracts (PDF/images). - Detects and lists all differences: additions, deletions, and modifications. - Generates a detailed Word report and (optionally) a highlighted image showing modified areas. - Supports extraction via OCR, and provides summary plus detailed, side-by-side difference listings.
元数据
Slug contract-diff
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Contract Diff 是什么?

Compare contract templates with scanned stamped contracts, list all differences (additions, deletions, modifications). Output as Word document for easy downl... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 87 次。

如何安装 Contract Diff?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install contract-diff」即可一键安装,无需额外配置。

Contract Diff 是免费的吗?

是的,Contract Diff 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Contract Diff 支持哪些平台?

Contract Diff 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Contract Diff?

由 russell-yu(@russell-yu)开发并维护,当前版本 v1.0.0。

💬 留言讨论