← Back to Skills Marketplace
russell-yu

Contract Diff

by russell-yu · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
87
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install contract-diff
Description
Compare contract templates with scanned stamped contracts, list all differences (additions, deletions, modifications). Output as Word document for easy downl...
README (SKILL.md)

contract-diff

Compare contract templates (Word/PDF) with scanned stamped contracts (PDF/images), list ALL differences, and generate a highlighted visualization showing where changes are.

When to Use

  • User uploads a contract template AND a scanned signed contract
  • User wants to know EVERY difference between template and signed version
  • User needs detailed report showing additions, deletions, and modifications
  • User needs visual highlighting of modified areas in the scanned contract

Workflow

Step 1: Extract Text from Both Files

For contract template (.docx): Use python-docx to extract all text.

For contract template (.pdf): Use PyMuPDF (fitz) to extract text.

For scanned contract (PDF or image): Use OCR with pytesseract to extract text with bounding boxes.

Step 2: Detailed Comparison

Split text into sentences/paragraphs and categorize:

  1. Only in template - Content that was deleted
  2. Only in scanned - Content that was added
  3. Similar but different - Modified content (with similarity ratio)

Using difflib.SequenceMatcher with threshold:

  • 85% similarity: treated as same

  • 50-85% similarity: marked as modified
  • \x3C 50% similarity: marked as added/deleted

Step 3: Generate Highlighted Image

For modified content:

  • Find text position in OCR results
  • Draw colored highlight box:
    • 🟡 Yellow = Modified content

Step 4: Generate Detailed Report

Output format:

# 合同比对详细报告

## 📋 文件信息
- **模板文件**: [filename]
- **盖章合同**: [filename]

## 📊 比对结果总览
- **风险等级**: 🟢低/🟡中/🔴高
- 🔴 删除内容: X 处
- 🟢 新增内容: X 处
- 🟡 修改内容: X 处

## 🔴 删除内容(模板 → 盖章合同)
1. [content...]
2. [content...]

## 🟢 新增内容(模板 → 盖章合同)
1. [content...]
2. [content...]

## 🟡 修改内容对比
| 模板内容 | 扫描件内容 | 相似度 |
|----------|------------|--------|
| ... | ... | 0.xx |

---
*⚠️ 注:比对结果基于 OCR 文字识别,可能存在误差。*

Usage

# 安装依赖
pip install python-docx PyMuPDF pillow pytesseract

# 运行比对(输出 Word 文档)
python scripts/compare.py contract_template.docx signed_contract.pdf

# 指定输出文件
python scripts/compare.py template.pdf scan.pdf -o report.docx

Dependencies

Required Python packages:

  • python-docx - for .docx files
  • PyMuPDF (fitz) - for PDF text extraction
  • Pillow - image processing
  • pytesseract - OCR
  • Tesseract-OCR binary (system-level installation required)

Important Notes

  1. OCR 准确性: 扫描件 OCR 可能存在误差,特别是手写或模糊文字
  2. 高亮精度: 高亮依赖于 OCR 识别的坐标,可能有轻微偏移
  3. 详细比对: 新版算法会列出所有差异,包括新增、删除、修改
  4. 脱敏处理: 敏感信息用 *** 代替

Output Files

文件 说明
report.docx Word 文档格式的详细比对报告(含所有差异,可直接下载)
highlighted.png 带高亮标注的图片(可选)

Windows Setup

  1. Install Python 3.12+
  2. Install Tesseract OCR: winget install tesseract-ocr.tesseract
  3. Install Python packages:
    pip install python-docx PyMuPDF pillow pytesseract
    

Example

# Compare two contract files, output as Word document
python compare.py "合同模板.docx" "盖章合同.pdf" -o "详细比对报告.docx"

Output includes:

  • All content only in template (deletions)
  • All content only in scanned (additions)
  • All similar but modified content with similarity scores
Usage Guidance
The skill appears to implement the advertised functionality (OCR + diff + Word report) but has several things you should consider before running it: - Inspect and/or remove list_files.py or any hard-coded paths. list_files.py references an absolute Windows path (C:\Users\yangy\...) and copies files: it can overwrite files if run in your environment. - The compare script will attempt to auto-install Python packages using pip (os.system('pip install ...')). If you run it, it will perform network installs from PyPI. Run in a controlled environment (virtualenv/container) or manually install the listed dependencies instead. - The SKILL.md claims sensitive-data redaction ("脱敏处理"), but the included scripts do not perform automated redaction; reports in the package include full contract text. Do not use this on real sensitive contracts until you confirm/implement redaction. - The scripts require the Tesseract OCR binary; install it from an official source and verify PATH configuration. - Because the skill writes files and can install packages, run it in a sandbox or isolated environment and back up any data you care about first. If you want to proceed safely: review/clean the code (remove or fix list_files.py), pre-install dependencies in an isolated venv, validate that reports redact sensitive fields if needed, and test on non-sensitive sample documents. If you want me to, I can point to exact lines to change/remove or produce a safer invocation plan (commands to run in a virtualenv).
Capability Analysis
Type: OpenClaw Skill Name: contract-diff Version: 1.0.0 The skill bundle provides a legitimate contract comparison utility but contains high-risk coding practices. Specifically, scripts/compare.py uses os.system() to automatically install Python packages, which is a potential shell injection vector and an unsafe method for dependency management. The code also contains hardcoded absolute Windows file paths (e.g., C:\Users\yangy\...) and system-level binary paths in list_files.py and scripts/compare.py, indicating poor security hygiene and potential environment-specific vulnerabilities, though no clear evidence of intentional malice or data exfiltration was detected.
Capability Assessment
Purpose & Capability
Name/description ask for template vs scanned-contract comparison with OCR and a Word report; the included scripts use python-docx, PyMuPDF, Pillow, pytesseract and difflib which are exactly the tools you would expect for this task.
Instruction Scope
SKILL.md stays on-purpose (text extraction, OCR, diff, highlighted images, Word report). However there are inconsistencies: list_files.py contains a hard-coded absolute path (C:\Users\yangy\.openclaw\workspace\contract-diff\input) and performs shutil.copy to 'template.docx'/'scanned.pdf' (can overwrite files). SKILL.md states '脱敏处理: 敏感信息用 *** 代替' (redaction), but I found no implementation of systematic redaction in the scripts — reports in the output folder contain full contract text. Also the scripts run pip installs at runtime (see install_mechanism), which expands the runtime scope beyond what the SKILL.md describes.
Install Mechanism
No formal install spec is provided, but compare.py includes a try_import helper that calls os.system('pip install <pkg> -q') to install missing Python packages at runtime. That means installing packages from PyPI when the script runs (network activity, arbitrary package install side-effects). This is riskier than an instruction-only skill that expects preinstalled dependencies. The script does not download code from arbitrary URLs, but auto-installing packages without user confirmation is a notable concern.
Credentials
The skill declares no required environment variables or credentials, which is proportional. It does attempt to set pytesseract.pytesseract.tesseract_cmd to a Windows path if present (TESSERACT_PATH = 'C:\Program Files\Tesseract-OCR\tesseract.exe') — that is reasonable but platform-specific. It also requires a system-level Tesseract binary (documented in SKILL.md). No secrets or unrelated credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not request elevated privileges. It does write/copy files in an 'input' directory (and could overwrite files via shutil.copy in list_files.py). It does not modify other skills or system-wide configurations. Running the scripts will modify local files (create report.docx, highlighted images, and the script's own copied files).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install contract-diff
  3. After installation, invoke the skill by name or use /contract-diff
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
contract-diff v1.0.0 - Initial release of contract-diff tool. - Compares contract templates (Word/PDF) with scanned stamped contracts (PDF/images). - Detects and lists all differences: additions, deletions, and modifications. - Generates a detailed Word report and (optionally) a highlighted image showing modified areas. - Supports extraction via OCR, and provides summary plus detailed, side-by-side difference listings.
Metadata
Slug contract-diff
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Contract Diff?

Compare contract templates with scanned stamped contracts, list all differences (additions, deletions, modifications). Output as Word document for easy downl... It is an AI Agent Skill for Claude Code / OpenClaw, with 87 downloads so far.

How do I install Contract Diff?

Run "/install contract-diff" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Contract Diff free?

Yes, Contract Diff is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Contract Diff support?

Contract Diff is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Contract Diff?

It is built and maintained by russell-yu (@russell-yu); the current version is v1.0.0.

💬 Comments