← 返回 Skills 市场
youfeijun123

Regulation Extractor

作者 youfeijun123 · GitHub ↗ · v3.0.0 · MIT-0
cross-platform ⚠ suspicious
90
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install regulation-extractor
功能描述
从建筑工程规范PDF中结构化提取条文并同步到飞书多维表格。支持PDF双文字层(原文+OCR)去重、纯图片PDF的RapidOCR识别、条文编号切分(含带空格编号如6. 1. 2. 3)、带圈数字转换(如6.4.④→6.4.4)、OCR错误检测、质量标记、文本清洗(去换行/页眉/符号表/中英文粘连/过长切分)。输出...
安全使用建议
Before installing or running this skill: 1) Inspect and edit quality_check.py to remove or change the hard-coded Path to a directory you control (it currently points to a developer's D: directory). Do not run that script until you confirm the path or provide a safe working directory. 2) When using sync_to_bitable.py, prefer the --dry_run option first to preview changes; create a least-privileged Feishu app/service account for writes and rotate credentials after use. 3) Install and run dependencies (PyMuPDF, rapidocr-onnxruntime) in an isolated environment (virtualenv or container) to avoid system-wide changes. 4) Review any JSON outputs before syncing (they contain extracted regulatory text). 5) If you need to run these scripts on a multi-tenant or sensitive host, run them in an isolated VM/container and verify network access rules so that only the intended Feishu endpoint is reachable. 6) Overall risk: functionality appears legitimate, but the hard-coded path and undocumented credential handling are concrete issues to fix before trusting the package.
功能分析
Type: OpenClaw Skill Name: regulation-extractor Version: 3.0.0 The regulation-extractor skill bundle is a legitimate tool designed to extract, clean, and synchronize construction regulation data from PDFs to Feishu (Lark) Bitable. The scripts (extract_regulation.py, ocr_batch.py, sync_to_bitable.py) perform standard PDF parsing, OCR via RapidOCR, and API interactions with the official Feishu endpoint (open.feishu.cn). No evidence of data exfiltration, malicious command execution, or harmful prompt injection was found; the code logic is transparent and strictly aligned with the stated purpose of document processing and data management.
能力评估
Purpose & Capability
Name/description align with the included scripts: extract_regulation.py, ocr_batch.py, deep_clean.py, clean_json.py, quality_check.py, and sync_to_bitable.py implement PDF text extraction, offline RapidOCR, cleaning, quality checks, and Feishu (飞书) sync. External network use is limited to Feishu API for its intended purpose.
Instruction Scope
SKILL.md instructs running each script with user-specified paths, but scripts are not fully consistent with that expectation: quality_check.py ignores CLI input and uses a hard-coded Windows path (output_dir = Path(r"D:\有斐家\小一\常用规范处理成果")), which could read arbitrary JSON files on the host if executed. Other scripts read PDFs and write JSON (expected). The sync script performs network writes to Feishu only when given credentials/IDs.
Install Mechanism
No automated install spec included (instruction-only), but SKILL.md lists pip deps (PyMuPDF, rapidocr-onnxruntime). That requires installing Python packages manually in the runtime; this is moderate risk but typical. There is no download from untrusted URLs or archive extraction in the skill bundle itself.
Credentials
Feishu credentials (app_id, app_secret, app_token, table_id) are required only for the sync_to_bitable step and are proportional to the stated purpose. However the skill metadata did not declare required credentials or env vars; credentials are passed as CLI args. The hard-coded path in quality_check.py can access local files unexpectedly, which is disproportionate to the stated single-file quality-check invocation.
Persistence & Privilege
The skill does not request persistent installation privileges (always=false), does not modify other skills or system-wide configs, and will not autonomously exfiltrate data except when the user runs the sync script with Feishu credentials. No evidence of attempts to persist credentials or enable background network activity.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install regulation-extractor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /regulation-extractor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v3.0.0
从建筑工程规范PDF中结构化提取条文。支持文字层+OCR双模式、5步清洗Pipeline、过长切分、符号表过滤、飞书同步。实测21个PDF、5865条条文、96.8%干净率。
元数据
Slug regulation-extractor
版本 3.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Regulation Extractor 是什么?

从建筑工程规范PDF中结构化提取条文并同步到飞书多维表格。支持PDF双文字层(原文+OCR)去重、纯图片PDF的RapidOCR识别、条文编号切分(含带空格编号如6. 1. 2. 3)、带圈数字转换(如6.4.④→6.4.4)、OCR错误检测、质量标记、文本清洗(去换行/页眉/符号表/中英文粘连/过长切分)。输出... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 90 次。

如何安装 Regulation Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install regulation-extractor」即可一键安装,无需额外配置。

Regulation Extractor 是免费的吗?

是的,Regulation Extractor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Regulation Extractor 支持哪些平台?

Regulation Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Regulation Extractor?

由 youfeijun123(@youfeijun123)开发并维护,当前版本 v3.0.0。

💬 留言讨论