pdf-ocr-layout

Name: pdf-ocr-layout
Author: baokui

功能描述

基于智谱 GLM-OCR、GLM-4.7 及 GLM-4.6V 的多模态文档深度解析工具。 Use when: - 需要高精度提取文档（PDF/图片）中的表格并转换为 Markdown 格式 - 需要从文档页面中自动裁剪并提取插图、图表为独立文件 - 需要对提取的图表进行深度语义理解（基于 GLM-4.6V 视觉分析） - 需要对提取的表格数据进行逻辑分析（基于 GLM-4.7 文本分析）核心架构： 1. 视觉提取：GLM-OCR 2. 语义理解：GLM-4.7 (纯文本/表格) + GLM-4.6V (多模态/图像)

安全使用建议

This package appears to implement the advertised OCR + GLM analysis pipeline, but before installing you should: - Verify the source: there is no homepage or repository listed. Prefer code from a known source if you will send sensitive documents. - Expect document data to be transmitted to Zhipu's API (the scripts Base64-encode images and send full page Markdown/context). Do NOT run on private/sensitive documents unless you're comfortable with that external transmission and the API provider's data retention policy. - Fix/confirm dependencies: SKILL.md lists 'zhipuai' but the code imports 'zai' (from zai import ZhipuAiClient). Confirm the correct client package and install it in a controlled environment (virtualenv/container). - Registry metadata mismatch: the manifest claims no required env vars, but the scripts require ZHIPU_API_KEY. Treat ZHIPU_API_KEY as mandatory and do not place sensitive credentials in shared environments. - If you need higher assurance, ask the publisher for: 1) source repository or release page 2) exact Python package name for the Zhipu client and installation instructions 3) confirmation of what data is sent to the API and the provider's retention/privacy terms Given these inconsistencies and the fact that your documents will be sent to an external API, proceed only after clarifying the above or run the skill in an isolated environment with non-sensitive test files.

功能分析

Type: OpenClaw Skill Name: pdf-ocr-layout Version: 1.0.2 The skill is classified as suspicious due to potential prompt injection vulnerabilities against the backend LLMs (GLM-4.7, GLM-4.6V) and potential path traversal vulnerabilities. The `script/glm_understanding.py` directly embeds content (`full_markdown_context`, `detected_title`) derived from the input document into the LLM prompts without sanitization, which could allow a malicious input document to inject instructions to the backend models. Additionally, the scripts perform file system operations using `file_path` and `output_dir` (e.g., in `script/glm_ocr_extract.py`), which, while necessary for functionality, could be exploited for path traversal if the OpenClaw agent is tricked into providing malicious paths. There is no evidence of intentional malicious behavior such as data exfiltration or backdoor installation; the identified issues are vulnerabilities rather than deliberate malice.

能力评估

ℹ Purpose & Capability

The code and SKILL.md implement a PDF/image layout extraction step plus LLM/VLM analysis against Zhipu models (GLM-OCR, GLM-4.7, GLM-4.6V) — this matches the skill's description. However the registry metadata earlier lists no required environment variables or primary credential, while the SKILL.md and code both require a ZHIPU_API_KEY (inconsistency).

⚠ Instruction Scope

The runtime instructions and included scripts load arbitrary input files, encode images/base64 and send the file contents and the page's full Markdown context to the Zhipu API for analysis. That behavior is coherent for this tool but means the user's document contents will be transmitted to an external service; the instructions do not document any privacy/retention or opt-out. Also the SKILL.md instructs users to set ZHIPU_API_KEY but the registry metadata did not declare it.

ℹ Install Mechanism

There is no install spec (instruction-only), so nothing is auto-downloaded — lower install risk. However the code depends on Python packages and a client library: SKILL.md lists 'zhipuai' as a dependency but the code imports 'zai' (from zai import ZhipuAiClient), a mismatch that will break runtime unless clarified. Required Python libs (pillow, beautifulsoup4) are reasonable for OCR/cropping, but the missing/ambiguous client package is a concern.

⚠ Credentials

At runtime the scripts require a single credential env var ZHIPU_API_KEY to call the remote API — that is proportionate to the stated function. The problem is the registry metadata lists 'Required env vars: none' and 'Primary credential: none', which is inconsistent and could mislead users about what secrets are needed. No other unrelated credentials are requested.

✓ Persistence & Privilege

The skill does not request permanent/always-on privileges, does not modify other skills, and uses normal file I/O within the provided output directory. There is no 'always: true' or other excessive privilege requested.

版本历史

v1.0.2

- Added a Chinese documentation file `SKILL_zh.md` for the skill. - No changes to core code or functionality.

v1.0.1

- Skill 名称由 "pdf-ocr-layout-understanding" 更新为 "pdf-ocr-layout"。 - 文档结构优化，脚本阶段内容细分为更清晰的提取阶段和理解阶段。 - 说明内容由列表格式调整为分级标题，便于阅读和理解。 - 各阶段描述从段落式改为条目式，更突出核心流程和要点。 - 统一格式，增强参数表格和返回数据说明的可读性，无核心功能变更。

v1.0.0

- Initial release of pdf-ocr-layout-understanding: multimodal document parsing tool. - High-precision extraction of tables (to Markdown) and automatic cropping of figures/charts from PDFs and images. - Deep semantic analysis: uses GLM-4.7 for logical interpretation of tables and GLM-4.6V for visual understanding of figures. - Returns a structured JSON report with bounding boxes, extracted content, and in-depth semantic insights. - Supports CLI pipeline for PDF/image file input and output to designated directory. - Requires ZHIPU_API_KEY, Python 3.8+, and dependencies: zhipuai, pillow, beautifulsoup4.

元数据

Slug pdf-ocr-layout

版本 1.0.2

许可证 —

累计安装 8

当前安装数 6

历史版本数 3

常见问题

pdf-ocr-layout 是什么？

基于智谱 GLM-OCR、GLM-4.7 及 GLM-4.6V 的多模态文档深度解析工具。 Use when: - 需要高精度提取文档（PDF/图片）中的表格并转换为 Markdown 格式 - 需要从文档页面中自动裁剪并提取插图、图表为独立文件 - 需要对提取的图表进行深度语义理解（基于 GLM-4.6V 视觉分析） - 需要对提取的表格数据进行逻辑分析（基于 GLM-4.7 文本分析）核心架构： 1. 视觉提取：GLM-OCR 2. 语义理解：GLM-4.7 (纯文本/表格) + GLM-4.6V (多模态/图像). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1470 次。

如何安装 pdf-ocr-layout？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pdf-ocr-layout」即可一键安装，无需额外配置。

pdf-ocr-layout 是免费的吗？

是的，pdf-ocr-layout 完全免费（开源免费），可自由下载、安装和使用。

pdf-ocr-layout 支持哪些平台？

pdf-ocr-layout 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 pdf-ocr-layout？

由 baokui（@baokui）开发并维护，当前版本 v1.0.2。

pdf-ocr-layout 是什么？

如何安装 pdf-ocr-layout？

pdf-ocr-layout 是免费的吗？

pdf-ocr-layout 支持哪些平台？

谁开发了 pdf-ocr-layout？

💬 留言讨论