← 返回 Skills 市场
157
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install moark-doc-extraction
功能描述
Extract and recognize text from documents, including PDF and DOCX files.
使用说明 (SKILL.md)
Document Extraction
This skill allows users to extract and recognize text from documents, including PDF and DOCX files, using an external GITEE AI API.
Usage
Ensure you have installed the required dependencies (pip install requests requests-toolbelt). Use the bundled script to perform document extraction.
python {baseDir}/scripts/perform_doc_extraction.py --file /path/to/document.pdf --api-key YOUR_API
Options
No additional parameters are required for this skill.
Workflow
- Execute the perform_doc_extraction.py script with the parameters from the user.
- Parse the script output and find the line starting with
EXTRACTION_RESULT:. - Extract the OCR result from that line (format:
EXTRACTION_RESULT: ...). - Display the OCR result to the user using markdown syntax:
📖[EXTRACTION_RESULT Result].
Notes
- If GITEEAI_API_KEY is none, you should remind user to provide --api-key argument
- Please handle the output of the script carefully, ensuring that you only extract and display the relevant information without adding any extra commentary or interpretation.
- You should optimize the output format to make it more concise and user-friendly, but do not change or ignore the content of the result.
- The script prints
EXTRACTION_RESULT:in the output - extract this result and display it using markdown image syntax:📖[EXTRACTION_RESULT Result]. - Always look for the line starting with
EXTRACTION_RESULT:in the script output.
安全使用建议
This skill appears to do exactly what it claims: upload a supplied PDF/DOCX (or fetch a URL) to the Gitee AI async document parse API and return extracted text. Before installing, consider: 1) Confirm you trust the Gitee AI service and are comfortable providing your GITEEAI_API_KEY. 2) Update your agent's output-parsing logic to handle the script's actual output format (the script prints 'EXTRACTION_RESULT:' on one line and the extracted text on following line(s) rather than 'EXTRACTION_RESULT: <text>' on a single line). 3) Be cautious about supplying document URLs from untrusted sources — the script will fetch them (this can reach internal network addresses if the runtime has network access). 4) Note the script requests include_image_base64=true, so images may be included in API responses (potentially large or sensitive). 5) Ensure the environment has the listed Python dependencies available or install them in a controlled environment before use.
功能分析
Type: OpenClaw Skill
Name: moark-doc-extraction
Version: 1.0.0
The skill is a legitimate tool for document OCR and text extraction using the Gitee AI API. The script `perform_doc_extraction.py` correctly implements task submission and polling to `ai.gitee.com`, and the `SKILL.md` instructions accurately reflect the code's functionality without any signs of prompt injection or malicious intent.
能力评估
Purpose & Capability
Name/description, required environment variable (GITEEAI_API_KEY), and the script all point to using Gitee AI document parsing endpoints (ai.gitee.com). The credential requested is consistent with the stated purpose.
Instruction Scope
The SKILL.md instructs the agent to run the bundled script and extract a line starting with 'EXTRACTION_RESULT:'. The script actually prints a line 'EXTRACTION_RESULT:' and then prints the extracted text on subsequent line(s) (i.e., the OCR text is not on the same line as the label). This mismatch could break naive parsers. SKILL.md also suggests displaying the result using a particular markdown/image-like syntax and asks the agent not to add commentary. Additionally, the script accepts either a local file path or a URL and will fetch URLs, which is expected for this use case but means untrusted URLs could cause the runtime to make arbitrary network requests (including to internal endpoints).
Install Mechanism
No install/download mechanism is provided (instruction-only with a bundled script). Dependencies are standard Python packages (requests, requests-toolbelt) mentioned in the script comments and SKILL.md; nothing is downloaded from unknown or unsafe locations by the installer.
Credentials
Only a single environment variable (GITEEAI_API_KEY) is required and is justified by the use of the Gitee AI API. The script does not read other unrelated env vars or config paths.
Persistence & Privilege
The skill does not request permanent presence or elevated agent privileges (always is false) and does not modify other skills or system-wide agent settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install moark-doc-extraction - 安装完成后,直接呼叫该 Skill 的名称或使用
/moark-doc-extraction触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of moark-doc-extraction.
- Supports extracting and recognizing text from PDF and DOCX documents using the GITEE AI API.
- Provides a script for document extraction via command line.
- Requires a valid GITEEAI_API_KEY for API access.
- Extraction results are parsed automatically and displayed in a clear, markdown-formatted output.
元数据
常见问题
Moark Doc Extraction 是什么?
Extract and recognize text from documents, including PDF and DOCX files. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 157 次。
如何安装 Moark Doc Extraction?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install moark-doc-extraction」即可一键安装,无需额外配置。
Moark Doc Extraction 是免费的吗?
是的,Moark Doc Extraction 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Moark Doc Extraction 支持哪些平台?
Moark Doc Extraction 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Moark Doc Extraction?
由 fchange(@fchange)开发并维护,当前版本 v1.0.0。
推荐 Skills