← Back to Skills Marketplace
fchange

Moark Doc Extraction

by fchange · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
157
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install moark-doc-extraction
Description
Extract and recognize text from documents, including PDF and DOCX files.
README (SKILL.md)

Document Extraction

This skill allows users to extract and recognize text from documents, including PDF and DOCX files, using an external GITEE AI API.

Usage

Ensure you have installed the required dependencies (pip install requests requests-toolbelt). Use the bundled script to perform document extraction.

python {baseDir}/scripts/perform_doc_extraction.py --file /path/to/document.pdf --api-key YOUR_API

Options

No additional parameters are required for this skill.

Workflow

  1. Execute the perform_doc_extraction.py script with the parameters from the user.
  2. Parse the script output and find the line starting with EXTRACTION_RESULT:.
  3. Extract the OCR result from that line (format: EXTRACTION_RESULT: ...).
  4. Display the OCR result to the user using markdown syntax: 📖[EXTRACTION_RESULT Result].

Notes

  • If GITEEAI_API_KEY is none, you should remind user to provide --api-key argument
  • Please handle the output of the script carefully, ensuring that you only extract and display the relevant information without adding any extra commentary or interpretation.
  • You should optimize the output format to make it more concise and user-friendly, but do not change or ignore the content of the result.
  • The script prints EXTRACTION_RESULT: in the output - extract this result and display it using markdown image syntax:📖[EXTRACTION_RESULT Result].
  • Always look for the line starting with EXTRACTION_RESULT: in the script output.
Usage Guidance
This skill appears to do exactly what it claims: upload a supplied PDF/DOCX (or fetch a URL) to the Gitee AI async document parse API and return extracted text. Before installing, consider: 1) Confirm you trust the Gitee AI service and are comfortable providing your GITEEAI_API_KEY. 2) Update your agent's output-parsing logic to handle the script's actual output format (the script prints 'EXTRACTION_RESULT:' on one line and the extracted text on following line(s) rather than 'EXTRACTION_RESULT: <text>' on a single line). 3) Be cautious about supplying document URLs from untrusted sources — the script will fetch them (this can reach internal network addresses if the runtime has network access). 4) Note the script requests include_image_base64=true, so images may be included in API responses (potentially large or sensitive). 5) Ensure the environment has the listed Python dependencies available or install them in a controlled environment before use.
Capability Analysis
Type: OpenClaw Skill Name: moark-doc-extraction Version: 1.0.0 The skill is a legitimate tool for document OCR and text extraction using the Gitee AI API. The script `perform_doc_extraction.py` correctly implements task submission and polling to `ai.gitee.com`, and the `SKILL.md` instructions accurately reflect the code's functionality without any signs of prompt injection or malicious intent.
Capability Assessment
Purpose & Capability
Name/description, required environment variable (GITEEAI_API_KEY), and the script all point to using Gitee AI document parsing endpoints (ai.gitee.com). The credential requested is consistent with the stated purpose.
Instruction Scope
The SKILL.md instructs the agent to run the bundled script and extract a line starting with 'EXTRACTION_RESULT:'. The script actually prints a line 'EXTRACTION_RESULT:' and then prints the extracted text on subsequent line(s) (i.e., the OCR text is not on the same line as the label). This mismatch could break naive parsers. SKILL.md also suggests displaying the result using a particular markdown/image-like syntax and asks the agent not to add commentary. Additionally, the script accepts either a local file path or a URL and will fetch URLs, which is expected for this use case but means untrusted URLs could cause the runtime to make arbitrary network requests (including to internal endpoints).
Install Mechanism
No install/download mechanism is provided (instruction-only with a bundled script). Dependencies are standard Python packages (requests, requests-toolbelt) mentioned in the script comments and SKILL.md; nothing is downloaded from unknown or unsafe locations by the installer.
Credentials
Only a single environment variable (GITEEAI_API_KEY) is required and is justified by the use of the Gitee AI API. The script does not read other unrelated env vars or config paths.
Persistence & Privilege
The skill does not request permanent presence or elevated agent privileges (always is false) and does not modify other skills or system-wide agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install moark-doc-extraction
  3. After installation, invoke the skill by name or use /moark-doc-extraction
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of moark-doc-extraction. - Supports extracting and recognizing text from PDF and DOCX documents using the GITEE AI API. - Provides a script for document extraction via command line. - Requires a valid GITEEAI_API_KEY for API access. - Extraction results are parsed automatically and displayed in a clear, markdown-formatted output.
Metadata
Slug moark-doc-extraction
Version 1.0.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is Moark Doc Extraction?

Extract and recognize text from documents, including PDF and DOCX files. It is an AI Agent Skill for Claude Code / OpenClaw, with 157 downloads so far.

How do I install Moark Doc Extraction?

Run "/install moark-doc-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Moark Doc Extraction free?

Yes, Moark Doc Extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Moark Doc Extraction support?

Moark Doc Extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Moark Doc Extraction?

It is built and maintained by fchange (@fchange); the current version is v1.0.0.

💬 Comments