← 返回 Skills 市场
er3mit4

Docling

作者 Er3mit4 · GitHub ↗ · v1.0.2
cross-platform ⚠ suspicious
1367
总下载
0
收藏
5
当前安装
3
版本数
在 OpenClaw 中安装
/install docling
功能描述
Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed.
使用说明 (SKILL.md)

Docling - Document & Web Content Extraction

CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.

Prerequisites

  • docling CLI must be installed (e.g., via pipx install docling)
  • For GPU support: NVIDIA GPU with CUDA drivers

When to Use

  • Extract content from a URL → Use docling (not web_fetch)
  • Search for information → Use web_search (Brave)
  • Parse PDFs, DOCX, PPTX → Use docling
  • OCR on images → Use docling

Quick Commands

Web Page → Markdown (default)

docling "\x3CURL>" --from html --to md

Output: creates a .md file in current directory (or use --output)

Web Page → Plain Text

docling "\x3CURL>" --from html --to text --output /tmp/docling_out

PDF with OCR

docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out

Key Options

Option Values Description
--from html, pdf, docx, pptx, image, md, csv, xlsx Input format
--to md, text, json, yaml, html Output format
--device auto, cuda, cpu Accelerator (default: auto)
--output path Output directory (recommended: use controlled temp dir)
--ocr flag Enable OCR for images/scanned PDFs
--tables flag Extract tables (default: on)

Security Notes

⚠️ Avoid these flags unless you trust the source:

  • --enable-remote-services - can send data to remote endpoints
  • --allow-external-plugins - loads third-party code
  • Custom --headers with untrusted values - can redirect requests

Workflow

  1. For web content extraction: Use docling "\x3CURL>" --from html --to text --output /tmp/docling_out
  2. Read the output file from the specified output directory
  3. Clean up the output directory after reading

GPU Support

Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"

Full CLI Reference

See references/cli-reference.md for complete option list.

安全使用建议
This skill is an instruction-only wrapper around a local `docling` CLI. Before installing or using it: (1) install the `docling` CLI from a trusted source (e.g., the official project or PyPI) and verify package integrity, (2) avoid using the flagged options `--enable-remote-services` and `--allow-external-plugins` unless you trust the remote endpoints and plugins (they can send your document data off-host), (3) prefer writing outputs to a controlled temporary directory and remove outputs after use, (4) don't pass custom headers or other untrusted inputs that might be used to redirect requests or leak data, and (5) be cautious when processing sensitive documents—OCR and model enrichments may send content to model backends if you enable remote services. The skill metadata mismatch about required binaries (registry vs SKILL.md) and the lack of an official source/homepage lowers confidence; if you need higher assurance, ask the publisher for the authoritative project URL or a signed release before proceeding.
功能分析
Type: OpenClaw Skill Name: docling Version: 1.0.2 The skill wraps the `docling` CLI tool, which exposes high-risk capabilities such as `--enable-remote-services` (can send data to remote endpoints) and `--allow-external-plugins` (loads third-party code), as detailed in `SKILL.md` and `references/cli-reference.md`. While `SKILL.md` explicitly warns against using these flags, an AI agent could be susceptible to prompt injection, leading it to ignore these warnings and activate these features, potentially resulting in data exfiltration or remote code execution. This constitutes a significant vulnerability rather than intentional malicious behavior by the skill author.
能力评估
Purpose & Capability
The name/description promise (extract/parse web pages, PDFs, images via a CLI with optional GPU) matches the runtime instructions which show command-line usage of a local `docling` tool. One minor inconsistency: the registry metadata in the provided summary lists no required binaries, but SKILL.md metadata and the instructions explicitly require the `docling` CLI to be installed (e.g., via `pipx`). This is plausibly a metadata sync issue and not a functional mismatch.
Instruction Scope
SKILL.md only instructs the agent to run `docling` against URLs or local files, read output files, and clean up. It does not ask the agent to read unrelated system files or environment variables. The doc explicitly warns about risky flags (`--enable-remote-services`, `--allow-external-plugins`, custom `--headers`) which, if used, could exfiltrate data—those flags are part of the CLI but are cautioned against in the instructions.
Install Mechanism
There is no install spec in the skill bundle (instruction-only). The SKILL.md advises installing `docling` via `pipx`, which is a reasonable, low-risk installation path; nothing in the bundle tries to download or run arbitrary code itself.
Credentials
The skill declares no required environment variables, no credentials, and no config paths. That fits a local CLI wrapper which relies on an installed binary. This is proportionate to the stated purpose.
Persistence & Privilege
The skill does not request always-on presence, does not modify other skills or system-wide settings, and allows autonomous invocation (default) which is normal. No elevated persistence or privilege is requested by the skill itself.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install docling
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /docling 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
Added required bins metadata, security warnings for remote services and plugins, and best practices for output directory
v1.0.1
Removed specific hardware references
v1.0.0
Initial release
元数据
Slug docling
版本 1.0.2
许可证
累计安装 5
当前安装数 5
历史版本数 3
常见问题

Docling 是什么?

Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1367 次。

如何安装 Docling?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install docling」即可一键安装,无需额外配置。

Docling 是免费的吗?

是的,Docling 完全免费(开源免费),可自由下载、安装和使用。

Docling 支持哪些平台?

Docling 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Docling?

由 Er3mit4(@er3mit4)开发并维护,当前版本 v1.0.2。

💬 留言讨论