← 返回 Skills 市场

content-extraction

Name: content-extraction
Author: halfmoon82

作者 halfmoon82 · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ 安全检测通过

155

总下载

当前安装

版本数

在 OpenClaw 中安装

/install content-extraction

功能描述

OpenClaw-native executable content extraction skill for URLs, Feishu, YouTube, and web pages.

使用说明 (SKILL.md)

Content Extraction — Executable Skill

This skill is the local executable version. It keeps the source-aware routing design and restores a concrete extraction workflow.

What it does

Detects the input source
Selects the best extraction channel
Produces clean Markdown
Saves long content locally when needed
Explains fallback failures instead of hiding them

Main entrypoints

scripts/extract_router.py — classify input and build a route plan
scripts/extract.py — generate an executable extraction spec

Route priorities

WeChat → browser chain
Feishu doc/wiki → Feishu tools
YouTube → transcript chain
Generic URL → r.jina.ai → defuddle.md → web_fetch → browser fallback

Output contract

Always return:

title
author when available
source
url
summary
Markdown body
save path when content is long

Fallback rule

Never claim success when extraction is partial. If a layer fails, report:

where it failed
why it failed
what fallback was tried next

Notes

The ClawHub abstracted package stays abstract.
This local version restores the executable workflow for OpenClaw use and ClawDex publishing.

安全使用建议

This skill appears internally consistent and focused on building extraction plans; the included scripts only classify URLs and emit specs (they do not perform network calls or exfiltrate data). Before installing, consider: 1) The actual extraction depends on OpenClaw tools (browser, feishu, web_fetch, transcript chains). Those tools may require credentials (e.g., Feishu tokens) and will perform network access — ensure you trust the platform and how those tool credentials are managed. 2) The skill suggests saving long outputs under extracted/ — check filesystem permissions and where files will be stored. 3) If you plan to extract private documents (Feishu/wiki) confirm the platform has appropriate access controls and audit logs. If any of those runtime tools are unfamiliar or untrusted in your environment, review their configurations before enabling the skill.

功能分析

Type: OpenClaw Skill Name: content-extraction Version: 1.1.0 The content-extraction skill is a well-structured routing and planning framework designed to help an AI agent convert various web sources (WeChat, Feishu, YouTube, and generic URLs) into clean Markdown. The Python scripts (extract.py and extract_router.py) contain logic for URL classification and workflow generation without any dangerous system calls, network exfiltration, or obfuscation. The instructions in SKILL.md and the supporting documentation are transparent and strictly aligned with the stated purpose of content extraction and formatting.

能力评估

✓ Purpose & Capability

Name/description (content extraction for URLs, Feishu, YouTube, web pages) matches the provided assets: router and executor scripts, mapping notes, and docs. No unrelated binaries, env vars, or config paths are requested.

ℹ Instruction Scope

SKILL.md instructs the agent to use OpenClaw tools (browser, feishu, web_fetch, transcript chains) and to save long content locally. The included Python scripts only classify URLs and emit extraction specs / save-path suggestions — they do not themselves call network endpoints or read secrets. Be aware the runtime will rely on platform tools (browser/feishu/etc.) to perform actual fetches, which is expected for this skill.

✓ Install Mechanism

No install spec or remote downloads. This is instruction- and script-only; nothing will be written to disk by an installer. The risk surface is limited to the scripts included in the skill bundle.

ℹ Credentials

The skill declares no required env vars or credentials. However, its runtime behavior (per SKILL.md) expects platform-provided tools that may themselves need credentials (e.g., Feishu API tokens or browser tooling with access). The absence of required env vars in the skill is reasonable but means it relies on the agent/platform to provide any needed credentials.

✓ Persistence & Privilege

always is false and the skill does not request persistent/privileged presence. The scripts suggest saving to extracted/ by default (they only generate a save path), which is a modest local-write behavior and proportionate to the stated purpose.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install content-extraction
安装完成后，直接呼叫该 Skill 的名称或使用 /content-extraction 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

Restore executable local extractor while keeping ClawHub abstract package unchanged.

v1.0.1

- Framework-only release: removes all executable router/extractor code and platform integration; keeps only routing logic and output contract as documentation. - Updated SKILL.md to reflect a lightweight, reference-only version. - Now intended purely as a template or documentation, not for runtime use.

v1.0.0

Executable router + executor scaffold solidified

元数据

Slug content-extraction

版本 1.1.0

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 3

常见问题