← Back to Skills Marketplace

content-extraction

Name: content-extraction
Author: halfmoon82

by halfmoon82 · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ Security Clean

155

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install content-extraction

Description

OpenClaw-native executable content extraction skill for URLs, Feishu, YouTube, and web pages.

README (SKILL.md)

Content Extraction — Executable Skill

This skill is the local executable version. It keeps the source-aware routing design and restores a concrete extraction workflow.

What it does

Detects the input source
Selects the best extraction channel
Produces clean Markdown
Saves long content locally when needed
Explains fallback failures instead of hiding them

Main entrypoints

scripts/extract_router.py — classify input and build a route plan
scripts/extract.py — generate an executable extraction spec

Route priorities

WeChat → browser chain
Feishu doc/wiki → Feishu tools
YouTube → transcript chain
Generic URL → r.jina.ai → defuddle.md → web_fetch → browser fallback

Output contract

Always return:

title
author when available
source
url
summary
Markdown body
save path when content is long

Fallback rule

Never claim success when extraction is partial. If a layer fails, report:

where it failed
why it failed
what fallback was tried next

Notes

The ClawHub abstracted package stays abstract.
This local version restores the executable workflow for OpenClaw use and ClawDex publishing.

Usage Guidance

This skill appears internally consistent and focused on building extraction plans; the included scripts only classify URLs and emit specs (they do not perform network calls or exfiltrate data). Before installing, consider: 1) The actual extraction depends on OpenClaw tools (browser, feishu, web_fetch, transcript chains). Those tools may require credentials (e.g., Feishu tokens) and will perform network access — ensure you trust the platform and how those tool credentials are managed. 2) The skill suggests saving long outputs under extracted/ — check filesystem permissions and where files will be stored. 3) If you plan to extract private documents (Feishu/wiki) confirm the platform has appropriate access controls and audit logs. If any of those runtime tools are unfamiliar or untrusted in your environment, review their configurations before enabling the skill.

Capability Analysis

Type: OpenClaw Skill Name: content-extraction Version: 1.1.0 The content-extraction skill is a well-structured routing and planning framework designed to help an AI agent convert various web sources (WeChat, Feishu, YouTube, and generic URLs) into clean Markdown. The Python scripts (extract.py and extract_router.py) contain logic for URL classification and workflow generation without any dangerous system calls, network exfiltration, or obfuscation. The instructions in SKILL.md and the supporting documentation are transparent and strictly aligned with the stated purpose of content extraction and formatting.

Capability Assessment

✓ Purpose & Capability

Name/description (content extraction for URLs, Feishu, YouTube, web pages) matches the provided assets: router and executor scripts, mapping notes, and docs. No unrelated binaries, env vars, or config paths are requested.

ℹ Instruction Scope

SKILL.md instructs the agent to use OpenClaw tools (browser, feishu, web_fetch, transcript chains) and to save long content locally. The included Python scripts only classify URLs and emit extraction specs / save-path suggestions — they do not themselves call network endpoints or read secrets. Be aware the runtime will rely on platform tools (browser/feishu/etc.) to perform actual fetches, which is expected for this skill.

✓ Install Mechanism

No install spec or remote downloads. This is instruction- and script-only; nothing will be written to disk by an installer. The risk surface is limited to the scripts included in the skill bundle.

ℹ Credentials

The skill declares no required env vars or credentials. However, its runtime behavior (per SKILL.md) expects platform-provided tools that may themselves need credentials (e.g., Feishu API tokens or browser tooling with access). The absence of required env vars in the skill is reasonable but means it relies on the agent/platform to provide any needed credentials.

✓ Persistence & Privilege

always is false and the skill does not request persistent/privileged presence. The scripts suggest saving to extracted/ by default (they only generate a save path), which is a modest local-write behavior and proportionate to the stated purpose.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install content-extraction
After installation, invoke the skill by name or use /content-extraction
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

Restore executable local extractor while keeping ClawHub abstract package unchanged.

v1.0.1

- Framework-only release: removes all executable router/extractor code and platform integration; keeps only routing logic and output contract as documentation. - Updated SKILL.md to reflect a lightweight, reference-only version. - Now intended purely as a template or documentation, not for runtime use.

v1.0.0

Executable router + executor scaffold solidified

Metadata

Slug content-extraction

Version 1.1.0

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 3

Frequently Asked Questions

What is content-extraction?

OpenClaw-native executable content extraction skill for URLs, Feishu, YouTube, and web pages. It is an AI Agent Skill for Claude Code / OpenClaw, with 155 downloads so far.

How do I install content-extraction?

Run "/install content-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is content-extraction free?

Yes, content-extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does content-extraction support?

content-extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created content-extraction?

It is built and maintained by halfmoon82 (@halfmoon82); the current version is v1.1.0.

More Skills