← Back to Skills Marketplace
halfmoon82

content-extraction

by halfmoon82 · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ Security Clean
155
Downloads
0
Stars
2
Active Installs
3
Versions
Install in OpenClaw
/install content-extraction
Description
OpenClaw-native executable content extraction skill for URLs, Feishu, YouTube, and web pages.
README (SKILL.md)

Content Extraction — Executable Skill

This skill is the local executable version. It keeps the source-aware routing design and restores a concrete extraction workflow.

What it does

  • Detects the input source
  • Selects the best extraction channel
  • Produces clean Markdown
  • Saves long content locally when needed
  • Explains fallback failures instead of hiding them

Main entrypoints

  • scripts/extract_router.py — classify input and build a route plan
  • scripts/extract.py — generate an executable extraction spec

Route priorities

  1. WeChat → browser chain
  2. Feishu doc/wiki → Feishu tools
  3. YouTube → transcript chain
  4. Generic URLr.jina.aidefuddle.mdweb_fetch → browser fallback

Output contract

Always return:

  • title
  • author when available
  • source
  • url
  • summary
  • Markdown body
  • save path when content is long

Fallback rule

Never claim success when extraction is partial. If a layer fails, report:

  • where it failed
  • why it failed
  • what fallback was tried next

Notes

  • The ClawHub abstracted package stays abstract.
  • This local version restores the executable workflow for OpenClaw use and ClawDex publishing.
Usage Guidance
This skill appears internally consistent and focused on building extraction plans; the included scripts only classify URLs and emit specs (they do not perform network calls or exfiltrate data). Before installing, consider: 1) The actual extraction depends on OpenClaw tools (browser, feishu, web_fetch, transcript chains). Those tools may require credentials (e.g., Feishu tokens) and will perform network access — ensure you trust the platform and how those tool credentials are managed. 2) The skill suggests saving long outputs under extracted/ — check filesystem permissions and where files will be stored. 3) If you plan to extract private documents (Feishu/wiki) confirm the platform has appropriate access controls and audit logs. If any of those runtime tools are unfamiliar or untrusted in your environment, review their configurations before enabling the skill.
Capability Analysis
Type: OpenClaw Skill Name: content-extraction Version: 1.1.0 The content-extraction skill is a well-structured routing and planning framework designed to help an AI agent convert various web sources (WeChat, Feishu, YouTube, and generic URLs) into clean Markdown. The Python scripts (extract.py and extract_router.py) contain logic for URL classification and workflow generation without any dangerous system calls, network exfiltration, or obfuscation. The instructions in SKILL.md and the supporting documentation are transparent and strictly aligned with the stated purpose of content extraction and formatting.
Capability Assessment
Purpose & Capability
Name/description (content extraction for URLs, Feishu, YouTube, web pages) matches the provided assets: router and executor scripts, mapping notes, and docs. No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
SKILL.md instructs the agent to use OpenClaw tools (browser, feishu, web_fetch, transcript chains) and to save long content locally. The included Python scripts only classify URLs and emit extraction specs / save-path suggestions — they do not themselves call network endpoints or read secrets. Be aware the runtime will rely on platform tools (browser/feishu/etc.) to perform actual fetches, which is expected for this skill.
Install Mechanism
No install spec or remote downloads. This is instruction- and script-only; nothing will be written to disk by an installer. The risk surface is limited to the scripts included in the skill bundle.
Credentials
The skill declares no required env vars or credentials. However, its runtime behavior (per SKILL.md) expects platform-provided tools that may themselves need credentials (e.g., Feishu API tokens or browser tooling with access). The absence of required env vars in the skill is reasonable but means it relies on the agent/platform to provide any needed credentials.
Persistence & Privilege
always is false and the skill does not request persistent/privileged presence. The scripts suggest saving to extracted/ by default (they only generate a save path), which is a modest local-write behavior and proportionate to the stated purpose.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install content-extraction
  3. After installation, invoke the skill by name or use /content-extraction
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Restore executable local extractor while keeping ClawHub abstract package unchanged.
v1.0.1
- Framework-only release: removes all executable router/extractor code and platform integration; keeps only routing logic and output contract as documentation. - Updated SKILL.md to reflect a lightweight, reference-only version. - Now intended purely as a template or documentation, not for runtime use.
v1.0.0
Executable router + executor scaffold solidified
Metadata
Slug content-extraction
Version 1.1.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 3
Frequently Asked Questions

What is content-extraction?

OpenClaw-native executable content extraction skill for URLs, Feishu, YouTube, and web pages. It is an AI Agent Skill for Claude Code / OpenClaw, with 155 downloads so far.

How do I install content-extraction?

Run "/install content-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is content-extraction free?

Yes, content-extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does content-extraction support?

content-extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created content-extraction?

It is built and maintained by halfmoon82 (@halfmoon82); the current version is v1.1.0.

💬 Comments