Content Parser

Name: Content Parser
Author: 0xfango

功能描述

Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL"...

使用说明 (SKILL.md)

When to Use

User provides a URL and wants to extract/read its content
Another skill needs to parse source material from a URL before generation
User says "parse this URL", "extract content from this link"
User says "解析链接", "提取内容"

When NOT to Use

User already has text content and doesn't need URL parsing
User wants to generate audio/video content (not content extraction)
User wants to read a local file (use standard file reading tools)

Purpose

Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.

Hard Constraints

No shell scripts. Construct curl commands from the API reference files listed in Resources
Always read shared/authentication.md for API key and headers
Follow shared/common-patterns.md for polling, errors, and interaction patterns
URL must be a valid HTTP(S) URL
Always read config following shared/config-pattern.md before any interaction
Never save files to ~/Downloads/ or .listenhub/ — save to the current working directory

\x3CHARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API. \x3C/HARD-GATE>

Step -1: API Key Check

Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0.

If file doesn't exist — ask location, then create immediately:

mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
# (or $HOME/.listenhub/content-parser/config.json for global)

Then run Setup Flow below.

If file exists — read config, display summary, and confirm:

当前配置 (content-parser)：
  自动下载：{是 / 否}

Ask: "使用已保存的配置？" → 确认，直接继续 / 重新配置

Setup Flow (first run or reconfigure)

autoDownload: "自动保存提取的内容到当前目录？"
- "是（推荐）" → autoDownload: true
- "否" → autoDownload: false

Save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

Step 1: URL Input

Free text input. Ask the user:

What URL would you like to extract content from?

Step 2: Options (optional)

Ask if the user wants to configure extraction options:

Question: "Do you want to configure extraction options?"
Options:
  - "No, use defaults" — Extract with default settings
  - "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count

If "Yes", ask follow-up questions:

Summarize: "Generate a summary of the content?" (Yes/No)
Max Length: "Set maximum content length?" (Free text, e.g., "5000")
Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)

Step 3: Confirm & Extract

Summarize:

Ready to extract content:

  URL: {url}
  Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default

  Proceed?

Wait for explicit confirmation before calling the API.

Workflow

Validate URL: Must be HTTP(S). Normalize if needed (see references/supported-platforms.md)

Build request body:

{
  "source": {
    "type": "url",
    "uri": "{url}"
  },
  "options": {
    "summarize": true/false,
    "maxLength": 5000,
    "twitter": {
      "count": 50
    }
  }
}

Omit options if user chose defaults.

Submit (foreground): POST /v1/content/extract → extract taskId
Tell the user extraction is in progress

Poll (background): Run the following exact bash command with run_in_background: true and timeout: 300000. Note: status field is .data.status (not processStatus), interval is 5s, values are processing/completed/failed:

TASK_ID="\x3Cid-from-step-3>"
for i in $(seq 1 60); do
  RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
    -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
  STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
  case "$STATUS" in
    completed) echo "$RESULT"; exit 0 ;;
    failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
    *) sleep 5 ;;
  esac
done
echo "TIMEOUT" >&2; exit 2

When notified, download and present result:

If autoDownload is true:

Write {taskId}-extracted.md to the current directory — full extracted content in markdown
Write {taskId}-extracted.json to the current directory — full raw API response data

echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
echo "$RESULT" > "${TASK_ID}-extracted.json"

Present:

内容提取完成！

来源：{url}
标题：{metadata.title}
长度：~{character count} 字符
消耗积分：{credits}

已保存到当前目录：
  {taskId}-extracted.md
  {taskId}-extracted.json

Show a preview of the extracted content (first ~500 chars)
Offer to use content in another skill (e.g. /podcast, /tts)

Estimated time: 10-30 seconds depending on content size and platform.

API Reference

Content extract: shared/api-content-extract.md
Supported platforms: references/supported-platforms.md
Polling: shared/common-patterns.md § Async Polling
Error handling: shared/common-patterns.md § Error Handling
Config pattern: shared/config-pattern.md

Example

User: "Parse this article: https://en.wikipedia.org/wiki/Topology"

Agent workflow:

URL: https://en.wikipedia.org/wiki/Topology
Options: defaults (omit options)
Submit extraction

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://en.wikipedia.org/wiki/Topology"
    }
  }'

Poll until complete:

curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"

Present extracted content preview and offer next actions.

User: "Extract recent tweets from @elonmusk, get 50 tweets"

Agent workflow:

URL: https://x.com/elonmusk
Options: {"twitter": {"count": 50}}
Submit extraction

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'

Poll until complete, present results.

安全使用建议

Don't install blindly. Before proceeding, ask the skill author (or registry owner) to clarify: (1) which API host is authoritative (is LISTENHUB backed by marswave.ai?); (2) where config and downloaded files will actually be stored (the SKILL.md both creates and forbids .listenhub); and (3) provide the missing shared files referenced (authentication.md, config-pattern.md, common-patterns.md, api-content-extract.md). If you must try it, use a throwaway/limited API key with only needed scope, run the skill in an isolated environment or container, and verify all network endpoints (DNS/IP) and files the agent writes. If the author cannot explain the domain/credential mismatch and the contradictory config instructions, treat the skill as untrusted.

功能分析

Type: OpenClaw Skill Name: content-parser Version: 0.1.0 The 'content-parser' skill is a legitimate tool designed to extract and normalize content from various URLs (YouTube, Twitter, WeChat, etc.) using the Marswave API (api.marswave.ai). The skill follows a transparent workflow that includes API key validation, user confirmation for extraction options, and local storage of results in markdown and JSON formats. While it utilizes bash scripts for polling task status, the logic is restricted to the stated purpose and lacks any indicators of malicious intent, data exfiltration, or unauthorized system access.

能力评估

⚠ Purpose & Capability

The skill declares it needs a LISTENHUB_API_KEY which fits a content-extraction service, but the runtime instructions call an API at https://api.marswave.ai/... — the domain does not match the LISTENHUB name and there is no homepage or source to explain this. That mismatch (credential name vs. endpoint host) is unexpected and should be justified.

⚠ Instruction Scope

SKILL.md instructs the agent to call external extraction endpoints (via curl) and to create/read local config. It also mandates reading several shared files (shared/authentication.md, shared/config-pattern.md, shared/common-patterns.md) which are not included in the package. There is a direct contradiction: Setup Flow creates files under .listenhub/ but a Hard Constraint says 'Never save files to ... .listenhub/ — save to the current working directory.' These inconsistencies could cause incorrect behavior or unexpected writes.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no bundled code, so nothing will be downloaded or written by an installer step. That is the lowest install risk.

ℹ Credentials

The skill asks for a single API credential (LISTENHUB_API_KEY), which is reasonable for a hosted extraction service. However, the env var name does not match the explicit API host used in curl calls (marswave.ai), which is suspicious and should be clarified. The SKILL.md otherwise does not request additional unrelated secrets.

⚠ Persistence & Privilege

The skill instructs creating a local config file ('.listenhub/content-parser/config.json' or $HOME equivalent) and may write extracted content into the current directory. The contradictory guidance about never saving to .listenhub vs. creating .listenhub is alarming: the skill both instructs and forbids writing there. Persisting config and output files is plausible for this functionality but the inconsistency increases risk and should be resolved.

版本历史

v0.1.0

content-parser v0.1.0 - Initial release. - Extracts and parses content from URLs, supporting both user and skill-triggered flows. - Interactive setup with configurable auto-download and extraction options (summarize, maxLength, Twitter count). - Polling and background job handling for extraction tasks. - Saves results (`.md` and `.json`) to the current directory if autoDownload is enabled. - Presents extraction previews and offers next-step actions.

元数据

Slug content-parser

版本 0.1.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

Content Parser 是什么？

Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL"... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 288 次。

如何安装 Content Parser？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install content-parser」即可一键安装，无需额外配置。

Content Parser 是免费的吗？

是的，Content Parser 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Content Parser 支持哪些平台？

Content Parser 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Content Parser？

由 0xFango（@0xfango）开发并维护，当前版本 v0.1.0。