← Back to Skills Marketplace
kukuxnd

Web Extractor

by kukuxNd · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
323
Downloads
1
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install web-extractor
Description
使用 jina.ai 提取网页干净文本并让 Agent 总结。触发词:提取网页、总结新闻、提取文章、获取页面内容
Usage Guidance
This skill behaves as advertised (it delegates extraction to r.jina.ai then summarizes the returned markdown), but it will cause your requested URL and the fetched page content to be fetched and processed by a third-party service. Before installing or using it, consider: - Do not send sensitive, private, or internal URLs (intranets, private docs, or cloud metadata endpoints like 169.254.169.254) — doing so can leak secrets or enable SSRF via the remote extractor. - Treat r.jina.ai as an external party: any content fetched for summarization will be disclosed to them. Verify you trust that service or host an extractor locally. - The skill writes to predictable /tmp filenames; if you must use it, prefer changing the workflow to use a secure temporary filename (e.g., mktemp) to avoid collisions or exposure. - If you need to summarize protected content, fetch the page locally (ensuring credentials are handled safely), sanitize/remove sensitive headers or query params, and run a local extraction/parsing step instead of sending the raw URL to a public extractor. If you want a safer alternative, ask for a version that accepts raw HTML you provide (so you control what is sent externally) or for instructions to run a local HTML-to-text tool rather than delegating fetching to r.jina.ai.
Capability Analysis
Type: OpenClaw Skill Name: web-extractor Version: 1.0.0 The web-extractor skill is designed to fetch and clean web content using the r.jina.ai service for AI summarization. The workflow in SKILL.md uses standard curl commands to retrieve data and store it in temporary files (/tmp/web-content.md), which is consistent with its stated purpose and shows no signs of malicious intent or data exfiltration.
Capability Assessment
Purpose & Capability
The name/description match the instructions: the SKILL.md tells the agent to fetch a page via r.jina.ai and summarize the resulting markdown. No unrelated binaries, installs, or credentials are requested.
Instruction Scope
The instructions instruct the agent to POST the target page URL to an external service (https://r.jina.ai/...) and save the result to /tmp, then read and summarize that file. This is within the stated function but has privacy/security implications that the skill does not address: arbitrary URLs (including internal intranet or metadata endpoints) will be fetched by the remote service, and page contents are disclosed to a third party. The instructions also use a predictable /tmp filename, which can create local information exposure or race conditions.
Install Mechanism
Instruction-only skill with no install spec and no code files — nothing is written to disk by an installer. Lowest install risk.
Credentials
The skill requests no environment variables, credentials, or config paths. There is no overbroad credential access declared.
Persistence & Privilege
The skill does not request permanent presence (always: false) and does not modify agent/system configs. Agent-autonomous invocation is allowed by default, which is expected and not by itself a red flag.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install web-extractor
  3. After installation, invoke the skill by name or use /web-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of web-extractor skill. - Extracts clean text from web pages using r.jina.ai, removing scripts, navigation, ads, and unnecessary CSS. - Allows easy summarization of core content by the Agent. - Supports extracting from any news site, tech blog, or article page. - Saved content is in pure text format for optimal AI processing. - Default output path is /tmp/, with customizable file locations.
Metadata
Slug web-extractor
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Web Extractor?

使用 jina.ai 提取网页干净文本并让 Agent 总结。触发词:提取网页、总结新闻、提取文章、获取页面内容. It is an AI Agent Skill for Claude Code / OpenClaw, with 323 downloads so far.

How do I install Web Extractor?

Run "/install web-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Extractor free?

Yes, Web Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Extractor support?

Web Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Extractor?

It is built and maintained by kukuxNd (@kukuxnd); the current version is v1.0.0.

💬 Comments