← Back to Skills Marketplace

Web Extractor

Name: Web Extractor
Author: kukuxnd

by kukuxNd · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

323

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install web-extractor

Description

使用 jina.ai 提取网页干净文本并让 Agent 总结。触发词：提取网页、总结新闻、提取文章、获取页面内容

Usage Guidance

This skill behaves as advertised (it delegates extraction to r.jina.ai then summarizes the returned markdown), but it will cause your requested URL and the fetched page content to be fetched and processed by a third-party service. Before installing or using it, consider: - Do not send sensitive, private, or internal URLs (intranets, private docs, or cloud metadata endpoints like 169.254.169.254) — doing so can leak secrets or enable SSRF via the remote extractor. - Treat r.jina.ai as an external party: any content fetched for summarization will be disclosed to them. Verify you trust that service or host an extractor locally. - The skill writes to predictable /tmp filenames; if you must use it, prefer changing the workflow to use a secure temporary filename (e.g., mktemp) to avoid collisions or exposure. - If you need to summarize protected content, fetch the page locally (ensuring credentials are handled safely), sanitize/remove sensitive headers or query params, and run a local extraction/parsing step instead of sending the raw URL to a public extractor. If you want a safer alternative, ask for a version that accepts raw HTML you provide (so you control what is sent externally) or for instructions to run a local HTML-to-text tool rather than delegating fetching to r.jina.ai.

Capability Analysis

Type: OpenClaw Skill Name: web-extractor Version: 1.0.0 The web-extractor skill is designed to fetch and clean web content using the r.jina.ai service for AI summarization. The workflow in SKILL.md uses standard curl commands to retrieve data and store it in temporary files (/tmp/web-content.md), which is consistent with its stated purpose and shows no signs of malicious intent or data exfiltration.

Capability Assessment

✓ Purpose & Capability

The name/description match the instructions: the SKILL.md tells the agent to fetch a page via r.jina.ai and summarize the resulting markdown. No unrelated binaries, installs, or credentials are requested.

⚠ Instruction Scope

The instructions instruct the agent to POST the target page URL to an external service (https://r.jina.ai/...) and save the result to /tmp, then read and summarize that file. This is within the stated function but has privacy/security implications that the skill does not address: arbitrary URLs (including internal intranet or metadata endpoints) will be fetched by the remote service, and page contents are disclosed to a third party. The instructions also use a predictable /tmp filename, which can create local information exposure or race conditions.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files — nothing is written to disk by an installer. Lowest install risk.

✓ Credentials

The skill requests no environment variables, credentials, or config paths. There is no overbroad credential access declared.

✓ Persistence & Privilege

The skill does not request permanent presence (always: false) and does not modify agent/system configs. Agent-autonomous invocation is allowed by default, which is expected and not by itself a red flag.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install web-extractor
After installation, invoke the skill by name or use /web-extractor
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of web-extractor skill. - Extracts clean text from web pages using r.jina.ai, removing scripts, navigation, ads, and unnecessary CSS. - Allows easy summarization of core content by the Agent. - Supports extracting from any news site, tech blog, or article page. - Saved content is in pure text format for optimal AI processing. - Default output path is /tmp/, with customizable file locations.

Metadata

Slug web-extractor

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Web Extractor?

使用 jina.ai 提取网页干净文本并让 Agent 总结。触发词：提取网页、总结新闻、提取文章、获取页面内容. It is an AI Agent Skill for Claude Code / OpenClaw, with 323 downloads so far.

How do I install Web Extractor?

Run "/install web-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Extractor free?

Yes, Web Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Extractor support?

Web Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Extractor?

It is built and maintained by kukuxNd (@kukuxnd); the current version is v1.0.0.

More Skills