← 返回 Skills 市场

CNBLOGS 精华内容抓取

Name: CNBLOGS 精华内容抓取
Author: againcrazycode

作者 XWork · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install cnbblogs-pick

功能描述

抓取博客园精华区文章标题和正文，支持指定页数批量下载并保存为纯文本文件。

使用说明 (SKILL.md)

CNBLOGS 精华内容抓取技能

功能描述

抓取博客园（cnblogs.com）精华区内容，支持分页、批量下载标题和正文。

使用方法

基本用法

# 抓取第 1 页，保存所有文章到指定目录
openclaw cnblogs-pick --page 1 --output-dir /path/to/output

# 抓取前 3 页，保存所有文章
openclaw cnblogs-pick --pages 3 --output-dir /path/to/output

# 抓取指定 URL 的精华列表
openclaw cnblogs-pick --url https://www.cnblogs.com/pick/ --pages 2

参数说明

参数	类型	必填	默认值	说明
`--url`	string	否	https://www.cnblogs.com/pick/	精华列表页 URL
`--page`	int	否	1	单页抓取页数（仅当 --pages 未指定时有效）
`--pages`	int	否	1	总页数（优先于 --page）
`--output-dir`	string	否	~/.openclaw/workspace/user_cnglobs/	输出目录
`--agent`	string	否	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:149.0) Gecko/20100101 Firefox/149.0	User-Agent

输出格式

每篇文章保存为独立文件，命名格式：

{标题}.txt

标题中的特殊字符会被替换为下划线。

工作流程

获取列表页：使用 curl 下载指定页数的精华列表
提取链接：解析 HTML，提取所有 post-item-title 类链接
下载详情：逐个打开详情页面
提取正文：获取 cnblogs_post_body 内容并去除 HTML 标签
保存文件：按标题命名保存到输出目录

示例

# 抓取前 5 页精华内容
openclaw cnblogs-pick --pages 5 --output-dir /tmp/cnb-pick

# 查看结果
ls -lh /tmp/cnb-pick/

依赖工具

curl - HTTP 请求
grep -oP - Perl 正则表达式
sed - 文本处理

注意事项

部分文章可能因反爬机制失败
大页面可能超出 token 限制
建议先测试单页再批量处理

更新日志

v1.0.0: 初始版本，支持单页抓取
v1.1.0: 支持多页批量抓取
v1.2.0: 优化错误处理和日志输出

安全使用建议

This is an instruction-only scraper that will run shell commands (curl, grep, sed) to download and save CNBlogs 'pick' articles to a directory you choose. It does not ask for credentials or install code. Before running: (1) test on a single page and a non-sensitive output directory to confirm behavior; (2) avoid pointing output-dir at system or home configuration folders to prevent accidental overwrite; (3) be aware parsing uses regex (grep -oP) which may be brittle or incompatible with some grep builds — commands may need adjustment on your system; (4) consider site Terms of Service and rate limits — scraping can be blocked or disallowed; (5) if you want stronger safety, inspect or run the commands in a sandboxed environment first.

功能分析

Type: OpenClaw Skill Name: cnbblogs-pick Version: 1.0.0 The skill is a web scraper designed to fetch featured content from cnblogs.com. It uses standard command-line tools (curl, grep, sed) to extract article links and content, saving them to a local directory. The instructions in SKILL.md are transparent, align with the stated functionality, and contain no evidence of malicious intent, data exfiltration, or prompt injection.

能力评估

✓ Purpose & Capability

Name/description (抓取博客园精华区文章) matches the SKILL.md: it describes using curl/grep/sed to fetch list pages, extract links, download articles, strip HTML and save them as text files. No unrelated services, credentials, or binaries are requested.

ℹ Instruction Scope

Instructions stay within the stated scraping purpose (download list pages, parse links, fetch article bodies, save to output dir). Caution: the skill prescribes HTML parsing via grep -oP/sed (regex-based parsing), which is brittle and may miss content or break on site changes. It will write files to the user's output directory (default under ~/.openclaw/workspace); ensure you don't point it at sensitive system directories. The SKILL.md indicates titles will be sanitized, but filename/overwrite handling is not fully specified.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files — nothing is downloaded or written to disk by an installer. This is the lowest-risk install model.

✓ Credentials

The skill requests no environment variables or credentials. Declared runtime dependencies (curl, grep -oP, sed) are reasonable for command-line scraping. No unrelated secrets or config paths are asked for.

✓ Persistence & Privilege

always is false and the skill does not request persistent/system-wide changes. It does not modify other skills or agent settings. Normal autonomous invocation is allowed (default) but not exceptional here.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install cnbblogs-pick
安装完成后，直接呼叫该 Skill 的名称或使用 /cnbblogs-pick 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

CNBLOGS 精华内容抓取技能 v1.0.0 - 初始版本发布 - 支持抓取博客园精华区单页内容 - 提取列表中所有文章的标题和正文 - 每篇文章保存为独立 txt 文件 - 支持自定义输出目录和 User-Agent

元数据

Slug cnbblogs-pick

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题