← Back to Skills Marketplace

CNBLOGS 精华内容抓取

Name: CNBLOGS 精华内容抓取
Author: againcrazycode

by XWork · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install cnbblogs-pick

Description

抓取博客园精华区文章标题和正文，支持指定页数批量下载并保存为纯文本文件。

README (SKILL.md)

CNBLOGS 精华内容抓取技能

功能描述

抓取博客园（cnblogs.com）精华区内容，支持分页、批量下载标题和正文。

使用方法

基本用法

# 抓取第 1 页，保存所有文章到指定目录
openclaw cnblogs-pick --page 1 --output-dir /path/to/output

# 抓取前 3 页，保存所有文章
openclaw cnblogs-pick --pages 3 --output-dir /path/to/output

# 抓取指定 URL 的精华列表
openclaw cnblogs-pick --url https://www.cnblogs.com/pick/ --pages 2

参数说明

参数	类型	必填	默认值	说明
`--url`	string	否	https://www.cnblogs.com/pick/	精华列表页 URL
`--page`	int	否	1	单页抓取页数（仅当 --pages 未指定时有效）
`--pages`	int	否	1	总页数（优先于 --page）
`--output-dir`	string	否	~/.openclaw/workspace/user_cnglobs/	输出目录
`--agent`	string	否	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:149.0) Gecko/20100101 Firefox/149.0	User-Agent

输出格式

每篇文章保存为独立文件，命名格式：

{标题}.txt

标题中的特殊字符会被替换为下划线。

工作流程

获取列表页：使用 curl 下载指定页数的精华列表
提取链接：解析 HTML，提取所有 post-item-title 类链接
下载详情：逐个打开详情页面
提取正文：获取 cnblogs_post_body 内容并去除 HTML 标签
保存文件：按标题命名保存到输出目录

示例

# 抓取前 5 页精华内容
openclaw cnblogs-pick --pages 5 --output-dir /tmp/cnb-pick

# 查看结果
ls -lh /tmp/cnb-pick/

依赖工具

curl - HTTP 请求
grep -oP - Perl 正则表达式
sed - 文本处理

注意事项

部分文章可能因反爬机制失败
大页面可能超出 token 限制
建议先测试单页再批量处理

更新日志

v1.0.0: 初始版本，支持单页抓取
v1.1.0: 支持多页批量抓取
v1.2.0: 优化错误处理和日志输出

Usage Guidance

This is an instruction-only scraper that will run shell commands (curl, grep, sed) to download and save CNBlogs 'pick' articles to a directory you choose. It does not ask for credentials or install code. Before running: (1) test on a single page and a non-sensitive output directory to confirm behavior; (2) avoid pointing output-dir at system or home configuration folders to prevent accidental overwrite; (3) be aware parsing uses regex (grep -oP) which may be brittle or incompatible with some grep builds — commands may need adjustment on your system; (4) consider site Terms of Service and rate limits — scraping can be blocked or disallowed; (5) if you want stronger safety, inspect or run the commands in a sandboxed environment first.

Capability Analysis

Type: OpenClaw Skill Name: cnbblogs-pick Version: 1.0.0 The skill is a web scraper designed to fetch featured content from cnblogs.com. It uses standard command-line tools (curl, grep, sed) to extract article links and content, saving them to a local directory. The instructions in SKILL.md are transparent, align with the stated functionality, and contain no evidence of malicious intent, data exfiltration, or prompt injection.

Capability Assessment

✓ Purpose & Capability

Name/description (抓取博客园精华区文章) matches the SKILL.md: it describes using curl/grep/sed to fetch list pages, extract links, download articles, strip HTML and save them as text files. No unrelated services, credentials, or binaries are requested.

ℹ Instruction Scope

Instructions stay within the stated scraping purpose (download list pages, parse links, fetch article bodies, save to output dir). Caution: the skill prescribes HTML parsing via grep -oP/sed (regex-based parsing), which is brittle and may miss content or break on site changes. It will write files to the user's output directory (default under ~/.openclaw/workspace); ensure you don't point it at sensitive system directories. The SKILL.md indicates titles will be sanitized, but filename/overwrite handling is not fully specified.

✓ Install Mechanism

Instruction-only skill with no install spec and no code files — nothing is downloaded or written to disk by an installer. This is the lowest-risk install model.

✓ Credentials

The skill requests no environment variables or credentials. Declared runtime dependencies (curl, grep -oP, sed) are reasonable for command-line scraping. No unrelated secrets or config paths are asked for.

✓ Persistence & Privilege

always is false and the skill does not request persistent/system-wide changes. It does not modify other skills or agent settings. Normal autonomous invocation is allowed (default) but not exceptional here.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install cnbblogs-pick
After installation, invoke the skill by name or use /cnbblogs-pick
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

CNBLOGS 精华内容抓取技能 v1.0.0 - 初始版本发布 - 支持抓取博客园精华区单页内容 - 提取列表中所有文章的标题和正文 - 每篇文章保存为独立 txt 文件 - 支持自定义输出目录和 User-Agent

Metadata

Slug cnbblogs-pick

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is CNBLOGS 精华内容抓取?

抓取博客园精华区文章标题和正文，支持指定页数批量下载并保存为纯文本文件。 It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install CNBLOGS 精华内容抓取?

Run "/install cnbblogs-pick" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is CNBLOGS 精华内容抓取 free?

Yes, CNBLOGS 精华内容抓取 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does CNBLOGS 精华内容抓取 support?

CNBLOGS 精华内容抓取 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created CNBLOGS 精华内容抓取?

It is built and maintained by XWork (@againcrazycode); the current version is v1.0.0.

More Skills