← 返回 Skills 市场
97
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install cnbblogs-pick
功能描述
抓取博客园精华区文章标题和正文,支持指定页数批量下载并保存为纯文本文件。
使用说明 (SKILL.md)
CNBLOGS 精华内容抓取技能
功能描述
抓取博客园(cnblogs.com)精华区内容,支持分页、批量下载标题和正文。
使用方法
基本用法
# 抓取第 1 页,保存所有文章到指定目录
openclaw cnblogs-pick --page 1 --output-dir /path/to/output
# 抓取前 3 页,保存所有文章
openclaw cnblogs-pick --pages 3 --output-dir /path/to/output
# 抓取指定 URL 的精华列表
openclaw cnblogs-pick --url https://www.cnblogs.com/pick/ --pages 2
参数说明
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
--url |
string | 否 | https://www.cnblogs.com/pick/ | 精华列表页 URL |
--page |
int | 否 | 1 | 单页抓取页数(仅当 --pages 未指定时有效) |
--pages |
int | 否 | 1 | 总页数(优先于 --page) |
--output-dir |
string | 否 | ~/.openclaw/workspace/user_cnglobs/ | 输出目录 |
--agent |
string | 否 | Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:149.0) Gecko/20100101 Firefox/149.0 | User-Agent |
输出格式
每篇文章保存为独立文件,命名格式:
{标题}.txt
标题中的特殊字符会被替换为下划线。
工作流程
- 获取列表页:使用 curl 下载指定页数的精华列表
- 提取链接:解析 HTML,提取所有
post-item-title类链接 - 下载详情:逐个打开详情页面
- 提取正文:获取
cnblogs_post_body内容并去除 HTML 标签 - 保存文件:按标题命名保存到输出目录
示例
# 抓取前 5 页精华内容
openclaw cnblogs-pick --pages 5 --output-dir /tmp/cnb-pick
# 查看结果
ls -lh /tmp/cnb-pick/
依赖工具
curl- HTTP 请求grep -oP- Perl 正则表达式sed- 文本处理
注意事项
- 部分文章可能因反爬机制失败
- 大页面可能超出 token 限制
- 建议先测试单页再批量处理
更新日志
- v1.0.0: 初始版本,支持单页抓取
- v1.1.0: 支持多页批量抓取
- v1.2.0: 优化错误处理和日志输出
安全使用建议
This is an instruction-only scraper that will run shell commands (curl, grep, sed) to download and save CNBlogs 'pick' articles to a directory you choose. It does not ask for credentials or install code. Before running: (1) test on a single page and a non-sensitive output directory to confirm behavior; (2) avoid pointing output-dir at system or home configuration folders to prevent accidental overwrite; (3) be aware parsing uses regex (grep -oP) which may be brittle or incompatible with some grep builds — commands may need adjustment on your system; (4) consider site Terms of Service and rate limits — scraping can be blocked or disallowed; (5) if you want stronger safety, inspect or run the commands in a sandboxed environment first.
功能分析
Type: OpenClaw Skill
Name: cnbblogs-pick
Version: 1.0.0
The skill is a web scraper designed to fetch featured content from cnblogs.com. It uses standard command-line tools (curl, grep, sed) to extract article links and content, saving them to a local directory. The instructions in SKILL.md are transparent, align with the stated functionality, and contain no evidence of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
Name/description (抓取博客园精华区文章) matches the SKILL.md: it describes using curl/grep/sed to fetch list pages, extract links, download articles, strip HTML and save them as text files. No unrelated services, credentials, or binaries are requested.
Instruction Scope
Instructions stay within the stated scraping purpose (download list pages, parse links, fetch article bodies, save to output dir). Caution: the skill prescribes HTML parsing via grep -oP/sed (regex-based parsing), which is brittle and may miss content or break on site changes. It will write files to the user's output directory (default under ~/.openclaw/workspace); ensure you don't point it at sensitive system directories. The SKILL.md indicates titles will be sanitized, but filename/overwrite handling is not fully specified.
Install Mechanism
Instruction-only skill with no install spec and no code files — nothing is downloaded or written to disk by an installer. This is the lowest-risk install model.
Credentials
The skill requests no environment variables or credentials. Declared runtime dependencies (curl, grep -oP, sed) are reasonable for command-line scraping. No unrelated secrets or config paths are asked for.
Persistence & Privilege
always is false and the skill does not request persistent/system-wide changes. It does not modify other skills or agent settings. Normal autonomous invocation is allowed (default) but not exceptional here.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install cnbblogs-pick - 安装完成后,直接呼叫该 Skill 的名称或使用
/cnbblogs-pick触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
CNBLOGS 精华内容抓取技能 v1.0.0
- 初始版本发布
- 支持抓取博客园精华区单页内容
- 提取列表中所有文章的标题和正文
- 每篇文章保存为独立 txt 文件
- 支持自定义输出目录和 User-Agent
元数据
常见问题
CNBLOGS 精华内容抓取 是什么?
抓取博客园精华区文章标题和正文,支持指定页数批量下载并保存为纯文本文件。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 97 次。
如何安装 CNBLOGS 精华内容抓取?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install cnbblogs-pick」即可一键安装,无需额外配置。
CNBLOGS 精华内容抓取 是免费的吗?
是的,CNBLOGS 精华内容抓取 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
CNBLOGS 精华内容抓取 支持哪些平台?
CNBLOGS 精华内容抓取 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 CNBLOGS 精华内容抓取?
由 XWork(@againcrazycode)开发并维护,当前版本 v1.0.0。
推荐 Skills