← 返回 Skills 市场

hn-crawler

Name: hn-crawler
Author: drowning-in-codes

作者 proanimer · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

110

总下载

当前安装

版本数

在 OpenClaw 中安装

/install hn-crawler-cn

功能描述

爬取 https://hn.aimaker.dev/ 网站资讯，执行爬取->提取->整理->总结完整流程。Invoke when user wants to crawl news from hn.aimaker.dev or process web content through the full pipeline.

使用说明 (SKILL.md)

HN 资讯爬虫 Skill

本 Skill 用于爬取 https://hn.aimaker.dev/ 网站的资讯内容，并通过完整的处理流程将原始数据转化为结构化的总结报告。

工作流程

整个处理流程分为四个阶段：

┌─────────┐    ┌──────────┐    ┌──────────┐    ┌───────────┐
│  Crawl  │ -> │ Extract  │ -> │ Organize │ -> │ Summarize │
│  爬取   │    │  提取    │    │  整理    │    │  总结     │
└─────────┘    └──────────┘    └──────────┘    └───────────┘

1. Crawl（爬取）

脚本: scripts/crawl.py
功能: 使用 HTTP 请求获取网页原始 HTML 内容
输出: data/raw/hn_aimaker_\x3Ctimestamp>.html

2. Extract（提取）

脚本: scripts/extract.py
功能: 解析 HTML，提取文章标题、链接、摘要、发布时间等信息
输出: data/extracted/articles_\x3Ctimestamp>.json

3. Organize（整理）

脚本: scripts/organize.py
功能: 对提取的数据进行清洗、去重、分类和格式化
输出: data/organized/articles_organized_\x3Ctimestamp>.json

4. Summarize（总结）

脚本: scripts/summarize.py
功能: 生成摘要报告，包括热点话题统计、趋势分析等
输出: data/summary/summary_\x3Ctimestamp>.md

快速开始

安装依赖

cd .trae/skills/hn-crawler/scripts
pip install -r requirements.txt

运行完整流程

# 方法1：逐个执行
python scripts/crawl.py
python scripts/extract.py
python scripts/organize.py
python scripts/summarize.py

# 方法2：一键执行完整流程
python scripts/run_pipeline.py

目录结构

.trae/skills/hn-crawler/
├── SKILL.md                    # 本文件
├── scripts/
│   ├── requirements.txt        # Python 依赖
│   ├── crawl.py               # 爬取脚本
│   ├── extract.py             # 提取脚本
│   ├── organize.py            # 整理脚本
│   ├── summarize.py           # 总结脚本
│   └── run_pipeline.py        # 一键运行完整流程
└── data/                      # 数据输出目录（自动创建）
    ├── raw/                   # 原始 HTML
    ├── extracted/             # 提取的 JSON 数据
    ├── organized/             # 整理后的数据
    └── summary/               # 总结报告

数据格式

提取后的文章格式 (JSON)

{
  "articles": [
    {
      "title": "文章标题",
      "url": "https://example.com/article",
      "summary": "文章摘要",
      "published_at": "2024-01-15T10:30:00",
      "source": "hn.aimaker.dev",
      "category": "AI",
      "score": 150
    }
  ],
  "metadata": {
    "crawled_at": "2024-01-15T12:00:00",
    "total_count": 30
  }
}

配置选项

各脚本支持以下环境变量或命令行参数：

TARGET_URL: 目标 URL（默认: https://hn.aimaker.dev/）
OUTPUT_DIR: 输出目录（默认: data/）
TIMEOUT: 请求超时时间（默认: 30秒）

注意事项

请遵守网站的 robots.txt 和爬虫协议
建议设置适当的请求间隔，避免对服务器造成压力
爬取的数据仅供个人学习研究使用

安全使用建议

This skill appears internally consistent for crawling and processing hn.aimaker.dev content. Before installing or running: 1) Inspect the code locally (you already have the files); there are syntax/typing bugs (e.g., in organize.py) that must be fixed for the pipeline to run. 2) Follow robots.txt and rate-limit requests to avoid abusive crawling. 3) When running pip install -r requirements.txt, review which packages and versions will be installed (PyPI packages are common but carry supply-chain risk). 4) Run the skill in a sandbox or non-critical environment first (it writes files to data/). 5) If you need higher assurance, request the full, untruncated source for final review or ask the author to provide a fixed release with tests and an explicit provenance/homepage.

功能分析

Type: OpenClaw Skill Name: hn-crawler-cn Version: 1.0.0 The skill bundle implements a standard web crawling and data processing pipeline for the website hn.aimaker.dev. The scripts (crawl.py, extract.py, organize.py, summarize.py) use well-known libraries like requests and BeautifulSoup to fetch and parse content, and run_pipeline.py orchestrates the workflow using subprocess.run in a safe manner. There is no evidence of data exfiltration, malicious execution, or prompt injection intended to subvert the agent's behavior.

能力评估

✓ Purpose & Capability

Name/description match the provided scripts and SKILL.md. The package contains crawl/extract/organize/summarize scripts and a run_pipeline orchestrator which all operate on the stated site (default TARGET_URL is https://hn.aimaker.dev/). There are no unrelated required binaries or environment variables.

ℹ Instruction Scope

SKILL.md and the scripts limit actions to HTTP GET requests to the target site, parsing HTML, local file read/write under data/, and generating summaries. Declared environment variables (TARGET_URL, OUTPUT_DIR, TIMEOUT) are used. The code does not reference other system credentials, config paths, or external endpoints beyond normal HTTP requests to the target URL. Note: some source files (organize.py) contain syntax/typing errors that will prevent successful execution until fixed; this is a functionality issue rather than a security misdirection.

ℹ Install Mechanism

There is no automated install spec; SKILL.md instructs the user to run pip install -r requirements.txt. Installing packages from PyPI is normal but carries the usual supply-chain risk (verify package versions and trust). No downloads from arbitrary URLs or archive extraction steps are present in the skill itself.

✓ Credentials

The skill does not request credentials or secrets. The only environment variables used (TARGET_URL, OUTPUT_DIR, TIMEOUT) are proportional and documented. Scripts operate on local output directories and do not exfiltrate data to unlisted remote endpoints.

✓ Persistence & Privilege

The skill is not marked always:true and does not attempt to modify other skills or system-level agent configuration. It does not request permanent presence or elevated privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install hn-crawler-cn
安装完成后，直接呼叫该 Skill 的名称或使用 /hn-crawler-cn 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of hn-crawler skill. - Implements a complete pipeline to crawl, extract, organize, and summarize news from https://hn.aimaker.dev/. - Provides Python scripts for each processing stage and a one-click pipeline runner. - Outputs structured JSON, cleaned/organized data, and markdown summary reports. - Configurable via environment variables and CLI arguments for target URL, output directory, and timeout. - Includes detailed documentation on workflow, data format, and usage.

元数据

Slug hn-crawler-cn

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

hn-crawler 是什么？

爬取 https://hn.aimaker.dev/ 网站资讯，执行爬取->提取->整理->总结完整流程。Invoke when user wants to crawl news from hn.aimaker.dev or process web content through the full pipeline. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 110 次。

如何安装 hn-crawler？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install hn-crawler-cn」即可一键安装，无需额外配置。

hn-crawler 是免费的吗？

是的，hn-crawler 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

hn-crawler 支持哪些平台？

hn-crawler 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 hn-crawler？

由 proanimer（@drowning-in-codes）开发并维护，当前版本 v1.0.0。