← 返回 Skills 市场
flobo3

Yandex Archive Scraper

作者 Flo · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
108
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install yandex-archive-scraper
功能描述
Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling.
使用说明 (SKILL.md)

\r \r

yandex-archive-scraper\r

\r A powerful skill for searching and extracting data from Yandex.Archive (Яндекс.Архив) using Scrapling to bypass bot protection and Cloudflare Turnstile.\r \r

Features\r

  • Converts natural language queries into optimized Yandex.Archive search URLs.\r
  • Uses Scrapling (StealthyFetcher) to bypass Yandex bot protection.\r
  • Extracts search results (document titles, text snippets, and direct links).\r
  • Supports pagination to collect multiple pages of results.\r
  • Can search across all three Yandex.Archive indexes:\r
    • archive (Архивы) — Metric books, revision tales, confessional statements.\r
    • mass_media (Периодика) — Old newspapers (e.g., "Senate Gazette", "Provincial Gazette").\r
    • directories (Справочники) — Address calendars, lists of residents, memorable books.\r \r

Tools\r

\r

yandex_archive_search\r

Search Yandex.Archive based on a natural language query.\r Parameters:\r

  • query (string): The search query (e.g., "Александр Пушкин Москва").\r
  • index (string, optional): The index to search in. Options: archive (default), mass_media, directories.\r
  • max_pages (integer, optional): Maximum number of pages to scrape (default 1).\r \r

Requirements\r

  • scrapling\r
  • playwright\r
  • curl_cffi\r
  • patchright\r
  • msgspec\r
  • browserforge\r \r ---\r \r

yandex-archive-scraper (Русский)\r

\r Мощный скилл для поиска и извлечения данных из Яндекс.Архива с использованием фреймворка Scrapling для обхода защиты от ботов и Cloudflare Turnstile.\r \r

Возможности\r

  • Преобразует запросы на естественном языке в оптимизированные URL для поиска по Яндекс.Архиву.\r
  • Использует Scrapling (StealthyFetcher) для обхода защиты Яндекса.\r
  • Извлекает результаты поиска (названия документов, текстовые фрагменты/сниппеты и прямые ссылки).\r
  • Поддерживает пагинацию для сбора нескольких страниц результатов.\r
  • Умеет искать по всем трем базам Яндекс.Архива:\r
    • archive (Архивы) — Метрические книги, ревизские сказки, исповедные ведомости.\r
    • mass_media (Периодика) — Старые газеты (например, "Сенатские ведомости", "Губернские ведомости").\r
    • directories (Справочники) — Адрес-календари, списки жителей, памятные книжки.\r \r

Инструменты (Tools)\r

\r

yandex_archive_search\r

Поиск по Яндекс.Архиву на основе текстового запроса.\r Параметры:\r

  • query (string): Поисковый запрос (например, "Александр Пушкин Москва").\r
  • index (string, optional): Раздел для поиска. Варианты: archive (по умолчанию), mass_media, directories.\r
  • max_pages (integer, optional): Максимальное количество страниц для парсинга (по умолчанию 1).\r \r

Зависимости\r

  • scrapling\r
  • playwright\r
  • curl_cffi\r
  • patchright\r
  • msgspec\r
  • browserforge
安全使用建议
This skill appears internally consistent with its stated purpose: it fetches and parses Yandex.Archive pages and uses Scrapling/Playwright to avoid bot protections. Before installing, consider that: (1) bypassing bot protection may violate Yandex's terms of service or local law — ensure you have the right to scrape the target; (2) installing Playwright will download browser binaries and the listed Python packages (third-party code) which will run on your system — audit or sandbox the environment and verify package sources (PyPI project pages, authors); (3) run this skill in an isolated environment (container/VM) if you are concerned about third-party dependencies; and (4) no secrets are required by the skill itself, but if you modify it to integrate other services, re-evaluate requested credentials. If you want, I can list the exact package pages to review or suggest safer alternatives (site APIs, manual downloads, or permissioned data access).
功能分析
Type: OpenClaw Skill Name: yandex-archive-scraper Version: 1.0.0 The skill is a specialized web scraper designed to extract historical records from Yandex.Archive. The code in search.py and get_page.py uses the Scrapling library to bypass bot detection and parse search results from Yandex's internal JSON state or HTML. There is no evidence of data exfiltration, malicious execution, or prompt injection; the scripts are focused entirely on the stated purpose of archival research.
能力评估
Purpose & Capability
Name/description (Yandex.Archive scraping, bypassing bot protection) align with the included Python scripts and declared dependencies (scrapling, playwright, etc.). The code only targets Yandex.Archive URLs and extracts site-specific JSON/HTML.
Instruction Scope
SKILL.md and README instruct installing the listed Python packages and using StealthyFetcher to fetch archive pages. The runtime instructions and scripts stay focused on constructing search URLs, fetching pages, and parsing results; they do not read unrelated files or environment variables.
Install Mechanism
The package is instruction-first and contains code files but no formal install spec. README suggests pip installing several packages and running 'playwright install chromium' — this will download browser binaries and execute third-party code (scrapling, browserforge). That is expected for a scraper but increases runtime footprint and risk from third-party packages.
Credentials
The skill requests no environment variables, no credentials, and accesses no system config paths. The lack of secret access is proportionate to a public-web scraping task.
Persistence & Privilege
Skill does not request always:true, does not attempt to modify other skills or agent-wide settings, and requires no persistent credentials. Autonomous invocation is allowed (platform default) but not combined with additional privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install yandex-archive-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /yandex-archive-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial ClawHub publication: search Yandex.Archive (metric books, newspapers, directories) with bot protection bypass.
元数据
Slug yandex-archive-scraper
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Yandex Archive Scraper 是什么?

Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 108 次。

如何安装 Yandex Archive Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install yandex-archive-scraper」即可一键安装,无需额外配置。

Yandex Archive Scraper 是免费的吗?

是的,Yandex Archive Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Yandex Archive Scraper 支持哪些平台?

Yandex Archive Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Yandex Archive Scraper?

由 Flo(@flobo3)开发并维护,当前版本 v1.0.0。

💬 留言讨论