← 返回 Skills 市场

Yandex Archive Scraper

Name: Yandex Archive Scraper
Author: flobo3

作者 Flo · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

108

总下载

当前安装

版本数

在 OpenClaw 中安装

/install yandex-archive-scraper

功能描述

Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling.

使用说明 (SKILL.md)

\r \r

yandex-archive-scraper\r

\r A powerful skill for searching and extracting data from Yandex.Archive (Яндекс.Архив) using Scrapling to bypass bot protection and Cloudflare Turnstile.\r \r

Features\r

Converts natural language queries into optimized Yandex.Archive search URLs.\r
Uses Scrapling (StealthyFetcher) to bypass Yandex bot protection.\r
Extracts search results (document titles, text snippets, and direct links).\r
Supports pagination to collect multiple pages of results.\r
Can search across all three Yandex.Archive indexes:\r
- archive (Архивы) — Metric books, revision tales, confessional statements.\r
- mass_media (Периодика) — Old newspapers (e.g., "Senate Gazette", "Provincial Gazette").\r
- directories (Справочники) — Address calendars, lists of residents, memorable books.\r \r

Tools\r

`yandex_archive_search`\r

Search Yandex.Archive based on a natural language query.\r Parameters:\r

query (string): The search query (e.g., "Александр Пушкин Москва").\r
index (string, optional): The index to search in. Options: archive (default), mass_media, directories.\r
max_pages (integer, optional): Maximum number of pages to scrape (default 1).\r \r

Requirements\r

scrapling\r
playwright\r
curl_cffi\r
patchright\r
msgspec\r
browserforge\r \r ---\r \r

yandex-archive-scraper (Русский)\r

\r Мощный скилл для поиска и извлечения данных из Яндекс.Архива с использованием фреймворка Scrapling для обхода защиты от ботов и Cloudflare Turnstile.\r \r

Возможности\r

Преобразует запросы на естественном языке в оптимизированные URL для поиска по Яндекс.Архиву.\r
Использует Scrapling (StealthyFetcher) для обхода защиты Яндекса.\r
Извлекает результаты поиска (названия документов, текстовые фрагменты/сниппеты и прямые ссылки).\r
Поддерживает пагинацию для сбора нескольких страниц результатов.\r
Умеет искать по всем трем базам Яндекс.Архива:\r
- archive (Архивы) — Метрические книги, ревизские сказки, исповедные ведомости.\r
- mass_media (Периодика) — Старые газеты (например, "Сенатские ведомости", "Губернские ведомости").\r
- directories (Справочники) — Адрес-календари, списки жителей, памятные книжки.\r \r

Инструменты (Tools)\r

`yandex_archive_search`\r

Поиск по Яндекс.Архиву на основе текстового запроса.\r Параметры:\r

query (string): Поисковый запрос (например, "Александр Пушкин Москва").\r
index (string, optional): Раздел для поиска. Варианты: archive (по умолчанию), mass_media, directories.\r
max_pages (integer, optional): Максимальное количество страниц для парсинга (по умолчанию 1).\r \r

Зависимости\r

scrapling\r
playwright\r
curl_cffi\r
patchright\r
msgspec\r
browserforge

安全使用建议

This skill appears internally consistent with its stated purpose: it fetches and parses Yandex.Archive pages and uses Scrapling/Playwright to avoid bot protections. Before installing, consider that: (1) bypassing bot protection may violate Yandex's terms of service or local law — ensure you have the right to scrape the target; (2) installing Playwright will download browser binaries and the listed Python packages (third-party code) which will run on your system — audit or sandbox the environment and verify package sources (PyPI project pages, authors); (3) run this skill in an isolated environment (container/VM) if you are concerned about third-party dependencies; and (4) no secrets are required by the skill itself, but if you modify it to integrate other services, re-evaluate requested credentials. If you want, I can list the exact package pages to review or suggest safer alternatives (site APIs, manual downloads, or permissioned data access).

功能分析

Type: OpenClaw Skill Name: yandex-archive-scraper Version: 1.0.0 The skill is a specialized web scraper designed to extract historical records from Yandex.Archive. The code in search.py and get_page.py uses the Scrapling library to bypass bot detection and parse search results from Yandex's internal JSON state or HTML. There is no evidence of data exfiltration, malicious execution, or prompt injection; the scripts are focused entirely on the stated purpose of archival research.

能力评估

✓ Purpose & Capability

Name/description (Yandex.Archive scraping, bypassing bot protection) align with the included Python scripts and declared dependencies (scrapling, playwright, etc.). The code only targets Yandex.Archive URLs and extracts site-specific JSON/HTML.

✓ Instruction Scope

SKILL.md and README instruct installing the listed Python packages and using StealthyFetcher to fetch archive pages. The runtime instructions and scripts stay focused on constructing search URLs, fetching pages, and parsing results; they do not read unrelated files or environment variables.

ℹ Install Mechanism

The package is instruction-first and contains code files but no formal install spec. README suggests pip installing several packages and running 'playwright install chromium' — this will download browser binaries and execute third-party code (scrapling, browserforge). That is expected for a scraper but increases runtime footprint and risk from third-party packages.

✓ Credentials

The skill requests no environment variables, no credentials, and accesses no system config paths. The lack of secret access is proportionate to a public-web scraping task.

✓ Persistence & Privilege

Skill does not request always:true, does not attempt to modify other skills or agent-wide settings, and requires no persistent credentials. Autonomous invocation is allowed (platform default) but not combined with additional privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install yandex-archive-scraper
安装完成后，直接呼叫该 Skill 的名称或使用 /yandex-archive-scraper 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial ClawHub publication: search Yandex.Archive (metric books, newspapers, directories) with bot protection bypass.

元数据

Slug yandex-archive-scraper

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题