← Back to Skills Marketplace
flobo3

Yandex Archive Scraper

by Flo · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
108
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install yandex-archive-scraper
Description
Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling.
README (SKILL.md)

\r \r

yandex-archive-scraper\r

\r A powerful skill for searching and extracting data from Yandex.Archive (Яндекс.Архив) using Scrapling to bypass bot protection and Cloudflare Turnstile.\r \r

Features\r

  • Converts natural language queries into optimized Yandex.Archive search URLs.\r
  • Uses Scrapling (StealthyFetcher) to bypass Yandex bot protection.\r
  • Extracts search results (document titles, text snippets, and direct links).\r
  • Supports pagination to collect multiple pages of results.\r
  • Can search across all three Yandex.Archive indexes:\r
    • archive (Архивы) — Metric books, revision tales, confessional statements.\r
    • mass_media (Периодика) — Old newspapers (e.g., "Senate Gazette", "Provincial Gazette").\r
    • directories (Справочники) — Address calendars, lists of residents, memorable books.\r \r

Tools\r

\r

yandex_archive_search\r

Search Yandex.Archive based on a natural language query.\r Parameters:\r

  • query (string): The search query (e.g., "Александр Пушкин Москва").\r
  • index (string, optional): The index to search in. Options: archive (default), mass_media, directories.\r
  • max_pages (integer, optional): Maximum number of pages to scrape (default 1).\r \r

Requirements\r

  • scrapling\r
  • playwright\r
  • curl_cffi\r
  • patchright\r
  • msgspec\r
  • browserforge\r \r ---\r \r

yandex-archive-scraper (Русский)\r

\r Мощный скилл для поиска и извлечения данных из Яндекс.Архива с использованием фреймворка Scrapling для обхода защиты от ботов и Cloudflare Turnstile.\r \r

Возможности\r

  • Преобразует запросы на естественном языке в оптимизированные URL для поиска по Яндекс.Архиву.\r
  • Использует Scrapling (StealthyFetcher) для обхода защиты Яндекса.\r
  • Извлекает результаты поиска (названия документов, текстовые фрагменты/сниппеты и прямые ссылки).\r
  • Поддерживает пагинацию для сбора нескольких страниц результатов.\r
  • Умеет искать по всем трем базам Яндекс.Архива:\r
    • archive (Архивы) — Метрические книги, ревизские сказки, исповедные ведомости.\r
    • mass_media (Периодика) — Старые газеты (например, "Сенатские ведомости", "Губернские ведомости").\r
    • directories (Справочники) — Адрес-календари, списки жителей, памятные книжки.\r \r

Инструменты (Tools)\r

\r

yandex_archive_search\r

Поиск по Яндекс.Архиву на основе текстового запроса.\r Параметры:\r

  • query (string): Поисковый запрос (например, "Александр Пушкин Москва").\r
  • index (string, optional): Раздел для поиска. Варианты: archive (по умолчанию), mass_media, directories.\r
  • max_pages (integer, optional): Максимальное количество страниц для парсинга (по умолчанию 1).\r \r

Зависимости\r

  • scrapling\r
  • playwright\r
  • curl_cffi\r
  • patchright\r
  • msgspec\r
  • browserforge
Usage Guidance
This skill appears internally consistent with its stated purpose: it fetches and parses Yandex.Archive pages and uses Scrapling/Playwright to avoid bot protections. Before installing, consider that: (1) bypassing bot protection may violate Yandex's terms of service or local law — ensure you have the right to scrape the target; (2) installing Playwright will download browser binaries and the listed Python packages (third-party code) which will run on your system — audit or sandbox the environment and verify package sources (PyPI project pages, authors); (3) run this skill in an isolated environment (container/VM) if you are concerned about third-party dependencies; and (4) no secrets are required by the skill itself, but if you modify it to integrate other services, re-evaluate requested credentials. If you want, I can list the exact package pages to review or suggest safer alternatives (site APIs, manual downloads, or permissioned data access).
Capability Analysis
Type: OpenClaw Skill Name: yandex-archive-scraper Version: 1.0.0 The skill is a specialized web scraper designed to extract historical records from Yandex.Archive. The code in search.py and get_page.py uses the Scrapling library to bypass bot detection and parse search results from Yandex's internal JSON state or HTML. There is no evidence of data exfiltration, malicious execution, or prompt injection; the scripts are focused entirely on the stated purpose of archival research.
Capability Assessment
Purpose & Capability
Name/description (Yandex.Archive scraping, bypassing bot protection) align with the included Python scripts and declared dependencies (scrapling, playwright, etc.). The code only targets Yandex.Archive URLs and extracts site-specific JSON/HTML.
Instruction Scope
SKILL.md and README instruct installing the listed Python packages and using StealthyFetcher to fetch archive pages. The runtime instructions and scripts stay focused on constructing search URLs, fetching pages, and parsing results; they do not read unrelated files or environment variables.
Install Mechanism
The package is instruction-first and contains code files but no formal install spec. README suggests pip installing several packages and running 'playwright install chromium' — this will download browser binaries and execute third-party code (scrapling, browserforge). That is expected for a scraper but increases runtime footprint and risk from third-party packages.
Credentials
The skill requests no environment variables, no credentials, and accesses no system config paths. The lack of secret access is proportionate to a public-web scraping task.
Persistence & Privilege
Skill does not request always:true, does not attempt to modify other skills or agent-wide settings, and requires no persistent credentials. Autonomous invocation is allowed (platform default) but not combined with additional privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install yandex-archive-scraper
  3. After installation, invoke the skill by name or use /yandex-archive-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial ClawHub publication: search Yandex.Archive (metric books, newspapers, directories) with bot protection bypass.
Metadata
Slug yandex-archive-scraper
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Yandex Archive Scraper?

Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling. It is an AI Agent Skill for Claude Code / OpenClaw, with 108 downloads so far.

How do I install Yandex Archive Scraper?

Run "/install yandex-archive-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Yandex Archive Scraper free?

Yes, Yandex Archive Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Yandex Archive Scraper support?

Yandex Archive Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Yandex Archive Scraper?

It is built and maintained by Flo (@flobo3); the current version is v1.0.0.

💬 Comments