← Back to Skills Marketplace
lb1121

Chinese Ebook Downloader

by lb1121 · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
145
Downloads
0
Stars
0
Active Installs
7
Versions
Install in OpenClaw
/install chinese-ebook-downloader
Description
Download Chinese-language ebooks from multiple sources with automatic A→B→C fallback. Primary source: online book library with ~100% coverage, no daily limit...
README (SKILL.md)

Chinese Ebook Downloader

Download Chinese ebooks from multiple sources with automatic fallback and format conversion.

Quick Start

# Single book download (multi-source fallback)
python scripts/download_book.py --title "超越百岁" --author "彼得·阿提亚"

# Multi-source batch download (A→B→C fallback + EPUB→PDF conversion)
python scripts/multi_source_download.py ~/Books/

# Search Anna's Archive directly
python scripts/search_source_c.py "书名" "作者"

# Convert EPUB to PDF
python scripts/epub_to_pdf.py book.epub book.pdf

Download Sources (Priority Order)

Source Coverage Limit Notes
Source A (online book library) ~100% None Primary — high coverage for popular Chinese books
Source B (secondary library) ~8% None Fallback for missing titles
Source C (Anna's Archive) Wide Rate-limited Last resort — uses libgen.li mirrors

Note: Z-Library has been deprecated due to 10/day download limit.

Multi-Source Fallback

The multi_source_download.py script automatically tries sources in order:

Source A → Source B → Source C → EPUB→PDF Conversion

Workflow per book:

  1. Try Source A (ZIP → extract PDF/EPUB)
  2. If failed, try Source B (file host download)
  3. If failed, try Source C (Anna's Archive via libgen.li)
  4. If only EPUB found, auto-convert to PDF using weasyprint

Usage:

# Edit BOOKS list in script, then run:
python scripts/multi_source_download.py ~/Books/

EPUB → PDF Conversion

When only EPUB format is available, auto-convert using weasyprint:

# Single file
python scripts/epub_to_pdf.py input.epub output.pdf

# Batch convert directory
python scripts/epub_to_pdf.py --batch ~/Books/

Requirements: ebooklib, weasyprint, CJK fonts installed.

Scripts Reference

Script Purpose
download_book.py Primary download from Source A
search_secondary_source.py Source B search & download
search_source_c.py Anna's Archive search & download
batch_download.py Batch download from JSON list
multi_source_download.py Multi-source A→B→C fallback
epub_to_pdf.py EPUB/MOBI to PDF conversion
anna_iso_batch.sh Anna's Archive isolated batch (one process per book)

Source A Workflow (Primary)

Search → Get file host link → Decrypt → Wait countdown → API fetch → curl download → Extract ZIP

Step 1: Search

Search the primary library for the book title. Navigate to download page, extract file host URL and password.

Step 2: Decrypt

Navigate to file host URL, enter password, click decrypt.

Step 3: Wait for countdown

File hosting service requires countdown before download. Do not skip.

Step 4: Fetch real download URL

Get page variables:

JSON.stringify({api_server, userid, file_id, share_id, file_chk, start_time, wait_seconds, verifycode})

Call API:

(async () => {
  var url = api_server + '/get_file_url.php?uid=' + userid
    + '&fid=' + file_id + '&folder_id=0&share_id=' + share_id
    + '&file_chk=' + file_chk + '&start_time=' + start_time
    + '&wait_seconds=' + wait_seconds + '&mb=0&app=0&acheck=0'
    + '&verifycode=' + verifycode + '&rd=' + Math.random();
  var headers = typeof getAjaxHeaders === 'function' ? getAjaxHeaders() : {};
  var resp = await fetch(url, {headers: headers});
  return JSON.stringify(await resp.json());
})()

Response code: 200downurl is real URL.

Step 5: Download

curl -L -o "book.zip" "DOWNURL" \
  -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
  --max-time 1200

Step 6: Extract ZIP (GBK encoding)

import zipfile
with zipfile.ZipFile('book.zip', 'r') as z:
    for info in z.infolist():
        try:
            name = info.filename.encode('cp437').decode('gbk')
        except:
            name = info.filename
        ext = os.path.splitext(name)[1].lower()
        if ext in ('.epub', '.azw3', '.mobi', '.pdf', '.txt'):
            data = z.read(info.filename)
            with open(os.path.basename(name), 'wb') as f:
                f.write(data)

Book Name Matching Strategy

When a book title is long or contains multiple names (e.g. box sets):

  • Removes subtitles (after ":" or ":")
  • Removes parenthetical content ("(...)", "(...)")
  • Removes "套装共X册" bundle descriptions
  • Splits "+"-connected titles into individual books
  • Tries each keyword until match found
  • Falls back to full title + author

Examples:

  • "杨定一全部生命系列:真原医+静坐+好睡(套装3册)" → tries "真原医", "静坐", "好睡"
  • "超越百岁:长寿的科学与艺术" → tries "超越百岁", then "超越百岁 彼得·阿提亚"

Format Selection

Flag Description
--format pdf PDF only (default, preferred for NotebookLM)
--format epub EPUB only
--format mobi MOBI only
--format azw3 AZW3 only
--format any Accept any available format

Batch Download

python scripts/batch_download.py --book-list books.json --output-dir ~/Books/

JSON format:

[
  {"title": "超越百岁", "file_url": "\x3Cfile_host_url>", "password": "\x3Cpassword>"}
]

Features: resume via _progress.json, skip existing, rate limiting.

Troubleshooting

Problem Solution
IP blocking Use browser tool, not web_fetch
Link 404 Link expired, re-search
API non-200 Re-navigate and re-decrypt
Download is HTML URL expired, fresh API call needed
ZIP filenames garbled Use Python cp437→gbk, not unzip
Timeout on large files Increase --max-time to 1200
Anna's Archive blocked Try different mirror, use anna_iso_batch.sh
Usage Guidance
Key points before you install/use: - Functional fit: The skill appears to do what it claims (automated ebook search/download + conversion). The included scripts perform browser automation, decrypt file-host pages, call APIs, and download files. - Missing/declarative mismatches: The registry says 'no required binaries' but the code needs: Playwright (and a browser runtime), Python packages (playwright, ebooklib, weasyprint), system tools (curl, unzip, file), and CJK fonts. Several scripts contain a hard-coded Python interpreter path (/opt/homebrew/...), which will likely fail on other systems. Expect to manually install dependencies and edit paths. - Security surface: The automation executes JavaScript extracted from third-party pages (page.evaluate), launches headless browsers, and executes shell commands (curl, unzip, subprocess.run). That is expected for this downloader but increases risk: a malicious or compromised download page could trigger unexpected network requests or server‑side interactions. To reduce risk, run this skill only in an isolated environment (container, VM, or dedicated machine), review the code paths that call page.evaluate and subprocess.run, and avoid running with elevated privileges. - Legal and policy: The skill is designed to retrieve ebooks from sites and file hosts (including Anna's Archive/libgen mirrors). That may conflict with copyright law or your organization's acceptable-use policy. Confirm legality and policy compliance before using. - Practical recommendations: - Install and test dependencies in a sandbox (virtualenv/conda, container). Follow README for Playwright setup. - Replace or remove hard-coded PYTHON paths and verify environment values (SOURCE_* variables) point to expected hosts. - Inspect and, if desired, restrict network access for the process (e.g., block outbound except to known sources) when testing. - If you want to use it as an OpenClaw skill, add an explicit install step and declare required binaries and env vars so the runtime can validate prerequisites. If you want, I can: list the exact files/lines that reference hard-coded paths and subprocess calls, extract all external hostnames the code references, or generate a minimal checklist of the packages/commands to install to run this safely in a container.
Capability Analysis
Type: OpenClaw Skill Name: chinese-ebook-downloader Version: 2.0.0 The bundle provides automated ebook downloading from various Chinese sources using Playwright for browser automation and subprocess calls for file handling. It is classified as suspicious due to high-risk patterns: executing remote JavaScript via 'page.evaluate' to bypass file-host protections and passing externally sourced URLs and filenames to shell commands (curl, unzip) in 'download_book.py' and 'multi_source_download.py'. While these behaviors align with the stated purpose of scraping and downloading, they represent a significant attack surface for command injection if the targeted sites or search results are compromised. No explicit evidence of intentional data exfiltration or persistence was found.
Capability Assessment
Purpose & Capability
Name/description align with the included scripts: the code implements multi‑source search, browser automation (Playwright), decrypting file hosts, curl downloads, ZIP extraction and EPUB→PDF conversion. However the package metadata claims 'no required binaries' and 'no required env vars' while the code clearly depends on external tooling and libraries (Playwright, curl, unzip, file, weasyprint/ebooklib, CJK fonts) and even references a hard‑coded Python interpreter path. This mismatch between declared requirements and what the code needs is an incoherence the user must resolve.
Instruction Scope
SKILL.md and the scripts instruct the agent (and the user) to automate browser interactions: enter passwords, wait countdowns, extract JS variables from pages and run JS via page.evaluate to call file-host APIs, then download with curl and extract files. Those steps are within the downloader's purpose, but the automation intentionally executes extracted/constructed JS in a browser context and runs arbitrary downloads and subprocesses. That increases the attack surface (malicious remote pages could cause unexpected network activity). The instructions do not ask the agent to read unrelated system files or credentials.
Install Mechanism
There is no install spec (instruction-only install), but the bundle includes many Python scripts which require installing dependencies manually. README mentions Playwright and pip packages, but the registry metadata declared no required binaries. Several scripts assume system binaries exist (curl, unzip, file) and one shell script and multiple Python scripts hard-code an absolute PYTHON path (/opt/homebrew/.../env9/bin/python) and paths under ~/.openclaw/workspace — these are brittle and incoherent with a cross‑platform skill. Lack of an install step means users may run these scripts with missing dependencies or unexpected interpreter versions.
Credentials
The skill does not declare required environment variables in the registry manifest, but the README and code reference optional env vars (SOURCE_A_BASE_URL, SOURCE_B_BASE_URL, FILE_HOST_BASE_URL, EBOOK_DEFAULT_PASSWORD). These are reasonable for configuring source hosts and a default extraction password. The skill does not request unrelated secrets (AWS keys, tokens). Still, default passwords and host base URLs can be changed via env; ensure you don't accidentally set sensitive values there.
Persistence & Privilege
The skill is not always-enabled and will not autonomously be force‑included in all agent runs (always: false). It does not modify other skills or global agent settings. It does read and write files in user directories (/tmp and under the user's home) which is expected for a downloader.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install chinese-ebook-downloader
  3. After installation, invoke the skill by name or use /chinese-ebook-downloader
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
Major upgrade: Adds multi-source fallback, EPUB→PDF conversion, Anna's Archive support, and new scripts. - Introduced automatic multi-source (A→B→C) download fallback with `multi_source_download.py` - Added direct Anna's Archive (libgen.li) search and download as last-resort source - Implemented EPUB to PDF auto-conversion using `epub_to_pdf.py` and weasyprint - Deprecated Z-Library due to strict download limits; Anna's Archive replaces as tertiary source - Added new scripts: `multi_source_download.py`, `epub_to_pdf.py`, `search_source_c.py`, `anna_async_download.py`, `anna_batch_download.py`, `anna_iso_batch.sh` - Updated documentation: streamlined workflows, expanded troubleshooting, and modernized usage instructions
v1.2.0
**Format selection support added for downloads.** - Added `--format` flag to both download scripts for selecting file format (PDF, EPUB, MOBI, AZW3, or any). - Downloads extract only the requested format from ZIP archives; multi-source fallback when format unavailable. - Failure now reports which formats are available from each source. - Batch download supports global format selection via command-line argument. - Updated documentation to explain format flags and workflow enhancements.
v1.1.1
- Minor updates to scripts/download_book.py with no user-facing changes documented. - No SKILL.md or feature updates in this version.
v1.1.0
**Improved search and matching for books with complex titles or box sets.** - Added intelligent book name extraction and matching: removes subtitles, parentheticals, bundle descriptions, and splits multi-book titles for more accurate results. - Updated documentation with new "Book Name Matching Strategy" and examples. - No breaking changes to batch or single download usage. - Internal logic improvements to scripts for improved searching with challenging book titles.
v1.0.2
renamed search_yabook to search_secondary_source, removed hardcoded passwords, cleaned git history
v1.0.1
**Generalization and terminology update for v1.0.1** - Replaced hardcoded sources (domain names, passwords) with generic terms and descriptions throughout documentation. - Updated all instance of "ctfile" and "dushupai" to "file hosting service" and "online book library" for flexibility. - Adjusted example commands, code snippets, JSON inputs, and tables to use generalized parameter names. - Refined feature and troubleshooting descriptions to avoid single-source references. - No changes to core functionality or scripts beyond terminology updates.
v1.0.0
Initial release of Chinese Ebook Downloader. - Download Chinese-language ebooks from multiple free sources, primarily dushupai.com (supports ctfile.com/城通网盘, nearly full coverage of popular books, no daily limit). - Handles ctfile password decryption, 60-second countdown, JS API extraction for real download URL, and GBK-encoded ZIP extraction. - Supports batch and single-book download in PDF, EPUB, MOBI, and AZW3 formats. - Secondary sources: yabook.org and Z-Library as fallbacks. - Troubleshooting guidelines for common download and extraction issues included.
Metadata
Slug chinese-ebook-downloader
Version 2.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 7
Frequently Asked Questions

What is Chinese Ebook Downloader?

Download Chinese-language ebooks from multiple sources with automatic A→B→C fallback. Primary source: online book library with ~100% coverage, no daily limit... It is an AI Agent Skill for Claude Code / OpenClaw, with 145 downloads so far.

How do I install Chinese Ebook Downloader?

Run "/install chinese-ebook-downloader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Chinese Ebook Downloader free?

Yes, Chinese Ebook Downloader is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Chinese Ebook Downloader support?

Chinese Ebook Downloader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Chinese Ebook Downloader?

It is built and maintained by lb1121 (@lb1121); the current version is v2.0.0.

💬 Comments