← 返回 Skills 市场
gaojizhou

EPUB reader

作者 gaojizhou · GitHub ↗ · v1.0.0
cross-platform ✓ 安全检测通过
467
总下载
0
收藏
3
当前安装
1
版本数
在 OpenClaw 中安装
/install epub
功能描述
Use this skill whenever the user wants to read, parse, extract content from, modify, or otherwise process an .epub file. Triggers include any mention of ".ep...
使用说明 (SKILL.md)

EPUB Processing Guide

Core Insight: EPUB is a ZIP Archive

An .epub file is simply a ZIP archive with a specific internal structure. The most reliable way to process any epub is:

  1. Copy the file to the working directory
  2. Rename it from .epub.zip
  3. Unzip it into a folder
  4. Find and read the navigation/TOC file first (e.g. nav.xhtml, nav.html, toc.ncx)
  5. Then read content files as needed

This approach works 100% of the time and requires no special epub libraries.


Step-by-Step Workflow

Step 1: Extract the EPUB

# Copy uploaded file to working directory
cp /mnt/user-data/uploads/book.epub /home/claude/book.epub

# Rename to .zip and extract
cp /home/claude/book.epub /home/claude/book.zip
unzip -o /home/claude/book.zip -d /home/claude/book_extracted/

# List the extracted contents
find /home/claude/book_extracted/ -type f | sort

Step 2: Find the Navigation File (Highest Priority)

The navigation file is the table of contents — it tells you the book's structure, chapter order, and file layout. Always find and read this first.

# Look for nav files (in priority order)
find /home/claude/book_extracted/ -type f \( \
  -name "nav.xhtml" -o \
  -name "nav.html" -o \
  -name "toc.ncx" -o \
  -name "*nav*" -o \
  -name "*toc*" \
\) | sort

Nav file priority order:

  1. nav.xhtml or nav.html — EPUB3 navigation document (preferred)
  2. toc.ncx — EPUB2 navigation control file (older format)
  3. Any file with "nav" or "toc" in its name
# Read the nav file to understand structure
cat /home/claude/book_extracted/OEBPS/nav.xhtml
# or
cat /home/claude/book_extracted/EPUB/nav.html

Step 3: Find the OPF Package File

The .opf file (Open Packaging Format) contains metadata and the full reading order manifest.

# Find the OPF file
find /home/claude/book_extracted/ -name "*.opf" | head -5

# Read it for metadata and spine (reading order)
cat /home/claude/book_extracted/OEBPS/content.opf

The \x3Cspine> element in the OPF file defines chapter reading order. The \x3Cmetadata> block has title, author, language, etc.

Step 4: Read Content Files

# Find all HTML/XHTML content files
find /home/claude/book_extracted/ -type f \( -name "*.html" -o -name "*.xhtml" \) | sort

# Read a specific chapter
cat /home/claude/book_extracted/OEBPS/chapter01.xhtml

To extract clean text from HTML content:

from bs4 import BeautifulSoup

with open("/home/claude/book_extracted/OEBPS/chapter01.xhtml", "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f.read(), "html.parser")
    
# Remove script/style tags
for tag in soup(["script", "style"]):
    tag.decompose()

text = soup.get_text(separator="\
", strip=True)
print(text)

Typical EPUB Directory Structure

book_extracted/
├── mimetype                    ← Must contain "application/epub+zip"
├── META-INF/
│   └── container.xml           ← Points to the OPF file
└── OEBPS/   (or EPUB/, or OPS/)
    ├── content.opf             ← Package manifest + metadata + spine
    ├── nav.xhtml               ← ★ TABLE OF CONTENTS (read this first!)
    ├── toc.ncx                 ← Older TOC format (EPUB2)
    ├── chapter01.xhtml
    ├── chapter02.xhtml
    ├── ...
    ├── images/
    │   └── cover.jpg
    ├── css/
    │   └── styles.css
    └── fonts/

Reading container.xml to find the OPF path

cat /home/claude/book_extracted/META-INF/container.xml

This file always points to the root OPF file via \x3Crootfile full-path="...">.


Common Tasks

Extract All Text (Full Book)

import os
from bs4 import BeautifulSoup

extracted_dir = "/home/claude/book_extracted/OEBPS"
output_text = []

# Get ordered list of content files from OPF spine (or just sort them)
html_files = sorted([
    f for f in os.listdir(extracted_dir)
    if f.endswith((".html", ".xhtml")) and "nav" not in f.lower()
])

for filename in html_files:
    filepath = os.path.join(extracted_dir, filename)
    with open(filepath, "r", encoding="utf-8", errors="ignore") as f:
        soup = BeautifulSoup(f.read(), "html.parser")
    for tag in soup(["script", "style", "head"]):
        tag.decompose()
    text = soup.get_text(separator="\
", strip=True)
    output_text.append(f"\
\
--- {filename} ---\
\
{text}")

full_text = "\
".join(output_text)
with open("/mnt/user-data/outputs/book_full_text.txt", "w", encoding="utf-8") as f:
    f.write(full_text)

Extract Metadata

import xml.etree.ElementTree as ET

tree = ET.parse("/home/claude/book_extracted/OEBPS/content.opf")
root = tree.getroot()

# Namespace handling
ns = {
    "opf": "http://www.idpf.org/2007/opf",
    "dc":  "http://purl.org/dc/elements/1.1/"
}

metadata = root.find("opf:metadata", ns)
if metadata is not None:
    title   = metadata.findtext("dc:title",    namespaces=ns)
    author  = metadata.findtext("dc:creator",  namespaces=ns)
    lang    = metadata.findtext("dc:language", namespaces=ns)
    pub     = metadata.findtext("dc:publisher",namespaces=ns)
    date    = metadata.findtext("dc:date",     namespaces=ns)
    print(f"Title:     {title}")
    print(f"Author:    {author}")
    print(f"Language:  {lang}")
    print(f"Publisher: {pub}")
    print(f"Date:      {date}")

Parse Table of Contents from nav.xhtml

from bs4 import BeautifulSoup

with open("/home/claude/book_extracted/OEBPS/nav.xhtml", "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f.read(), "html.parser")

# Find the nav element with epub:type="toc"
nav = soup.find("nav", attrs={"epub:type": "toc"}) or soup.find("nav")

if nav:
    print("=== Table of Contents ===")
    for a in nav.find_all("a"):
        print(f"  {a.get_text(strip=True)}  →  {a.get('href', '')}")

Parse TOC from toc.ncx (EPUB2)

import xml.etree.ElementTree as ET

tree = ET.parse("/home/claude/book_extracted/OEBPS/toc.ncx")
root = tree.getroot()
ns = {"ncx": "http://www.daisy.org/z3986/2005/ncx/"}

print("=== Table of Contents (NCX) ===")
for navpoint in root.findall(".//ncx:navPoint", ns):
    label = navpoint.findtext("ncx:navLabel/ncx:text", namespaces=ns)
    src   = navpoint.find("ncx:content", ns)
    href  = src.get("src") if src is not None else ""
    print(f"  {label}  →  {href}")

Extract Cover Image

# Find the cover image
find /home/claude/book_extracted/ -type f \( \
  -name "cover*" -o -name "*cover*" \
\) | grep -iE "\.(jpg|jpeg|png|gif|webp)$"
import shutil

# Copy cover to output
shutil.copy(
    "/home/claude/book_extracted/OEBPS/images/cover.jpg",
    "/mnt/user-data/outputs/cover.jpg"
)

Repack a Modified EPUB

If you've edited files inside the extracted folder and want to repack:

cd /home/claude/book_extracted/

# mimetype MUST be first and uncompressed
zip -0 -X /home/claude/modified_book.epub mimetype

# Add everything else
zip -r /home/claude/modified_book.epub . --exclude mimetype

# Copy to output
cp /home/claude/modified_book.epub /mnt/user-data/outputs/modified_book.epub

Quick Reference

Goal File to Read Tool
Understand structure META-INF/container.xml → OPF path cat / xml.etree
Table of contents nav.xhtml or nav.html (EPUB3) BeautifulSoup
Table of contents (old) toc.ncx (EPUB2) xml.etree
Book metadata *.opf \x3Cmetadata> block xml.etree
Reading order *.opf \x3Cspine> block xml.etree
Chapter text *.xhtml / *.html in OEBPS/ BeautifulSoup
Cover image images/cover.* or OPF \x3Citem properties="cover-image"> shutil.copy

Required Python Packages

pip install beautifulsoup4 lxml --break-system-packages

unzip is available by default on the system. No special epub library is needed.


Troubleshooting

"No nav file found" — Try find . -name "*.xhtml" -o -name "*.html" | xargs grep -l "epub:type" 2>/dev/null to locate the navigation doc.

Encoding errors — Always use encoding="utf-8", errors="ignore" when opening HTML/XML files from epubs.

Namespace issues in XML — EPUB uses multiple XML namespaces. When using xml.etree, always pass the ns dict to find/findall, or use {namespace_uri}tagname syntax directly.

Unusual directory layout — Check META-INF/container.xml first; it always provides the canonical path to the root OPF file, regardless of directory naming conventions.

安全使用建议
This skill appears coherent for reading and extracting EPUBs, but check a few things before using it: - Ensure the runtime has BeautifulSoup (bs4) and any Python deps the examples use. - Be careful extracting untrusted EPUBs: the instructions use unzip without sanitizing paths — malicious ZIP entries can write outside the target folder (zip-slip). Prefer a safe extraction routine (validate paths, use Python's zipfile with path checks, or inspect archive contents first). - Confirm the hard-coded paths (/mnt/user-data/uploads, /home/claude, /mnt/user-data/outputs) match your environment or update them before running. - The skill reads and writes files but does not request network or credentials; if you plan to process sensitive books, ensure your host enforces appropriate access controls. If you need higher assurance, request the maintainer to add safe-extract guidance and list required runtime dependencies in SKILL.md.
功能分析
Type: OpenClaw Skill Name: epub Version: 1.0.0 The skill is designed to process EPUB files, which involves standard operations like unzipping, parsing XML/HTML, and re-packaging. All file system operations are confined to the expected input (`/mnt/user-data/uploads/`), working (`/home/claude/`), and output (`/mnt/user-data/outputs/`) directories. There is no evidence of data exfiltration, unauthorized network calls, persistence mechanisms, or malicious prompt injection attempts. The use of `unzip` on user-provided files, while potentially a source of vulnerabilities like Zip Slip in a less secure environment, is a necessary and transparent action for the skill's stated purpose and does not indicate malicious intent by the skill author.
能力评估
Purpose & Capability
The name/description (EPUB reader) match the instructions: copying an .epub, treating it as a zip, extracting nav/opf/content files, and using HTML parsing to extract text/metadata. Nothing requested (no env vars, no installs) is out of scope for EPUB processing.
Instruction Scope
Instructions are concrete and focused on EPUB extraction. However, they include hard-coded paths (/mnt/user-data/uploads, /home/claude, /mnt/user-data/outputs) which are platform-specific, and they recommend using unzip -o without addressing archive safety (e.g., zip-slip/path traversal). They also show Python examples using bs4 but do not document that the environment needs BeautifulSoup installed.
Install Mechanism
Instruction-only skill with no install spec and no external downloads—lowest install risk.
Credentials
The skill requests no environment variables or credentials. It does read and write filesystem locations (uploads and outputs) which is expected for file-processing, but those accesses should be validated by the host system's sandboxing/policy.
Persistence & Privilege
always is false and the skill does not request persistent or elevated platform presence. It does not modify other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install epub
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /epub 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the epub skill. - Enables extraction, parsing, and modification of .epub ebook files. - Uses robust unzip-then-parse workflow for reliable processing. - Supports reading chapters, table of contents, metadata, text, and images. - Allows conversion, inspection, and editing of epub content. - Handles both EPUB2 and EPUB3 file structures.
元数据
Slug epub
版本 1.0.0
许可证
累计安装 3
当前安装数 3
历史版本数 1
常见问题

EPUB reader 是什么?

Use this skill whenever the user wants to read, parse, extract content from, modify, or otherwise process an .epub file. Triggers include any mention of ".ep... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 467 次。

如何安装 EPUB reader?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install epub」即可一键安装,无需额外配置。

EPUB reader 是免费的吗?

是的,EPUB reader 完全免费(开源免费),可自由下载、安装和使用。

EPUB reader 支持哪些平台?

EPUB reader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 EPUB reader?

由 gaojizhou(@gaojizhou)开发并维护,当前版本 v1.0.0。

💬 留言讨论