← Back to Skills Marketplace
sunfirehw

TXT电子书清洗修复

by SunYin · GitHub ↗ · v4.1.0 · MIT-0
cross-platform ⚠ suspicious
125
Downloads
0
Stars
1
Active Installs
5
Versions
Install in OpenClaw
/install good-txt-to-hwreader
Description
清理和修复盗版 txt 电子书中的乱码、广告和排版问题。支持 AI 增强模式,可智能识别非标准广告、修复复杂乱码、识别非标准章节格式。触发词:txt清理、电子书修复、去广告、修乱码、排版修复、清理txt、修复电子书、txt乱码、txt广告。
README (SKILL.md)

Good Txt To Hwreader

将盗版 txt 电子书转换为干净、规范的阅读格式。

✨ v4.0 新增 AI 增强功能:智能广告识别、复杂乱码修复、非标准章节识别

触发关键词

用户可以通过以下方式触发此技能:

关键词 示例
txt清理 帮我清理这个txt文件
电子书修复 修复这本电子书
去广告 去掉txt里的广告
修乱码 修复txt乱码
排版修复 修复txt排版
txt乱码 这个txt有乱码
txt广告 txt里好多广告

处理模式

本技能支持三种处理模式,用户可根据需求选择:

模式 速度 准确率 AI 功能 适用场景
fast ⚡ 最快 85% 全部关闭 快速预览、大批量处理
balanced 🔄 平衡 92% 广告+乱码 日常使用(默认)
thorough 🎯 最准 98% 全部开启 重要文件、复杂乱码

使用方式

清理这个txt文件(使用thorough模式)
用fast模式清理这本电子书

用户输入方式

方式一:直接指定文件路径

清理 /path/to/book.txt
修复电子书 ~/Downloads/novel.txt

方式二:模糊搜索手机 txt 文件

当用户说:

  • 清理一本txt(未指定具体文件)
  • 帮我找个txt文件清理一下
  • 修复某个电子书

执行步骤

  1. 使用 search_file 工具搜索用户设备上的 txt 文件
  2. 列出匹配的文件供用户选择
  3. 用户确认后执行清理

方式三:搜索关键词

清理包含"斗破"的txt
修复文件名有"修仙"的电子书

执行步骤

  1. 使用 search_file 工具按关键词搜索
  2. 列出匹配结果
  3. 用户选择后执行清理

依赖

  • Python 3.6+
  • chardet 库:pip install chardet
  • PyYAML 库:pip install pyyaml(AI 增强模式)
  • requests 库:pip install requests(AI 增强模式)

处理流程

阶段一:文件获取

  1. 使用 search_file 搜索用户手机上的 txt 文件
  2. 使用 upload_file 上传到云端获取 URL
  3. 使用 curl 下载到工作目录

阶段二:清理修复

规则引擎处理(所有模式)

  1. 编码检测 — 自动识别 GBK/UTF-8/GB2312 等编码
  2. 广告清理 — 匹配 40+ 种常见广告模式
  3. 乱码修复 — 替换 30+ 种常见乱码字符
  4. 排版规范化 — 统一章节标题、段落格式

AI 增强处理(balanced/thorough 模式)

  1. 智能广告识别 — LLM 识别变体广告、软广、新平台广告
  2. 复杂乱码修复 — LLM 根据上下文推断正确字符
  3. 智能章节识别 — LLM 识别非标准章节格式(仅 thorough 模式)

阶段三:输出结果

  1. 发送文件给用户 — 使用 send_file_to_user 发送清理后的文件
  2. 输出修复报告 — 以简洁的 md 表格展示修复结果

输出报告

清理完成后,助手会解析脚本输出,生成简洁表格:

# txt 清理报告

## 基本信息

| 项目 | 结果 |
|------|------|
| 原文长度 | 199,044 字符 |
| 清理后长度 | 198,702 字符 |
| 移除内容 | 342 字符 (0.17%) |
| 处理模式 | balanced |
| AI 增强 | 已启用 |

## 清理详情

| 项目 | 数量 |
|------|------|
| 广告清理 | 5 处 |
| 乱码修复 | 12 处 |
| 章节识别 | 50 个 |

## 性能统计

| 项目 | 数值 |
|------|------|
| 处理时间 | 2.35 秒 |
| LLM 调用次数 | 3 次 |

Resources

scripts/

  • clean_txt.py — 规则引擎清理脚本
  • ai_enhanced_cleaner.py — AI 增强清理脚本(主入口)
  • ai_modules/ — AI 增强模块
    • ad_detector.py — 广告识别模块
    • mojibake_fixer.py — 乱码修复模块
    • chapter_parser.py — 章节识别模块
  • utils/ — 工具模块
    • llm_client.py — LLM 客户端封装

config/

  • ai_config.yaml — AI 增强配置文件

references/

  • ads_patterns.md — 常见广告模式列表
  • mojibake_patterns.md — 常见乱码映射表
  • learned_mojibake_rules.json — 学习到的乱码规则(自动生成)

assets/

  • chapter_template.txt — 标准章节格式模板

完整示例

示例一:规则引擎模式(fast)

用户: 用fast模式清理三体txt文件

执行流程:

1. search_file(query="三体 txt")
   → 找到: /storage/.../三体.txt

2. upload_file(fileInfos=[{"mediaUri": "file://docs/..."}])
   → 获取公网URL

3. curl -o "三体.txt" "URL"
   → 下载到工作目录

4. python3 scripts/ai_enhanced_cleaner.py -m fast "三体.txt"
   → 生成: 三体_清理版.txt

5. send_file_to_user(fileLocalUrls=["三体_清理版.txt"])
   → 发送清理后的文件给用户

示例二:AI 增强模式(balanced)

用户: 清理这个txt文件,有乱码

执行流程:

1. search_file(query="txt")
   → 列出文件供用户选择

2. upload_file + curl
   → 下载文件

3. python3 scripts/ai_enhanced_cleaner.py -m balanced "book.txt"
   → 规则引擎预处理
   → AI 广告识别
   → AI 乱码修复
   → 规则引擎后处理
   → 生成: book_清理版.txt

4. send_file_to_user + 报告

示例三:深度清理模式(thorough)

用户: 用thorough模式清理这本小说,章节格式很乱

执行流程:

1. 获取文件

2. python3 scripts/ai_enhanced_cleaner.py -m thorough "novel.txt"
   → 规则引擎预处理
   → AI 广告识别
   → AI 乱码修复
   → AI 章节识别与规范化
   → 规则引擎后处理
   → 生成: novel_清理版.txt + novel_清理版_报告.md

3. 发送文件和报告

AI 增强功能详解

1. 智能广告识别

功能 说明
变体广告 识别故意添加干扰字符的广告
软广 识别伪装成正文的推广内容
新平台广告 无需预定义规则即可识别
批量处理 10 个段落一批,减少 API 调用

2. 复杂乱码修复

功能 说明
上下文推断 根据语义推断正确字符
规则学习 高置信度修复自动保存为新规则
分级处理 规则优先,AI 补充

3. 智能章节识别

功能 说明
非标准格式 识别各种变体章节标题
结构分析 分析全文结构,提取章节列表
标题规范化 统一章节标题格式

常见问题

问题 原因 解决方案
文件过大 超过 10MB 分卷处理或提示用户
编码无法识别 特殊编码 尝试多种编码,使用 errors='replace'
乱码过多 编码错误 使用 thorough 模式进行 AI 修复
章节识别不准 格式不规范 使用 thorough 模式进行 AI 识别
处理速度慢 AI 模式 使用 fast 模式或 balanced 模式

配置说明

配置文件位于 config/ai_config.yaml,可自定义:

# 处理模式
mode: "balanced"  # fast / balanced / thorough

# AI 功能开关
ai_enhancement:
  ad_detection:
    enabled: true
    confidence_threshold: 0.8
  mojibake_fix:
    enabled: true
    confidence_threshold: 0.7
    auto_learn: true
  chapter_detection:
    enabled: false

# LLM 配置
llm:
  provider: "xiaoyi"
  model: "glm-4-flash"

重要说明

📖 一键导入书架

收到清理后的文件后,您可以:

  1. 在聊天中长按文件
  2. 选择"分享"
  3. 选择"华为阅读"

即可一键导入书架,享受修复完美的阅读体验!


技能版本: 4.1.0 (广告+乱码规则全面扩展,LLM子会话集成) 更新时间: 2026-03-29

版本历史

详见 CHANGELOG.md

Usage Guidance
This skill appears to implement the advertised TXT cleaning functionality, but it will: (1) search your device for TXT files, (2) upload selected files to get a public URL, and (3) send text to an LLM via a bundled llm_client. Before installing or running: review scripts/utils/llm_client.py to see exactly which endpoints and credentials are used; confirm where upload_file sends files (cloud provider/URL) and whether that meets your privacy/copyright constraints; if you must avoid sending content off-device, run only the rule-based 'fast' mode locally or run the Python scripts in a sandbox you control; ensure required Python and pip packages are installed from trusted sources; and be cautious about automatic learning/persistence files (learned_mojibake_rules.json, logs) which may store excerpts of processed text. If you need a safer setup, request a version that guarantees local-only processing (no uploads/subagent LLM calls) and documents dependency installation.
Capability Analysis
Type: OpenClaw Skill Name: good-txt-to-hwreader Version: 4.1.0 The skill bundle is a comprehensive tool for cleaning and formatting TXT e-books, utilizing both regex-based rules and AI-enhanced processing. It implements sophisticated logic to detect ads, fix encoding errors (mojibake), and normalize chapter structures. The AI features are powered by a sub-agent communication pattern, calling the 'openclaw' CLI via subprocess in 'llm_client.py' to access LLM capabilities. The code is well-structured, follows its stated purpose, and contains no evidence of malicious intent, data exfiltration, or unauthorized system access.
Capability Assessment
Purpose & Capability
Name/description match the included scripts: the repo contains rule-based cleaners and AI modules for ad detection, mojibake fixing, and chapter parsing — these are appropriate for a txt-cleaning skill. Minor inconsistency: SKILL metadata lists no required binaries/env, yet SKILL.md and scripts assume Python 3.x and third‑party libraries (chardet, pyyaml, requests).
Instruction Scope
Runtime instructions explicitly instruct the agent to search the user's device for TXT files (search_file), upload selected files to a cloud URL (upload_file) and then use curl to download them into the working directory, and to call an LLM for AI-enhanced processing. Those steps necessarily transmit user file content outside the local environment; while needed to process arbitrary user files, the instructions do not clearly document where uploads/LLM requests are sent or what privacy protections apply.
Install Mechanism
There is no install spec (instruction-only), which lowers supply-chain risk, but the skill includes many Python scripts and a non-trivial LLM client (scripts/utils/llm_client.py). Dependencies are listed in SKILL.md but not enforced by the registry metadata. The absence of an install step means the runtime environment (Python libs, correct versions) is assumed rather than provisioned — potential runtime failures or silent use of system Python.
Credentials
The skill declares no required environment variables or credentials, yet it will perform LLM calls (via a bundled llm_client) and upload files to a cloud URL. LLM client behavior and upload target are determined by config (ai_config.yaml uses 'openclaw-subagent' by default) and by the unshown llm_client implementation; this can lead to unadvertised network endpoints and possible exfiltration of file contents or sensitive text. No explicit API key or endpoint is declared in the skill metadata.
Persistence & Privilege
always:false and no modifications of other skills or global agent settings were observed. The skill writes learned rules and logs to local files (e.g., learned_mojibake_rules.json, ai_enhancement.log) which is expected for a learning/cleanup tool but should be considered when evaluating disk storage/privacy.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install good-txt-to-hwreader
  3. After installation, invoke the skill by name or use /good-txt-to-hwreader
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v4.1.0
v4.1.0 规则扩展+LLM集成版:广告规则100+条、乱码规则150+条、集成OpenClaw子会话LLM、5000词保护词库、智能缓存66%+命中率
v1.3.0
更新技能包
v1.2.0
新增华为阅读导入支持;补充乱码/广告规则参考文档;优化章节模板
v1.1.0
新增华为阅读导入支持;补充乱码/广告规则参考文档;优化章节模板
v1.0.0
首次发布:修复乱码、去除广告、整理排版
Metadata
Slug good-txt-to-hwreader
Version 4.1.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 5
Frequently Asked Questions

What is TXT电子书清洗修复?

清理和修复盗版 txt 电子书中的乱码、广告和排版问题。支持 AI 增强模式,可智能识别非标准广告、修复复杂乱码、识别非标准章节格式。触发词:txt清理、电子书修复、去广告、修乱码、排版修复、清理txt、修复电子书、txt乱码、txt广告。 It is an AI Agent Skill for Claude Code / OpenClaw, with 125 downloads so far.

How do I install TXT电子书清洗修复?

Run "/install good-txt-to-hwreader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is TXT电子书清洗修复 free?

Yes, TXT电子书清洗修复 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does TXT电子书清洗修复 support?

TXT电子书清洗修复 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created TXT电子书清洗修复?

It is built and maintained by SunYin (@sunfirehw); the current version is v4.1.0.

💬 Comments