← Back to Skills Marketplace
54meteor

article-tts

by 退役前写代码的 · GitHub ↗ · v2.1.0 · MIT-0
cross-platform ✓ Security Clean
189
Downloads
1
Stars
0
Active Installs
8
Versions
Install in OpenClaw
/install article-tts
Description
拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TT...
README (SKILL.md)

Article TTS Skill

Default Configuration

参数 默认值 说明
lang en 语言:enzh
skipConfirmation false 是否跳过文字确认步骤
speed 90% TTS 语速(--rate=-10% = 90%)
voice en-US-EmmaNeural(英文)/ zh-CN-XiaoxiaoNeural(中文) TTS 声音
splitSentences false 是否生成按句拆分的音频

Supported Languages

语言 OCR 语言包 TTS Voice
en eng(预装) en-US-EmmaNeural
zh chi_sim(需安装) zh-CN-XiaoxiaoNeural

中文 OCR 语言包安装:

  • Linux(WSL/Debian/Ubuntu):apt-get install tesseract-ocr-chi-sim
  • macOS:brew install tesseract-lang(自带中文)
  • Windows:下载 chi_sim.traineddata 放入 Tesseract 安装目录的 tessdata 文件夹

Workflow

Input Types

  • 图片:OCR 提取文字(需要 lang 指定语言)
  • 纯文字:直接 TTS,无需 OCR

Standard Flow(默认,需确认)

图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送
文字 → 直接生成 TTS → 发送

Skip-Confirmation Flow ⚠️

用户说"不需要确认"或"直接生成"时,跳过确认步骤。

⚠️ 安全提示:skipConfirmation 会跳过文字确认步骤,OCR 提取的文本(可能包含敏感信息)会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭(skipConfirmation: false)。

OCR Step

# 图片预处理
from PIL import Image, ImageOps
img = Image.open(image_path)
img = ImageOps.autocontrast(img.convert('L'), cutoff=10)
w, h = img.size
img = img.resize((w*4, h*4), Image.LANCZOS)
img.save('/tmp/ocr_input.jpg', quality=99)
# 英文
tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4

# 中文
tesseract /tmp/ocr_input.jpg stdout -l chi_sim --psm 4

TTS Step

全文字频

uvx edge-tts \
  -t "FULL TEXT" \
  -v en-US-EmmaNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

# 中文
uvx edge-tts \
  -t "中文文字内容" \
  -v zh-CN-XiaoxiaoNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

按句拆分(仅 splitSentences=true)

import subprocess, re

def split_sentences(text, lang='en'):
    if lang == 'zh':
        # 中文按句号/感叹号/问号拆分
        sentences = re.split(r'(?\x3C=[。!?])\s*', text)
    else:
        # 英文按 .!? 拆分
        sentences = re.split(r'(?\x3C=[.!?])\s+', text)
    return [s.strip() for s in sentences if s.strip()]

sentences = split_sentences(text, lang=lang)
for i, sentence in enumerate(sentences, 1):
    num = str(i).zfill(2)
    voice = 'zh-CN-XiaoxiaoNeural' if lang == 'zh' else 'en-US-EmmaNeural'
    subprocess.run([
        "uvx", "edge-tts",
        "-t", sentence,
        "-v", voice,
        "--rate=-10%",
        "--write-media", f"OUTPUT_DIR/sentence_{num}.mp3"
    ])

Output Directory

/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/
├── original_text.md
├── full_article.mp3
└── sentence_01.mp3 ...

Sending via Message Channel

The agent detects the active channel from the runtime context and calls message(...) accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.

# Detect active channel automatically (from runtime inbound metadata)
# channel is inferred: feishu / telegram / discord / whatsapp / signal / imessage / openclaw-weixin

# 发送全文
message(action="send", channel="{active_channel}",
        message="📄 全文音频",
        media="PATH/full_article.mp3",
        filename="full_article.mp3")

# 发送每句
for i, sentence in enumerate(sentences, 1):
    num = str(i).zfill(2)
    message(action="send", channel="{active_channel}",
            message=f"📝 {num}: {sentence}",
            media=f"PATH/sentence_{num}.mp3",
            filename=f"sentence_{num}.mp3")

Channel Behavior Notes

Channel 音频支持 备注
Feishu 推荐使用 feishu-voice-send skill 发送语音消息
Telegram 直接发送 mp3
Discord 作为附件发送
WhatsApp 直接发送 mp3
Signal ⚠️ 取决于信号强度,可能不支持
iMessage ⚠️ 通过 macOS 发送,mp3 兼容性一般
WeChat Work 同 Feishu

If the channel does not support audio, the agent saves the file to OUTPUT_DIR and sends the file path as a text message instead.


如何发送为语音消息(而非附件)

重要说明: OpenClaw 内置的飞书媒体发送存在 bug(缺少 duration 参数),导致 .ogg 文件有时显示为附件而非语音消息。

推荐方案:使用 feishu-voice-send skill

该 skill 调用飞书官方 API,正确传递 duration 参数,确保语音消息正常显示。

方式一:通过 feishu-voice-send skill 发送

# 发送现有的 .ogg 文件
python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/send_voice.py \
    /path/to/audio.ogg \
    \x3C接收者open_id>

# 或直接生成 TTS 并发送
python3 /mnt/d/wslspace/workspace/skills/feishu-voice-send/scripts/tts_and_send.py \
    "要转换的文字" \
    \x3C接收者open_id> \
    -v zh-CN-YunjianNeural \
    -r -10%

方式二:手动调用(不推荐)

如果必须使用 OpenClaw 内置的 message 工具,需要:

  1. 将 mp3 转换为标准 Ogg Opus 格式
  2. 发送时必须带 message 参数
  3. 注意:即使带 message 参数,仍可能因为缺少 duration 而显示为附件
# 1. 用 edge-tts 生成 mp3
uvx edge-tts \
  -t "Your text here" \
  -v en-US-EmmaNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/voice.mp3

# 2. 用 ffmpeg 转换为标准 Ogg Opus
ffmpeg -i OUTPUT_DIR/voice.mp3 \
  -c:a libopus \
  -b:a 32k \
  -ar 24000 \
  -ac 1 \
  OUTPUT_DIR/voice.ogg

# 3. 使用 message 工具发送(仍可能显示为附件)
message(action="send", channel="feishu", \
        message="📄 语音", \
        media="OUTPUT_DIR/voice.ogg")

Available TTS Voices

English

en-US-EmmaNeural, en-US-BrianNeural, en-GB-LibbyNeural, ...

Chinese

zh-CN-XiaoxiaoNeural(女声), zh-CN-YunxiNeural(男声), zh-CN-YunyangNeural(新闻男声), ...

查看完整列表:uvx edge-tts -l | grep "zh-CN"

Notes

  • Tesseract + English 预装;中文需 apt-get install tesseract-ocr-chi-sim
  • edge-tts 通过 uvx 运行,无需安装
  • 图片质量直接影响 OCR 效果,尽量保持光线充足、角度端正
Usage Guidance
This skill appears to do what it says: OCR images or accept text, then produce Edge TTS audio and send it over the active channel. Before installing/using it: 1) Be cautious with skipConfirmation — don’t enable it for images that may contain private data. 2) Expect an apt-get step to install tesseract and a first-run network download (uvx will auto-fetch edge-tts); if you need stricter supply-chain control, preinstall and vet the edge-tts package source. 3) The docs include absolute example paths and an example call to another skill/script (feishu-voice-send) — verify those scripts exist and review them before executing. 4) Run the skill in a sandbox or test environment first if you are uncertain about auto-downloaded components. If you want a stronger assurance, ask the publisher for explicit sources/URLs for uvx/edge-tts and the feishu helper script, or request a packaged release rather than instruction-only steps.
Capability Assessment
Purpose & Capability
The name/description (image OCR + Edge TTS) matches the declared runtime steps and required tools: Tesseract for OCR, Python + Pillow for image preprocessing, and uvx/edge-tts for TTS. Requiring tessdata and language packs for Chinese OCR is expected. No unexplained external credentials or unrelated binaries are requested.
Instruction Scope
Instructions stay within the stated task: preprocess image, run tesseract, produce text, optionally split into sentences, and call edge-tts via uvx. Two things to note: (1) skipConfirmation is explicitly warned as a privacy risk because it will convert OCR output (which may contain sensitive data) directly to audio; (2) the doc includes examples that run other scripts by absolute path (e.g., a feishu-voice-send script under /mnt/d/wslspace/...), which assumes local files/skills exist and could execute arbitrary code if present. The skill does not request extra env vars and relies on OpenClaw's message(...) tool for channel delivery.
Install Mechanism
This is instruction-only (no packaged install). The SKILL.md suggests apt-get to install tesseract and language packs (standard). It also relies on uvx auto-downloading edge-tts on first run — an implicit network fetch of code at runtime. That auto-download is reasonable for convenience but is a higher-risk action than purely using already-installed binaries because it pulls remote package(s) dynamically.
Credentials
No environment variables or credentials are requested; the skill defers to OpenClaw's channel authentication. This is proportionate for a messaging-forwarding TTS skill. Caveat: forwarding via other skills (e.g., feishu-voice-send) may require credentials/configuration outside this skill.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It is a runtime instruction-only skill and does not modify other skills or require persistent configuration changes.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install article-tts
  3. After installation, invoke the skill by name or use /article-tts
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.1.0
更新飞书语音消息发送说明:推荐使用 feishu-voice-api skill
v2.0.0
添加飞书语音消息发送说明:mp3需转ogg格式才能显示为语音消息
v1.1.0
Add requires/install/credentials sections; clarify multi-channel support; add security warnings
v1.0.4
- Added support for direct text input: generate TTS audio from plain text without OCR. - Updated documentation to clarify input types (photo or text) and corresponding flows. - No code or functional changes; only SKILL.md was modified for clearer instructions.
v1.0.3
- Updated installation instructions for the Chinese Tesseract OCR language pack, adding steps for Linux, macOS, and Windows. - No functionality changes; documentation only.
v1.0.2
- Added bilingual support: now supports both English and Chinese (requires chi_sim language pack for Chinese OCR). - Users can select TTS voice, language, and speed; defaults are set for each language. - The workflow, OCR steps, and TTS instructions updated to include Chinese usage and configuration. - Documentation reorganized for clarity, including available TTS voices and language setup instructions. - Minor wording, formatting, and usability improvements.
v1.0.1
- Added Chinese explanation to the skill description for better accessibility. - No changes to code samples or functionality; documentation description improved only.
v1.0.0
Initial release of article-tts skill. - Extracts English article text from photos using OCR (Tesseract). - Supports configurable workflow: optional text confirmation step before generating audio. - Converts extracted text to natural TTS audio via Microsoft Edge TTS. - Allows splitting audio by sentences on user request. - Complete image pre-processing for improved OCR accuracy. - Works seamlessly with Feishu for both input (photo) and output (audio).
Metadata
Slug article-tts
Version 2.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 8
Frequently Asked Questions

What is article-tts?

拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TT... It is an AI Agent Skill for Claude Code / OpenClaw, with 189 downloads so far.

How do I install article-tts?

Run "/install article-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is article-tts free?

Yes, article-tts is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does article-tts support?

article-tts is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created article-tts?

It is built and maintained by 退役前写代码的 (@54meteor); the current version is v2.1.0.

💬 Comments