Description

将某个话题或者网页内容总结合成为播客音频（Podcast）。基于火山引擎豆包语音播客合成协议生成最终音频。

README (SKILL.md)

Podcast Skill

Name: Byted Podcast Gen
Author: volcengine-skills

基于火山引擎豆包语音合成 WebSocket 协议（PodcastTTS，/api/v3/sami/podcasttts）将某个话题合成为播客音频并保存为本地文件。支持：

输入一句话题文本或者一个网页地址（也可以是个文件下载地址，支持 pdf/word/txt 格式）生成播客
原样输出播客音频下载链接（不要做截断等处理）和生成好的本地文件供下载。验证下载链接是否可下载，若可下载则返回给用户，不可下载的只是只返回本地文件。
输出播客分段文本（JSON）

适用场景

用户提到 生成播客 或 播客合成 等相关关键词。
用户需要为某个话题生成播客形式的音频文件。
用户需要某个网页或文件内容生成播客形式的音频文件。
用户需要为用户上传的文件内容或者一个长上下文生成播客形式的音频文件。

强制规则（最高优先级）

当你收到用户请求生成播客时：

必须且只能使用 本 Skill 的脚本来生成播客
话题模式 用户需要为某个话题生成播客形式的音频文件，使用参数 action=4 和 prompt_text = 话题文本。
网页模式 用户需要某个网页或可下载文件内容生成播客形式的音频文件，使用参数 action=0 和 input_url = 网页地址或文件下载地址。
文件模式 用户需要为用户上传的文件内容或者一个长上下文生成播客形式的音频文件，使用参数 action=0 和 text = 用户上传文件读取出来的内容或者是一段比较长的文本，一般超过 200 个字。

使用步骤

分析用户需要合成播客的内容，准备要合成的输入：prompt_text（原始话题，一般不超过 20 个字）或 input_url（网页地址或文件下载地址）或者 text（用户上传文件读取出来的内容或者是一个比较长的文本，一般超过 200 个字）。
运行脚本前先 cd 到本技能目录：skills/byted-podcast-gen。
配置鉴权（环境变量或命令行参数）。
执行脚本：python scripts/podcast.py [参数]。参考下面示例部分。
根据脚本输出的 JSON 里的 audio_path / texts / audio_url 使用生成结果，如果有 audio_url 是一个带过期时间的 URL, 原封不动的返回给用户, audio_path 是本地文件路径, 可以给用户提供下载。

脚本参数

参数	必填	说明
`--text`	否	输入原始长文本（`action=0` 时使用）
`--input_url`	否	输入文本的 URL（`action=0` 时使用，二选一）
`--prompt_text`	否	提示词文本（`action=4` 时必填）
`--action`	否	播客类型：`0`(原始文本/URL)、`4`(prompt)；默认 `4`
`--speaker_info`	否	说话人配置 JSON（默认 `{"random_order":false}`）
`--encoding`	否	音频格式：`mp3`（默认）、`wav`、`ogg_opus`
`--output`	否	最终音频输出文件路径（默认自动生成到 `output/`）

返回值说明

脚本输出 JSON，包含：

status: "success" 或 "error"
task_id: 任务标识（用于定位一次生成任务）
audio_path: 最终音频本地路径
texts: 分段文本 JSON 字符串，每个发音人对应的文本列表。
audio_url: 服务端返回的音频下载地址
error: 失败时的错误信息

错误处理

若报错提示缺少 MODEL_SPEECH_API_KEY：检查环境变量或命令行参数是否已配置，不存在的时候提示用户输入, 然后设置到环境变量。
若收到服务端错误（MsgType.Error）：根据错误信息检查账号权限、资源 ID、输入内容及是否已开通服务。
若收到服务端错误包含关键字 quota 说明当前账号已超量，需升级火山引擎豆包语音的播客服务。
python 执行缺少相关 package 时，需要先安装依赖：pip install -r requirements.txt

参考文档

豆包播客-产品简介

示例

# 基于话题生成播客音频
ptompt_text="豆包语音合成服务"
python scripts/podcast.py --prompt_text $ptompt_text --action 4
# 基于网页内容生成播客音频
url="https://www.volcengine.com/docs/6561/1668014?lang=zh"
python scripts/podcast.py --input_url $url --action 0
# 基于长文本内容生成播客音频
text="欢迎收听本期节目，我们聊聊人工智能的关键拐点……"
python scripts/podcast.py --text $text --action 0

Usage Guidance

This skill implements the advertised podcast TTS flow, but it will try to obtain a MODEL_SPEECH_API_KEY automatically by calling an ARK management API if that env var is not present. Before installing or running: 1) Verify you trust the skill source (source/hompepage unknown). 2) If you do not want the skill to call an external ARK API or create keys, set MODEL_SPEECH_API_KEY yourself and do not set ARK_SKILL_API_KEY/ARK_SKILL_API_BASE. 3) Be aware the script will write any found/created key to scripts/.env (it attempts to chmod 600) — treat that file as sensitive or run in an isolated environment. 4) Inspect or run the code in a sandbox or container if you are unsure, and ensure the ARK key you provide (if any) has minimal permissions. If you want to proceed, prefer supplying a pre-created MODEL_SPEECH_API_KEY rather than giving the skill an ARK management key/base URL.

Capability Analysis

Type: OpenClaw Skill Name: byted-podcast-gen Version: 1.0.0 The skill bundle is a legitimate integration for ByteDance's Volcengine (Doubao) PodcastTTS service. It uses a custom binary protocol over WebSockets (implemented in `scripts/protocols/`) to generate audio from text or URLs. The `scripts/api_key.py` file securely manages API keys by checking environment variables and optionally persisting them to a `.env` file with restricted permissions (0o600). No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the code logic is strictly aligned with the stated purpose of podcast generation.

Capability Assessment

ℹ Purpose & Capability

The name/description (synthesize podcast audio via 火山引擎/豆包 TTS) matches the implementation: the scripts open a WebSocket to a ByteDance/Volcengine TTS endpoint and assemble received audio chunks into files. There are no obvious unrelated capabilities (no SSH, no cloud provider SDKs).

⚠ Instruction Scope

SKILL.md instructs running the included scripts and setting MODEL_SPEECH_API_KEY. The actual code will also attempt to use ARK_SKILL_API_KEY and ARK_SKILL_API_BASE (if MODEL_SPEECH_API_KEY is absent) to call remote APIs to list/create speech API keys and will persist any discovered/created key to a .env file under the skill's scripts directory. Those ARK env vars and the auto-create behavior are not documented in the top-level metadata and are scope-expanding (network calls to arbitrary ARK base + persistent storage of credentials).

✓ Install Mechanism

There is no install spec; requirements.txt only lists 'websockets'. The skill is instruction+script only and will not automatically download third-party archives or install binaries. Installing dependencies uses pip per SKILL.md which is expected.

⚠ Credentials

The package metadata declares no required env vars, but the code requires MODEL_SPEECH_API_KEY (documented in SKILL.md) or, alternatively, ARK_SKILL_API_KEY and ARK_SKILL_API_BASE to list/create API keys. Those ARK variables are not declared in the skill metadata. Using ARK_SKILL_API_KEY/BASE can give the script permission to create API keys via the provided base URL — a higher-privilege operation and potentially disproportionate if the user did not expect the skill to manage API keys.

ℹ Persistence & Privilege

The script persists discovered/created MODEL_SPEECH_API_KEY into a .env file located next to scripts/api_key.py (scripts/.env) and sets os.environ for the current process. It does not set system-wide settings or modify other skills, and 'always' is false. Persisting keys to a local .env is potentially surprising and should be considered when running in shared environments.

Version History

v1.0.0

byted-podcast-gen 1.0.0 initial release: - Generate podcast audio from a topic, webpage, or document using Volcengine Doubao TTS via WebSocket. - Accepts input as topic text, URL (including file URLs), or uploaded file (supports pdf, word, txt). - Outputs downloadable audio link (if available) and local audio file, plus segmented podcast transcript in JSON. - Applies strict action/routing rules for different user requests (topic, webpage, file/long text). - Provides clear error handling and environment setup guidance. - Includes usage examples, parameters, and return value explanations in documentation.

Metadata

Slug byted-podcast-gen

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Byted Podcast Gen?

将某个话题或者网页内容总结合成为播客音频（Podcast）。基于火山引擎豆包语音播客合成协议生成最终音频。 It is an AI Agent Skill for Claude Code / OpenClaw, with 89 downloads so far.

How do I install Byted Podcast Gen?

Run "/install byted-podcast-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Byted Podcast Gen free?

Yes, Byted Podcast Gen is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Byted Podcast Gen support?

Byted Podcast Gen is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Byted Podcast Gen?

It is built and maintained by volcengine-skills (@volcengine-skills); the current version is v1.0.0.

More Skills

Byted Podcast Gen