← 返回 Skills 市场

Byted Podcast Gen

Name: Byted Podcast Gen
Author: volcengine-skills

作者 volcengine-skills · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install byted-podcast-gen

功能描述

将某个话题或者网页内容总结合成为播客音频（Podcast）。基于火山引擎豆包语音播客合成协议生成最终音频。

使用说明 (SKILL.md)

Podcast Skill

基于火山引擎豆包语音合成 WebSocket 协议（PodcastTTS，/api/v3/sami/podcasttts）将某个话题合成为播客音频并保存为本地文件。支持：

输入一句话题文本或者一个网页地址（也可以是个文件下载地址，支持 pdf/word/txt 格式）生成播客
原样输出播客音频下载链接（不要做截断等处理）和生成好的本地文件供下载。验证下载链接是否可下载，若可下载则返回给用户，不可下载的只是只返回本地文件。
输出播客分段文本（JSON）

适用场景

用户提到 生成播客 或 播客合成 等相关关键词。
用户需要为某个话题生成播客形式的音频文件。
用户需要某个网页或文件内容生成播客形式的音频文件。
用户需要为用户上传的文件内容或者一个长上下文生成播客形式的音频文件。

强制规则（最高优先级）

当你收到用户请求生成播客时：

必须且只能使用 本 Skill 的脚本来生成播客
话题模式 用户需要为某个话题生成播客形式的音频文件，使用参数 action=4 和 prompt_text = 话题文本。
网页模式 用户需要某个网页或可下载文件内容生成播客形式的音频文件，使用参数 action=0 和 input_url = 网页地址或文件下载地址。
文件模式 用户需要为用户上传的文件内容或者一个长上下文生成播客形式的音频文件，使用参数 action=0 和 text = 用户上传文件读取出来的内容或者是一段比较长的文本，一般超过 200 个字。

使用步骤

分析用户需要合成播客的内容，准备要合成的输入：prompt_text（原始话题，一般不超过 20 个字）或 input_url（网页地址或文件下载地址）或者 text（用户上传文件读取出来的内容或者是一个比较长的文本，一般超过 200 个字）。
运行脚本前先 cd 到本技能目录：skills/byted-podcast-gen。
配置鉴权（环境变量或命令行参数）。
执行脚本：python scripts/podcast.py [参数]。参考下面示例部分。
根据脚本输出的 JSON 里的 audio_path / texts / audio_url 使用生成结果，如果有 audio_url 是一个带过期时间的 URL, 原封不动的返回给用户, audio_path 是本地文件路径, 可以给用户提供下载。

脚本参数

参数	必填	说明
`--text`	否	输入原始长文本（`action=0` 时使用）
`--input_url`	否	输入文本的 URL（`action=0` 时使用，二选一）
`--prompt_text`	否	提示词文本（`action=4` 时必填）
`--action`	否	播客类型：`0`(原始文本/URL)、`4`(prompt)；默认 `4`
`--speaker_info`	否	说话人配置 JSON（默认 `{"random_order":false}`）
`--encoding`	否	音频格式：`mp3`（默认）、`wav`、`ogg_opus`
`--output`	否	最终音频输出文件路径（默认自动生成到 `output/`）

返回值说明

脚本输出 JSON，包含：

status: "success" 或 "error"
task_id: 任务标识（用于定位一次生成任务）
audio_path: 最终音频本地路径
texts: 分段文本 JSON 字符串，每个发音人对应的文本列表。
audio_url: 服务端返回的音频下载地址
error: 失败时的错误信息

错误处理

若报错提示缺少 MODEL_SPEECH_API_KEY：检查环境变量或命令行参数是否已配置，不存在的时候提示用户输入, 然后设置到环境变量。
若收到服务端错误（MsgType.Error）：根据错误信息检查账号权限、资源 ID、输入内容及是否已开通服务。
若收到服务端错误包含关键字 quota 说明当前账号已超量，需升级火山引擎豆包语音的播客服务。
python 执行缺少相关 package 时，需要先安装依赖：pip install -r requirements.txt

参考文档

豆包播客-产品简介

示例

# 基于话题生成播客音频
ptompt_text="豆包语音合成服务"
python scripts/podcast.py --prompt_text $ptompt_text --action 4
# 基于网页内容生成播客音频
url="https://www.volcengine.com/docs/6561/1668014?lang=zh"
python scripts/podcast.py --input_url $url --action 0
# 基于长文本内容生成播客音频
text="欢迎收听本期节目，我们聊聊人工智能的关键拐点……"
python scripts/podcast.py --text $text --action 0

安全使用建议

This skill implements the advertised podcast TTS flow, but it will try to obtain a MODEL_SPEECH_API_KEY automatically by calling an ARK management API if that env var is not present. Before installing or running: 1) Verify you trust the skill source (source/hompepage unknown). 2) If you do not want the skill to call an external ARK API or create keys, set MODEL_SPEECH_API_KEY yourself and do not set ARK_SKILL_API_KEY/ARK_SKILL_API_BASE. 3) Be aware the script will write any found/created key to scripts/.env (it attempts to chmod 600) — treat that file as sensitive or run in an isolated environment. 4) Inspect or run the code in a sandbox or container if you are unsure, and ensure the ARK key you provide (if any) has minimal permissions. If you want to proceed, prefer supplying a pre-created MODEL_SPEECH_API_KEY rather than giving the skill an ARK management key/base URL.

功能分析

Type: OpenClaw Skill Name: byted-podcast-gen Version: 1.0.0 The skill bundle is a legitimate integration for ByteDance's Volcengine (Doubao) PodcastTTS service. It uses a custom binary protocol over WebSockets (implemented in `scripts/protocols/`) to generate audio from text or URLs. The `scripts/api_key.py` file securely manages API keys by checking environment variables and optionally persisting them to a `.env` file with restricted permissions (0o600). No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the code logic is strictly aligned with the stated purpose of podcast generation.

能力评估

ℹ Purpose & Capability

The name/description (synthesize podcast audio via 火山引擎/豆包 TTS) matches the implementation: the scripts open a WebSocket to a ByteDance/Volcengine TTS endpoint and assemble received audio chunks into files. There are no obvious unrelated capabilities (no SSH, no cloud provider SDKs).

⚠ Instruction Scope

SKILL.md instructs running the included scripts and setting MODEL_SPEECH_API_KEY. The actual code will also attempt to use ARK_SKILL_API_KEY and ARK_SKILL_API_BASE (if MODEL_SPEECH_API_KEY is absent) to call remote APIs to list/create speech API keys and will persist any discovered/created key to a .env file under the skill's scripts directory. Those ARK env vars and the auto-create behavior are not documented in the top-level metadata and are scope-expanding (network calls to arbitrary ARK base + persistent storage of credentials).

✓ Install Mechanism

There is no install spec; requirements.txt only lists 'websockets'. The skill is instruction+script only and will not automatically download third-party archives or install binaries. Installing dependencies uses pip per SKILL.md which is expected.

⚠ Credentials

The package metadata declares no required env vars, but the code requires MODEL_SPEECH_API_KEY (documented in SKILL.md) or, alternatively, ARK_SKILL_API_KEY and ARK_SKILL_API_BASE to list/create API keys. Those ARK variables are not declared in the skill metadata. Using ARK_SKILL_API_KEY/BASE can give the script permission to create API keys via the provided base URL — a higher-privilege operation and potentially disproportionate if the user did not expect the skill to manage API keys.

ℹ Persistence & Privilege

The script persists discovered/created MODEL_SPEECH_API_KEY into a .env file located next to scripts/api_key.py (scripts/.env) and sets os.environ for the current process. It does not set system-wide settings or modify other skills, and 'always' is false. Persisting keys to a local .env is potentially surprising and should be considered when running in shared environments.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install byted-podcast-gen
安装完成后，直接呼叫该 Skill 的名称或使用 /byted-podcast-gen 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

byted-podcast-gen 1.0.0 initial release: - Generate podcast audio from a topic, webpage, or document using Volcengine Doubao TTS via WebSocket. - Accepts input as topic text, URL (including file URLs), or uploaded file (supports pdf, word, txt). - Outputs downloadable audio link (if available) and local audio file, plus segmented podcast transcript in JSON. - Applies strict action/routing rules for different user requests (topic, webpage, file/long text). - Provides clear error handling and environment setup guidance. - Includes usage examples, parameters, and return value explanations in documentation.

元数据

Slug byted-podcast-gen

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题