功能描述

Generate audiobooks from novels and long-form text with chapter management and character voices. Use when users mention audiobooks, narrating books, or conve...

使用说明 (SKILL.md)

SenseAudio Audiobook Generator

Name: Audiobook
Author: scikkk

Convert novels, books, and long-form written content into professional audiobooks with natural narration and character voices.

What This Skill Does

This skill helps you create audiobooks by:

Converting long texts (novels, books, articles) into audio
Automatically handling chapter divisions
Supporting multiple character voices for dialogue
Distinguishing between narration and character speech
Managing long content with automatic chunking

When to Use This Skill

Use this skill when you need to:

Create audiobooks from novels or books
Convert long-form articles to audio
Generate narrated versions of written content
Produce audio for e-learning materials
Create accessible audio versions of text documents

How It Works

Basic Workflow

Prepare the text - Format with chapter markers and dialogue tags
Configure narration - Choose narrator voice and character voices
Process chapters - Generate audio for each chapter or section
Combine audio - Merge all segments into complete audiobook
Add metadata - Include chapter markers and timestamps

Text Format

Structure your content like this:

# Chapter 1: The Beginning

[Narrator]
It was a dark and stormy night. The old mansion stood alone on the hill.

[Character: John, Voice: male_0004_a]
"I don't think we should go in there," John whispered nervously.

[Character: Sarah, Voice: female_0006_a]
"Don't be silly. It's just an old house."

[Narrator]
They pushed open the creaky door and stepped inside.

Voice Configuration

Narrator voice: Choose a clear, neutral voice for narration

Recommended: male_0004_a (calm male) or female_0027_a (clear female)
Speed: 0.9-1.0 for comfortable listening
Pitch: 0 (neutral)

Character voices: Assign distinct voices to each character

Use different voice IDs for different characters
Adjust pitch slightly to differentiate (+2/-2)
Vary speed for character personality (excited: 1.1, calm: 0.9)

Chapter Management

For long books, process by chapters:

chapters = [
    {"title": "Chapter 1", "text": "...", "start_time": 0},
    {"title": "Chapter 2", "text": "...", "start_time": 1234},
    # ...
]

Benefits:

Easier to manage and debug
Allows chapter navigation in audio players
Can resume generation if interrupted
Enables parallel processing

Implementation Steps

Step 1: Parse and Structure Content

Extract structure from the text:

import re

def parse_audiobook_text(text):
    sections = []
    current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

    for line in text.split('\
'):
        # Chapter markers
        if line.startswith('# Chapter'):
            if current_section["text"]:
                sections.append(current_section)
            sections.append({"type": "chapter", "title": line[2:]})
            current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

        # Narrator
        elif line.startswith('[Narrator]'):
            if current_section["text"]:
                sections.append(current_section)
            current_section = {"type": "narrator", "text": "", "voice": "male_0004_a"}

        # Character dialogue
        elif match := re.match(r'\[Character: (\w+), Voice: ([\w_]+)\]', line):
            if current_section["text"]:
                sections.append(current_section)
            current_section = {
                "type": "character",
                "character": match.group(1),
                "voice": match.group(2),
                "text": ""
            }

        else:
            current_section["text"] += line + " "

    if current_section["text"]:
        sections.append(current_section)

    return sections

Step 2: Generate Audio Segments

For each section, call the TTS API:

import requests
import binascii

def generate_audio_segment(text, voice_id, output_file, speed=1.0):
    url = "https://api.senseaudio.cn/v1/t2a_v2"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "SenseAudio-TTS-1.0",
        "text": text,
        "stream": False,
        "voice_setting": {
            "voice_id": voice_id,
            "speed": speed,
            "vol": 1.0,
            "pitch": 0
        },
        "audio_setting": {
            "format": "mp3",
            "sample_rate": 32000,
            "bitrate": 128000,
            "channel": 2
        }
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    # Decode hex audio
    audio_hex = data['data']['audio']
    audio_binary = binascii.unhexlify(audio_hex)

    # Save to file
    with open(output_file, 'wb') as f:
        f.write(audio_binary)

    return data['extra_info']['audio_length']  # Duration in ms

Step 3: Handle Long Text

Split long chapters into manageable chunks:

def chunk_text(text, max_length=5000):
    """Split text at sentence boundaries"""
    sentences = re.split(r'([.!?]+\s+)', text)
    chunks = []
    current_chunk = ""

    for i in range(0, len(sentences), 2):
        sentence = sentences[i]
        separator = sentences[i+1] if i+1 \x3C len(sentences) else ""

        if len(current_chunk) + len(sentence) + len(separator) > max_length:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence + separator
        else:
            current_chunk += sentence + separator

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

Step 4: Combine Audio Files

Merge all segments into a complete audiobook:

from pydub import AudioSegment

def create_audiobook(segments, output_file, chapter_markers=None):
    audiobook = AudioSegment.empty()
    chapter_times = []
    current_time = 0

    for i, segment_file in enumerate(segments):
        # Load segment
        segment = AudioSegment.from_mp3(segment_file)

        # Check if this is a chapter start
        if chapter_markers and i in chapter_markers:
            chapter_times.append({
                "chapter": chapter_markers[i],
                "time": current_time
            })

        # Add segment
        audiobook += segment
        current_time += len(segment)

        # Add pause between segments (300ms)
        if i \x3C len(segments) - 1:
            audiobook += AudioSegment.silent(duration=300)
            current_time += 300

    # Export with metadata
    audiobook.export(
        output_file,
        format="mp3",
        bitrate="128k",
        tags={
            "title": "Audiobook",
            "artist": "SenseAudio TTS",
            "album": "Generated Audiobook"
        }
    )

    return chapter_times

Advanced Features

Dialogue Enhancement

Make dialogue more natural:

Add slight pauses before character speech: \x3Cbreak time=200>
Vary character emotions based on context
Use different pitch for different characters

Narration Styles

Adjust narrator voice for different genres:

Fiction: Slightly expressive (speed: 1.0, varied pitch)
Non-fiction: Clear and steady (speed: 0.95, neutral pitch)
Children's books: Animated (speed: 1.1, varied emotions)
Technical: Slow and clear (speed: 0.9, neutral)

Chapter Markers

Add ID3 tags for chapter navigation:

from mutagen.mp3 import MP3
from mutagen.id3 import ID3, CHAP, TIT2, CTOCFlags

def add_chapter_markers(mp3_file, chapters):
    audio = MP3(mp3_file, ID3=ID3)

    # Add chapters
    for i, chapter in enumerate(chapters):
        audio.tags.add(
            CHAP(
                element_id=f"chp{i}",
                start_time=chapter["time"],
                end_time=chapters[i+1]["time"] if i+1 \x3C len(chapters) else len(audio.info.length * 1000),
                sub_frames=[TIT2(text=chapter["chapter"])]
            )
        )

    audio.save()

Progress Tracking

For long books, track progress:

def generate_audiobook_with_progress(sections, output_dir):
    total = len(sections)
    completed = 0

    for i, section in enumerate(sections):
        output_file = f"{output_dir}/segment_{i:04d}.mp3"

        # Skip if already generated
        if os.path.exists(output_file):
            completed += 1
            continue

        # Generate audio
        generate_audio_segment(
            section["text"],
            section["voice"],
            output_file=output_file
        )

        completed += 1
        print(f"Progress: {completed}/{total} ({completed*100//total}%)")

Error Handling

Common issues:

Text too long: Chunk into smaller segments (max 10,000 chars per request)

Voice consistency: Save voice mappings to ensure same character uses same voice

Memory issues: Process chapters separately, don't load entire book at once

API rate limits: Add delays between requests (e.g., 1 second)

Output Format

The skill produces:

Individual chapter audio files (MP3)
Complete audiobook file (MP3)
Chapter markers and timestamps (JSON)
Metadata file with voice mappings and settings

Example Usage

User request: "Convert this novel into an audiobook with different voices for each character"

Skill actions:

Parse the novel text and identify chapters
Extract character dialogue and assign voices
Generate audio for each section
Combine all audio with chapter markers
Export final audiobook with metadata
Provide playback instructions and chapter list

Tips for Best Results

Use consistent voice IDs for the same character throughout
Add natural pauses between paragraphs and chapters
Test a sample chapter before processing the entire book
Keep narrator voice neutral and clear
Use higher quality settings (44100 Hz) for final production
Consider adding background ambience for immersive experience

Reference

For detailed API documentation, see the SenseAudio TTS API reference in the references/ directory.

安全使用建议

This skill appears to do what it says: it parses text and sends chunks to SenseAudio to generate audio. Before installing, verify and be comfortable with these points: (1) You must provide a SenseAudio API key — confirm the exact environment variable name expected (SKILL.md indicates API_KEY / SENSEAUDIO_API_KEY). (2) Using the skill will upload full book text to a third-party service (api.senseaudio.cn); do not send copyrighted, confidential, or sensitive text unless you have rights and trust the service. (3) The SKILL.md uses Python + requests for examples but the skill does not install dependencies — ensure your environment has the needed runtime/libraries. (4) If you need stronger assurances, ask the publisher to correct the registry metadata to list the required env var and to document data retention/privacy for uploaded text. If any of these items worry you, test with non-sensitive sample text first.

功能分析

Type: OpenClaw Skill Name: audiobook-generator Version: 1.0.0 The audiobook-generator skill provides legitimate functionality for converting text to speech using the SenseAudio API (api.senseaudio.cn). The provided Python code snippets use standard libraries (requests, pydub, mutagen) to handle API requests, text chunking, and audio file merging, with no evidence of data exfiltration, unauthorized execution, or malicious prompt injection.

能力评估

✓ Purpose & Capability

The name and description match the runtime instructions: the SKILL.md shows parsing text, chunking, and calling a SenseAudio TTS API to produce chaptered audio. Requiring a SenseAudio API key is appropriate for this purpose. One inconsistency: the registry metadata at the top reported no required env vars, while SKILL.md declares a required credential (SENSEAUDIO_API_KEY / env_var API_KEY).

ℹ Instruction Scope

Runtime instructions are limited to parsing text, chunking it, calling the SenseAudio API, decoding returned audio, and writing audio files — all consistent with audiobook generation. Important operational scope: the skill will send full user text (entire chapters/books) to a third-party API (api.senseaudio.cn). There are no instructions that access unrelated system files, other credentials, or exfiltrate data elsewhere.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no bundled code files, so nothing is downloaded or installed by the skill itself. The SKILL.md includes Python examples that assume standard libraries (and requests), but the skill does not declare or install those dependencies.

⚠ Credentials

The SKILL.md correctly requires an API key for SenseAudio (Authorization header uses API_KEY). However, the registry metadata did not list required environment variables, creating a mismatch. The single credential requested (SenseAudio API key) is proportionate to the stated purpose, but confirm which env var name the agent expects (SKILL.md maps SENSEAUDIO_API_KEY -> API_KEY).

✓ Persistence & Privilege

The skill does not request always: true and does not declare elevated or persistent privileges. It is user-invocable and can be autonomously invoked by the agent (platform default), which is expected for skills of this type.

版本历史

v1.0.0

SenseAudio Audiobook Generator 1.0.0 – Initial Release - Generate audiobooks from novels, books, and long-form written content. - Supports chapter management and navigation for large texts. - Assign distinct voices to narrator and multiple characters for dialogue. - Automatically handles text chunking for lengthy content. - Provides detailed implementation steps and example code for parsing, audio generation, and combining audio segments. - Requires a SENSEAUDIO_API_KEY for integration with SenseAudio’s TTS API.

元数据

Slug audiobook-generator

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Audiobook 是什么？

Generate audiobooks from novels and long-form text with chapter management and character voices. Use when users mention audiobooks, narrating books, or conve... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 297 次。

如何安装 Audiobook？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audiobook-generator」即可一键安装，无需额外配置。

Audiobook 是免费的吗？

是的，Audiobook 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Audiobook 支持哪些平台？

Audiobook 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Audiobook？

由 scikkk（@scikkk）开发并维护，当前版本 v1.0.0。

Audiobook