功能描述

Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Triggers: generate speech, text-to...

使用说明 (SKILL.md)

简体中文 | English

Text-to-Audio

Name: Giggle Generation Speech
Author: patches429

Synthesizes text into AI voice/voiceover via giggle.pro. Supports multiple voice tones, emotions, and speaking rates.

⚠️ Review Before Installing

Please review the following before installing. This skill will:

Write to ~/.openclaw/skills/giggle-generation-speech/logs/ – Task state files for Cron deduplication
Register Cron (30s interval) – Async polling when user initiates speech generation; removed when complete
Forward raw stdout – Script output (audio links, status) is passed to the user as-is

Requirements: python3, GIGGLE_API_KEY (system environment variable), pip packages: requests

API Key: Set system environment variable GIGGLE_API_KEY. The script will prompt if not configured.

No inline Python: All commands must be executed via the exec tool. Never use heredoc inline code.

No Retry on Error: If script execution encounters an error, do not retry. Report the error to the user directly and stop.

Execution Flow (Phase 1 Submit + Phase 2 Cron + Phase 3 Sync Fallback)

Speech generation typically takes 10–30 seconds. Uses "fast submit + Cron poll + sync fallback" three-phase architecture.

Important: Never pass GIGGLE_API_KEY in exec's env parameter. API Key is read from system environment variable.

Phase 0: Guide User to Select Voice and Emotion (required)

Before submitting, you must guide the user to select voice and emotion. Do not use defaults.

Run --list-voices to get available voices:

python3 scripts/text_to_audio_api.py --list-voices

Display the voice list to the user in a readable format (voice_id, name, style, gender, etc.) and guide them to pick one
Ask the user's preferred emotion (e.g. joy, sad, neutral, angry, surprise). Use neutral if no preference
Only after the user confirms voice and emotion, proceed to Phase 1 submit

Phase 1: Submit Task (exec completes in ~10 seconds)

First send a message to the user: "Speech generation in progress, usually takes 10–30 seconds. Results will be sent automatically."

# Must specify user-selected voice and emotion
python3 scripts/text_to_audio_api.py \
  --text "The weather is nice today" \
  --voice-id "Calm_Woman" \
  --emotion "joy" \
  --speed 1.2 \
  --no-wait --json

# View available voices
python3 scripts/text_to_audio_api.py --list-voices

Response example:

{"status": "started", "task_id": "xxx"}

Immediately store task_id in memory (addMemory):

giggle-generation-speech task_id: xxx (submitted: YYYY-MM-DD HH:mm)

Phase 2: Register Cron (30 second interval)

Use the cron tool to register the polling job. Strictly follow the parameter format:

{
  "action": "add",
  "job": {
    "name": "giggle-generation-speech-\x3Cfirst 8 chars of task_id>",
    "schedule": {
      "kind": "every",
      "everyMs": 30000
    },
    "payload": {
      "kind": "systemEvent",
      "text": "Speech task poll: exec python3 scripts/text_to_audio_api.py --query --task-id \x3Cfull task_id>, handle stdout per Cron logic. If stdout is non-JSON plain text, forward to user and remove Cron. If stdout is JSON, do not send message, keep waiting. If stdout is empty, remove Cron immediately."
    },
    "sessionTarget": "main"
  }
}

Cron trigger handling (based on exec stdout):

stdout pattern	Action
Non-empty plain text (not starting with `{`)	Forward to user as-is, remove Cron
stdout empty	Already pushed, remove Cron immediately, do not send message
JSON (starts with `{`, has `"status"` field)	Do not send message, do not remove Cron, keep waiting

Phase 3: Sync Wait (optimistic path, fallback when Cron hasn't fired)

Execute this step whether or not Cron registration succeeded.

python3 scripts/text_to_audio_api.py --query --task-id \x3Ctask_id> --poll --max-wait 120

Handling logic:

Returns plain text (speech ready/failed message) → Forward to user as-is, remove Cron
stdout empty → Cron already pushed, remove Cron, do not send message
exec timeout → Cron continues polling

View Voice List

When the user wants to see available voices, run:

python3 scripts/text_to_audio_api.py --list-voices

The script calls GET /api/v1/project/preset_tones and displays voice_id, name, style, gender, age, language to the user.

Link Return Rule

Audio links returned to the user must be full signed URLs (with Policy, Key-Pair-Id, Signature query params). Correct: https://assets.giggle.pro/...?Policy=...&Key-Pair-Id=...&Signature=.... Wrong: do not return unsigned URLs with only the base path (no query params). The script handles ~ encoding to %7E; keep as-is when forwarding.

New Request vs Query Old Task

When the user initiates a new speech generation request, must run Phase 1 to submit a new task. Do not reuse old task_id from memory.

Only when the user explicitly asks about a previous task's progress should you query the old task_id from memory.

Parameter Reference

Parameter	Required	Default	Description
`--text`	yes	-	Text to synthesize
`--voice-id`	yes	-	Voice ID; must get via `--list-voices` and guide user to choose
`--emotion`	yes	-	Emotion: joy, sad, neutral, angry, surprise, etc. Guide user to choose
`--speed`	no	1	Speaking rate multiplier
`--list-voices`	-	-	Get available voice list
`--query`	-	-	Query task status
`--task-id`	required for query	-	Task ID
`--poll`	no	-	Sync poll with `--query`
`--max-wait`	no	120	Max wait seconds

Interaction Guide

Before each speech generation, complete this interaction:

If the user did not provide text, ask: "Which text would you like to convert to speech?"
Must guide user to select voice: Run --list-voices, display list, have user choose. Do not use default voice
Must guide user to select emotion: Ask the user's preferred emotion (joy, sad, neutral, angry, surprise, etc.)
After user confirms text, voice, and emotion, run Phase 1 submit → Phase 2 register Cron → Phase 3 sync wait

安全使用建议

This skill appears to do what it says: it runs a bundled Python script that calls giggle.pro using the GIGGLE_API_KEY you provide. Before installing, consider: (1) You must set GIGGLE_API_KEY as a system environment variable — only provide a key you trust to be used with giggle.pro. (2) The skill will create files under ~/.openclaw/skills/giggle-generation-speech/logs/ (task state, counters, prompt previews) and will register a cron job that polls the service every 30 seconds while a task is pending; these are removed when tasks complete. (3) SKILL.md instructs forwarding raw stdout to users (including signed audio URLs) — verify you are comfortable with having those exact responses relayed. (4) The script and instructions explicitly avoid passing the API key via exec env; ensure your agent/runtime keeps the system env secure. If any of the above is unacceptable (cron activity, local log files, or forwarding raw responses), do not install or ask the skill author to modify behavior.

功能分析

Type: OpenClaw Skill Name: giggle-generation-speech Version: 0.0.10 The giggle-generation-speech skill provides text-to-speech functionality via the giggle.pro API. The Python script (scripts/text_to_audio_api.py) implements a robust three-phase execution model (submission, cron-based polling, and synchronous fallback) to handle asynchronous audio generation. It follows security best practices by requiring the API key via environment variables rather than command-line arguments and uses local state files in ~/.openclaw/ to manage task deduplication. No evidence of data exfiltration, malicious execution, or prompt injection was found.

能力评估

✓ Purpose & Capability

Name/description match the actual code and runtime instructions. The only required env var is GIGGLE_API_KEY and the only required binary is python3, which are appropriate for a TTS client that calls giggle.pro.

ℹ Instruction Scope

SKILL.md instructs the agent to submit tasks, register a 30s Cron poller, and forward raw stdout to the user. Those behaviors are documented and aligned with the TTS workflow, but the Cron registration and 'forward raw stdout' policy are noteworthy: forwarded stdout can include API messages and signed URLs verbatim, and Cron payloads include commands that run the script with task IDs.

✓ Install Mechanism

No install spec; skill is instruction + a small Python script and requirements.txt. No downloads from third-party URLs or archive extraction; risk from install mechanism is low.

✓ Credentials

Only one secret (GIGGLE_API_KEY) is required and it's the declared primaryEnv. The script reads the API key from the system environment as documented; no unrelated credentials are requested.

ℹ Persistence & Privilege

The skill writes small state files under ~/.openclaw/skills/giggle-generation-speech/logs/ and registers short-lived Cron jobs when active. always:false (not force-included). This level of persistence is proportional to the described async polling behavior, but users should be aware of the created log files and Cron activity.

版本历史

v0.0.10

- English documentation added, replacing original Chinese SKILL.md for international compatibility and clarity - Now explicitly documents and enforces requirements: writes to logs, pip dependency on requests, cron registration for async polling - Adds version, license, and clearer structured requirements fields for installer compatibility - Introduces new section on review before install, with security and resource notes - Usage/documentation clarified, with more concise step-by-step user guidance and requirements - SKILL.zh-CN.md (Simplified Chinese version) added for Chinese users

v0.0.1

giggle-generation-speech v0.0.1 – Initial Release - Enables text-to-speech (TTS) via the Giggle.pro API, supporting voiceover, narration, and audio synthesis from text. - Requires user interaction to select voice and emotion before generating speech; no defaults allowed. - Implements a three-stage execution flow: quick task submission, cron polling for results, and synchronous fallback. - Provides clear guidance on listing available voices and handling user queries for past or current speech tasks. - Ensures all API credentials are securely managed and prohibits inline Python; all tasks must run via exec. - Clearly specifies that only fully signed audio URLs should be returned to users.

元数据

Slug giggle-generation-speech

版本 0.0.10

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Giggle Generation Speech 是什么？

Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Triggers: generate speech, text-to... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 256 次。

如何安装 Giggle Generation Speech？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install giggle-generation-speech」即可一键安装，无需额外配置。

Giggle Generation Speech 是免费的吗？

是的，Giggle Generation Speech 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Giggle Generation Speech 支持哪些平台？

Giggle Generation Speech 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Giggle Generation Speech？

由 Parker（@patches429）开发并维护，当前版本 v0.0.10。

Giggle Generation Speech