/install flowyaipc-herdsman-skill-1-0-2-en
Herdsman Skill
This directory is not a single script but an integration package for reuse by other agent platforms, enabling external agents to reliably access the Herdsman local model engine.
Use Cases
- Other agent platforms need to use Herdsman as an OpenAI-compatible backend
- Platforms want to access Herdsman via the Anthropic Messages-compatible interface
- Platforms that support the AG-UI protocol need to connect to
/agui - Need to reliably call text, image, OCR, embedding, and speech capabilities without writing long JSON in shell
Default Connection
- Service address:
http://127.0.0.1:8080 - OpenAI root path:
http://127.0.0.1:8080/v1 - Anthropic endpoint:
http://127.0.0.1:8080/v1/anthropic/messages - AGUI endpoint:
http://127.0.0.1:8080/agui - API Key: Empty by default; if configured, use
Authorization: Bearer \x3Ckey>
Mandatory Rules
1. Do not construct complex curl commands directly
Do not construct complex prompts, tools, base64 images, or long-timeout tasks directly in the shell. Prefer using Python scripts under scripts/, or generate temporary Python files following the same pattern.
2. Run model discovery first
Before calling any model, always run:
python headsman-skill/scripts/check_model.py
If you know the model name, you can also:
python headsman-skill/scripts/check_model.py "\x3Cmodel_id>"
3. Long tasks must explicitly set longer timeouts
- Image generation, editing, img2img: recommended
timeout >= 120 - OCR: recommended
timeout >= 120 - Speech synthesis, recognition, streaming: recommended
timeout >= 120 - Text chat: recommended
timeout >= 60
4. Save image results to disk
If results will be reused in subsequent conversations, save them to outputs/ and return the absolute path or cache URL to the user.
Protocol Priority
OpenAI Compatible
Preferred for:
- Chat completions
- Tool calls
- Embeddings
- Rerank
- Image generation / editing / img2img
- OCR text recognition
- Speech recognition / synthesis / streaming
Core endpoints:
GET /v1/modelsPOST /v1/chat/completionsPOST /v1/embeddingsPOST /v1/rerankPOST /v1/images/generationsPOST /v1/images/editsPOST /v1/images/img2imgGET /v1/images/cache/:filenamePOST /v1/ocrPOST /v1/audio/transcriptionsGET /v1/audio/transcriptions/stream?model=(WebSocket)POST /v1/audio/speechGET /v1/audio/speech/stream/:tokenGET /v1/audio/info?model=
Additional parameters for chat completions (OpenAI Chat Completions compatible extensions):
| Parameter | Type | Description |
|---|---|---|
reasoning_effort |
string | Reasoning level: low / medium / high; local llama.cpp maps to template parameters |
thinking_enabled |
boolean | Enable or disable thinking mode for supported models; local llama.cpp maps to enable_thinking |
thinking_tokens |
number | Thinking token budget; local llama.cpp maps to reasoning_budget |
Anthropic Compatible
For platforms that only support the Anthropic Messages style, the endpoint is:
POST /v1/anthropic/messages
Therefore:
- If the platform supports custom full endpoints, it can connect directly
- If the SDK hardcodes
/v1/messages, add a lightweight proxy on the platform side or use raw HTTP requests
AGUI
For platforms supporting the AG-UI protocol event stream:
POST /agui
AGUI is more suitable for protocol clients or SDKs; raw HTTP is not recommended. In the current state, state should at least provide model, and may optionally include webSearch, tools, task_type, pass_through.
Recommended Scripts
scripts/herdsman_client.py: General HTTP client wrapperscripts/check_model.py: Model discovery and filteringscripts/chat_completion.py: OpenAI chat completion (supports reasoning_effort / thinking)scripts/generate_image.py: Text-to-image generation with auto-downloadscripts/edit_image.py: Image editing with support for local files, URLs, masks, and additional reference imagesscripts/img2img.py: Image-to-image (style transfer, inpainting)scripts/ocr.py: OCR text recognition, supports direct local image recognitionscripts/transcribe_audio.py: Speech transcription, supports local files, URLs, and data URLsscripts/audio_speech.py: Text-to-speech (TTS), supports VoiceDesign, VoiceClone, and streamingscripts/anthropic_messages.py: Anthropic Messages compatible invocation
Directory Structure
references/api-examples.md: Capability-based call examplesreferences/platform-integration.md: OpenAI / Anthropic / AGUI integration guidereferences/error-codes.md: Common errors and agent-side handling strategiesreferences/model-capabilities.md: Model capabilities and endpoint mappingoutputs/: Recommended directory for saving generated images
Best Practices
- Use
check_model.pyfirst to get installed models - Choose OpenAI, Anthropic, or AGUI based on the platform protocol
- Use Python scripts instead of shell concatenation for long tasks
- Save image results as files or cache URLs, avoiding large base64 payloads
- When encountering
model_not_found,model_not_installed,invalid_model_capability, re-run model discovery - Speech transcription supports both JSON body (
audiofield) andmultipart/form-data(filefield) - Before OCR, use
check_model.pyto confirmpaddleocr-ppocrv5-serveror another OCR model is installed
Speech Extension: TTS Voice Clone + ASR Standalone Transcription
The following three scripts are advanced speech tools integrated with Herdsman, supporting a full workflow from audio conversion to ASR transcription to voice cloning.
Script Overview
| Script | Function | External Dependency |
|---|---|---|
scripts/convert_audio.py |
Audio format conversion (any format to 16kHz WAV) | ffmpeg |
scripts/transcribe_standalone.py |
ASR speech transcription (pure urllib, no herdsman_client dependency) | Herdsman ASR model |
scripts/tts_voice_clone.py |
Voice cloning TTS synthesis | Herdsman qwen3-tts-voiceclone |
convert_audio.py
Convert audio in any format (MP3/M4A/OGG, etc.) to 16kHz mono WAV. No Herdsman dependency.
uv run python scripts/convert_audio.py \x3Cinput_path> [output_path]
Parameters:
input_path— Path to the reference audio fileoutput_path— Optional, defaults to same directory as input with.wavextension
Examples:
uv run python scripts/convert_audio.py ref.mp3
uv run python scripts/convert_audio.py ref.mp3 ref.wav
transcribe_standalone.py
Standalone ASR transcription script (pure urllib, no dependency on herdsman_client.py). Dynamic model selection, supports absolute output paths.
uv run python scripts/transcribe_standalone.py \x3Caudio_path> --model \x3Cmodel_id> [--language \x3Clanguage>] [--output \x3Cabsolute_path>]
Parameters:
audio_path— Input audio file path (.wav/.mp3/.m4a, etc.)--model— ASR model ID (required, dynamic selection)--language— Language code (optional, auto-detect by default)--output/-o— Output file absolute path, writes both.txt+.json(optional, prints only if not specified)--timeout— Timeout in seconds (default 300)
Tested model recommendations:
| Model | Recommendation | Notes |
|---|---|---|
sherpa-onnx-paraformer-zh-small |
⭐ Preferred | Simplified Chinese, preserves filler words, ~5s fastest |
whisper-base |
Alternative | General high accuracy, Traditional Chinese output |
funasr |
⚠️ | WebSocket streaming only, HTTP not supported |
sherpa-onnx-streaming-zipformer-zh-14m |
⚠️ | Streaming only, HTTP does not support full transcription |
Examples:
# Recommended
uv run python scripts/transcribe_standalone.py audio.wav --model sherpa-onnx-paraformer-zh-small --output "D:/result.txt"
# Print only
uv run python scripts/transcribe_standalone.py audio.wav --model whisper-base
tts_voice_clone.py
Voice cloning TTS synthesis using qwen3-tts-voiceclone. Three dynamic parameters: reference audio WAV, original text, target script.
uv run python scripts/tts_voice_clone.py \x3Cref_audio_wav> \x3Cref_text> \x3Ctarget_text> [--output \x3Cpath>]
Parameters:
ref_audio_wav— 16kHz mono WAV pathref_text— Original text corresponding to the reference audiotarget_text— Target text to be synthesized with cloned voice--output/-o— Output audio path (defaultripple_tts_cloned.wav)--timeout— Timeout in seconds (default 180)
Examples:
uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o output.wav
Full Workflow
# 1. Convert to WAV
uv run python scripts/convert_audio.py source.mp3 ref.wav
# 2. ASR transcription (extract audio text for comparison)
uv run python scripts/transcribe_standalone.py ref.wav --model sherpa-onnx-paraformer-zh-small --output "D:/transcribed.txt"
# 3. Voice clone synthesis
uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o final.wav
Notes
- Reference audio recommended 10-60 seconds, low background noise, natural speech rate
- The original text must exactly match the audio content, otherwise cloning quality is affected
- ASR transcription supports absolute paths via
--outputfor cross-directory use - Error messages output to stderr, normal results output to stdout
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install flowyaipc-herdsman-skill-1-0-2-en - 安装完成后,直接呼叫该 Skill 的名称或使用
/flowyaipc-herdsman-skill-1-0-2-en触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Flowyaipc Herdsman Skill En 是什么?
Integration package for the Herdsman model engine. Used by other agent platforms to call scripts in this directory and protocol specifications when connectin... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。
如何安装 Flowyaipc Herdsman Skill En?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install flowyaipc-herdsman-skill-1-0-2-en」即可一键安装,无需额外配置。
Flowyaipc Herdsman Skill En 是免费的吗?
是的,Flowyaipc Herdsman Skill En 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Flowyaipc Herdsman Skill En 支持哪些平台?
Flowyaipc Herdsman Skill En 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Flowyaipc Herdsman Skill En?
由 JieJingKe(@jiejingke)开发并维护,当前版本 v1.0.0。