← 返回 Skills 市场
don068589

Douyin Transcriber

作者 Don Li · GitHub ↗ · v1.0.5 · MIT-0
cross-platform ⚠ suspicious
116
总下载
0
收藏
0
当前安装
7
版本数
在 OpenClaw 中安装
/install douyin-transcriber
功能描述
Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media.
使用说明 (SKILL.md)

Douyin Transcriber

Transcribe audio/video files to text using local Docker Whisper ASR.

Quick Start

curl -X POST "http://localhost:PORT/asr" -F "audio_file=@/path/to/video.mp4"

The container has built-in ffmpeg for automatic audio extraction.

Prerequisites

Tool Purpose Install
Docker Whisper ASR Docker Desktop
ffmpeg Audio extraction winget install Gyan.FFmpeg

Deploy Whisper ASR:

docker run -d -p PORT:PORT -e ASR_MODEL=small -e ASR_ENGINE=faster_whisper --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest

Workflow

Step 1: Extract Audio from Video

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

Parameters:

  • -ar 16000: 16kHz sample rate
  • -ac 1: Mono channel
  • -c:a pcm_s16le: 16-bit PCM

Step 2: Transcribe

curl -X POST "http://localhost:PORT/asr" -F "[email protected]"

Optional: specify language

curl -X POST "http://localhost:PORT/asr" -F "[email protected]" -F "language=zh"

Step 3: Parse Result

Response format:

{
  "text": "Transcribed content...",
  "segments": [
    {"start": 0.0, "end": 2.5, "text": "First sentence"},
    {"start": 2.5, "end": 5.0, "text": "Second sentence"}
  ],
  "language": "zh"
}

Model Selection

Model Size 5-min video Accuracy
tiny 75MB ~30s Fair
base 142MB ~1min Good
small 466MB ~3min Better (recommended)
medium 1.5GB ~8min Best

Change model via environment variable: -e ASR_MODEL=medium

Supported Formats

Video: mp4, mkv, avi, mov, flv, wmv, webm, m4v

Audio: wav, m4a, mp3, aac, ogg, flac, wma, opus

Troubleshooting

Issue Solution
Docker not available Install Docker Desktop
Container start fails Check port availability
Transcription timeout Use smaller model or split audio
ffmpeg not found winget install Gyan.FFmpeg

Related Modules

  • douyin-fetcher - Video download
  • douyin-analyzer - Content analysis
  • douyin-orchestrator - Workflow coordination
安全使用建议
This skill appears to do what it says (local transcription) but has several practical and security gaps you should address before running it: - Metadata mismatch: the SKILL.md requires Docker and ffmpeg but the skill metadata lists none. Assume you need Docker and ffmpeg. - Untrusted image: the instructions pull onerahmet/openai-whisper-asr-webservice:latest from Docker Hub. Prefer a well-known repo or a pinned digest (sha256) and inspect the Dockerfile/source before running. Avoid :latest. - Run safely: execute the container in an isolated VM or sandbox, not on a critical host. Use --rm, drop capabilities, run as non-root user, bind-mount only the directory with audio (read-only if possible), and restrict network access if you don't want the container to contact the internet. - Scan the image: use tools like trivy/snyk/clair to scan the image for vulnerabilities and malware signatures before running. - Port and config: the SKILL.md uses a PORT placeholder—confirm what port to expose and avoid binding to privileged or widely routable host ports. - Ask the author for provenance: request a homepage or source repository, a specific release/tag or digest, and minimal runtime flags recommended for secure execution. If you cannot verify the image or source, run a locally built, audited ASR container instead. Given these issues (metadata omissions and an unpinned third‑party Docker image), treat the skill as suspicious until you can verify the container source and run it in a hardened environment.
功能分析
Type: OpenClaw Skill Name: douyin-transcriber Version: 1.0.5 The skill bundle provides standard instructions for transcribing audio and video files using ffmpeg and a local Docker-based Whisper ASR service (onerahmet/openai-whisper-asr-webservice). All commands, including the ffmpeg parameters and curl requests to localhost, are consistent with the stated purpose of media transcription and do not exhibit any signs of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
Name/description (Douyin Transcriber using Docker Whisper ASR) matches the SKILL.md workflow (ffmpeg -> Docker container ASR -> curl to localhost). However the registry metadata claims no required binaries or env vars while the instructions clearly require Docker and ffmpeg and recommend container env vars (ASR_MODEL/ASR_ENGINE). This metadata/instruction mismatch is inconsistent.
Instruction Scope
Instructions ask operators to run 'docker run' to pull and run an HTTP ASR service and to run ffmpeg locally and curl audio to localhost. They do not request unrelated system files or credentials, but they do (a) use an unspecified placeholder PORT, (b) assume ability to run Docker (which implies daemon/root access), and (c) direct pulling/execution of a remote image. The steps grant the container network/host-execution potential that isn't described in metadata.
Install Mechanism
No formal install spec (instruction-only), but the SKILL.md instructs pulling a Docker image 'onerahmet/openai-whisper-asr-webservice:latest' from Docker Hub. Pulling and running an unpinned, third‑party image (latest tag, unknown maintainer) is higher risk because images can contain arbitrary code. No guidance to pin a digest, verify source, or run the container with reduced privileges.
Credentials
The skill does not request credentials or secret environment variables. It recommends container env vars for model selection (ASR_MODEL, ASR_ENGINE) which are non-sensitive. However, running Docker implies access to the Docker daemon (privileged), which can be used to access the host; that privilege is disproportionate relative to a metadata claim of 'no required binaries'.
Persistence & Privilege
The skill is not marked always:true and has no install that forces persistent presence. It instructs running a container that exposes an HTTP port (user-controlled). The skill itself does not request elevated platform privileges beyond normal Docker usage, but the act of running arbitrary containers increases blast radius if the image is malicious.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install douyin-transcriber
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /douyin-transcriber 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.5
- Added clear usage instructions and workflow for audio/video transcription using Docker Whisper ASR. - Detailed prerequisite tools and installation steps. - Included command examples for extracting audio, transcribing, specifying language, and parsing results. - Provided table for model selection, supported formats, and troubleshooting common issues. - Listed related modules for extended Douyin/TikTok workflows.
v1.0.4
- Added detailed usage instructions and quick start guide for transcribing media files with Docker Whisper ASR. - Included prerequisites, installation steps, and workflow for extracting and transcribing audio/video. - Provided model selection table and format support list. - Added troubleshooting section for common issues. - Linked related modules for an integrated workflow.
v1.0.3
- Improved documentation for setup and usage, including quick start instructions and example commands. - Added details on model selection, supported formats, and configuration options. - Clarified integration with Docker Whisper ASR and automatic audio extraction using ffmpeg. - Listed related modules and expanded guidance for transcription workflows.
v1.0.2
- Updated documentation to improve clarity and provide a concise English overview. - Added quick start guide and streamlined usage instructions. - Listed supported audio and video formats explicitly. - Provided model selection table and performance estimates. - Summarized prerequisite tools and deployment steps. - Removed redundant/obsolete information and improved configuration examples.
v1.0.1
- Added comprehensive documentation for skill features, usage, and configuration. - Clarified support for both local Docker Whisper ASR and optional cloud APIs. - Provided detailed setup and deployment instructions, including Docker commands. - Included example commands for curl and Python usage. - Listed dependencies, related modules, and expected transcription times. - License information (MIT-0) clearly stated.
v2.0.0
v2.0.0 - Major upgrade: Modular architecture, browser DOM extraction, DASH support, Docker Whisper, structured output format, extended troubleshooting guide
v1.0.0
Initial release - Audio transcription module with Whisper support
元数据
Slug douyin-transcriber
版本 1.0.5
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 7
常见问题

Douyin Transcriber 是什么?

Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 116 次。

如何安装 Douyin Transcriber?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install douyin-transcriber」即可一键安装,无需额外配置。

Douyin Transcriber 是免费的吗?

是的,Douyin Transcriber 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Douyin Transcriber 支持哪些平台?

Douyin Transcriber 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Douyin Transcriber?

由 Don Li(@don068589)开发并维护,当前版本 v1.0.5。

💬 留言讨论