← Back to Skills Marketplace
don068589

Douyin Transcriber

by Don Li · GitHub ↗ · v1.0.5 · MIT-0
cross-platform ⚠ suspicious
116
Downloads
0
Stars
0
Active Installs
7
Versions
Install in OpenClaw
/install douyin-transcriber
Description
Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media.
README (SKILL.md)

Douyin Transcriber

Transcribe audio/video files to text using local Docker Whisper ASR.

Quick Start

curl -X POST "http://localhost:PORT/asr" -F "audio_file=@/path/to/video.mp4"

The container has built-in ffmpeg for automatic audio extraction.

Prerequisites

Tool Purpose Install
Docker Whisper ASR Docker Desktop
ffmpeg Audio extraction winget install Gyan.FFmpeg

Deploy Whisper ASR:

docker run -d -p PORT:PORT -e ASR_MODEL=small -e ASR_ENGINE=faster_whisper --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest

Workflow

Step 1: Extract Audio from Video

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

Parameters:

  • -ar 16000: 16kHz sample rate
  • -ac 1: Mono channel
  • -c:a pcm_s16le: 16-bit PCM

Step 2: Transcribe

curl -X POST "http://localhost:PORT/asr" -F "[email protected]"

Optional: specify language

curl -X POST "http://localhost:PORT/asr" -F "[email protected]" -F "language=zh"

Step 3: Parse Result

Response format:

{
  "text": "Transcribed content...",
  "segments": [
    {"start": 0.0, "end": 2.5, "text": "First sentence"},
    {"start": 2.5, "end": 5.0, "text": "Second sentence"}
  ],
  "language": "zh"
}

Model Selection

Model Size 5-min video Accuracy
tiny 75MB ~30s Fair
base 142MB ~1min Good
small 466MB ~3min Better (recommended)
medium 1.5GB ~8min Best

Change model via environment variable: -e ASR_MODEL=medium

Supported Formats

Video: mp4, mkv, avi, mov, flv, wmv, webm, m4v

Audio: wav, m4a, mp3, aac, ogg, flac, wma, opus

Troubleshooting

Issue Solution
Docker not available Install Docker Desktop
Container start fails Check port availability
Transcription timeout Use smaller model or split audio
ffmpeg not found winget install Gyan.FFmpeg

Related Modules

  • douyin-fetcher - Video download
  • douyin-analyzer - Content analysis
  • douyin-orchestrator - Workflow coordination
Usage Guidance
This skill appears to do what it says (local transcription) but has several practical and security gaps you should address before running it: - Metadata mismatch: the SKILL.md requires Docker and ffmpeg but the skill metadata lists none. Assume you need Docker and ffmpeg. - Untrusted image: the instructions pull onerahmet/openai-whisper-asr-webservice:latest from Docker Hub. Prefer a well-known repo or a pinned digest (sha256) and inspect the Dockerfile/source before running. Avoid :latest. - Run safely: execute the container in an isolated VM or sandbox, not on a critical host. Use --rm, drop capabilities, run as non-root user, bind-mount only the directory with audio (read-only if possible), and restrict network access if you don't want the container to contact the internet. - Scan the image: use tools like trivy/snyk/clair to scan the image for vulnerabilities and malware signatures before running. - Port and config: the SKILL.md uses a PORT placeholder—confirm what port to expose and avoid binding to privileged or widely routable host ports. - Ask the author for provenance: request a homepage or source repository, a specific release/tag or digest, and minimal runtime flags recommended for secure execution. If you cannot verify the image or source, run a locally built, audited ASR container instead. Given these issues (metadata omissions and an unpinned third‑party Docker image), treat the skill as suspicious until you can verify the container source and run it in a hardened environment.
Capability Analysis
Type: OpenClaw Skill Name: douyin-transcriber Version: 1.0.5 The skill bundle provides standard instructions for transcribing audio and video files using ffmpeg and a local Docker-based Whisper ASR service (onerahmet/openai-whisper-asr-webservice). All commands, including the ffmpeg parameters and curl requests to localhost, are consistent with the stated purpose of media transcription and do not exhibit any signs of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (Douyin Transcriber using Docker Whisper ASR) matches the SKILL.md workflow (ffmpeg -> Docker container ASR -> curl to localhost). However the registry metadata claims no required binaries or env vars while the instructions clearly require Docker and ffmpeg and recommend container env vars (ASR_MODEL/ASR_ENGINE). This metadata/instruction mismatch is inconsistent.
Instruction Scope
Instructions ask operators to run 'docker run' to pull and run an HTTP ASR service and to run ffmpeg locally and curl audio to localhost. They do not request unrelated system files or credentials, but they do (a) use an unspecified placeholder PORT, (b) assume ability to run Docker (which implies daemon/root access), and (c) direct pulling/execution of a remote image. The steps grant the container network/host-execution potential that isn't described in metadata.
Install Mechanism
No formal install spec (instruction-only), but the SKILL.md instructs pulling a Docker image 'onerahmet/openai-whisper-asr-webservice:latest' from Docker Hub. Pulling and running an unpinned, third‑party image (latest tag, unknown maintainer) is higher risk because images can contain arbitrary code. No guidance to pin a digest, verify source, or run the container with reduced privileges.
Credentials
The skill does not request credentials or secret environment variables. It recommends container env vars for model selection (ASR_MODEL, ASR_ENGINE) which are non-sensitive. However, running Docker implies access to the Docker daemon (privileged), which can be used to access the host; that privilege is disproportionate relative to a metadata claim of 'no required binaries'.
Persistence & Privilege
The skill is not marked always:true and has no install that forces persistent presence. It instructs running a container that exposes an HTTP port (user-controlled). The skill itself does not request elevated platform privileges beyond normal Docker usage, but the act of running arbitrary containers increases blast radius if the image is malicious.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install douyin-transcriber
  3. After installation, invoke the skill by name or use /douyin-transcriber
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.5
- Added clear usage instructions and workflow for audio/video transcription using Docker Whisper ASR. - Detailed prerequisite tools and installation steps. - Included command examples for extracting audio, transcribing, specifying language, and parsing results. - Provided table for model selection, supported formats, and troubleshooting common issues. - Listed related modules for extended Douyin/TikTok workflows.
v1.0.4
- Added detailed usage instructions and quick start guide for transcribing media files with Docker Whisper ASR. - Included prerequisites, installation steps, and workflow for extracting and transcribing audio/video. - Provided model selection table and format support list. - Added troubleshooting section for common issues. - Linked related modules for an integrated workflow.
v1.0.3
- Improved documentation for setup and usage, including quick start instructions and example commands. - Added details on model selection, supported formats, and configuration options. - Clarified integration with Docker Whisper ASR and automatic audio extraction using ffmpeg. - Listed related modules and expanded guidance for transcription workflows.
v1.0.2
- Updated documentation to improve clarity and provide a concise English overview. - Added quick start guide and streamlined usage instructions. - Listed supported audio and video formats explicitly. - Provided model selection table and performance estimates. - Summarized prerequisite tools and deployment steps. - Removed redundant/obsolete information and improved configuration examples.
v1.0.1
- Added comprehensive documentation for skill features, usage, and configuration. - Clarified support for both local Docker Whisper ASR and optional cloud APIs. - Provided detailed setup and deployment instructions, including Docker commands. - Included example commands for curl and Python usage. - Listed dependencies, related modules, and expected transcription times. - License information (MIT-0) clearly stated.
v2.0.0
v2.0.0 - Major upgrade: Modular architecture, browser DOM extraction, DASH support, Docker Whisper, structured output format, extended troubleshooting guide
v1.0.0
Initial release - Audio transcription module with Whisper support
Metadata
Slug douyin-transcriber
Version 1.0.5
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 7
Frequently Asked Questions

What is Douyin Transcriber?

Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media. It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install Douyin Transcriber?

Run "/install douyin-transcriber" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Douyin Transcriber free?

Yes, Douyin Transcriber is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Douyin Transcriber support?

Douyin Transcriber is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Douyin Transcriber?

It is built and maintained by Don Li (@don068589); the current version is v1.0.5.

💬 Comments