← Back to Skills Marketplace

Douyin Transcriber

Name: Douyin Transcriber
Author: don068589

by Don Li · GitHub ↗ · v1.0.5 · MIT-0

cross-platform ⚠ suspicious

116

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install douyin-transcriber

Description

Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media.

README (SKILL.md)

Douyin Transcriber

Transcribe audio/video files to text using local Docker Whisper ASR.

Quick Start

curl -X POST "http://localhost:PORT/asr" -F "audio_file=@/path/to/video.mp4"

The container has built-in ffmpeg for automatic audio extraction.

Prerequisites

Tool	Purpose	Install
Docker	Whisper ASR	Docker Desktop
ffmpeg	Audio extraction	`winget install Gyan.FFmpeg`

Deploy Whisper ASR:

docker run -d -p PORT:PORT -e ASR_MODEL=small -e ASR_ENGINE=faster_whisper --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest

Workflow

Step 1: Extract Audio from Video

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

Parameters:

-ar 16000: 16kHz sample rate
-ac 1: Mono channel
-c:a pcm_s16le: 16-bit PCM

Step 2: Transcribe

curl -X POST "http://localhost:PORT/asr" -F "[email protected]"

Optional: specify language

curl -X POST "http://localhost:PORT/asr" -F "[email protected]" -F "language=zh"

Step 3: Parse Result

Response format:

{
  "text": "Transcribed content...",
  "segments": [
    {"start": 0.0, "end": 2.5, "text": "First sentence"},
    {"start": 2.5, "end": 5.0, "text": "Second sentence"}
  ],
  "language": "zh"
}

Model Selection

Model	Size	5-min video	Accuracy
tiny	75MB	~30s	Fair
base	142MB	~1min	Good
small	466MB	~3min	Better (recommended)
medium	1.5GB	~8min	Best

Change model via environment variable: -e ASR_MODEL=medium

Supported Formats

Video: mp4, mkv, avi, mov, flv, wmv, webm, m4v

Audio: wav, m4a, mp3, aac, ogg, flac, wma, opus

Troubleshooting

Issue	Solution
Docker not available	Install Docker Desktop
Container start fails	Check port availability
Transcription timeout	Use smaller model or split audio
ffmpeg not found	`winget install Gyan.FFmpeg`

Related Modules

douyin-fetcher - Video download
douyin-analyzer - Content analysis
douyin-orchestrator - Workflow coordination

Usage Guidance

This skill appears to do what it says (local transcription) but has several practical and security gaps you should address before running it: - Metadata mismatch: the SKILL.md requires Docker and ffmpeg but the skill metadata lists none. Assume you need Docker and ffmpeg. - Untrusted image: the instructions pull onerahmet/openai-whisper-asr-webservice:latest from Docker Hub. Prefer a well-known repo or a pinned digest (sha256) and inspect the Dockerfile/source before running. Avoid :latest. - Run safely: execute the container in an isolated VM or sandbox, not on a critical host. Use --rm, drop capabilities, run as non-root user, bind-mount only the directory with audio (read-only if possible), and restrict network access if you don't want the container to contact the internet. - Scan the image: use tools like trivy/snyk/clair to scan the image for vulnerabilities and malware signatures before running. - Port and config: the SKILL.md uses a PORT placeholder—confirm what port to expose and avoid binding to privileged or widely routable host ports. - Ask the author for provenance: request a homepage or source repository, a specific release/tag or digest, and minimal runtime flags recommended for secure execution. If you cannot verify the image or source, run a locally built, audited ASR container instead. Given these issues (metadata omissions and an unpinned third‑party Docker image), treat the skill as suspicious until you can verify the container source and run it in a hardened environment.

Capability Analysis

Type: OpenClaw Skill Name: douyin-transcriber Version: 1.0.5 The skill bundle provides standard instructions for transcribing audio and video files using ffmpeg and a local Docker-based Whisper ASR service (onerahmet/openai-whisper-asr-webservice). All commands, including the ffmpeg parameters and curl requests to localhost, are consistent with the stated purpose of media transcription and do not exhibit any signs of malicious intent, data exfiltration, or prompt injection.

Capability Assessment

ℹ Purpose & Capability

Name/description (Douyin Transcriber using Docker Whisper ASR) matches the SKILL.md workflow (ffmpeg -> Docker container ASR -> curl to localhost). However the registry metadata claims no required binaries or env vars while the instructions clearly require Docker and ffmpeg and recommend container env vars (ASR_MODEL/ASR_ENGINE). This metadata/instruction mismatch is inconsistent.

⚠ Instruction Scope

Instructions ask operators to run 'docker run' to pull and run an HTTP ASR service and to run ffmpeg locally and curl audio to localhost. They do not request unrelated system files or credentials, but they do (a) use an unspecified placeholder PORT, (b) assume ability to run Docker (which implies daemon/root access), and (c) direct pulling/execution of a remote image. The steps grant the container network/host-execution potential that isn't described in metadata.

⚠ Install Mechanism

No formal install spec (instruction-only), but the SKILL.md instructs pulling a Docker image 'onerahmet/openai-whisper-asr-webservice:latest' from Docker Hub. Pulling and running an unpinned, third‑party image (latest tag, unknown maintainer) is higher risk because images can contain arbitrary code. No guidance to pin a digest, verify source, or run the container with reduced privileges.

ℹ Credentials

The skill does not request credentials or secret environment variables. It recommends container env vars for model selection (ASR_MODEL, ASR_ENGINE) which are non-sensitive. However, running Docker implies access to the Docker daemon (privileged), which can be used to access the host; that privilege is disproportionate relative to a metadata claim of 'no required binaries'.

✓ Persistence & Privilege

The skill is not marked always:true and has no install that forces persistent presence. It instructs running a container that exposes an HTTP port (user-controlled). The skill itself does not request elevated platform privileges beyond normal Docker usage, but the act of running arbitrary containers increases blast radius if the image is malicious.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install douyin-transcriber
After installation, invoke the skill by name or use /douyin-transcriber
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.5

- Added clear usage instructions and workflow for audio/video transcription using Docker Whisper ASR. - Detailed prerequisite tools and installation steps. - Included command examples for extracting audio, transcribing, specifying language, and parsing results. - Provided table for model selection, supported formats, and troubleshooting common issues. - Listed related modules for extended Douyin/TikTok workflows.

v1.0.4

- Added detailed usage instructions and quick start guide for transcribing media files with Docker Whisper ASR. - Included prerequisites, installation steps, and workflow for extracting and transcribing audio/video. - Provided model selection table and format support list. - Added troubleshooting section for common issues. - Linked related modules for an integrated workflow.

v1.0.3

- Improved documentation for setup and usage, including quick start instructions and example commands. - Added details on model selection, supported formats, and configuration options. - Clarified integration with Docker Whisper ASR and automatic audio extraction using ffmpeg. - Listed related modules and expanded guidance for transcription workflows.

v1.0.2

- Updated documentation to improve clarity and provide a concise English overview. - Added quick start guide and streamlined usage instructions. - Listed supported audio and video formats explicitly. - Provided model selection table and performance estimates. - Summarized prerequisite tools and deployment steps. - Removed redundant/obsolete information and improved configuration examples.

v1.0.1

- Added comprehensive documentation for skill features, usage, and configuration. - Clarified support for both local Docker Whisper ASR and optional cloud APIs. - Provided detailed setup and deployment instructions, including Docker commands. - Included example commands for curl and Python usage. - Listed dependencies, related modules, and expected transcription times. - License information (MIT-0) clearly stated.

v2.0.0

v2.0.0 - Major upgrade: Modular architecture, browser DOM extraction, DASH support, Docker Whisper, structured output format, extended troubleshooting guide

v1.0.0

Initial release - Audio transcription module with Whisper support

Metadata

Slug douyin-transcriber

Version 1.0.5

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 7

Frequently Asked Questions

What is Douyin Transcriber?

Transcribe speech from audio or video files, automatically extracting audio and converting to text using Docker Whisper ASR for Douyin/TikTok media. It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install Douyin Transcriber?

Run "/install douyin-transcriber" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Douyin Transcriber free?

Yes, Douyin Transcriber is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Douyin Transcriber support?

Douyin Transcriber is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Douyin Transcriber?

It is built and maintained by Don Li (@don068589); the current version is v1.0.5.

More Skills

Douyin Transcriber

Douyin Transcriber

Quick Start

Prerequisites

Workflow

Step 1: Extract Audio from Video

Step 2: Transcribe

Step 3: Parse Result

Model Selection

Supported Formats

Troubleshooting

Related Modules

What is Douyin Transcriber?

How do I install Douyin Transcriber?

Is Douyin Transcriber free?

Which platforms does Douyin Transcriber support?

Who created Douyin Transcriber?

💬 Comments