Description

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...

README (SKILL.md)

Douyin Transcribe - Video Transcription Suite

Name: Douyin Video Transcribe
Author: don068589

A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.

Version History

Version	Changes
2.0.0	Modular architecture, improved workflow, browser DOM extraction
1.0.0	Initial release, basic transcription

Architecture

\\ User Input (Douyin Link/File) │ ▼ ┌─────────────────────────────────────────┐ │ Workflow Orchestrator │ ├─────────────────────────────────────────┤ │ Step 1: Fetcher → Get video file │ │ Step 2: Transcriber → Extract & convert│ │ Step 3: Analyzer → Structure output │ │ Step 4: Output → Save results │ └─────────────────────────────────────────┘ \\

Core Features

Video Fetching: Browser-based DOM extraction for CDN URLs
Audio Extraction: ffmpeg-powered audio conversion
Speech-to-Text: Whisper ASR with multiple model options
Content Analysis: Auto-structured transcripts with key points
Multi-format Support: Video links, local files, image notes

Prerequisites

Tool	Purpose	Install
curl	Download files	Built-in (Windows: \curl.exe)
ffmpeg	Audio extraction/merge	\winget install Gyan.FFmpeg\
Whisper	Transcription	\pip install openai-whisper\ or Docker
Browser	Video extraction	OpenClaw profile required

Docker Whisper (Recommended): \\bash docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest \\

Workflow

Step 0: Input Classification

Input Type	Detection	Action
Video link (/video/)	URL pattern	Full workflow
Image note (/note/)	URL pattern	Snapshot only
Local video file	File path	Start from Step 2
Text input	Plain text	Start from Step 3

Step 1: Fetch Video

1.1 Resolve Short URL

\\bash

Windows PowerShell

curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"

macOS/Linux

curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/" \\

Output: \https://www.douyin.com/video/7616020798351871284\

1.2 Open Video Page

\\ browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}') \\

Wait 10-15 seconds for page to load completely.

1.3 Extract Video URL (Browser DOM Method)

\\javascript browser(action='act', targetId='PAGE_ID', request={ "kind": "evaluate", "fn": "(() => { const entries = performance.getEntriesByType('resource'); const videoEntries = entries.filter(e => { const name = e.name.toLowerCase(); return name.includes('douyinvod') && (name.includes('.mp4') || name.includes('video')); }); if (videoEntries.length > 0) { const video = videoEntries[videoEntries.length - 1]; return { url: video.name, type: video.name.includes('.mp4') ? 'mp4' : 'dash' }; } return null; })()" }) \\

Important Notes:

\ct\ action requires nested \request\ object with \kind\ and \fn\
Wrong: \browser(action='act', fn='...')\
Correct: \browser(action='act', request={"kind": "evaluate", "fn": "..."})\

1.4 Download Video

\\bash curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 "\x3CCDN_URL>" \\

Referer header is required, otherwise 403.

Step 2: Transcribe Audio

2.1 Extract Audio

\\bash

For MP4 videos

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

For DASH videos (need merge)

ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y \\

Parameters:

-ar 16000: 16kHz sample rate (Whisper requirement)
-ac 1: Mono channel
-c:a pcm_s16le: 16-bit PCM

2.2 Transcribe with Docker Whisper

\\bash curl.exe -X POST "http://localhost:PORT/asr" -F "audio_[email protected]" \\

2.3 Alternative: Local Whisper

\\bash python -m whisper audio.wav --model small --language zh \\

Model Selection:

Model	Size	5-min Video (CPU)	Accuracy	Use Case
tiny	75MB	~30s	Fair	Quick preview
base	142MB	~1min	Good	Daily use
small	466MB	~3min	Better	Recommended
medium	1.5GB	~8min	Best	High accuracy

Step 3: Analyze Content

Agent processes transcript and generates:

Fix transcription errors
- Correct homophones
- Fix speaker names
- Remove filler words
Structure content
- Add paragraph breaks
- Create sections
Extract key points
- Main ideas
- Important quotes
Generate tags
- 3-5 topic tags

Step 4: Save Output

Transcript Format

\\markdown

{Title}

作者: {Author} 来源: 抖音日期: {Date} 转录时间: {Transcription Date}

摘要

{Summary}

正文

{Transcript content with paragraphs}

要点

{Key point 1}
{Key point 2}
{Key point 3}

Troubleshooting

Stage	Issue	Solution
Step 1	Short URL fails	Check link completeness, remove share text
Step 1	JS returns null	Wait 15-20s and retry, increase timeout
Step 1	Download 403	URL expired, re-fetch from browser
Step 1	DASH no audio	Merge with \ffmpeg -i video -i audio -c copy\
Step 2	ffmpeg not installed	\winget install Gyan.FFmpeg\
Step 2	Whisper service down	\docker start whisper-asr\
Step 2	Transcription slow	10-min video takes 15-20 min on CPU
Step 2	Poor quality	Use larger model (medium)

Image Note Handling

Image notes (/note/) don't need transcription:

\\

browser(action='open', profile='openclaw', url='IMAGE_NOTE_URL')
browser(action='snapshot')
Extract content from snapshot
Save to output directory \\

Edge Cases

Article links (/article/): Use browser snapshot, no transcription
Douyin AI summary: Extract from page as supplement
Other platforms: Use yt-dlp for YouTube/Bilibili
Live streams: Not supported

Related Modules

This skill can be extended with standalone modules:

Module	Purpose
douyin-fetcher	Video fetching only
douyin-transcriber	Audio transcription only
douyin-analyzer	Content analysis only
douyin-orchestrator	Workflow coordination

License

MIT-0 License - Free to use, modify, and redistribute.

Usage Guidance

This skill generally implements Douyin -> audio -> Whisper transcribe as advertised, but there are a few red flags to consider before installing or running it: - It reads config files in your home directory (~/.openclaw/skills/douyin-config.json and ~/.openclaw/config.json) to find API keys and paths. Inspect those files first — the skill may access keys you did not intend to expose. - The skill can fall back to cloud ASR providers if keys are present. Only provide API keys for providers you trust, and prefer explicit configuration instead of leaving keys in global config files. - If using the local Whisper path the code will pull/run the Docker image onerahmet/openai-whisper-asr-webservice:latest. Treat that image as untrusted code: review its Docker Hub page or run it in a sandboxed environment (VM/container) first. - The skill creates a Docker container named 'whisper-asr' and writes temporary files. Clean up containers/files after use if you are concerned about persistence. Recommended actions: review the included Python files, inspect any ~/.openclaw config for sensitive data, run the skill in an isolated environment if you will allow it to pull/run the Docker image, or modify the code to avoid loading home config files (or to require explicit credentials via a separate declared config) before use.

Capability Analysis

Type: OpenClaw Skill Name: douyin-video-transcribe Version: 2.0.0 The skill bundle is a legitimate tool for transcribing Douyin videos using browser automation, ffmpeg, and Whisper ASR. It utilizes system utilities (curl, ffmpeg, docker) in a manner strictly aligned with its stated purpose of video fetching and audio processing. The Python scripts (transcriber.py, whisper_local.py) follow safe coding practices, such as using subprocess.run with argument lists to mitigate shell injection risks, and no evidence of data exfiltration, persistence, or malicious prompt injection was found.

Capability Assessment

ℹ Purpose & Capability

The stated purpose — fetch Douyin videos, extract audio, transcribe with Whisper — matches the instructions and included code. However the Python code also supports two cloud ASR backends (named sili_flow_api and dashscope_api) and tries to load API keys from user config files. Those cloud fallbacks are not declared in the skill metadata or required env vars; supporting remote ASR providers is plausible but the omission of any mention of required credentials or config is an incoherence.

⚠ Instruction Scope

SKILL.md instructs the agent to use a browser DOM extraction and to run curl/ffmpeg/docker/whisper — all reasonable for this task. But the shipped code reads configuration files from the user's home (~/.openclaw/skills/douyin-config.json and ~/.openclaw/config.json) to find API keys and temp paths. The README does not require or show these config files; the code will therefore access user home config silently, which broadens the skill's runtime scope beyond what the SKILL.md declares.

ℹ Install Mechanism

There is no install spec (instruction-only + included scripts). The code will start or create a Docker container using the image onerahmet/openai-whisper-asr-webservice:latest if the local Whisper path is used. Pulling and running an external Docker image is expected for running Whisper but carries extra risk because the image is from an individual namespace (not an official vendor) and will execute third‑party code on the host.

⚠ Credentials

The registry metadata declares no required env vars or credentials, but the code attempts to read API keys (sili_flow_api_key, dashscope_api_key) from user config files. It also reads a fallback ~/.openclaw/config.json which could contain unrelated settings or secrets. Requesting no credentials in metadata while reading user config for keys is disproportionate and surprising.

ℹ Persistence & Privilege

The skill does not request always:true and does not alter other skills. It will create a Docker container named 'whisper-asr' (persistent on the host) and write temporary files to a configured temp directory (default is a path under /path/to/temp/douyin or overridden in config). Those are reasonable for this functionality but represent persistent artifacts the user should be aware of.

Version History

v2.0.0

v2.0.0 - Major upgrade: Modular architecture, browser DOM extraction, DASH support, Docker Whisper, structured output format, extended troubleshooting guide

v1.0.0

Initial release: Extract audio from Douyin videos and transcribe using Whisper. Cross-platform support.

Metadata

Slug douyin-video-transcribe

Version 2.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 2

Frequently Asked Questions

What is Douyin Video Transcribe?

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi... It is an AI Agent Skill for Claude Code / OpenClaw, with 437 downloads so far.

How do I install Douyin Video Transcribe?

Run "/install douyin-video-transcribe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Douyin Video Transcribe free?

Yes, Douyin Video Transcribe is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Douyin Video Transcribe support?

Douyin Video Transcribe is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Douyin Video Transcribe?

It is built and maintained by Don Li (@don068589); the current version is v2.0.0.

More Skills

Douyin Video Transcribe

Douyin Transcribe - Video Transcription Suite

Version History

Architecture

Core Features

Prerequisites

Workflow

Step 0: Input Classification

Step 1: Fetch Video

1.1 Resolve Short URL

Windows PowerShell

macOS/Linux

1.2 Open Video Page

1.3 Extract Video URL (Browser DOM Method)

1.4 Download Video

Step 2: Transcribe Audio

2.1 Extract Audio

For MP4 videos

For DASH videos (need merge)

2.2 Transcribe with Docker Whisper

2.3 Alternative: Local Whisper

Step 3: Analyze Content

Step 4: Save Output

Transcript Format

{Title}

摘要

正文

要点

标签

File Naming Convention

Troubleshooting

Image Note Handling

Edge Cases

Related Modules

License

What is Douyin Video Transcribe?

How do I install Douyin Video Transcribe?

Is Douyin Video Transcribe free?

Which platforms does Douyin Video Transcribe support?

Who created Douyin Video Transcribe?

💬 Comments