← Back to Skills Marketplace
don068589

Douyin Video Transcribe

by Don Li · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
437
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install douyin-video-transcribe
Description
Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...
README (SKILL.md)

Douyin Transcribe - Video Transcription Suite

A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.

Version History

Version Changes
2.0.0 Modular architecture, improved workflow, browser DOM extraction
1.0.0 Initial release, basic transcription

Architecture

\\ User Input (Douyin Link/File) │ ▼ ┌─────────────────────────────────────────┐ │ Workflow Orchestrator │ ├─────────────────────────────────────────┤ │ Step 1: Fetcher → Get video file │ │ Step 2: Transcriber → Extract & convert│ │ Step 3: Analyzer → Structure output │ │ Step 4: Output → Save results │ └─────────────────────────────────────────┘ \\

Core Features

  • Video Fetching: Browser-based DOM extraction for CDN URLs
  • Audio Extraction: ffmpeg-powered audio conversion
  • Speech-to-Text: Whisper ASR with multiple model options
  • Content Analysis: Auto-structured transcripts with key points
  • Multi-format Support: Video links, local files, image notes

Prerequisites

Tool Purpose Install
curl Download files Built-in (Windows: \curl.exe)
ffmpeg Audio extraction/merge \winget install Gyan.FFmpeg\
Whisper Transcription \pip install openai-whisper\ or Docker
Browser Video extraction OpenClaw profile required

Docker Whisper (Recommended): \\bash docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest \\

Workflow

Step 0: Input Classification

Input Type Detection Action
Video link (/video/) URL pattern Full workflow
Image note (/note/) URL pattern Snapshot only
Local video file File path Start from Step 2
Text input Plain text Start from Step 3

Step 1: Fetch Video

1.1 Resolve Short URL

\\bash

Windows PowerShell

curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"

macOS/Linux

curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/" \\

Output: \https://www.douyin.com/video/7616020798351871284\

1.2 Open Video Page

\\ browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}') \\

Wait 10-15 seconds for page to load completely.

1.3 Extract Video URL (Browser DOM Method)

\\javascript browser(action='act', targetId='PAGE_ID', request={ "kind": "evaluate", "fn": "(() => { const entries = performance.getEntriesByType('resource'); const videoEntries = entries.filter(e => { const name = e.name.toLowerCase(); return name.includes('douyinvod') && (name.includes('.mp4') || name.includes('video')); }); if (videoEntries.length > 0) { const video = videoEntries[videoEntries.length - 1]; return { url: video.name, type: video.name.includes('.mp4') ? 'mp4' : 'dash' }; } return null; })()" }) \\

Important Notes:

  • \ct\ action requires nested \request\ object with \kind\ and \fn\
  • Wrong: \browser(action='act', fn='...')\
  • Correct: \browser(action='act', request={"kind": "evaluate", "fn": "..."})\

1.4 Download Video

\\bash curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 "\x3CCDN_URL>" \\

Referer header is required, otherwise 403.

Step 2: Transcribe Audio

2.1 Extract Audio

\\bash

For MP4 videos

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

For DASH videos (need merge)

ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y \\

Parameters:

  • -ar 16000: 16kHz sample rate (Whisper requirement)
  • -ac 1: Mono channel
  • -c:a pcm_s16le: 16-bit PCM

2.2 Transcribe with Docker Whisper

\\bash curl.exe -X POST "http://localhost:PORT/asr" -F "audio_[email protected]" \\

2.3 Alternative: Local Whisper

\\bash python -m whisper audio.wav --model small --language zh \\

Model Selection:

Model Size 5-min Video (CPU) Accuracy Use Case
tiny 75MB ~30s Fair Quick preview
base 142MB ~1min Good Daily use
small 466MB ~3min Better Recommended
medium 1.5GB ~8min Best High accuracy

Step 3: Analyze Content

Agent processes transcript and generates:

  1. Fix transcription errors

    • Correct homophones
    • Fix speaker names
    • Remove filler words
  2. Structure content

    • Add paragraph breaks
    • Create sections
  3. Extract key points

    • Main ideas
    • Important quotes
  4. Generate tags

    • 3-5 topic tags

Step 4: Save Output

Transcript Format

\\markdown

{Title}

作者: {Author} 来源: 抖音 日期: {Date} 转录时间: {Transcription Date}


摘要

{Summary}


正文

{Transcript content with paragraphs}


要点

  • {Key point 1}
  • {Key point 2}
  • {Key point 3}

标签

#{tag1} #{tag2} #{tag3} \\

File Naming Convention

\\ {VIDEO_ID}-抖音转录.md \\

Troubleshooting

Stage Issue Solution
Step 1 Short URL fails Check link completeness, remove share text
Step 1 JS returns null Wait 15-20s and retry, increase timeout
Step 1 Download 403 URL expired, re-fetch from browser
Step 1 DASH no audio Merge with \ffmpeg -i video -i audio -c copy\
Step 2 ffmpeg not installed \winget install Gyan.FFmpeg\
Step 2 Whisper service down \docker start whisper-asr\
Step 2 Transcription slow 10-min video takes 15-20 min on CPU
Step 2 Poor quality Use larger model (medium)

Image Note Handling

Image notes (/note/) don't need transcription:

\\

  1. browser(action='open', profile='openclaw', url='IMAGE_NOTE_URL')
  2. browser(action='snapshot')
  3. Extract content from snapshot
  4. Save to output directory \\

Edge Cases

  • Article links (/article/): Use browser snapshot, no transcription
  • Douyin AI summary: Extract from page as supplement
  • Other platforms: Use yt-dlp for YouTube/Bilibili
  • Live streams: Not supported

Related Modules

This skill can be extended with standalone modules:

Module Purpose
douyin-fetcher Video fetching only
douyin-transcriber Audio transcription only
douyin-analyzer Content analysis only
douyin-orchestrator Workflow coordination

License

MIT-0 License - Free to use, modify, and redistribute.

Usage Guidance
This skill generally implements Douyin -> audio -> Whisper transcribe as advertised, but there are a few red flags to consider before installing or running it: - It reads config files in your home directory (~/.openclaw/skills/douyin-config.json and ~/.openclaw/config.json) to find API keys and paths. Inspect those files first — the skill may access keys you did not intend to expose. - The skill can fall back to cloud ASR providers if keys are present. Only provide API keys for providers you trust, and prefer explicit configuration instead of leaving keys in global config files. - If using the local Whisper path the code will pull/run the Docker image onerahmet/openai-whisper-asr-webservice:latest. Treat that image as untrusted code: review its Docker Hub page or run it in a sandboxed environment (VM/container) first. - The skill creates a Docker container named 'whisper-asr' and writes temporary files. Clean up containers/files after use if you are concerned about persistence. Recommended actions: review the included Python files, inspect any ~/.openclaw config for sensitive data, run the skill in an isolated environment if you will allow it to pull/run the Docker image, or modify the code to avoid loading home config files (or to require explicit credentials via a separate declared config) before use.
Capability Analysis
Type: OpenClaw Skill Name: douyin-video-transcribe Version: 2.0.0 The skill bundle is a legitimate tool for transcribing Douyin videos using browser automation, ffmpeg, and Whisper ASR. It utilizes system utilities (curl, ffmpeg, docker) in a manner strictly aligned with its stated purpose of video fetching and audio processing. The Python scripts (transcriber.py, whisper_local.py) follow safe coding practices, such as using subprocess.run with argument lists to mitigate shell injection risks, and no evidence of data exfiltration, persistence, or malicious prompt injection was found.
Capability Assessment
Purpose & Capability
The stated purpose — fetch Douyin videos, extract audio, transcribe with Whisper — matches the instructions and included code. However the Python code also supports two cloud ASR backends (named sili_flow_api and dashscope_api) and tries to load API keys from user config files. Those cloud fallbacks are not declared in the skill metadata or required env vars; supporting remote ASR providers is plausible but the omission of any mention of required credentials or config is an incoherence.
Instruction Scope
SKILL.md instructs the agent to use a browser DOM extraction and to run curl/ffmpeg/docker/whisper — all reasonable for this task. But the shipped code reads configuration files from the user's home (~/.openclaw/skills/douyin-config.json and ~/.openclaw/config.json) to find API keys and temp paths. The README does not require or show these config files; the code will therefore access user home config silently, which broadens the skill's runtime scope beyond what the SKILL.md declares.
Install Mechanism
There is no install spec (instruction-only + included scripts). The code will start or create a Docker container using the image onerahmet/openai-whisper-asr-webservice:latest if the local Whisper path is used. Pulling and running an external Docker image is expected for running Whisper but carries extra risk because the image is from an individual namespace (not an official vendor) and will execute third‑party code on the host.
Credentials
The registry metadata declares no required env vars or credentials, but the code attempts to read API keys (sili_flow_api_key, dashscope_api_key) from user config files. It also reads a fallback ~/.openclaw/config.json which could contain unrelated settings or secrets. Requesting no credentials in metadata while reading user config for keys is disproportionate and surprising.
Persistence & Privilege
The skill does not request always:true and does not alter other skills. It will create a Docker container named 'whisper-asr' (persistent on the host) and write temporary files to a configured temp directory (default is a path under /path/to/temp/douyin or overridden in config). Those are reasonable for this functionality but represent persistent artifacts the user should be aware of.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install douyin-video-transcribe
  3. After installation, invoke the skill by name or use /douyin-video-transcribe
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
v2.0.0 - Major upgrade: Modular architecture, browser DOM extraction, DASH support, Docker Whisper, structured output format, extended troubleshooting guide
v1.0.0
Initial release: Extract audio from Douyin videos and transcribe using Whisper. Cross-platform support.
Metadata
Slug douyin-video-transcribe
Version 2.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Douyin Video Transcribe?

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi... It is an AI Agent Skill for Claude Code / OpenClaw, with 437 downloads so far.

How do I install Douyin Video Transcribe?

Run "/install douyin-video-transcribe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Douyin Video Transcribe free?

Yes, Douyin Video Transcribe is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Douyin Video Transcribe support?

Douyin Video Transcribe is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Douyin Video Transcribe?

It is built and maintained by Don Li (@don068589); the current version is v2.0.0.

💬 Comments