← Back to Skills Marketplace
c-narcissus

Web Video Transcribe DOCX

by c-narcissus · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ✓ Security Clean
79
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install web-video-transcribe-docx
Description
Offline-first workflow for turning Chinese web page video or audio into text and Word deliverables. Use when Codex needs to (1) extract playable media stream...
Usage Guidance
This skill is coherent with its purpose but will: (1) download a large ASR model from GitHub into a user cache the first time you run it, (2) pip-install Python packages if you run scripts/bootstrap_env.py, and (3) use Playwright and a local Chrome/Edge to extract media from pages when needed. Those actions require network access and will write files to your user cache and whatever output directory you choose. If you install or run it: review the bootstrap step before running, run initial tests on non-sensitive/public pages, consider using an isolated environment (virtualenv/container), and inspect the small agents/openai.yaml file if you want to confirm there are no hidden external endpoints or API keys. The skill explicitly avoids capturing cookies/tokens and uses a safe tar extraction check for the model archive.
Capability Analysis
Type: OpenClaw Skill Name: web-video-transcribe-docx Version: 1.0.2 The skill provides a legitimate and well-documented workflow for extracting media from web pages, performing offline transcription using SenseVoice (via sherpa-onnx), and generating DOCX files. The scripts (e.g., `extract_web_media.py`, `transcribe_sensevoice.py`) align perfectly with the stated purpose. Security-wise, the code includes a path traversal check in `pipeline_common.py` during model extraction, and `SKILL.md` explicitly instructs the agent not to exfiltrate sensitive data like cookies or credentials. All external network calls are directed to user-provided URLs or the official GitHub repository for the ASR models.
Capability Assessment
Purpose & Capability
The skill's name and description (extract web media, download streams, run local SenseVoice ASR, produce TXT/DOCX) align with the included scripts. Declared runtime requirement (python) matches the Python scripts. Required actions such as browser automation, media downloading, ffmpeg usage, and model download are expected for this functionality.
Instruction Scope
SKILL.md and the scripts confine behavior to extracting media URLs from pages, downloading media, running local ASR, and producing DOCX. The runtime instructions explicitly state not to request/store cookies or tokens and not to bypass DRM or logins. Extractors capture request headers but sanitize them (only keeping Referer/Origin), and pipelines only operate on user-supplied or page-extracted URLs.
Install Mechanism
No marketplace install spec is present (instruction-only), but the bootstrap script will pip-install several Python packages and can run Playwright's browser installer if invoked. The SenseVoice model is downloaded from a GitHub releases URL and extracted with a safe-path check; these behaviors are appropriate for the task but do involve network downloads and installing Python packages on first run, which is expected but worth noting.
Credentials
The skill requests no environment variables or external credentials. It does look for a local Chrome/Edge executable and writes cache/model files to a per-user cache directory. The only remote endpoint used for code operation is a GitHub releases URL to download the ASR model (appropriate).
Persistence & Privilege
The skill is not 'always: true' and does not claim to modify other skills or global agent settings. It writes files to its own cache and output directories and installs packages only when the bootstrap script is run; this level of presence is appropriate for the described offline transcription workflow.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install web-video-transcribe-docx
  3. After installation, invoke the skill by name or use /web-video-transcribe-docx
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
Initial release of web-video-transcribe-docx. - Provides an offline-first workflow for converting Chinese web page video/audio into text and Word documents. - Supports extraction of media streams (video/audio) from web pages, including MP4, M3U8, MPD, and split audio streams. - Includes pipelines and scripts for media extraction, download (with custom headers), offline transcription using SenseVoice ASR, and rendering to TXT/DOCX. - Special handling for Toutiao pages alongside generic web sources. - Raw and refined transcripts are managed separately for accuracy and auditability. - Stays within ethical and legal boundaries (no DRM bypass, no credential handling, no unauthorized downloads).
Metadata
Slug web-video-transcribe-docx
Version 1.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Web Video Transcribe DOCX?

Offline-first workflow for turning Chinese web page video or audio into text and Word deliverables. Use when Codex needs to (1) extract playable media stream... It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.

How do I install Web Video Transcribe DOCX?

Run "/install web-video-transcribe-docx" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Web Video Transcribe DOCX free?

Yes, Web Video Transcribe DOCX is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Web Video Transcribe DOCX support?

Web Video Transcribe DOCX is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Web Video Transcribe DOCX?

It is built and maintained by c-narcissus (@c-narcissus); the current version is v1.0.2.

💬 Comments