← 返回 Skills 市场
tridefender

Glmv Caption Tunnel

作者 TriDefender · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
105
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install glmv-caption-tunnel
功能描述
Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to descr...
使用说明 (SKILL.md)

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

  • Describe, caption, summarize, or interpret image/video/document content
  • User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
  • Extract visual or textual information from media files
  • Compare multiple images
  • User provides an image/video/file and asks what's in it

Supported Input Types

Type Formats Max Size Max Count Base64 Local Path
Image jpg, png, jpeg 5MB / 6000×6000px 50 ✅ (→base64)
Video mp4, mkv, mov 200MB ✅ (→tunnel)
File pdf, docx, txt, xlsx, pptx, jsonl 50 ✅ (→tunnel)

⚠️ file_url cannot mix with image_url or video_url in the same request.

How Local Paths Work for Videos & Files

The GLM-V API requires public HTTPS URLs for videos and files. When you provide a local path, the script automatically:

  1. Creates a temporary directory and symlinks/copies the file into it (no other files are exposed)
  2. Starts a local Python HTTP server on a random port
  3. Launches cloudflared tunnel (trycloudflare.com quick tunnel) to expose the server to the public internet
  4. Uses the resulting HTTPS URL for the API call
  5. Cleans up (kills cloudflared, shuts down HTTP server, removes temp dir) when done

Prerequisite: cloudflared must be installed. Get it from Cloudflare Downloads.

Resource Links

Resource Link
Get API Key https://bigmodel.cn/usercenter/proj-mgmt/apikeys
API Docs Chat Completions / 对话补全

Prerequisites

API Key Setup / API Key 配置(Required / 必需)

This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills. 脚本通过 ZHIPU_API_KEY 环境变量获取密钥,与其他智谱技能共用同一个 key。

Get Key / 获取 Key: Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式(任选一种):

  1. OpenClaw config (recommended) / OpenClaw 配置(推荐): Set in openclaw.json under skills.entries.glmv-caption.env:

    "glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
    
  2. Shell environment variable / Shell 环境变量: Add to ~/.zshrc:

    export ZHIPU_API_KEY="你的密钥"
    
  3. .env file / .env 文件: Create .env in this skill directory:

    ZHIPU_API_KEY=你的密钥
    

Cloudflared (Required for local video/file paths)

Only needed when passing local file/video paths — not required if you only use URLs.

Install from Cloudflare Downloads.

⛔ MANDATORY: cloudflared 检查与安装流程 (YOU MUST FOLLOW THIS EXACTLY)⛔

当你(AI agent)准备用 --videos--files 传入 本地文件路径 时,在执行脚本之前,你必须先检查 cloudflared 是否已安装。按照以下流程操作:

Step 1: 检查 cloudflared 是否存在

# macOS / Linux
which cloudflared

# Windows
where cloudflared

如果命令返回了路径(如 /usr/local/bin/cloudflared),说明已安装,直接执行脚本即可

如果命令报错(not found / 找不到),进入 Step 2。

Step 2: 向用户说明情况并征求同意

你必须先完整地向用户解释以下内容,然后询问用户是否同意下载安装。未经用户明确同意,不得自行下载。

向用户展示的说明内容(原文照搬,不要精简):


检测到系统未安装 cloudflared,本地视频/文件路径需要该工具才能使用。

什么是 cloudflared?

cloudflared 是 Cloudflare 公司官方开源的命令行隧道工具(Apache 2.0 协议,GitHub 24000+ stars)。 它的功能是:在本机与 Cloudflare 全球边缘网络之间建立一条加密的临时隧道,将本地服务暴露为公网 HTTPS URL。 开源仓库地址:https://github.com/cloudflare/cloudflared

在本场景中的必要性:

智谱 GLM-V API 要求视频和文件以公网 HTTPS URL 的形式传入(不支持本地上传)。 当你提供本地文件路径时,脚本的工作流程是:

  1. 在本机启动一个临时 HTTP 服务器(仅暴露你指定的那几个文件,不会暴露其他目录)
  2. 使用 cloudflared 建立一条临时隧道(通过 trycloudflare.com,无需注册账号、无需登录
  3. 获得一个形如 https://xxx-xxx.trycloudflare.com/your-file.mp4 的临时公网 URL
  4. 将该 URL 传给 GLM-V API 完成调用
  5. API 调用完成后,隧道和 HTTP 服务器自动关闭,临时文件清理

没有 cloudflared = 无法建立隧道 = 本地视频/文件路径不可用。

安全说明:

  • 隧道仅在脚本运行期间存在(通常几十秒到几分钟),脚本结束后立即关闭
  • 只暴露你明确指定的文件,不会暴露整个目录或磁盘
  • cloudflared 不会在后台常驻,不会开机自启
  • 不需要注册 Cloudflare 账号,不收集个人信息
  • 该工具被全球开发者广泛使用,Cloudflare 官方维护

下载链接:

替代方案:如果你不想安装 cloudflared,可以直接传入远程 URL(如 https://example.com/video.mp4),则完全不需要安装任何东西。


等待用户回复。根据用户回复执行对应操作:

  • 用户同意安装 → 进入 Step 3
  • 用户拒绝 → 提示用户改用远程 URL,或告知功能受限
  • 用户要自行安装 → 提供下载链接,等待用户安装完成后再次执行

Step 3: 下载并安装 cloudflared

用户同意后,使用以下命令安装:

macOS (Homebrew):

brew install cloudflared

Windows (推荐用 winget):

winget install Cloudflare.cloudflared

Linux (下载二进制):

# x64
curl -Lo /usr/local/bin/cloudflared https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x /usr/local/bin/cloudflared

其他方式:官方下载页面 手动下载,放到 PATH 中即可。

安装后用 cloudflared version 验证成功,然后执行脚本。

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

  1. ONLY use GLM-V API — Execute the script python scripts/glmv_caption.py
  2. NEVER caption media yourself — Do NOT try to describe content using built-in vision or any other method
  3. NEVER offer alternatives — Do NOT suggest "I can try to describe it" or similar
  4. IF API fails — Display the error message and STOP immediately
  5. NO fallback methods — Do NOT attempt captioning any other way

📋 Output Display Rules (MANDATORY)

After running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.

  • Image captioning: show the full caption text
  • Multiple images: show each image result
  • Video/files: show the full understanding result
  • If token usage is included, you may optionally display it

How to Use

Caption an Image

python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png

Caption Multiple Images

python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"

Caption a Video (URL or local path)

# Remote URL
python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"

# Local file (auto-tunneled via cloudflare)
python scripts/glmv_caption.py --videos /path/to/local-video.mp4

Caption a Document (URL or local path)

# Remote URL
python scripts/glmv_caption.py --files "https://example.com/report.pdf"

# Local file (auto-tunneled via cloudflare)
python scripts/glmv_caption.py --files /path/to/local-report.pdf

# Mix URLs and local paths
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" /path/to/local-doc2.txt

Custom Prompt

python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"

Save Result

python scripts/glmv_caption.py --images photo.jpg --output result.json

Thinking Mode

python scripts/glmv_caption.py --images photo.jpg --thinking

CLI Reference

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]
Parameter Required Description
--images, -i One of Image paths or URLs (supports multiple, base64 OK)
--videos, -v One of Video paths or URLs (supports multiple, mp4/mkv/mov, local paths auto-tunneled)
--files, -f One of Document paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl, local paths auto-tunneled)
--prompt, -p No Custom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail")
--model, -m No Model name (default: glm-5v-turbo)
--temperature, -t No Sampling temperature 0-1 (default: 0.8)
--top-p No Nucleus sampling 0.01-1.0 (default: 0.6)
--max-tokens No Max output tokens (default: 1024, max 32768)
--thinking No Enable thinking/reasoning mode
--output, -o No Save result JSON to file
--pretty No Pretty-print JSON output
--stream No Enable streaming output

Note: --images, --videos, and --files are mutually exclusive per API limits.

Response Format

{
  "success": true,
  "caption": "A landscape photo showing a mountain range at sunset...",
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

Key fields:

  • success — whether the request succeeded
  • caption — the generated caption text
  • usage — token usage statistics
  • warning — present when content was blocked by safety review
  • error — error details on failure

Error Handling

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Content filtered: warning field present → content blocked by safety review

Tunnel failure (local paths only):

Tunnel setup failed: cloudflared not found. Install it from: ...

→ Guide user to install cloudflared, or use a remote URL instead

安全使用建议
This skill legitimately needs to reach the ZhiPu GLM-V API using your ZHIPU_API_KEY and will create a temporary public HTTPS URL for any local video/file by running a local HTTP server and a Cloudflare Quick Tunnel (trycloudflare.com). Before installing/using: 1) Do not pass sensitive or private videos/documents unless you accept temporary public exposure via the tunnel URL. 2) Confirm cloudflared is installed from official sources (Cloudflare/GitHub); the skill's metadata omits cloudflared as a required binary even though SKILL.md/code need it. 3) Be prepared to provide your ZHIPU_API_KEY and understand it will be used for API calls and may be shared across Zhipu-related skills. 4) If you require an explicit prompt before each tunnel creation, request that behavior from the agent or avoid sending local files and use pre-hosted HTTPS URLs instead.
功能分析
Type: OpenClaw Skill Name: glmv-caption-tunnel Version: 1.0.0 The skill implements a high-risk mechanism to expose local files to the public internet via 'cloudflared' tunnels (trycloudflare.com) to satisfy the ZhiPu GLM-V API's requirement for HTTPS URLs. While scripts/tunnel_server.py attempts to limit exposure by serving only specific files from a temporary directory and SKILL.md includes mandatory instructions for the agent to seek user consent before installing dependencies, the capability to programmatically install software and open public network tunnels to local data is inherently risky and could be abused. No clear evidence of intentional malice was found, but the architectural design presents a significant security trade-off.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Name/description, required ZHIPU_API_KEY, and Python usage align with a GLM-V captioning skill. Minor metadata mismatch: the SKILL.md and included code require the cloudflared binary for tunneling local videos/files, but registry metadata only lists 'python' as a required binary. This omission is inconsistent but plausibly an oversight rather than malicious.
Instruction Scope
SKILL.md and scripts explicitly instruct the agent to create a temporary HTTP server and launch cloudflared quick tunnels (trycloudflare.com) to produce public HTTPS URLs for local videos/files. That behavior is necessary for the stated purpose (the remote API requires public URLs) but it results in temporary public exposure of the specified files to the tunnel domain and Cloudflare's edge. The instructions do include a user-consent step for downloading cloudflared if missing, but they do not require explicit user confirmation each time a tunnel is created for local files. This is a meaningful privacy/security implication the user must accept.
Install Mechanism
The skill is instruction-only (no install spec). It ships Python scripts and relies on system binaries (python, and optionally cloudflared). No remote archive download/install spec is present in the manifest. The SKILL.md contains links to official cloudflared releases (GitHub/Cloudflare), which is reasonable; there is no automatic installer in the skill files themselves.
Credentials
Only ZHIPU_API_KEY is requested and is appropriate for calling the ZhiPu GLM-V API. The skill does not request unrelated secrets or config paths. The SKILL.md notes the API key is read from ZHIPU_API_KEY and may be shared with other Zhipu skills — this is expected but the user should be aware of cross-skill reuse of that variable.
Persistence & Privilege
The skill is not always-on, does not claim elevated platform privileges, and does not modify other skills or system-wide configs. It launches short-lived subprocesses (local HTTP server and cloudflared) when used; the included code attempts to clean up after itself.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install glmv-caption-tunnel
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /glmv-caption-tunnel 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
GLM-V Caption Tunnel 1.0.0 – Initial Release - Generate captions and descriptions for images, videos, and documents using the ZhiPu GLM-V multimodal model series. - Supports multiple input types: images (jpg, png, jpeg), videos (mp4, mkv, mov), and documents (pdf, docx, txt, xlsx, pptx, jsonl) via URL, local path, or base64 (images). - Includes automated tunneling of local video and file paths to the public internet using a temporary local HTTP server and Cloudflare tunnel. - Strictly requires ZHIPU_API_KEY for access and enforces specific output and security rules. - Provides detailed prerequisites and setup instructions for cloudflared, ensuring compliance and user consent before installation. - Comprehensive usage documentation and resource links included.
元数据
Slug glmv-caption-tunnel
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Glmv Caption Tunnel 是什么?

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to descr... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 105 次。

如何安装 Glmv Caption Tunnel?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install glmv-caption-tunnel」即可一键安装,无需额外配置。

Glmv Caption Tunnel 是免费的吗?

是的,Glmv Caption Tunnel 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Glmv Caption Tunnel 支持哪些平台?

Glmv Caption Tunnel 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Glmv Caption Tunnel?

由 TriDefender(@tridefender)开发并维护,当前版本 v1.0.0。

💬 留言讨论