← Back to Skills Marketplace

Bytedance Visual Recognition

Name: Bytedance Visual Recognition
Author: etmnb

by Etmnb · GitHub ↗ · v3.0.4 · MIT-0

cross-platform ⚠ pending

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install bytedance-visual-recognition

Description

ByteDance Visual Recognition — 调用豆包 Doubao-Seed 多模态模型识别图片/视频。支持图片转文字、视频转文字、图片转代码、视频转代码，全自动模型降级。参与火山协作奖励计划免费使用顶级多模态模型。智能调度模型，每模型每日180W tokens，超限自动降级。

README (SKILL.md)

ByteDance Visual Recognition — 豆包 Doubao-Seed 图片视频识别

调用火山方舟 Doubao-Seed API 文档识别图片/视频，自动选择模型并限制用量。参与火山协作奖励计划免费使用顶级多模态模型。

⚠️ 配置只需一次！ 如果 .env 文件已存在且包含 ARK_API_KEY 和 6 个模型 ID，说明已配置过，直接跳到"调用方式"执行命令，不要重新配置。

🚀 首次配置

1. 获取 API Key

打开 https://console.volcengine.com/ark 注册/登录
左侧菜单 → API Key 管理 → 创建 API Key → 复制保存

2. 创建模型接入点

在同一个控制台，左侧菜单 → 在线推理 → 创建推理接入点，选以下 6 个模型：

环境变量名	模型	说明
`DOUBAO_VISION_20P_ID`	Doubao-Seed-2.0-Pro	主力模型，所有模式优先
`DOUBAO_VISION_20C_ID`	Doubao-Seed-2.0-Code	代码模式优先
`DOUBAO_VISION_20L_ID`	Doubao-Seed-2.0-Lite	轻量备选
`DOUBAO_VISION_16V_ID`	Doubao-Seed-1.6-Vision	视觉专用
`DOUBAO_VISION_18_ID`	Doubao-Seed-1.8	通用备选
`DOUBAO_VISION_10C_ID`	Doubao-Seed-Code	代码专用

每个接入点创建后会得到一个 ep-xxxxx 格式的 ID，复制保存。

3. 配置 .env 文件

在 Skill 目录下创建 .env 文件，填入你的 Key 和接入点 ID：

ARK_API_KEY=你的API Key
DOUBAO_VISION_20P_ID=ep-xxxxx
DOUBAO_VISION_20C_ID=ep-xxxxx
DOUBAO_VISION_20L_ID=ep-xxxxx
DOUBAO_VISION_16V_ID=ep-xxxxx
DOUBAO_VISION_18_ID=ep-xxxxx
DOUBAO_VISION_10C_ID=ep-xxxxx

可选：加上 IAM 密钥可自动同步控制台用量
VOLCENGINE_ACCESS_KEY=你的Access Key
VOLCENGINE_SECRET_KEY=你的Secret Key

4. 测试

python doubao_vision_recognize.py --help
python doubao_vision_recognize.py status

有响应且不报错，就配置好了。

⚡ 调用方式 — 触发后必须立即执行命令

这是操作手册，不是参考文档。检测到触发词后，直接复制对应命令执行，禁止只回复文字。

确认 Skill 目录后，先 cd 进去，再执行对应命令。Skill 安装在: .openclaw/workspace/skills/bytedance-visual-recognition/

执行成功后，脚本会输出 ✅ 成功! 及识别结果。你必须把结果告知用户。

⚠️ 严禁: 只回复"正在识别..."而不执行命令。必须跑 python doubao_vision_recognize.py ...。

命令

命令	用途	示例
`rec \x3C文件> --image\|--video --text\|--code`	识别文件	`rec photo.jpg --image --text`
`rec \x3C目录> --image\|--video --text\|--code --batch`	批量处理	`rec ./images/ --batch --image --text`
`ask --text\|--code --prompt "内容"`	追问上次结果	`ask --text --prompt "详细说说"`
`status`	查看今日用量
`sync`	同步控制台数据
`history`	查看7天记录

参数

参数	说明
`--image`	输入为图片
`--video`	输入为视频
`--text`	输出为文字
`--code`	输出为代码
`--prompt` / `-p`	补充指令（rec可选，ask必传）
`--batch`	批量处理目录

--image/--video 必须传一个，--text/--code 必须传一个。模型自动选，不支持指定。

追问规则

不用重新上传文件，接着上次 rec 的结果聊
追问必须用和上次 rec 同一个模型，无法更换
想换模型就重新 rec

🚫 行为规则（最高优先级 — 必须遵守！）

1. 禁止确认配额消耗

✅ 用户调用 Skill 即表示知情并同意消耗配额，直接执行，一个字都别问

2. 禁止确认执行

✅ 触发即执行，不要等用户二次确认

总结：触发 → 推断参数 → 直接执行。中间不要有任何回合的确认。

3. 参数推断规则

用户说"识别/分析/看看"图片 → --image --text
用户说"识别/分析"视频 → --video --text
用户说"转代码/UI转代码/设计稿转代码" → --code
用户有额外要求 → 加 --prompt "内容"
不确定输入类型 → 问用户是图片还是视频（只问这一次）

限制

每个模型每天 180W tokens，超了自动降级
图片最大 15MB，视频最大 50MB

Capability Tags

requires-sensitive-credentials

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install bytedance-visual-recognition
After installation, invoke the skill by name or use /bytedance-visual-recognition
Provide required inputs per the skill's parameter spec and get structured output

Version History

v3.0.4

- 移除了对"追问"（追问与输出结果相关的提问）功能的描述和支持 - 同步更新说明文档，删除所有与追问相关的内容和操作示例 - 其它使用方法、调用规则及配置流程保持不变

v3.0.2

- 更新了技能描述与文档，强调参与火山协作奖励计划可免费获得顶级多模态模型使用权限 - 明确支持图片/视频转文字与代码、追问功能，无需重新上传 - 去除冗余和重复内容，结构更简洁清晰 - 保留详细首次配置与命令用法说明，便于新用户上手 - 行为规则与参数推断无变动，执行规范保持一致

v3.0.1

Version 3.0.1 - 文档新增免费顶级模型使用说明，增加活动奖励计划入口和相关链接 - 其他内容和规则未变，配置和调用方式保持一致

v3.0.0

Version 3.0.0 — Major update with expanded features and new usage rules. - Adds support for both image and video recognition, including conversion to text and code. - Implements automatic multi-model fallback with smart scheduling to manage token limits and maintain service. - Introduces concise command-line usage for all recognition modes, batch processing, and follow-up questioning. - Requires stricter compliance with action rules: auto-execution upon trigger with no confirmation prompts. - Environment variables and setup steps updated for seamless integration with Doubao-Seed API endpoints. - Command triggers and usage instructions greatly expanded and clarified.

Metadata

Slug bytedance-visual-recognition

Version 3.0.4

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 4

Frequently Asked Questions

What is Bytedance Visual Recognition?

ByteDance Visual Recognition — 调用豆包 Doubao-Seed 多模态模型识别图片/视频。支持图片转文字、视频转文字、图片转代码、视频转代码，全自动模型降级。参与火山协作奖励计划免费使用顶级多模态模型。智能调度模型，每模型每日180W tokens，超限自动降级。 It is an AI Agent Skill for Claude Code / OpenClaw, with 21 downloads so far.

How do I install Bytedance Visual Recognition?

Run "/install bytedance-visual-recognition" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Bytedance Visual Recognition free?

Yes, Bytedance Visual Recognition is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Bytedance Visual Recognition support?

Bytedance Visual Recognition is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Bytedance Visual Recognition?

It is built and maintained by Etmnb (@etmnb); the current version is v3.0.4.

More Skills