← Back to Skills Marketplace

Baidu Yijian Vision

Name: Baidu Yijian Vision
Author: linpower

by Power Lin · GitHub ↗ · v0.9.38 · MIT-0

cross-platform ✓ Security Clean

914

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install baidu-yijian-vision

Description

Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost th...

README (SKILL.md)

百度一见视觉技能（Baidu Yijian Vision Skill）

Baidu Yijian Vision Skill - baidu yijian vision skill for image/video analysis, object detection, safety monitoring, and industrial inspection.

⚠️ 必需条件

YIJIAN_API_KEY 环境变量（必需）— 从百度一见平台获取：
1. 登录百度一见平台
2. 激活试用包
3. 生成 API Key（系统管理 → 安全认证 → API Key）
Node.js >= 16.0.0 — 运行时依赖

配置环境变量：YIJIAN_API_KEY=your-api-key

🔒 客户端工具 - 这是一个本地工具，用于与百度一见（Baidu Yijian）平台交互。所有数据处理遵循安全协议。

🎯 此工具的功能

百度一见（yijian-next.cloud.baidu.com）是百度（Baidu）的视觉（vision）理解平台。此工具使你能够：

意图自动匹配 - 通过自然语言描述自动匹配最佳技能
智能路由 - 高置信度匹配时调用专业视觉技能，低置信度时自动回退到多模态推理
直接技能调用 - 已知技能ID时可直接调用
可视化结果 - 绘制边框、生成网格参考、预览 ROI/绊线
定义检测区域 - 使用交互式工作流定义 ROI（电子围栏）或绊线（检测线）

支持的检测类型： 人员检测、行人计数、车辆识别、OCR、姿态估计、目标跟踪等。

📚 使用指南

意图驱动工作流（推荐）

当你描述需求但不确定用哪个技能时，系统会自动匹配最佳技能：

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg

系统会自动：

查询一见平台，根据意图匹配公共技能列表
如果匹配置信度 ≥ 0.7，调用对应的专业技能（自动添加全图 ROI）
如果公共技能无匹配或调用失败，搜索私有工作空间技能（由你从列表中选择最匹配的技能，再用 invoke 调用）
如果私有空间也无合适技能，自动回退到多模态直接推理

自动 ROI： 当用户未提供 ROI 时，系统会自动生成覆盖整张图片的 ROI。如需指定检测区域，请使用 invoke.mjs 传入自定义 ROI。

自定义置信度阈值

# 仅当匹配度≥0.8时才使用技能，否则回退到多模态
node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/intent-invoke.mjs "检测是否有人摔倒" photo.jpg 0.8

不使用图片（纯文本意图查询）

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/intent-invoke.mjs "检测是否有人摔倒"

返回格式

{
  "success": true,
  "mode": "skill",
  "epId": "ep-public-xxxxx",
  "skillName": "人员摔倒检测",
  "confidence": 0.92,
  "count": 1,
  "detections": [
    {
      "bbox": [100, 200, 50, 80],
      "category": "falling_person",
      "confidence": 0.94
    }
  ]
}

字段说明：

字段	类型	说明
`success`	boolean	调用是否成功
`mode`	string	`"skill"` / `"workspace-search"` / `"multimodal"`，表示使用的推理模式
`epId`	string \| null	技能ID（技能模式时有值）
`skillName`	string \| null	技能名称（技能模式时有值）
`confidence`	number \| null	技能匹配置信度（0-1）
`count`	number	检测到的目标数量
`detections`	array	检测结果数组

模式说明：

"mode": "skill" - 使用了百度一见平台的专业技能，精度高、成本低
"mode": "workspace-search" - 公共技能无匹配，返回私有工作空间技能列表供选择
"mode": "multimodal" - 使用了多模态大模型直接推理，通用性强、无需预设技能

查询技能

查询公共技能（按意图匹配）：

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/list.mjs "人员检测"

查询私有工作空间技能（按 API Key 关联，缓存1小时）：

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/workspace.mjs list-skills

返回技能列表（含 epId、名称和描述）。当公共技能匹配不到时，从私有列表中选择最匹配的技能，用 invoke.mjs 调用：

echo '{"input0":{"image":"photo.jpg"}}' | node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/invoke.mjs ep-wsnyqcdj-0xdpgbt4

直接调用技能（已知技能ID）

当你已经知道具体的技能 ID 时，可以直接调用：

echo '{"input0":{"image":"photo.jpg"}}' | node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/invoke.mjs ep-xxxx-yyyy

ROI（电子围栏）参数格式

ROI 用于限定检测区域。必须包含 id、name、kind、points 四个字段，缺一不可，否则 API 返回 500 错误。

{
  "id": "1",
  "name": "zone",
  "kind": "ROI",
  "points": [x1,y1, x2,y2, x3,y3, x4,y4]
}

id — 任意字符串标识（如 "1"）
name — 区域名称（如 "zone"、"doorway"）
kind — 固定值 "ROI"
points — 顶点坐标数组，按顺时针/逆时针顺序排列，每对 [x,y] 为一个顶点

绊线（Tripwire）参数格式

绊线用于检测穿越事件。必须包含 id、name、kind、points、direction 五个字段。

{
  "id": "1",
  "name": "line",
  "kind": "TripWire",
  "points": [p1_x,p1_y, p2_x,p2_y, p3_x,p3_y, p4_x,p4_y],
  "direction": "Forward"
}

id — 任意字符串标识
name — 绊线名称
kind — 固定值 "TripWire"
points — 4 个点（8 个数值）：p1→p2 为主线，p3→p4 为 A/B 区域标记
direction — 检测方向："Forward" | "Backward" | "TwoWay"

绊线不会自动生成，必须由用户指定。详见绊线工作流。

调用带 ROI 的技能：

echo '{"input0":{"image":"photo.jpg","roi":{"id":"1","name":"zone","kind":"ROI","points":[100,100,500,100,500,400,100,400]}}}' | \
  node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/invoke.mjs ep-xxxx-yyyy

调用带绊线的技能：

echo '{"input0":{"image":"photo.jpg","tripwire":{"id":"1","name":"line","kind":"TripWire","points":[0,540,1920,540,0,500,1920,500],"direction":"Forward"}}}' | \
  node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/invoke.mjs ep-xxxx-yyyy

定义检测区域

需要定义电子围栏（ROI，又叫感兴趣区域）或绊线（Tripwire，又叫检测线）？

ROI 工作流 — 创建电子围栏，仅在指定区域检测
绊线工作流 — 绘制检测线，统计穿越事件

两个工作流都包含完整的交互步骤和示例对话。

预览 ROI/绊线 — 在调用前在图像上预览：

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/visualize.mjs photo.jpg '[]' preview.png \
  --overlays '[{"kind":"ROI","name":"zone","points":[...]}]'

生成网格 — 帮助用户使用网格坐标指定点位置：

node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/show-grid.mjs photo.jpg grid.png

查看完整文档

类型定义 — 检测（Detection），图像（Image）、电子围栏（ROI）、绊线（Tripwire）等数据结构
网格输入系统 — 使用网格坐标指定点

高级：视频帧处理和跟踪

场景： 处理 30 秒监控视频，逐帧检测和跟踪人员。

# 第 1 步：提取帧
ffmpeg -i surveillance_30sec.mp4 -vf fps=1 frames/frame_%04d.jpg

# 第 2 步：计算 sourceId（视频标识符）
sourceId=$(head -c 65536 surveillance_30sec.mp4 | md5sum | awk '{print substr($1, 1, 16)}')

# 第 3 步：处理每个帧并跟踪
for frame_file in frames/frame_*.jpg; do
  frame_num=$(basename "$frame_file" | grep -oE '[0-9]+' | head -1)
  frame_index=$((10#$frame_num - 1))
  timestamp=$((frame_index * 1000))
  imageId="frame_$(printf '%04d' "$frame_num")"

  # 使用意图驱动调用
  result=$(node ${CLAUDE_PLUGIN_ROOT}/skill/scripts/intent-invoke.mjs "检测人员" "$frame_file")

  detections=$(echo "$result" | jq '.detections')
  echo "$detections" > "results/${imageId}_detections.json"
done

Usage Guidance

This skill appears to be a legitimate local client for Baidu's Yijian vision service, but review a few things before installing or using it: 1) Origin/trust: the skill's source/homepage is unknown — only install from maintainers you trust. 2) API key scope: YIJIAN_API_KEY will be sent to the Yijian endpoints; ensure the key's permissions and lifetime are acceptable. 3) Network destinations: inspect utils.mjs (routerQueryUrl, workspacesGetUrl, etc.) to confirm the HTTP endpoints are the expected Baidu domains. 4) Local files & caching: the scripts read images and base64-encode them for upload, and they write cache files (workspace-cache.json, skills-cache.json) and generated preview images to disk — verify the cache directory location if you need to control where data is stored. 5) Dependencies: the code uses the sharp library for image processing; run npm install in a controlled environment and review package.json if you wish to vet dependencies. If you want higher assurance, review the omitted utility/cache files (utils.mjs, cache.mjs) to confirm no unexpected outbound hosts or credential harvesting behavior. Overall, nothing in the provided files is disproportionate to the stated purpose, but validate endpoints and the key scope because the repository origin is unverified.

Capability Analysis

Type: OpenClaw Skill Name: baidu-yijian-vision Version: 0.9.38 The baidu-yijian-vision skill bundle is a legitimate integration for Baidu's Yijian visual AI platform. It provides a robust set of tools for image and video analysis, including intent-driven skill routing, multimodal inference, and interactive workflows for defining detection regions (ROI) and tripwires. The scripts (e.g., invoke.mjs, intent-invoke.mjs, and visualize.mjs) use standard Node.js modules and the 'sharp' library for image processing, communicating exclusively with official Baidu API endpoints (yijian.baidubce.com and yijian-next.cloud.baidu.com). No malicious behavior, data exfiltration, or harmful prompt injection instructions were detected.

Capability Tags

cryptorequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (Baidu Yijian vision) align with the code and instructions: the scripts read image files, build ROI/tripwire overlays, query a router for skill matching, list workspace skills, and call multimodal/skill APIs. Required binaries (node/npm) and a single API key (YIJIAN_API_KEY) are appropriate for this client.

✓ Instruction Scope

SKILL.md and the scripts instruct the agent to read local image files, generate grid/visual previews, cache workspace/skill lists, and send image payloads (often base64) to Yijian endpoints. Those actions stay inside the stated purpose. The skill does read and cache workspace/skill metadata locally (1 hour TTL) and will base64-encode image bytes for API calls — this is expected for an image analysis client.

ℹ Install Mechanism

There is no explicit install spec in registry metadata (no automatic download), but package.json and the scripts depend on node modules (notably sharp for image processing). The SKILL.md expects Node and npm to be available but does not explicitly instruct running npm install; the user will need to install dependencies locally (e.g., npm install) before running scripts. Nothing in the manifest pulls arbitrary binaries or unknown URLs.

✓ Credentials

Only one environment credential is declared: YIJIAN_API_KEY (primary). That matches the documented behavior: API calls are authenticated with a bearer token. The skill does not request unrelated credentials or system secrets in the manifest.

✓ Persistence & Privilege

No elevated privileges are requested. always is false, and the skill caches workspace/skill lists locally (workspace-cache.json, skills-cache.json). Caching and writing preview images to disk are reasonable for a CLI client and limited in scope.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install baidu-yijian-vision
After installation, invoke the skill by name or use /baidu-yijian-vision
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.9.39

v0.9.38

- Updated description for improved clarity and emphasis on Yijian's visual AI specialization and advantages. - No code or functionality changes; all scripts and workflows remain the same.

v0.9.37

- Add English keywords to the description for improved search and discoverability. - No functional or workflow changes; documentation updated only.

v0.9.36

Version 0.9.36 - 新增 scripts/cache.mjs，用于缓存数据，提升技能获取和列表查询的效率。 - 相关接口现在支持本地缓存，有效减少无效请求。

v0.9.35

**Version 0.9.35 Changelog** - Major workflow update: Added "意图驱动自动路由"能力，根据自然语言描述智能匹配技能或自动回退到多模态推理。 - 新增脚本：intent-invoke.mjs、multimodal.mjs、query.mjs、workspace.mjs，实现自动技能筛选、多模态/技能切换和私有空间技能查询。 - 大幅简化和重构文档，删除安装、安全、可视化及视频帧处理等冗余说明，聚焦一键意图驱动和自动路由流程。 - 移除多余/过时脚本和 Markdown 文档，提高使用聚焦度。 - 支持自定义置信度阈值、私有技能快速查询、ROI/Tripwire 指定与自动生成。 - 统一参数格式和调用方式，适配全新意图工作流。

v0.9.32

baidu-yijian-vision v0.9.32 - 文档内容为小幅优化，术语统一，如“百度一见”、“Baidu Yijian”等表述更标准一致。 - 英文注释与说明文本的部分行文更精简直白，无功能变更。 - 平台名称、获取 API Key 相关说明补充“百度”字样，表达更清晰。 - 技能主描述、用途、系统要求、功能简介等段落格式与表达统一，无实质逻辑更新。 - 此版本仅文档优化，无代码或功能变化。

v0.9.31

Version 0.9.31 - 精简了名称和描述表述，使表意更直接（如“视觉 AI Agent”调整为“视觉技能”）。 - 保留所有原有技能功能、使用说明和文档结构不变。 - 介绍部分缩短，部分重复或冗余术语得到优化。 - 不涉及脚本或实际功能更新，仅文档表述微调。 - 适用于图片、视频分析及相关检测的全流程工作流不变。

v0.9.30

Version 0.9.30 - Added new utility script: `scripts/preset-utils.mjs`. - Documentation update: now includes instructions for filtering skills by user intent before skill invocation.

v0.9.29

- Removed 4 files: package-lock.json, scripts/generate-presets.mjs, scripts/migrate.mjs, skill/skill.json. - Skill manifest updated: added metadata for required binaries and environment variables. - Documentation and workflow instructions in SKILL.md updated and clarified. - English summary and key usage points introduced at the top of SKILL.md.

v0.9.28

**Changelog for baidu-yijian-vision v0.9.28** - Added migration and preset generation scripts for improved skill setup and compatibility. - Added machine-readable skill metadata file (`skill/skill.json`). - Updated documentation (SKILL.md) for more concise usage instructions and new requirements format. - Updated type guides and workflow guides (including ROI/grid guides) for clarity and structure. - Added package-lock.json for consistent dependency management.

v0.9.27

- Added a concise English summary at the start describing Baidu Yijian Vision as a professional vision AI agent. - Updated the main description field to include an English title and clarify product positioning. - No functional or workflow changes; documentation improvements only. - All usage guides, API instructions, and examples remain unchanged.

v0.9.26

- docs: improve data transmission disclosure in SECURITY.md

v0.9.25

- correct ROI/Tripwire examples in docs

v0.9.24

- 修正了文档中的 API Key 获取路径描述，现在明确指向“一见平台 → 系统管理 → 安全认证 → API Key”。 - 优化了术语表达，明确了“ROI”为电子围栏、“Tripwire”为绊线，并在文档中补充对应说明。 - 类型定义部分补充了检测（Detection）、图像（Image）、电子围栏（ROI）、绊线（Tripwire）等相关数据结构的中文解释。 - 其余文档内容保持与前一版本一致，没有功能变更。

v0.9.22

- Removed two scripts: generate-presets.mjs and migrate.mjs, simplifying the codebase. - Updated requirements metadata: now explicitly lists node and npm as required binaries. - No changes to end-user workflow or main documentation content.

v0.9.21

SECURITY.md changed

v0.9.20

baidu-yijian-vision 0.9.20 - 增加 metadata 字段（openclaw 规范），明确标注所需环境变量 YIJIAN_API_KEY 及主凭据。 - 其他文档内容无变化。

v0.9.19

- 精简和优化了技能描述，更突出专业视觉 AI 能力及核心优势场景。 - 文档结构保持一致，部分引用由“README.md”调整为“安装指南(INSTALL.md)”。 - 无新增功能或脚本更改，仅文档描述和说明文本更新。 - 适合在视觉巡检、工业、安全监控等多种场景快速上手使用。

v0.9.18

- 更新描述，突出行业级应用及典型场景，包含 SOP 合规、质检、红线监控、商业分析、物料盘点等丰富用例 - 明确支持数据类型：图片、视频及 RTSP/RTMP 实时流 - 增加“required-env-vars”和“primary-credential”字段，规范环境变量声明 - 强调专业精度与推理成本优势，突出相较通用基模的性能提升 - 其他说明、功能结构与用法保持不变

v0.9.17

- Removed README.md from the project. - Environment variable, binary, and dependency specifications were removed from SKILL.md’s metadata header. - All existing skill features and usage instructions remain unchanged. - Installation and requirement information is still included within the body content.

Metadata

Slug baidu-yijian-vision

Version 0.9.38

License MIT-0

All-time Installs 3

Active Installs 3

Total Versions 22

Frequently Asked Questions

What is Baidu Yijian Vision?

Yijian (一见) is Baidu's specialized visual AI skill for image and video analysis. yijian achieves 95%+ professional accuracy with 50%+ lower inference cost th... It is an AI Agent Skill for Claude Code / OpenClaw, with 914 downloads so far.

How do I install Baidu Yijian Vision?

Run "/install baidu-yijian-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Baidu Yijian Vision free?

Yes, Baidu Yijian Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Baidu Yijian Vision support?

Baidu Yijian Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Baidu Yijian Vision?

It is built and maintained by Power Lin (@linpower); the current version is v0.9.38.

More Skills