← 返回 Skills 市场

vlm-grounding

Name: vlm-grounding
Author: qijimrc

作者 Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

213

总下载

当前安装

版本数

在 OpenClaw 中安装

/install vlm-grounding

功能描述

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

安全使用建议

Treat this skill as potentially unsafe until you verify a few things: 1) Who published it and do you trust that owner? 2) Inspect the bundled ssssss.json log — remove or understand why session/tool content and example authorization headers are included. 3) Confirm whether the skill actually needs to read /root/.openclaw/agents/main/agent/models.json or call internal IPs; if so, restrict it to an isolated environment and ensure no sensitive networks/configs are exposed. 4) Watch for prompt-injection patterns in SKILL.md (base64/unicode control chars); ask the author to remove hidden/encoded content and to explicitly declare any needed config paths or credentials. If you cannot validate these points, run the skill only in a sandboxed agent or decline to install.

功能分析

Type: OpenClaw Skill Name: vlm-grounding Version: 1.0.0 The skill bundle provides instructions and code snippets for performing image grounding (object detection) using a GLM-4.7V model. It describes a standard workflow of calling an internal API (172.20.112.202), parsing bounding box coordinates from the response, and visualizing them on an image. No indicators of data exfiltration, malicious execution, or prompt injection were found in SKILL.md or _meta.json.

能力评估

ℹ Purpose & Capability

SKILL.md describes a reasonable grounding workflow (call model, parse boxes, draw visualizations). However the doc references a system config path (/root/.openclaw/agents/main/agent/models.json) and internal hosts (e.g., 172.20.112.202) without declaring that it needs access to those configs or network endpoints — this is an unexplained dependency on internal configuration.

⚠ Instruction Scope

Instructions tell the agent to contact an HTTP model API and to set NO_PROXY to bypass proxying (which affects network routing). They also include guidance that could cause the agent to read or use system-local config to locate model endpoints. The SKILL.md itself contains prompt-like material and the package contains a large session log (ssssss.json) with system/tool lists; combined with detected base64/unicode-control patterns, this raises concern about embedded prompt-injection or unintended privileged instructions.

✓ Install Mechanism

There is no install spec and no code files to be installed; this reduces disk-write risk. The skill is instruction-only, which is lower risk than an install that fetches and executes arbitrary archives.

⚠ Credentials

The manifest declares no env vars or credentials, but the instructions tell users to set NO_PROXY and point to internal hosts and a root-owned models.json path. That implies the skill expects access to internal network and possibly system config; those capabilities are not declared. The included session log also exposes an 'authorization' header (Bearer idonthaveakey in the sample)—an unexpected token-like artifact that could confuse or be misused.

✓ Persistence & Privilege

The skill is not marked always:true and does not request persistent privileges. It appears user-invocable only, which is appropriate for this type of helper.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install vlm-grounding
安装完成后，直接呼叫该 Skill 的名称或使用 /vlm-grounding 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of multimodal grounding skill using GLM-4.7V. - Supports detecting and locating objects, text, and UI elements in images with bounding box outputs. - Provides end-to-end workflow: model API call, bounding box parsing, and visualization on images. - Includes robust parsing for various bracket styles and auto-renormalization of coordinates. - Triggers automatically on grounding-related user requests in both English and Chinese.

元数据

Slug vlm-grounding

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

vlm-grounding 是什么？

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 213 次。

如何安装 vlm-grounding？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install vlm-grounding」即可一键安装，无需额外配置。

vlm-grounding 是免费的吗？

是的，vlm-grounding 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

vlm-grounding 支持哪些平台？

vlm-grounding 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 vlm-grounding？

由 Ji Qi（@qijimrc）开发并维护，当前版本 v1.0.0。