← 返回 Skills 市场

visual-grounding

Name: visual-grounding
Author: qijimrc

作者 Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

213

总下载

当前安装

版本数

在 OpenClaw 中安装

/install visual-grounding

功能描述

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

安全使用建议

Do not install blindly. Steps to take before proceeding: - Verify the skill author/source; this package contains an oversized session log (ssssss.json) that is unnecessary for a grounding helper — inspect or remove it. - Open SKILL.md and search for any base64 or invisible/unicode-control characters; if present, ask the author to explain them or provide a clean copy. - Confirm the helper modules referenced (interface_http, utils_boxes) actually exist on the agent environment; the skill provides no implementation files. - Be cautious setting NO_PROXY or pointing to internal IPs; avoid exposing network services or credentials. If you must test, run in an isolated/sandbox agent and do not provide sensitive creds. - If you plan to use an internal model endpoint, verify models.json and endpoint addresses come from a trusted admin and that no secrets are embedded in skill files. - If anything remains unclear (why the session log is included, what the obfuscated content is), contact the skill maintainer and request a minimal, clean SKILL.md and the missing helper modules before use.

功能分析

Type: OpenClaw Skill Name: visual-grounding Version: 1.0.0 The skill bundle provides a legitimate implementation for visual grounding (object detection and localization in images) using the GLM-4.7V model. The code and instructions in SKILL.md describe a standard workflow: calling an internal model API (using a private IP 172.20.112.202), parsing bounding box coordinates, and visualizing the results. No indicators of data exfiltration, malicious execution, or harmful prompt injection were found.

能力评估

ℹ Purpose & Capability

The SKILL.md describes grounding via an HTTP model API and visualization helpers — consistent with the skill name. However the doc references helper modules (interface_http, utils_boxes) that are not included in the package and references an internal config path (/root/.openclaw/agents/main/agent/models.json). The registry metadata declares no env vars or binaries required, but the instructions explicitly tell callers to set NO_PROXY and to contact an internal model host (e.g., 172.20.112.202). These are plausible for a local-model grounding skill but are not declared in metadata.

⚠ Instruction Scope

Instructions tell the agent to set NO_PROXY and call an internal HTTP model endpoint and to parse model responses for bounding boxes — behavior expected for grounding. However the SKILL.md also describes parsing/expanding truncated replies and contains obfuscation/prompt-injection signals (base64-block, unicode-control-chars). The document does not instruct arbitrary file reads, but it references internal config paths and helper modules not supplied, and the included guidance could be used to coax the agent to access internal resources. That ambiguity is concerning.

✓ Install Mechanism

No install spec and no code files (instruction-only) — lowest-risk distribution. Nothing in the package will be written to disk by an installer step.

⚠ Credentials

The skill declares no required credentials, which matches a local-model grounding use, but the SKILL.md instructs setting NO_PROXY to bypass proxies and contains examples with an internal IP. The package also contains a large session log (ssssss.json) that exposes a tool/system prompt and an Authorization header string (Bearer idonthaveakey). Including such an internal transcript in the skill bundle is unexpected and could leak sensitive run-time details or be used to manipulate behavior; this is disproportionate for a simple grounding skill.

✓ Persistence & Privilege

always is false and there are no install hooks or instructions to modify other skills or global agent settings. The skill does not request persistent/autonomous privileges beyond normal invocation.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install visual-grounding
安装完成后，直接呼叫该 Skill 的名称或使用 /visual-grounding 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of visual-grounding skill using GLM-4.7V's multimodal capability: - Supports detection and localization of objects, text, and regions in images, with bounding box output. - Activates automatically on user prompts related to finding, locating, or grounding visual elements. - Provides step-by-step workflow: model API call, response parsing for bounding boxes, and result visualization with labeled boxes. - Includes utility functions for API interaction, coordinate parsing, normalization, and image annotation. - Offers quick reference and usage examples for easy integration.

元数据

Slug visual-grounding

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

visual-grounding 是什么？

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 213 次。

如何安装 visual-grounding？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install visual-grounding」即可一键安装，无需额外配置。

visual-grounding 是免费的吗？

是的，visual-grounding 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

visual-grounding 支持哪些平台？

visual-grounding 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 visual-grounding？

由 Ji Qi（@qijimrc）开发并维护，当前版本 v1.0.0。