← Back to Skills Marketplace

vlm-grounding

Name: vlm-grounding
Author: qijimrc

by Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

213

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install vlm-grounding

Description

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

Usage Guidance

Treat this skill as potentially unsafe until you verify a few things: 1) Who published it and do you trust that owner? 2) Inspect the bundled ssssss.json log — remove or understand why session/tool content and example authorization headers are included. 3) Confirm whether the skill actually needs to read /root/.openclaw/agents/main/agent/models.json or call internal IPs; if so, restrict it to an isolated environment and ensure no sensitive networks/configs are exposed. 4) Watch for prompt-injection patterns in SKILL.md (base64/unicode control chars); ask the author to remove hidden/encoded content and to explicitly declare any needed config paths or credentials. If you cannot validate these points, run the skill only in a sandboxed agent or decline to install.

Capability Analysis

Type: OpenClaw Skill Name: vlm-grounding Version: 1.0.0 The skill bundle provides instructions and code snippets for performing image grounding (object detection) using a GLM-4.7V model. It describes a standard workflow of calling an internal API (172.20.112.202), parsing bounding box coordinates from the response, and visualizing them on an image. No indicators of data exfiltration, malicious execution, or prompt injection were found in SKILL.md or _meta.json.

Capability Assessment

ℹ Purpose & Capability

SKILL.md describes a reasonable grounding workflow (call model, parse boxes, draw visualizations). However the doc references a system config path (/root/.openclaw/agents/main/agent/models.json) and internal hosts (e.g., 172.20.112.202) without declaring that it needs access to those configs or network endpoints — this is an unexplained dependency on internal configuration.

⚠ Instruction Scope

Instructions tell the agent to contact an HTTP model API and to set NO_PROXY to bypass proxying (which affects network routing). They also include guidance that could cause the agent to read or use system-local config to locate model endpoints. The SKILL.md itself contains prompt-like material and the package contains a large session log (ssssss.json) with system/tool lists; combined with detected base64/unicode-control patterns, this raises concern about embedded prompt-injection or unintended privileged instructions.

✓ Install Mechanism

There is no install spec and no code files to be installed; this reduces disk-write risk. The skill is instruction-only, which is lower risk than an install that fetches and executes arbitrary archives.

⚠ Credentials

The manifest declares no env vars or credentials, but the instructions tell users to set NO_PROXY and point to internal hosts and a root-owned models.json path. That implies the skill expects access to internal network and possibly system config; those capabilities are not declared. The included session log also exposes an 'authorization' header (Bearer idonthaveakey in the sample)—an unexpected token-like artifact that could confuse or be misused.

✓ Persistence & Privilege

The skill is not marked always:true and does not request persistent privileges. It appears user-invocable only, which is appropriate for this type of helper.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install vlm-grounding
After installation, invoke the skill by name or use /vlm-grounding
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of multimodal grounding skill using GLM-4.7V. - Supports detecting and locating objects, text, and UI elements in images with bounding box outputs. - Provides end-to-end workflow: model API call, bounding box parsing, and visualization on images. - Includes robust parsing for various bracket styles and auto-renormalization of coordinates. - Triggers automatically on grounding-related user requests in both English and Chinese.

Metadata

Slug vlm-grounding

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is vlm-grounding?

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif... It is an AI Agent Skill for Claude Code / OpenClaw, with 213 downloads so far.

How do I install vlm-grounding?

Run "/install vlm-grounding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is vlm-grounding free?

Yes, vlm-grounding is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does vlm-grounding support?

vlm-grounding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created vlm-grounding?

It is built and maintained by Ji Qi (@qijimrc); the current version is v1.0.0.

More Skills