← Back to Skills Marketplace

visual-grounding

Name: visual-grounding
Author: qijimrc

by Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

213

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install visual-grounding

Description

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

Usage Guidance

Do not install blindly. Steps to take before proceeding: - Verify the skill author/source; this package contains an oversized session log (ssssss.json) that is unnecessary for a grounding helper — inspect or remove it. - Open SKILL.md and search for any base64 or invisible/unicode-control characters; if present, ask the author to explain them or provide a clean copy. - Confirm the helper modules referenced (interface_http, utils_boxes) actually exist on the agent environment; the skill provides no implementation files. - Be cautious setting NO_PROXY or pointing to internal IPs; avoid exposing network services or credentials. If you must test, run in an isolated/sandbox agent and do not provide sensitive creds. - If you plan to use an internal model endpoint, verify models.json and endpoint addresses come from a trusted admin and that no secrets are embedded in skill files. - If anything remains unclear (why the session log is included, what the obfuscated content is), contact the skill maintainer and request a minimal, clean SKILL.md and the missing helper modules before use.

Capability Analysis

Type: OpenClaw Skill Name: visual-grounding Version: 1.0.0 The skill bundle provides a legitimate implementation for visual grounding (object detection and localization in images) using the GLM-4.7V model. The code and instructions in SKILL.md describe a standard workflow: calling an internal model API (using a private IP 172.20.112.202), parsing bounding box coordinates, and visualizing the results. No indicators of data exfiltration, malicious execution, or harmful prompt injection were found.

Capability Assessment

ℹ Purpose & Capability

The SKILL.md describes grounding via an HTTP model API and visualization helpers — consistent with the skill name. However the doc references helper modules (interface_http, utils_boxes) that are not included in the package and references an internal config path (/root/.openclaw/agents/main/agent/models.json). The registry metadata declares no env vars or binaries required, but the instructions explicitly tell callers to set NO_PROXY and to contact an internal model host (e.g., 172.20.112.202). These are plausible for a local-model grounding skill but are not declared in metadata.

⚠ Instruction Scope

Instructions tell the agent to set NO_PROXY and call an internal HTTP model endpoint and to parse model responses for bounding boxes — behavior expected for grounding. However the SKILL.md also describes parsing/expanding truncated replies and contains obfuscation/prompt-injection signals (base64-block, unicode-control-chars). The document does not instruct arbitrary file reads, but it references internal config paths and helper modules not supplied, and the included guidance could be used to coax the agent to access internal resources. That ambiguity is concerning.

✓ Install Mechanism

No install spec and no code files (instruction-only) — lowest-risk distribution. Nothing in the package will be written to disk by an installer step.

⚠ Credentials

The skill declares no required credentials, which matches a local-model grounding use, but the SKILL.md instructs setting NO_PROXY to bypass proxies and contains examples with an internal IP. The package also contains a large session log (ssssss.json) that exposes a tool/system prompt and an Authorization header string (Bearer idonthaveakey). Including such an internal transcript in the skill bundle is unexpected and could leak sensitive run-time details or be used to manipulate behavior; this is disproportionate for a simple grounding skill.

✓ Persistence & Privilege

always is false and there are no install hooks or instructions to modify other skills or global agent settings. The skill does not request persistent/autonomous privileges beyond normal invocation.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install visual-grounding
After installation, invoke the skill by name or use /visual-grounding
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of visual-grounding skill using GLM-4.7V's multimodal capability: - Supports detection and localization of objects, text, and regions in images, with bounding box output. - Activates automatically on user prompts related to finding, locating, or grounding visual elements. - Provides step-by-step workflow: model API call, response parsing for bounding boxes, and result visualization with labeled boxes. - Includes utility functions for API interaction, coordinate parsing, normalization, and image annotation. - Offers quick reference and usage examples for easy integration.

Metadata

Slug visual-grounding

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is visual-grounding?

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif... It is an AI Agent Skill for Claude Code / OpenClaw, with 213 downloads so far.

How do I install visual-grounding?

Run "/install visual-grounding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is visual-grounding free?

Yes, visual-grounding is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does visual-grounding support?

visual-grounding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created visual-grounding?

It is built and maintained by Ji Qi (@qijimrc); the current version is v1.0.0.

More Skills