← Back to Skills Marketplace

glm-grounding

Name: glm-grounding
Author: qijimrc

by Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

269

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install glm-grounding

Description

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

README (SKILL.md)

Grounding - 多模态目标定位

利用 GLM-4.7V 的 grounding 能力，在图片中定位目标对象或文字，输出带标注框的结果图。

工作流程

用户输入（图片 + prompt）
        │
        ▼
  HttpInterface() → 调用模型 API → 得到 response 文本
        │
        ▼
  parse_bboxes_from_response() → 从回复中解析出坐标框列表
        │
        ▼
  visualize_boxes(renormalize=True) → 反归一化 + 画框 → 保存结果图

Step 1: 调用模型获取坐标

使用 HttpInterface 调用模型 API：

import os
os.environ['NO_PROXY'] = '\x3Cmodel-host>'  # 跳过代理
os.environ['no_proxy'] = '\x3Cmodel-host>'

from interface_http import HttpInterface

url = 'http://\x3Chost>:\x3Cport>/v1/chat/completions'
prompt = '''请在这张图中找到所有"{target}"，并以 [xmin, ymin, xmax, ymax] 格式输出每个目标的边界框坐标，坐标值为 0-1000 的归一化整数。每个目标一行，格式如下：
目标名称: [xmin, ymin, xmax, ymax]'''

response = HttpInterface(url, prompt, images=[image_path], no_think=True)
# 返回: "目标名称: [xmin, ymin, xmax, ymax]"

注意： 调用前需设置 NO_PROXY 环境变量跳过代理，否则内网请求会被代理拦截。

Step 2: 解析坐标框

from utils_boxes import parse_bboxes_from_response

boxes = parse_bboxes_from_response(response)
# 返回: [[x1, y1, x2, y2], ...]  (0-1000 归一化)

parse_bboxes_from_response 会自动：

从回复尾部向前检查截断，拓展 context window
遍历所有括号风格（[], {}, (), \x3C>, \x3Cbbox>）提取坐标
扁平化嵌套列表，返回一维 box 列表

Step 3: 画框可视化

from utils_boxes import visualize_boxes

visualize_boxes(
    img_path=image_path,
    boxes=boxes,                    # parse_bboxes_from_response 的输出
    labels=['label1', 'label2'],    # 每个框的标签
    renormalize=True,               # 自动将 0-1000 归一化转为像素坐标
    save_path='output.jpg',
    colors=['red', 'blue'],         # 可选
    thickness=[2, 3],               # 可选
)

renormalize=True 时，内部自动调用 reverse_normalize_box：pixel = coord * img_dimension / 1000

完整示例

import os
os.environ['NO_PROXY'] = '172.20.112.202'
os.environ['no_proxy'] = '172.20.112.202'

from interface_http import HttpInterface
from utils_boxes import parse_bboxes_from_response, visualize_boxes

url = 'http://172.20.112.202:5002/v1/chat/completions'
img = '/path/to/image.jpg'

# 1. 调用模型
response = HttpInterface(
    url,
    '请在这张图中找到"红色圣诞帽"，以 [xmin, ymin, xmax, ymax] 格式输出坐标（0-1000归一化）',
    images=[img],
    no_think=True,
)

# 2. 解析坐标
boxes = parse_bboxes_from_response(response)

# 3. 画框
visualize_boxes(img_path=img, boxes=boxes, labels=['圣诞帽'], renormalize=True, save_path='out.jpg')

工具函数速查

函数	作用
`HttpInterface(url, prompt, images, no_think)`	调用模型 API，返回文本回复
`parse_bboxes_from_response(text)`	从模型回复中提取所有坐标框列表
`find_boxes_all(text, flat=True)`	提取文本中所有括号风格的坐标框
`reverse_normalize_box(box, w, h)`	0-1000 归一化 → 像素坐标
`visualize_boxes(..., renormalize=True)`	画框 + 自动反归一化

注意事项

模型 API 地址配置在 /root/.openclaw/agents/main/agent/models.json
调用内网模型时必须设置 NO_PROXY 环境变量
no_think=True 可关闭模型思考模式，加快响应

Usage Guidance

This skill appears to do what it claims (call a grounding model, parse boxes, draw them), but it contains surprising instructions: it tells the runtime to set NO_PROXY/no_proxy (bypassing proxies) and references an agent config file (/root/.openclaw/agents/main/agent/models.json). Before enabling it, consider: do you trust the model host it will call? Do you want code that bypasses your organization's proxy/logging? Are the helper modules (interface_http, utils_boxes) present and from a trustworthy source? If you need to use this, prefer running it in a controlled environment or ask the author to remove proxy-bypass steps and to explicitly declare dependencies and any config file reads. If you have security policies about network egress or exposing agent internals, treat this skill with caution.

Capability Analysis

Type: OpenClaw Skill Name: glm-grounding Version: 1.0.0 The skill bundle provides legitimate functionality for object grounding and image annotation using the GLM-4.7V model. The instructions in SKILL.md guide the agent through calling an internal API (using a private IP 172.20.112.202), parsing coordinates, and visualizing results, all of which are consistent with the stated purpose. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.

Capability Assessment

ℹ Purpose & Capability

The skill's name and description (multimodal grounding) align with the instructions to call a grounding model, parse bounding boxes, and visualize them. However, the skill assumes the presence of helper modules (interface_http, utils_boxes) and a local model HTTP endpoint; these dependencies are not declared in metadata and no code is bundled, which is an implementation gap (missing declared dependencies).

⚠ Instruction Scope

SKILL.md instructs setting NO_PROXY/no_proxy to a model host and points to an agent config path (/root/.openclaw/agents/main/agent/models.json). Asking the agent to change environment networking behavior (bypass proxy) and to rely on an agent-specific config file expands scope beyond simply 'call model and draw boxes' and could be used to bypass network controls or access agent internals.

✓ Install Mechanism

There is no install spec and no code files — lowest-risk delivery. The skill is instruction-only, so nothing is written to disk by installation.

⚠ Credentials

The metadata declares no required env vars, yet the instructions explicitly set NO_PROXY/no_proxy at runtime. Modifying proxy-related environment variables is unexpected for a grounding helper and may affect network routing/monitoring. Also the instruction expects access to a local model HTTP endpoint (host:port) without declaring or validating credentials or access scope.

ℹ Persistence & Privilege

always is false and the skill is user-invocable (standard). However, the SKILL.md refers to a specific agent config file path which implies knowledge of or reliance on agent internals; the skill does not request persistent presence but it does assume read access to agent configuration, which is noteworthy.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install glm-grounding
After installation, invoke the skill by name or use /glm-grounding
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of the GLM multimodal grounding skill. - Supports detection and localization of objects or text in images using GLM-4.7V’s capabilities. - Skill automatically activates on relevant keywords or phrases in both English and Chinese. - Provides simple API workflow: call model, parse bounding boxes from response, and visualize results with annotation. - Includes utility functions for model interaction, response parsing, bounding box normalization, and visualization.

Metadata

Slug glm-grounding

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is glm-grounding?

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif... It is an AI Agent Skill for Claude Code / OpenClaw, with 269 downloads so far.

How do I install glm-grounding?

Run "/install glm-grounding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is glm-grounding free?

Yes, glm-grounding is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does glm-grounding support?

glm-grounding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created glm-grounding?

It is built and maintained by Ji Qi (@qijimrc); the current version is v1.0.0.

More Skills