← 返回 Skills 市场

grounding-anything

Name: grounding-anything
Author: qijimrc

作者 Ji Qi · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

243

总下载

当前安装

版本数

在 OpenClaw 中安装

/install grounding-anything

功能描述

Use GLM-4.7V's multimodal grounding capability to detect and locate objects/text in images. Activate when user asks to find, locate, detect, or ground specif...

使用说明 (SKILL.md)

Grounding - 多模态目标定位

利用 GLM-4.7V 的 grounding 能力，在图片中定位目标对象或文字，输出带标注框的结果图。

工作流程

用户输入（图片 + prompt）
        │
        ▼
  HttpInterface() → 调用模型 API → 得到 response 文本
        │
        ▼
  parse_bboxes_from_response() → 从回复中解析出坐标框列表
        │
        ▼
  visualize_boxes(renormalize=True) → 反归一化 + 画框 → 保存结果图

Step 1: 调用模型获取坐标

使用 HttpInterface 调用模型 API：

import os
os.environ['NO_PROXY'] = '\x3Cmodel-host>'  # 跳过代理
os.environ['no_proxy'] = '\x3Cmodel-host>'

from interface_http import HttpInterface

url = 'http://\x3Chost>:\x3Cport>/v1/chat/completions'
prompt = '''请在这张图中找到所有"{target}"，并以 [xmin, ymin, xmax, ymax] 格式输出每个目标的边界框坐标，坐标值为 0-1000 的归一化整数。每个目标一行，格式如下：
目标名称: [xmin, ymin, xmax, ymax]'''

response = HttpInterface(url, prompt, images=[image_path], no_think=True)
# 返回: "目标名称: [xmin, ymin, xmax, ymax]"

注意： 调用前需设置 NO_PROXY 环境变量跳过代理，否则内网请求会被代理拦截。

Step 2: 解析坐标框

from utils_boxes import parse_bboxes_from_response

boxes = parse_bboxes_from_response(response)
# 返回: [[x1, y1, x2, y2], ...]  (0-1000 归一化)

parse_bboxes_from_response 会自动：

从回复尾部向前检查截断，拓展 context window
遍历所有括号风格（[], {}, (), \x3C>, \x3Cbbox>）提取坐标
扁平化嵌套列表，返回一维 box 列表

Step 3: 画框可视化

from utils_boxes import visualize_boxes

visualize_boxes(
    img_path=image_path,
    boxes=boxes,                    # parse_bboxes_from_response 的输出
    labels=['label1', 'label2'],    # 每个框的标签
    renormalize=True,               # 自动将 0-1000 归一化转为像素坐标
    save_path='output.jpg',
    colors=['red', 'blue'],         # 可选
    thickness=[2, 3],               # 可选
)

renormalize=True 时，内部自动调用 reverse_normalize_box：pixel = coord * img_dimension / 1000

完整示例

import os
os.environ['NO_PROXY'] = '172.20.112.202'
os.environ['no_proxy'] = '172.20.112.202'

from interface_http import HttpInterface
from utils_boxes import parse_bboxes_from_response, visualize_boxes

url = 'http://172.20.112.202:5002/v1/chat/completions'
img = '/path/to/image.jpg'

# 1. 调用模型
response = HttpInterface(
    url,
    '请在这张图中找到"红色圣诞帽"，以 [xmin, ymin, xmax, ymax] 格式输出坐标（0-1000归一化）',
    images=[img],
    no_think=True,
)

# 2. 解析坐标
boxes = parse_bboxes_from_response(response)

# 3. 画框
visualize_boxes(img_path=img, boxes=boxes, labels=['圣诞帽'], renormalize=True, save_path='out.jpg')

工具函数速查

函数	作用
`HttpInterface(url, prompt, images, no_think)`	调用模型 API，返回文本回复
`parse_bboxes_from_response(text)`	从模型回复中提取所有坐标框列表
`find_boxes_all(text, flat=True)`	提取文本中所有括号风格的坐标框
`reverse_normalize_box(box, w, h)`	0-1000 归一化 → 像素坐标
`visualize_boxes(..., renormalize=True)`	画框 + 自动反归一化

注意事项

模型 API 地址配置在 /root/.openclaw/agents/main/agent/models.json
调用内网模型时必须设置 NO_PROXY 环境变量
no_think=True 可关闭模型思考模式，加快响应

安全使用建议

This skill is instruction-only but incomplete: it expects helper modules (interface_http, utils_boxes) that are not bundled or listed as dependencies, and it tells you to bypass proxies and points at an agent config path under /root. Before using: (1) obtain and review the referenced helper code from a trusted source or ask the publisher to include them; (2) confirm the model endpoint URL and any required auth—do not call unknown internal IPs unless you trust them; (3) avoid setting NO_PROXY/no_proxy globally on shared systems—use it only in a confined environment or for a single process; (4) be cautious about accessing or modifying /root/.openclaw/agents/... unless you control the host; (5) if you need this capability but prefer safer integration, ask for a packaged implementation (with clear deps and an install step) or run the workflow in an isolated environment. I have medium confidence because the document plausibly describes a grounding workflow, but the missing code and system-level hints create significant ambiguity.

功能分析

Type: OpenClaw Skill Name: grounding-anything Version: 1.0.0 The skill bundle provides instructions and Python snippets for performing image grounding (object detection) using the GLM-4.7V model. It utilizes standard networking practices like setting 'NO_PROXY' for internal API calls (e.g., to 172.20.112.202) and includes utility functions for parsing coordinates and visualizing results. No malicious patterns such as data exfiltration, unauthorized execution, or harmful prompt injections were found in SKILL.md or _meta.json.

能力评估

⚠ Purpose & Capability

The SKILL.md describes calling a GLM-4.7V model API to return bounding boxes — that fits the stated purpose. However, the instructions assume the existence of helper modules (interface_http, utils_boxes) and a reachable internal model host, but the skill bundle includes no code or declared dependencies. It's unclear where those helpers come from or why no dependency or install steps are declared.

⚠ Instruction Scope

Runtime instructions tell the agent to set NO_PROXY/no_proxy and call an internal HTTP model endpoint (examples use 172.20.112.202). The doc also references an agent config file at /root/.openclaw/agents/main/agent/models.json. Those steps cross from mere API usage into system configuration and internal network access, and they reference a root-owned config path—this expands the scope beyond simple image processing.

ℹ Install Mechanism

There is no install specification (instruction-only), which is low risk in itself. However, because the instructions rely on helper modules that are not included or declared, the skill as provided is incomplete; installing or sourcing those helpers from an external location would introduce additional risk that is not specified.

ℹ Credentials

The skill declares no required env vars or credentials, but the instructions explicitly instruct users to set NO_PROXY/no_proxy to bypass proxies for the model host. Requiring proxy bypass is plausible for calling an on-prem model, but it's unusual that the skill doesn't document or require the model host URL or any credential/authorization mechanism; this mismatch is surprising and should be clarified.

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges and is user-invocable only. It does reference the agent models.json path, but it does not itself request to modify persistent agent configuration in the provided material.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install grounding-anything
安装完成后，直接呼叫该 Skill 的名称或使用 /grounding-anything 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Grounding-anything 1.0.0 – initial release - Introduces multimodal grounding using GLM-4.7V to detect and locate objects or text in images. - Activates when prompted to find, locate, detect, or ground specific elements in an image, including Chinese language triggers. - Provides a step-by-step workflow: API call, bounding box parsing, and visual output with labeled boxes. - Includes utility functions for API interaction, response parsing, coordinate normalization, and visualization. - Emphasizes configuration steps and environment variable setup for internal network access.

元数据

Slug grounding-anything

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题