← Back to Skills Marketplace
visual-understanding
by
IsabellaZhangYM
· GitHub ↗
· v0.0.5
428
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install visual-understanding
Description
智谱 GLM-4.6V 多模态视觉模型集成插件。支持本地图像解析(Base64)及公网链接读取。优先提供 zai SDK 接入,并包含 cURL 原生降级方案。
Usage Guidance
Before installing or using this skill: (1) Confirm the publisher/source (the registry metadata omits required vars/deps that SKILL.md requires). Verify the project's repo and release artifacts at the listed homepage. (2) Inspect the 'zai' Python package (PyPI/GitHub) and review its code or provenance before pip installing — unvetted packages are a supply-chain risk. (3) Understand privacy impact: local images will be base64-encoded and uploaded to open.bigmodel.cn; do not send PII/confidential images unless you trust the service and your organization’s policy. (4) Use a limited/ephemeral ZHIPUAI_API_KEY if possible and test in an isolated environment or container. (5) If metadata remains inconsistent, ask the skill publisher to correct the registry fields (declare ZHIPUAI_API_KEY and the dependency) before trusting automated use.
Capability Analysis
Type: OpenClaw Skill
Name: visual-understanding
Version: 0.0.5
The skill demonstrates functionality involving local file system access (reading image files via `os.path.exists` and `open()` in `skill.md`) and making network requests (via `zai` SDK and `curl` to `open.bigmodel.cn`). While these capabilities are necessary for the skill's stated purpose of image processing, they introduce risky operations. If the OpenClaw agent were to execute these code examples with untrusted or unsanitized user input for file paths or URLs, it could lead to vulnerabilities such as arbitrary file reading (path traversal) or Server-Side Request Forgery (SSRF). This constitutes a significant risk if not handled securely by the consuming agent, classifying it as suspicious due to these risky capabilities without clear malicious intent within the skill itself.
Capability Assessment
Purpose & Capability
The SKILL.md content matches the declared purpose (integrating Zhipu/GLM-4.6V for image understanding). However the registry metadata provided with the skill claims no required env vars and no install steps, while the SKILL.md explicitly requires ZHIPUAI_API_KEY and the 'zai' Python package. This metadata vs. instruction mismatch is an incoherence that should be resolved by the publisher.
Instruction Scope
Runtime instructions are scoped to the stated feature set: they read local image files, base64-encode them, and send them (or public image URLs) to bigmodel.cn via the zai SDK or cURL. That behavior is expected for an image-understanding connector, but it means user local images (potentially containing sensitive data) will be transmitted to an external service. The doc does not instruct reading unrelated system files or secrets beyond ZHIPUAI_API_KEY.
Install Mechanism
There is no install spec in the registry metadata, yet SKILL.md recommends 'pip install zai' and lists a python dependency. This inconsistency is concerning: the skill will not be installed automatically but expects you to pip-install an external package named 'zai' (unknown provenance here). Installing third-party packages adds supply-chain risk; you should verify the 'zai' package source (PyPI repo, GitHub) and review its code before installing.
Credentials
The only credential the SKILL.md requires is ZHIPUAI_API_KEY, which is appropriate for a connector to Zhipu's Open API. The registry metadata, however, lists no required env vars — again a mismatch. The skill's use of an API key is proportionate, but granting that key will allow the skill to send images and data to the external bigmodel.cn service, so treat the key as sensitive and consider using a scoped/ephemeral key.
Persistence & Privilege
The skill does not request persistent or elevated platform privileges (always:false, no config paths, no code files). It is user-invocable and allows autonomous model invocation by default, which is normal. There is no evidence it modifies other skills or system-wide settings.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install visual-understanding - After installation, invoke the skill by name or use
/visual-understanding - Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.5
- Added integration guide for connecting to Zhipu GLM-4.6V multimodal vision model.
- Provides two access methods: Python SDK (recommended, supports local image Base64 upload) and cURL (for public image URLs only, zero dependency).
- Documents secure API key usage via environment variable.
- Includes code samples, best practices, and security/data privacy reminders.
- Explains application scenarios, advantages, and limitations for each method.
Metadata
Frequently Asked Questions
What is visual-understanding?
智谱 GLM-4.6V 多模态视觉模型集成插件。支持本地图像解析(Base64)及公网链接读取。优先提供 zai SDK 接入,并包含 cURL 原生降级方案。 It is an AI Agent Skill for Claude Code / OpenClaw, with 428 downloads so far.
How do I install visual-understanding?
Run "/install visual-understanding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is visual-understanding free?
Yes, visual-understanding is completely free (open-source). You can download, install and use it at no cost.
Which platforms does visual-understanding support?
visual-understanding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created visual-understanding?
It is built and maintained by IsabellaZhangYM (@isabellazhangym); the current version is v0.0.5.
More Skills