← 返回 Skills 市场

ifly-image-understanding

Name: ifly-image-understanding
Author: qingzhe2020

作者 Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

214

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ifly-image-understanding

功能描述

iFlytek Image Understanding (图片理解) — analyze and answer questions about images using Spark Vision model. WebSocket API, pure Python stdlib, no pip dependencies.

使用说明 (SKILL.md)

ifly-image-understanding

Analyze images and answer questions about their content using iFlytek's Spark Vision model (图片理解).

Setup

Create an app at 讯飞控制台 with 图片理解 service enabled

Set environment variables:

export IFLY_APP_ID="your_app_id"
export IFLY_API_KEY="your_api_key"
export IFLY_API_SECRET="your_api_secret"

Usage

Describe an image

python3 scripts/image_understanding.py photo.jpg

Ask a question about an image

python3 scripts/image_understanding.py photo.jpg -q "图片里有什么动物？"

Use basic model (lower token cost)

python3 scripts/image_understanding.py photo.jpg --domain general

Options

Flag	Short	Description
`image`		Image file path (.jpg, .jpeg, .png)
`--question`	`-q`	Question about the image (default: describe)
`--domain`	`-d`	`imagev3` (advanced, default) or `general` (basic, fixed 273 tokens/image)
`--temperature`	`-t`	Sampling temperature (0,1], default 0.5
`--max-tokens`		Max response tokens 1-8192, default 2048
`--raw`		Output raw WebSocket JSON frames

Examples

# OCR a receipt
python3 scripts/image_understanding.py receipt.png -q "总金额是多少？"

# Identify objects
python3 scripts/image_understanding.py scene.jpg -q "图片中有哪些物体？"

# Low-cost basic model
python3 scripts/image_understanding.py chart.png -q "图表的趋势是什么？" -d general

Notes

Image formats: .jpg, .jpeg, .png
Max image size: 4MB
Max tokens: 8192 (input + output combined)
Auth: HMAC-SHA256 signed WebSocket URL
Endpoint: wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image
Pure stdlib: No pip dependencies — uses built-in socket + ssl for WebSocket
Model versions: imagev3 (advanced, dynamic token cost) vs general (basic, fixed 273 tokens/image)

错误码说明 😢

遇到错误先别慌～看看下面找到对应的解决方法吧！✨

错误码	错误信息	解决办法
0	🎉 成功	恭喜你！请求正常完成啦～
10003	用户的消息格式有错误	检查一下你的请求格式是否正确哦～确保发送的是合法的JSON格式呢！
10004	用户数据的schema错误	看起来数据结构有点问题～请检查一下字段名称和类型是否正确呀！
10005	用户参数值有错误	参数值可能不太对呢～仔细核对一下每个参数的有效范围吧！
10006	用户并发错误：同一用户不能多处同时连接	检测到重复连接啦！请确保只有一个客户端在连接同一个用户ID哦～
10013	用户问题涉及敏感信息，审核不通过	哎呀，你的问题可能包含了一些不太合适的内容～换个问题试试看吧！
10022	模型生产的图片涉及敏感信息，审核不通过	生成的图片没有通过审核呢...很抱歉，换张图片再试一下吧！
10029	图片任何一边的长度超过12800	图片尺寸太大啦！请确保图片宽高都不超过12800像素哦～
10041	图片分辨率不符合要求	图片尺寸不合适的呢～要求是：50×50 \x3C 图片总像素值 \x3C 6000×6000 哦！
10907	Token数量超过上限	内容太丰富啦！对话历史+问题的字数太多，需要精简一下输入哦～

💡 小贴士：如果还有其他问题，可以查看官方文档或者联系技术支持哦！

常见问题 🤔

图片理解的主要功能是什么呀？🐱

答：用户输入一张图片和问题，从而识别出图片中的对象、场景等信息，然后回答你的问题～是不是很方便呢！✨

图片理解支持什么应用平台呢？📱

答：目前支持 Web API 应用平台哦！直接在代码里调用就可以啦～

图片理解的文本大小限制多少呀？📝

答：有效内容不能超过 8192 Token 呢～如果超过了就要精简一下输入啦！

更多资源 📚

📖 使用文档：https://console.xfyun.cn/services/image
🛒 购买套餐：https://console.xfyun.cn/sale/buy?wareId=9046&packageId=9046002&serviceName=%E5%9B%BE%E7%89%87%E7%90%86%E8%A7%A3&businessId=image

有更多问题随时来问我哦～祝你使用愉快！🌸

安全使用建议

Before installing or using this skill: 1) Recognize that the script will send the full image bytes and your question to iFlytek's cloud (wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image). Do not use it with sensitive images or questions unless you trust the service and your agreement with it. 2) The SKILL.md and script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — verify the registry listing or installer prompts these; do not paste secrets into unexpected places. 3) Remove or inspect the included .claude/settings.local.json before use: it contains a Read(...) permission pointing at a user's Desktop path and a zip command, which is unrelated to normal runtime and may be an accidental leak of local packaging settings. 4) Run the script in a controlled environment (isolated user account or container) if you have any doubt. 5) If you need higher assurance, ask the author to (a) update the registry metadata to declare required env vars, (b) remove any local .claude files from the published package, and (c) confirm the only network destination is the documented iFlytek WebSocket endpoint.

功能分析

Type: OpenClaw Skill Name: ifly-image-understanding Version: 1.0.0 The skill provides a legitimate interface to the iFlytek Spark Vision API for image analysis. The Python script (scripts/image_understanding.py) implements a standard HMAC-signed WebSocket client using only the Python standard library to communicate with the official iFlytek endpoint (spark-api.cn-huabei-1.xf-yun.com). The code is well-structured, follows the stated purpose, and contains no evidence of data exfiltration, malicious execution, or prompt injection.

能力评估

ℹ Purpose & Capability

The name/description (iFlytek Image Understanding) match the code: the script reads a local image, HMAC-signs requests, and connects to iFlytek's Spark Vision WebSocket endpoint. However, the skill registry metadata declares no required environment variables while both SKILL.md and the script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — this metadata mismatch is unexpected and should be corrected or explained.

✓ Instruction Scope

The runtime instructions are narrowly scoped: set iFlytek credentials, supply a local image, and run the Python script which transmits the image and question to the documented iFlytek wss endpoint. There are no instructions to read arbitrary files or exfiltrate data to other endpoints. Note: transmitted data includes the raw image and any question text (so avoid sending sensitive images/queries).

✓ Install Mechanism

There is no install spec and the code uses only Python stdlib; no external downloads or package installs are requested. This is a low-risk install mechanism in itself.

⚠ Credentials

Requesting IFLY_APP_ID / IFLY_API_KEY / IFLY_API_SECRET is proportionate to authenticating to iFlytek and is expected. The concern is twofold: (1) the registry metadata does not declare these required env vars (inconsistency that could mislead users), and (2) the repository includes a .claude/settings.local.json file granting Read(...) permission to a specific user's Desktop path and a zip command — that config is unrelated to the skill's runtime needs and suggests accidental inclusion of local packaging settings that could disclose or request broader filesystem access.

✓ Persistence & Privilege

The skill does not request always:true, does not declare persistent installation steps, and the code does not modify other skills or system-wide settings. It performs one-off WebSocket calls during invocation.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ifly-image-understanding
安装完成后，直接呼叫该 Skill 的名称或使用 /ifly-image-understanding 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of iFlytek Image Understanding skill - Analyze and answer questions about images using Spark Vision model via WebSocket API. - Pure Python standard library; no pip dependencies required. - Supports flexible querying: general description, specific questions, and basic/advanced model selection. - Configurable options for domain, temperature, max tokens, and raw output. - Includes error code explanations and usage examples. - Max image size 4MB, supported formats: .jpg, .jpeg, .png.

元数据

Slug ifly-image-understanding

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题