← 返回 Skills 市场
qingzhe2020

ifly-image-understanding

作者 Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
214
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install ifly-image-understanding
功能描述
iFlytek Image Understanding (图片理解) — analyze and answer questions about images using Spark Vision model. WebSocket API, pure Python stdlib, no pip dependencies.
使用说明 (SKILL.md)

ifly-image-understanding

Analyze images and answer questions about their content using iFlytek's Spark Vision model (图片理解).

Setup

  1. Create an app at 讯飞控制台 with 图片理解 service enabled
  2. Set environment variables:
    export IFLY_APP_ID="your_app_id"
    export IFLY_API_KEY="your_api_key"
    export IFLY_API_SECRET="your_api_secret"
    

Usage

Describe an image

python3 scripts/image_understanding.py photo.jpg

Ask a question about an image

python3 scripts/image_understanding.py photo.jpg -q "图片里有什么动物?"

Use basic model (lower token cost)

python3 scripts/image_understanding.py photo.jpg --domain general

Options

Flag Short Description
image Image file path (.jpg, .jpeg, .png)
--question -q Question about the image (default: describe)
--domain -d imagev3 (advanced, default) or general (basic, fixed 273 tokens/image)
--temperature -t Sampling temperature (0,1], default 0.5
--max-tokens Max response tokens 1-8192, default 2048
--raw Output raw WebSocket JSON frames

Examples

# OCR a receipt
python3 scripts/image_understanding.py receipt.png -q "总金额是多少?"

# Identify objects
python3 scripts/image_understanding.py scene.jpg -q "图片中有哪些物体?"

# Low-cost basic model
python3 scripts/image_understanding.py chart.png -q "图表的趋势是什么?" -d general

Notes

  • Image formats: .jpg, .jpeg, .png
  • Max image size: 4MB
  • Max tokens: 8192 (input + output combined)
  • Auth: HMAC-SHA256 signed WebSocket URL
  • Endpoint: wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image
  • Pure stdlib: No pip dependencies — uses built-in socket + ssl for WebSocket
  • Model versions: imagev3 (advanced, dynamic token cost) vs general (basic, fixed 273 tokens/image)

错误码说明 😢

遇到错误先别慌~看看下面找到对应的解决方法吧!✨

错误码 错误信息 解决办法
0 🎉 成功 恭喜你!请求正常完成啦~
10003 用户的消息格式有错误 检查一下你的请求格式是否正确哦~确保发送的是合法的JSON格式呢!
10004 用户数据的schema错误 看起来数据结构有点问题~请检查一下字段名称和类型是否正确呀!
10005 用户参数值有错误 参数值可能不太对呢~仔细核对一下每个参数的有效范围吧!
10006 用户并发错误:同一用户不能多处同时连接 检测到重复连接啦!请确保只有一个客户端在连接同一个用户ID哦~
10013 用户问题涉及敏感信息,审核不通过 哎呀,你的问题可能包含了一些不太合适的内容~换个问题试试看吧!
10022 模型生产的图片涉及敏感信息,审核不通过 生成的图片没有通过审核呢...很抱歉,换张图片再试一下吧!
10029 图片任何一边的长度超过12800 图片尺寸太大啦!请确保图片宽高都不超过12800像素哦~
10041 图片分辨率不符合要求 图片尺寸不合适的呢~要求是:50×50 \x3C 图片总像素值 \x3C 6000×6000 哦!
10907 Token数量超过上限 内容太丰富啦!对话历史+问题的字数太多,需要精简一下输入哦~

💡 小贴士:如果还有其他问题,可以查看官方文档或者联系技术支持哦!


常见问题 🤔

图片理解的主要功能是什么呀?🐱

答:用户输入一张图片和问题,从而识别出图片中的对象、场景等信息,然后回答你的问题~是不是很方便呢!✨

图片理解支持什么应用平台呢?📱

答:目前支持 Web API 应用平台哦!直接在代码里调用就可以啦~

图片理解的文本大小限制多少呀?📝

答:有效内容不能超过 8192 Token 呢~如果超过了就要精简一下输入啦!


更多资源 📚

  • 📖 使用文档:https://console.xfyun.cn/services/image
  • 🛒 购买套餐:https://console.xfyun.cn/sale/buy?wareId=9046&packageId=9046002&serviceName=%E5%9B%BE%E7%89%87%E7%90%86%E8%A7%A3&businessId=image

有更多问题随时来问我哦~祝你使用愉快!🌸

安全使用建议
Before installing or using this skill: 1) Recognize that the script will send the full image bytes and your question to iFlytek's cloud (wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image). Do not use it with sensitive images or questions unless you trust the service and your agreement with it. 2) The SKILL.md and script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — verify the registry listing or installer prompts these; do not paste secrets into unexpected places. 3) Remove or inspect the included .claude/settings.local.json before use: it contains a Read(...) permission pointing at a user's Desktop path and a zip command, which is unrelated to normal runtime and may be an accidental leak of local packaging settings. 4) Run the script in a controlled environment (isolated user account or container) if you have any doubt. 5) If you need higher assurance, ask the author to (a) update the registry metadata to declare required env vars, (b) remove any local .claude files from the published package, and (c) confirm the only network destination is the documented iFlytek WebSocket endpoint.
功能分析
Type: OpenClaw Skill Name: ifly-image-understanding Version: 1.0.0 The skill provides a legitimate interface to the iFlytek Spark Vision API for image analysis. The Python script (scripts/image_understanding.py) implements a standard HMAC-signed WebSocket client using only the Python standard library to communicate with the official iFlytek endpoint (spark-api.cn-huabei-1.xf-yun.com). The code is well-structured, follows the stated purpose, and contains no evidence of data exfiltration, malicious execution, or prompt injection.
能力评估
Purpose & Capability
The name/description (iFlytek Image Understanding) match the code: the script reads a local image, HMAC-signs requests, and connects to iFlytek's Spark Vision WebSocket endpoint. However, the skill registry metadata declares no required environment variables while both SKILL.md and the script require IFLY_APP_ID, IFLY_API_KEY, and IFLY_API_SECRET — this metadata mismatch is unexpected and should be corrected or explained.
Instruction Scope
The runtime instructions are narrowly scoped: set iFlytek credentials, supply a local image, and run the Python script which transmits the image and question to the documented iFlytek wss endpoint. There are no instructions to read arbitrary files or exfiltrate data to other endpoints. Note: transmitted data includes the raw image and any question text (so avoid sending sensitive images/queries).
Install Mechanism
There is no install spec and the code uses only Python stdlib; no external downloads or package installs are requested. This is a low-risk install mechanism in itself.
Credentials
Requesting IFLY_APP_ID / IFLY_API_KEY / IFLY_API_SECRET is proportionate to authenticating to iFlytek and is expected. The concern is twofold: (1) the registry metadata does not declare these required env vars (inconsistency that could mislead users), and (2) the repository includes a .claude/settings.local.json file granting Read(...) permission to a specific user's Desktop path and a zip command — that config is unrelated to the skill's runtime needs and suggests accidental inclusion of local packaging settings that could disclose or request broader filesystem access.
Persistence & Privilege
The skill does not request always:true, does not declare persistent installation steps, and the code does not modify other skills or system-wide settings. It performs one-off WebSocket calls during invocation.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ifly-image-understanding
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ifly-image-understanding 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of iFlytek Image Understanding skill - Analyze and answer questions about images using Spark Vision model via WebSocket API. - Pure Python standard library; no pip dependencies required. - Supports flexible querying: general description, specific questions, and basic/advanced model selection. - Configurable options for domain, temperature, max tokens, and raw output. - Includes error code explanations and usage examples. - Max image size 4MB, supported formats: .jpg, .jpeg, .png.
元数据
Slug ifly-image-understanding
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

ifly-image-understanding 是什么?

iFlytek Image Understanding (图片理解) — analyze and answer questions about images using Spark Vision model. WebSocket API, pure Python stdlib, no pip dependencies. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 214 次。

如何安装 ifly-image-understanding?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ifly-image-understanding」即可一键安装,无需额外配置。

ifly-image-understanding 是免费的吗?

是的,ifly-image-understanding 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ifly-image-understanding 支持哪些平台?

ifly-image-understanding 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ifly-image-understanding?

由 Iflytek AIcloud(@qingzhe2020)开发并维护,当前版本 v1.0.0。

💬 留言讨论