Iflytek Image Understanding
/install iflytek-image-understanding
ifly-image-understanding
Analyze images and answer questions about their content using iFlytek's Spark Vision model (图片理解).
Setup
- Create an app at 讯飞控制台 with 图片理解 service enabled
- Set environment variables:
export IFLY_APP_ID="your_app_id" export IFLY_API_KEY="your_api_key" export IFLY_API_SECRET="your_api_secret"
Usage
Describe an image
python3 scripts/image_understanding.py photo.jpg
Ask a question about an image
python3 scripts/image_understanding.py photo.jpg -q "图片里有什么动物?"
Use basic model (lower token cost)
python3 scripts/image_understanding.py photo.jpg --domain general
Options
| Flag | Short | Description |
|---|---|---|
image |
Image file path (.jpg, .jpeg, .png) | |
--question |
-q |
Question about the image (default: describe) |
--domain |
-d |
imagev3 (advanced, default) or general (basic, fixed 273 tokens/image) |
--temperature |
-t |
Sampling temperature (0,1], default 0.5 |
--max-tokens |
Max response tokens 1-8192, default 2048 | |
--raw |
Output raw WebSocket JSON frames |
Examples
# OCR a receipt
python3 scripts/image_understanding.py receipt.png -q "总金额是多少?"
# Identify objects
python3 scripts/image_understanding.py scene.jpg -q "图片中有哪些物体?"
# Low-cost basic model
python3 scripts/image_understanding.py chart.png -q "图表的趋势是什么?" -d general
Notes
- Image formats: .jpg, .jpeg, .png
- Max image size: 4MB
- Max tokens: 8192 (input + output combined)
- Auth: HMAC-SHA256 signed WebSocket URL
- Endpoint:
wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image - Pure stdlib: No pip dependencies — uses built-in
socket+sslfor WebSocket - Model versions:
imagev3(advanced, dynamic token cost) vsgeneral(basic, fixed 273 tokens/image)
错误码说明 😢
遇到错误先别慌~看看下面找到对应的解决方法吧!✨
| 错误码 | 错误信息 | 解决办法 |
|---|---|---|
| 0 | 🎉 成功 | 恭喜你!请求正常完成啦~ |
| 10003 | 用户的消息格式有错误 | 检查一下你的请求格式是否正确哦~确保发送的是合法的JSON格式呢! |
| 10004 | 用户数据的schema错误 | 看起来数据结构有点问题~请检查一下字段名称和类型是否正确呀! |
| 10005 | 用户参数值有错误 | 参数值可能不太对呢~仔细核对一下每个参数的有效范围吧! |
| 10006 | 用户并发错误:同一用户不能多处同时连接 | 检测到重复连接啦!请确保只有一个客户端在连接同一个用户ID哦~ |
| 10013 | 用户问题涉及敏感信息,审核不通过 | 哎呀,你的问题可能包含了一些不太合适的内容~换个问题试试看吧! |
| 10022 | 模型生产的图片涉及敏感信息,审核不通过 | 生成的图片没有通过审核呢...很抱歉,换张图片再试一下吧! |
| 10029 | 图片任何一边的长度超过12800 | 图片尺寸太大啦!请确保图片宽高都不超过12800像素哦~ |
| 10041 | 图片分辨率不符合要求 | 图片尺寸不合适的呢~要求是:50×50 \x3C 图片总像素值 \x3C 6000×6000 哦! |
| 10907 | Token数量超过上限 | 内容太丰富啦!对话历史+问题的字数太多,需要精简一下输入哦~ |
💡 小贴士:如果还有其他问题,可以查看官方文档或者联系技术支持哦!
常见问题 🤔
图片理解的主要功能是什么呀?🐱
答:用户输入一张图片和问题,从而识别出图片中的对象、场景等信息,然后回答你的问题~是不是很方便呢!✨
图片理解支持什么应用平台呢?📱
答:目前支持 Web API 应用平台哦!直接在代码里调用就可以啦~
图片理解的文本大小限制多少呀?📝
答:有效内容不能超过 8192 Token 呢~如果超过了就要精简一下输入啦!
更多资源 📚
- 📖 使用文档:https://console.xfyun.cn/services/image
- 🛒 购买套餐:https://console.xfyun.cn/sale/buy?wareId=9046&packageId=9046002&serviceName=%E5%9B%BE%E7%89%87%E7%90%86%E8%A7%A3&businessId=image
有更多问题随时来问我哦~祝你使用愉快!🌸
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install iflytek-image-understanding - After installation, invoke the skill by name or use
/iflytek-image-understanding - Provide required inputs per the skill's parameter spec and get structured output
What is Iflytek Image Understanding?
Use when user asks to analyze an image, describe image contents, or answer questions about a picture. iFlytek Image Understanding (图片理解) — analyze and answer... It is an AI Agent Skill for Claude Code / OpenClaw, with 16 downloads so far.
How do I install Iflytek Image Understanding?
Run "/install iflytek-image-understanding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Iflytek Image Understanding free?
Yes, Iflytek Image Understanding is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Iflytek Image Understanding support?
Iflytek Image Understanding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Iflytek Image Understanding?
It is built and maintained by iFly-Skills by iFLYTEK (@iflytek.skills); the current version is v1.0.0.