← Back to Skills Marketplace

Iflytek Image Understanding

Name: Iflytek Image Understanding
Author: iflytek.skills

by iFly-Skills by iFLYTEK · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ pending

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install iflytek-image-understanding

Description

Use when user asks to analyze an image, describe image contents, or answer questions about a picture. iFlytek Image Understanding (图片理解) — analyze and answer...

README (SKILL.md)

ifly-image-understanding

Analyze images and answer questions about their content using iFlytek's Spark Vision model (图片理解).

Setup

Create an app at 讯飞控制台 with 图片理解 service enabled

Set environment variables:

export IFLY_APP_ID="your_app_id"
export IFLY_API_KEY="your_api_key"
export IFLY_API_SECRET="your_api_secret"

Usage

Describe an image

python3 scripts/image_understanding.py photo.jpg

Ask a question about an image

python3 scripts/image_understanding.py photo.jpg -q "图片里有什么动物？"

Use basic model (lower token cost)

python3 scripts/image_understanding.py photo.jpg --domain general

Options

Flag	Short	Description
`image`		Image file path (.jpg, .jpeg, .png)
`--question`	`-q`	Question about the image (default: describe)
`--domain`	`-d`	`imagev3` (advanced, default) or `general` (basic, fixed 273 tokens/image)
`--temperature`	`-t`	Sampling temperature (0,1], default 0.5
`--max-tokens`		Max response tokens 1-8192, default 2048
`--raw`		Output raw WebSocket JSON frames

Examples

# OCR a receipt
python3 scripts/image_understanding.py receipt.png -q "总金额是多少？"

# Identify objects
python3 scripts/image_understanding.py scene.jpg -q "图片中有哪些物体？"

# Low-cost basic model
python3 scripts/image_understanding.py chart.png -q "图表的趋势是什么？" -d general

Notes

Image formats: .jpg, .jpeg, .png
Max image size: 4MB
Max tokens: 8192 (input + output combined)
Auth: HMAC-SHA256 signed WebSocket URL
Endpoint: wss://spark-api.cn-huabei-1.xf-yun.com/v2.1/image
Pure stdlib: No pip dependencies — uses built-in socket + ssl for WebSocket
Model versions: imagev3 (advanced, dynamic token cost) vs general (basic, fixed 273 tokens/image)

错误码说明 😢

遇到错误先别慌～看看下面找到对应的解决方法吧！✨

错误码	错误信息	解决办法
0	🎉 成功	恭喜你！请求正常完成啦～
10003	用户的消息格式有错误	检查一下你的请求格式是否正确哦～确保发送的是合法的JSON格式呢！
10004	用户数据的schema错误	看起来数据结构有点问题～请检查一下字段名称和类型是否正确呀！
10005	用户参数值有错误	参数值可能不太对呢～仔细核对一下每个参数的有效范围吧！
10006	用户并发错误：同一用户不能多处同时连接	检测到重复连接啦！请确保只有一个客户端在连接同一个用户ID哦～
10013	用户问题涉及敏感信息，审核不通过	哎呀，你的问题可能包含了一些不太合适的内容～换个问题试试看吧！
10022	模型生产的图片涉及敏感信息，审核不通过	生成的图片没有通过审核呢...很抱歉，换张图片再试一下吧！
10029	图片任何一边的长度超过12800	图片尺寸太大啦！请确保图片宽高都不超过12800像素哦～
10041	图片分辨率不符合要求	图片尺寸不合适的呢～要求是：50×50 \x3C 图片总像素值 \x3C 6000×6000 哦！
10907	Token数量超过上限	内容太丰富啦！对话历史+问题的字数太多，需要精简一下输入哦～

💡 小贴士：如果还有其他问题，可以查看官方文档或者联系技术支持哦！

常见问题 🤔

图片理解的主要功能是什么呀？🐱

答：用户输入一张图片和问题，从而识别出图片中的对象、场景等信息，然后回答你的问题～是不是很方便呢！✨

图片理解支持什么应用平台呢？📱

答：目前支持 Web API 应用平台哦！直接在代码里调用就可以啦～

图片理解的文本大小限制多少呀？📝

答：有效内容不能超过 8192 Token 呢～如果超过了就要精简一下输入啦！

更多资源 📚

📖 使用文档：https://console.xfyun.cn/services/image
🛒 购买套餐：https://console.xfyun.cn/sale/buy?wareId=9046&packageId=9046002&serviceName=%E5%9B%BE%E7%89%87%E7%90%86%E8%A7%A3&businessId=image

有更多问题随时来问我哦～祝你使用愉快！🌸

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install iflytek-image-understanding
After installation, invoke the skill by name or use /iflytek-image-understanding
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of iflytek-image-understanding skill. - Supports image analysis, description, and visual question answering using the iFlytek Spark Vision model. - Pure Python implementation with no pip dependencies; uses stdlib for WebSocket communication. - Accepts .jpg, .jpeg, and .png images (max 4MB; 50×50 < pixels < 6000×6000). - Includes detailed instructions, usage examples, error code explanations, and troubleshooting tips. - Requires environment variables (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET) for authentication.

Metadata

Slug iflytek-image-understanding

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Iflytek Image Understanding?

Use when user asks to analyze an image, describe image contents, or answer questions about a picture. iFlytek Image Understanding (图片理解) — analyze and answer... It is an AI Agent Skill for Claude Code / OpenClaw, with 16 downloads so far.

How do I install Iflytek Image Understanding?

Run "/install iflytek-image-understanding" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Iflytek Image Understanding free?

Yes, Iflytek Image Understanding is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Iflytek Image Understanding support?

Iflytek Image Understanding is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Iflytek Image Understanding?

It is built and maintained by iFly-Skills by iFLYTEK (@iflytek.skills); the current version is v1.0.0.

More Skills