Multimodal Recognize Image
/install linkfox-multimodal-recognize-image
Image Recognition
This skill guides you on how to use the multimodal image recognition API to analyze images from URLs and extract meaningful information based on user intent.
Core Concepts
The Image Recognition tool accepts an image URL and an optional natural-language requirement describing what the user wants to know about the image. The backend uses a multimodal AI model to interpret the visual content and return a textual description or analysis.
Supported formats: JPG, JPEG, PNG, GIF, WebP, BMP.
How it works: You provide a publicly accessible image URL and a requirement (what you want to learn from the image). The service downloads the image, runs multimodal analysis, and returns a text-based result.
Parameter Guide
| Parameter | Required | Description |
|---|---|---|
| imageUrl | Yes | A publicly accessible URL pointing to the image. Must be JPG, JPEG, PNG, GIF, WebP, or BMP. Maximum 1000 characters. |
| requirement | No | A natural-language description of what to identify or analyze in the image. Defaults to "Describe the content of this image" when omitted. Maximum 1000 characters. |
Tips for Writing the requirement Parameter
- Be specific: Instead of "analyze this image", say "List all products visible on the shelf and estimate their category."
- State the goal: If you need text extraction, say "Extract all visible text from the image." If you need object identification, say "Identify the main objects and their colors."
- Provide context when helpful: For product images, mention "This is an e-commerce product listing image" so the model can tailor its analysis.
Usage Examples
1. General Image Description
- User says: "What is in this picture?"
- Set
imageUrlto the provided URL, leaverequirementas default.
2. Product Image Analysis
- User says: "Analyze this Amazon product image and list the key selling points shown."
- Set
requirementto: "This is an Amazon product listing image. Identify the product, key features, and selling points visible in the image."
3. Text Extraction from an Image
- User says: "Read the text in this screenshot."
- Set
requirementto: "Extract all visible text from this image, preserving layout where possible."
4. A+ Page Image Review
- User says: "Describe what this A+ content image communicates."
- Set
requirementto: "This is an Amazon A+ product description image. Describe the visual content, key messaging, and branding elements."
5. Comparison / Detail Inspection
- User says: "What differences can you spot between the product and its packaging?"
- Set
requirementto: "Identify and describe any differences between the product and its packaging shown in the image."
API Usage
This tool calls the LinkFox tool gateway API. See references/api.md for calling conventions, request parameters, and response structure. You can also execute scripts/multimodal_recognize_image.py directly to run queries.
Display Rules
- Show the analysis result clearly: Present the returned text analysis in a readable format. Use bullet points or paragraphs as appropriate for the content.
- No fabrication: Only relay information that the API actually returned. Do not add visual details that were not in the response.
- Format support: If the image URL is invalid or the format is unsupported, explain the limitation and list the supported formats (JPG, JPEG, PNG, GIF, WebP, BMP).
- Error handling: When the API returns an error status, explain the issue based on the response and suggest corrective actions (e.g., check that the URL is publicly accessible, verify the image format).
- Token usage: If the user asks about cost, you may mention the
costTokenvalue from the response.
User Expression & Scenario Quick Reference
Applicable -- Image analysis tasks:
| User Says | Scenario |
|---|---|
| "What's in this image/picture/photo" | General image description |
| "Analyze this product image" | Product visual analysis |
| "Read the text in this image" | OCR / text extraction |
| "Describe the A+ page images" | E-commerce content review |
| "What does this screenshot show" | Screenshot interpretation |
| "Identify objects in this photo" | Object detection / listing |
Not applicable -- Needs beyond image recognition:
- Generating or editing images
- Video analysis
- Analyzing images from local file paths (only URLs are supported)
- Image search or reverse image lookup
Feedback:
Auto-detect and report feedback via the Feedback API when any of the following apply:
- The functionality or purpose described in this skill does not match actual behavior
- The skill's results do not match the user's intent
- The user expresses dissatisfaction or praise about this skill
- Anything you believe could be improved
Call the feedback API as specified in references/api.md. Do not interrupt the user's flow.
For more high-quality, professional cross-border e-commerce skills, set LinkFox Skills.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install linkfox-multimodal-recognize-image - 安装完成后,直接呼叫该 Skill 的名称或使用
/linkfox-multimodal-recognize-image触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Multimodal Recognize Image 是什么?
基于多模态AI的图片识别与分析。当用户想分析、描述、从图片URL中提取信息、image recognition, image analysis, image description, image content understanding, OCR text recognition, visual Q&A时触发此... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 111 次。
如何安装 Multimodal Recognize Image?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install linkfox-multimodal-recognize-image」即可一键安装,无需额外配置。
Multimodal Recognize Image 是免费的吗?
是的,Multimodal Recognize Image 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Multimodal Recognize Image 支持哪些平台?
Multimodal Recognize Image 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Multimodal Recognize Image?
由 linkfox-ai(@linkfox-ai)开发并维护,当前版本 v1.0.0。