← 返回 Skills 市场

Multimodal Recognize Image

Name: Multimodal Recognize Image
Author: linkfox-ai

作者 linkfox-ai · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

111

总下载

当前安装

版本数

在 OpenClaw 中安装

/install linkfox-multimodal-recognize-image

功能描述

基于多模态AI的图片识别与分析。当用户想分析、描述、从图片URL中提取信息、image recognition, image analysis, image description, image content understanding, OCR text recognition, visual Q&A时触发此...

使用说明 (SKILL.md)

Image Recognition

This skill guides you on how to use the multimodal image recognition API to analyze images from URLs and extract meaningful information based on user intent.

Core Concepts

The Image Recognition tool accepts an image URL and an optional natural-language requirement describing what the user wants to know about the image. The backend uses a multimodal AI model to interpret the visual content and return a textual description or analysis.

Supported formats: JPG, JPEG, PNG, GIF, WebP, BMP.

How it works: You provide a publicly accessible image URL and a requirement (what you want to learn from the image). The service downloads the image, runs multimodal analysis, and returns a text-based result.

Parameter Guide

Parameter	Required	Description
imageUrl	Yes	A publicly accessible URL pointing to the image. Must be JPG, JPEG, PNG, GIF, WebP, or BMP. Maximum 1000 characters.
requirement	No	A natural-language description of what to identify or analyze in the image. Defaults to "Describe the content of this image" when omitted. Maximum 1000 characters.

Tips for Writing the requirement Parameter

Be specific: Instead of "analyze this image", say "List all products visible on the shelf and estimate their category."
State the goal: If you need text extraction, say "Extract all visible text from the image." If you need object identification, say "Identify the main objects and their colors."
Provide context when helpful: For product images, mention "This is an e-commerce product listing image" so the model can tailor its analysis.

Usage Examples

1. General Image Description

User says: "What is in this picture?"
Set imageUrl to the provided URL, leave requirement as default.

2. Product Image Analysis

User says: "Analyze this Amazon product image and list the key selling points shown."
Set requirement to: "This is an Amazon product listing image. Identify the product, key features, and selling points visible in the image."

3. Text Extraction from an Image

User says: "Read the text in this screenshot."
Set requirement to: "Extract all visible text from this image, preserving layout where possible."

4. A+ Page Image Review

User says: "Describe what this A+ content image communicates."
Set requirement to: "This is an Amazon A+ product description image. Describe the visual content, key messaging, and branding elements."

5. Comparison / Detail Inspection

User says: "What differences can you spot between the product and its packaging?"
Set requirement to: "Identify and describe any differences between the product and its packaging shown in the image."

API Usage

This tool calls the LinkFox tool gateway API. See references/api.md for calling conventions, request parameters, and response structure. You can also execute scripts/multimodal_recognize_image.py directly to run queries.

Display Rules

Show the analysis result clearly: Present the returned text analysis in a readable format. Use bullet points or paragraphs as appropriate for the content.
No fabrication: Only relay information that the API actually returned. Do not add visual details that were not in the response.
Format support: If the image URL is invalid or the format is unsupported, explain the limitation and list the supported formats (JPG, JPEG, PNG, GIF, WebP, BMP).
Error handling: When the API returns an error status, explain the issue based on the response and suggest corrective actions (e.g., check that the URL is publicly accessible, verify the image format).
Token usage: If the user asks about cost, you may mention the costToken value from the response.

User Expression & Scenario Quick Reference

Applicable -- Image analysis tasks:

User Says	Scenario
"What's in this image/picture/photo"	General image description
"Analyze this product image"	Product visual analysis
"Read the text in this image"	OCR / text extraction
"Describe the A+ page images"	E-commerce content review
"What does this screenshot show"	Screenshot interpretation
"Identify objects in this photo"	Object detection / listing

Not applicable -- Needs beyond image recognition:

Generating or editing images
Video analysis
Analyzing images from local file paths (only URLs are supported)
Image search or reverse image lookup

Feedback:

Auto-detect and report feedback via the Feedback API when any of the following apply:

The functionality or purpose described in this skill does not match actual behavior
The skill's results do not match the user's intent
The user expresses dissatisfaction or praise about this skill
Anything you believe could be improved

Call the feedback API as specified in references/api.md. Do not interrupt the user's flow.

For more high-quality, professional cross-border e-commerce skills, set LinkFox Skills.

安全使用建议

Before installing, be aware that this skill will send any provided image URLs and the user’s analysis request to external LinkFox endpoints (tool-gateway.linkfox.com and skill-api.linkfox.com). The skill’s code and API docs require an API key held in LINKFOXAGENT_API_KEY, but the registry metadata incorrectly says no env vars are required — ask the publisher to correct that. Don’t provide private or sensitive images unless you trust LinkFox and have reviewed their privacy/security policies. Also confirm what the Feedback API sends (and whether it requires authentication). If you proceed, set the API key in a restricted environment variable, and consider monitoring outbound requests to verify behavior.

功能分析

Type: OpenClaw Skill Name: linkfox-multimodal-recognize-image Version: 1.0.0 The skill bundle provides a legitimate interface for multimodal image recognition via the LinkFox API. The Python script (multimodal_recognize_image.py) safely handles parameters, uses standard libraries for network requests, and requires an API key from environment variables. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.

能力评估

ℹ Purpose & Capability

Name/description align with the behavior: the script and SKILL.md call a LinkFox multimodal image-recognition API and expect image URLs for analysis. However, the registry metadata lists no required environment variables while the API reference and the provided script clearly require an API key (LINKFOXAGENT_API_KEY). This omission is an inconsistency (likely sloppy but important).

ℹ Instruction Scope

Runtime instructions are narrowly scoped to: accept a public image URL + optional requirement, call the LinkFox tool gateway API, and present the returned text. They explicitly say not to handle local files. One additional behavior to note: the SKILL.md instructs the agent to auto-send feedback to a separate Feedback API endpoint under certain conditions, which could transmit user content or metadata to another external service. The instructions do not request unrelated system files or other credentials.

✓ Install Mechanism

No install spec; this is instruction-only with a small helper script included. Nothing is downloaded from arbitrary URLs or installed to the system.

⚠ Credentials

The only credential the tool actually needs is LINKFOXAGENT_API_KEY (used for Authorization to https://tool-gateway.linkfox.com). That credential is proportionate to the described purpose, but the registry metadata incorrectly lists 'none' for required environment variables. This mismatch is important because users won't be warned up-front that an API key will be requested or used. Also the Feedback API is a separate endpoint (skill-api.linkfox.com) with no auth described — it's unclear what is sent and who can access feedback data.

✓ Persistence & Privilege

The skill does not request persistent or elevated privileges (always:false). It does not attempt to modify other skills or system settings and has no install-time setup that would add persistent agents.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install linkfox-multimodal-recognize-image
安装完成后，直接呼叫该 Skill 的名称或使用 /linkfox-multimodal-recognize-image 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release

元数据

Slug linkfox-multimodal-recognize-image

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题