← 返回 Skills 市场

image-reader

Name: image-reader
Author: simonjoe246

作者 simonjoe246 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

335

总下载

当前安装

版本数

在 OpenClaw 中安装

/install image-reader

功能描述

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac...

使用说明 (SKILL.md)

Image Reader Skill

Image recognition and understanding tool that leverages Doubao multimodal models to analyze image content.

Features

Text Extraction (OCR): Extract text from images, suitable for documents, screenshots, posters, menus, etc.
Image Description: Generate detailed descriptions of images, suitable for photos, illustrations, memes, UI screens, etc.
General Analysis: Automatically choose the best analysis strategy based on the image type.

API Configuration

Item	Value
API Endpoint	`https://ark.cn-beijing.volces.com/api/coding/v3`
Model	`doubao-seed-2.0-pro`
Authentication	API Key (configured in config.yaml)

Usage

Command Line

# General analysis
python image_reader.py /path/to/image.png

# Extract text (OCR)
python image_reader.py /path/to/image.png -p "Extract all text from the image"

# Describe the image
python image_reader.py /path/to/image.png -p "Describe this image in detail"

OpenClaw Skill Invocation

Once installed, you can invoke it using natural language:

Analyze this image
Extract the text from the image
Describe this screenshot

Output

Text-heavy images: Returns all extracted text, preserving original formatting.
Non-text images: Returns a detailed scene description, including objects, people, colors, style, etc.
Mixed content: Provides both text extraction and a visual description.

Technical Details

Uses an OpenAI-compatible API to call Doubao multimodal models
Images are sent as base64-encoded data
The system prompt adapts to the image type to select the most appropriate analysis strategy

安全使用建议

This skill will upload the full image you provide (encoded as base64) to the API endpoint configured in config.yaml (default: https://ark.cn-beijing.volces.com/api/coding/v3). Before installing or using it: 1) Do not send sensitive images (passwords, government IDs, medical records, proprietary screenshots) unless you trust the remote provider and its privacy policy. 2) Store your API key securely (preferably not committed in plaintext config files); consider using an env var or secret manager and modifying the script to read it from there. 3) Verify the API endpoint and provider (ark.cn-beijing.volces.com / volces.com) and confirm you are comfortable with their data handling. 4) Note the script uses a very large max_tokens value (64000) — this may be unsupported or cause unexpected behavior/billing; consider lowering it. If you want higher confidence, ask the publisher for provenance (homepage, organization), a privacy statement, and explicit guidance on API key handling.

功能分析

Type: OpenClaw Skill Name: image-reader Version: 1.0.0 The 'image-reader' skill is a legitimate tool designed to perform OCR and image description using the Volcengine (Doubao) multimodal API. The Python script (image_reader.py) safely handles image encoding and API communication using the standard OpenAI library, and the configuration (config.yaml) points to a valid service endpoint (ark.cn-beijing.volces.com) without any signs of data exfiltration, malicious execution, or prompt injection.

能力评估

✓ Purpose & Capability

Name, description, SKILL.md, config.yaml, README, and image_reader.py all align: the skill encodes images and calls an OpenAI-compatible multimodal model endpoint to perform OCR/description.

ℹ Instruction Scope

Runtime instructions and the script only read the included config.yaml and the image file, then send the image (base64 data URI) to the configured API endpoint. This is within scope for an image-analysis tool, but it means user images (potentially sensitive) are uploaded to a remote service; SKILL.md does not warn about privacy/PII implications.

✓ Install Mechanism

No install spec is provided (instruction-only plus a Python script). Dependencies are limited to openai and pyyaml as declared. No arbitrary downloads or extract operations are present.

ℹ Credentials

No environment variables are required, but an API key is expected in config.yaml. Storing an API key in a plaintext config file is functional but may be undesirable; README's claim that "default configuration is built in and can be used directly" is ambiguous and could encourage accidental use of embedded credentials if present.

✓ Persistence & Privilege

The skill is user-invocable, not always-enabled, and does not request elevated system privileges or modify other skill configs. It does not persist beyond its files.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install image-reader
安装完成后，直接呼叫该 Skill 的名称或使用 /image-reader 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of the Image Reader Skill. - Supports OCR text extraction from images. - Generates detailed image descriptions for various image types. - Automatically selects the best analysis strategy based on image content. - Compatible with multimodal models(e.g. doubao-seed-2.0-pro, kimi-k2.5) via OpenAI-compatible API. - Offers both command-line usage and natural language skill invocation.

元数据

Slug image-reader

版本 1.0.0

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 1

常见问题

image-reader 是什么？

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 335 次。

如何安装 image-reader？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-reader」即可一键安装，无需额外配置。

image-reader 是免费的吗？

是的，image-reader 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

image-reader 支持哪些平台？

image-reader 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 image-reader？

由 simonjoe246（@simonjoe246）开发并维护，当前版本 v1.0.0。