← Back to Skills Marketplace

image-reader

Name: image-reader
Author: simonjoe246

by simonjoe246 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

335

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install image-reader

Description

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac...

README (SKILL.md)

Image Reader Skill

Image recognition and understanding tool that leverages Doubao multimodal models to analyze image content.

Features

Text Extraction (OCR): Extract text from images, suitable for documents, screenshots, posters, menus, etc.
Image Description: Generate detailed descriptions of images, suitable for photos, illustrations, memes, UI screens, etc.
General Analysis: Automatically choose the best analysis strategy based on the image type.

API Configuration

Item	Value
API Endpoint	`https://ark.cn-beijing.volces.com/api/coding/v3`
Model	`doubao-seed-2.0-pro`
Authentication	API Key (configured in config.yaml)

Usage

Command Line

# General analysis
python image_reader.py /path/to/image.png

# Extract text (OCR)
python image_reader.py /path/to/image.png -p "Extract all text from the image"

# Describe the image
python image_reader.py /path/to/image.png -p "Describe this image in detail"

OpenClaw Skill Invocation

Once installed, you can invoke it using natural language:

Analyze this image
Extract the text from the image
Describe this screenshot

Output

Text-heavy images: Returns all extracted text, preserving original formatting.
Non-text images: Returns a detailed scene description, including objects, people, colors, style, etc.
Mixed content: Provides both text extraction and a visual description.

Technical Details

Uses an OpenAI-compatible API to call Doubao multimodal models
Images are sent as base64-encoded data
The system prompt adapts to the image type to select the most appropriate analysis strategy

Usage Guidance

This skill will upload the full image you provide (encoded as base64) to the API endpoint configured in config.yaml (default: https://ark.cn-beijing.volces.com/api/coding/v3). Before installing or using it: 1) Do not send sensitive images (passwords, government IDs, medical records, proprietary screenshots) unless you trust the remote provider and its privacy policy. 2) Store your API key securely (preferably not committed in plaintext config files); consider using an env var or secret manager and modifying the script to read it from there. 3) Verify the API endpoint and provider (ark.cn-beijing.volces.com / volces.com) and confirm you are comfortable with their data handling. 4) Note the script uses a very large max_tokens value (64000) — this may be unsupported or cause unexpected behavior/billing; consider lowering it. If you want higher confidence, ask the publisher for provenance (homepage, organization), a privacy statement, and explicit guidance on API key handling.

Capability Analysis

Type: OpenClaw Skill Name: image-reader Version: 1.0.0 The 'image-reader' skill is a legitimate tool designed to perform OCR and image description using the Volcengine (Doubao) multimodal API. The Python script (image_reader.py) safely handles image encoding and API communication using the standard OpenAI library, and the configuration (config.yaml) points to a valid service endpoint (ark.cn-beijing.volces.com) without any signs of data exfiltration, malicious execution, or prompt injection.

Capability Assessment

✓ Purpose & Capability

Name, description, SKILL.md, config.yaml, README, and image_reader.py all align: the skill encodes images and calls an OpenAI-compatible multimodal model endpoint to perform OCR/description.

ℹ Instruction Scope

Runtime instructions and the script only read the included config.yaml and the image file, then send the image (base64 data URI) to the configured API endpoint. This is within scope for an image-analysis tool, but it means user images (potentially sensitive) are uploaded to a remote service; SKILL.md does not warn about privacy/PII implications.

✓ Install Mechanism

No install spec is provided (instruction-only plus a Python script). Dependencies are limited to openai and pyyaml as declared. No arbitrary downloads or extract operations are present.

ℹ Credentials

No environment variables are required, but an API key is expected in config.yaml. Storing an API key in a plaintext config file is functional but may be undesirable; README's claim that "default configuration is built in and can be used directly" is ambiguous and could encourage accidental use of embedded credentials if present.

✓ Persistence & Privilege

The skill is user-invocable, not always-enabled, and does not request elevated system privileges or modify other skill configs. It does not persist beyond its files.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install image-reader
After installation, invoke the skill by name or use /image-reader
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of the Image Reader Skill. - Supports OCR text extraction from images. - Generates detailed image descriptions for various image types. - Automatically selects the best analysis strategy based on image content. - Compatible with multimodal models(e.g. doubao-seed-2.0-pro, kimi-k2.5) via OpenAI-compatible API. - Offers both command-line usage and natural language skill invocation.

Metadata

Slug image-reader

Version 1.0.0

License MIT-0

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is image-reader?

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac... It is an AI Agent Skill for Claude Code / OpenClaw, with 335 downloads so far.

How do I install image-reader?

Run "/install image-reader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image-reader free?

Yes, image-reader is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image-reader support?

image-reader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image-reader?

It is built and maintained by simonjoe246 (@simonjoe246); the current version is v1.0.0.

More Skills