← Back to Skills Marketplace
simonjoe246

image-reader

by simonjoe246 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
335
Downloads
1
Stars
3
Active Installs
1
Versions
Install in OpenClaw
/install image-reader
Description
Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac...
README (SKILL.md)

Image Reader Skill

Image recognition and understanding tool that leverages Doubao multimodal models to analyze image content.


Features

  • Text Extraction (OCR): Extract text from images, suitable for documents, screenshots, posters, menus, etc.
  • Image Description: Generate detailed descriptions of images, suitable for photos, illustrations, memes, UI screens, etc.
  • General Analysis: Automatically choose the best analysis strategy based on the image type.

API Configuration

Item Value
API Endpoint https://ark.cn-beijing.volces.com/api/coding/v3
Model doubao-seed-2.0-pro
Authentication API Key (configured in config.yaml)

Usage

Command Line

# General analysis
python image_reader.py /path/to/image.png

# Extract text (OCR)
python image_reader.py /path/to/image.png -p "Extract all text from the image"

# Describe the image
python image_reader.py /path/to/image.png -p "Describe this image in detail"

OpenClaw Skill Invocation

Once installed, you can invoke it using natural language:

Analyze this image
Extract the text from the image
Describe this screenshot

Output

  • Text-heavy images: Returns all extracted text, preserving original formatting.
  • Non-text images: Returns a detailed scene description, including objects, people, colors, style, etc.
  • Mixed content: Provides both text extraction and a visual description.

Technical Details

  • Uses an OpenAI-compatible API to call Doubao multimodal models
  • Images are sent as base64-encoded data
  • The system prompt adapts to the image type to select the most appropriate analysis strategy
Usage Guidance
This skill will upload the full image you provide (encoded as base64) to the API endpoint configured in config.yaml (default: https://ark.cn-beijing.volces.com/api/coding/v3). Before installing or using it: 1) Do not send sensitive images (passwords, government IDs, medical records, proprietary screenshots) unless you trust the remote provider and its privacy policy. 2) Store your API key securely (preferably not committed in plaintext config files); consider using an env var or secret manager and modifying the script to read it from there. 3) Verify the API endpoint and provider (ark.cn-beijing.volces.com / volces.com) and confirm you are comfortable with their data handling. 4) Note the script uses a very large max_tokens value (64000) — this may be unsupported or cause unexpected behavior/billing; consider lowering it. If you want higher confidence, ask the publisher for provenance (homepage, organization), a privacy statement, and explicit guidance on API key handling.
Capability Analysis
Type: OpenClaw Skill Name: image-reader Version: 1.0.0 The 'image-reader' skill is a legitimate tool designed to perform OCR and image description using the Volcengine (Doubao) multimodal API. The Python script (image_reader.py) safely handles image encoding and API communication using the standard OpenAI library, and the configuration (config.yaml) points to a valid service endpoint (ark.cn-beijing.volces.com) without any signs of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name, description, SKILL.md, config.yaml, README, and image_reader.py all align: the skill encodes images and calls an OpenAI-compatible multimodal model endpoint to perform OCR/description.
Instruction Scope
Runtime instructions and the script only read the included config.yaml and the image file, then send the image (base64 data URI) to the configured API endpoint. This is within scope for an image-analysis tool, but it means user images (potentially sensitive) are uploaded to a remote service; SKILL.md does not warn about privacy/PII implications.
Install Mechanism
No install spec is provided (instruction-only plus a Python script). Dependencies are limited to openai and pyyaml as declared. No arbitrary downloads or extract operations are present.
Credentials
No environment variables are required, but an API key is expected in config.yaml. Storing an API key in a plaintext config file is functional but may be undesirable; README's claim that "default configuration is built in and can be used directly" is ambiguous and could encourage accidental use of embedded credentials if present.
Persistence & Privilege
The skill is user-invocable, not always-enabled, and does not request elevated system privileges or modify other skill configs. It does not persist beyond its files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install image-reader
  3. After installation, invoke the skill by name or use /image-reader
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of the Image Reader Skill. - Supports OCR text extraction from images. - Generates detailed image descriptions for various image types. - Automatically selects the best analysis strategy based on image content. - Compatible with multimodal models(e.g. doubao-seed-2.0-pro, kimi-k2.5) via OpenAI-compatible API. - Offers both command-line usage and natural language skill invocation.
Metadata
Slug image-reader
Version 1.0.0
License MIT-0
All-time Installs 3
Active Installs 3
Total Versions 1
Frequently Asked Questions

What is image-reader?

Image recognition and understanding tool. Uses a multimodal model (e.g. doubao-seed-2.0-pro, kimi-k2.5) to analyze image content and supports OCR text extrac... It is an AI Agent Skill for Claude Code / OpenClaw, with 335 downloads so far.

How do I install image-reader?

Run "/install image-reader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image-reader free?

Yes, image-reader is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image-reader support?

image-reader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image-reader?

It is built and maintained by simonjoe246 (@simonjoe246); the current version is v1.0.0.

💬 Comments