← Back to Skills Marketplace
wu-uk

image-ocr

by wu-uk · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
68
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install jpg-ocr-stat-image-ocr
Description
Extract text content from images using Tesseract OCR via Python
README (SKILL.md)

Image OCR Skill

Purpose

This skill enables accurate text extraction from image files (JPG, PNG, etc.) using Tesseract OCR via the pytesseract Python library. It is suitable for scanned documents, screenshots, photos of text, receipts, forms, and other visual content containing text.

When to Use

  • Extracting text from scanned documents or photos
  • Reading text from screenshots or image captures
  • Processing batch image files that contain textual information
  • Converting visual documents to machine-readable text
  • Extracting structured data from forms, receipts, or tables in images

Required Libraries

The following Python libraries are required:

import pytesseract
from PIL import Image
import json
import os

Input Requirements

  • File formats: JPG, JPEG, PNG, WEBP
  • Image quality: Minimum 300 DPI recommended for printed text; clear and legible text
  • File size: Under 5MB per image (resize if necessary)
  • Text language: Specify if non-English to improve accuracy

Output Schema

All extracted content must be returned as valid JSON conforming to this schema:

{
  "success": true,
  "filename": "example.jpg",
  "extracted_text": "Full raw text extracted from the image...",
  "confidence": "high|medium|low",
  "metadata": {
    "language_detected": "en",
    "text_regions": 3,
    "has_tables": false,
    "has_handwriting": false
  },
  "warnings": [
    "Text partially obscured in bottom-right corner",
    "Low contrast detected in header section"
  ]
}

Field Descriptions

  • success: Boolean indicating whether text extraction completed
  • filename: Original image filename
  • extracted_text: Complete text content in reading order (top-to-bottom, left-to-right)
  • confidence: Overall OCR confidence level based on image quality and text clarity
  • metadata.language_detected: ISO 639-1 language code
  • metadata.text_regions: Number of distinct text blocks identified
  • metadata.has_tables: Whether tabular data structures were detected
  • metadata.has_handwriting: Whether handwritten text was detected
  • warnings: Array of quality issues or potential errors

Code Examples

Basic OCR Extraction

import pytesseract
from PIL import Image

def extract_text_from_image(image_path):
    """Extract text from a single image using Tesseract OCR."""
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)
    return text.strip()

OCR with Confidence Data

import pytesseract
from PIL import Image

def extract_with_confidence(image_path):
    """Extract text with per-word confidence scores."""
    img = Image.open(image_path)

    # Get detailed OCR data including confidence
    data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

    words = []
    confidences = []

    for i, word in enumerate(data['text']):
        if word.strip():  # Skip empty strings
            words.append(word)
            confidences.append(data['conf'][i])

    # Calculate average confidence
    avg_confidence = sum(c for c in confidences if c > 0) / len([c for c in confidences if c > 0]) if confidences else 0

    return {
        'text': ' '.join(words),
        'average_confidence': avg_confidence,
        'word_count': len(words)
    }

Full OCR with JSON Output

import pytesseract
from PIL import Image
import json
import os

def ocr_to_json(image_path):
    """Perform OCR and return results as JSON."""
    filename = os.path.basename(image_path)
    warnings = []

    try:
        img = Image.open(image_path)

        # Get detailed OCR data
        data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

        # Extract text preserving structure
        text = pytesseract.image_to_string(img)

        # Calculate confidence
        confidences = [c for c in data['conf'] if c > 0]
        avg_conf = sum(confidences) / len(confidences) if confidences else 0

        # Determine confidence level
        if avg_conf >= 80:
            confidence = "high"
        elif avg_conf >= 50:
            confidence = "medium"
        else:
            confidence = "low"
            warnings.append(f"Low OCR confidence: {avg_conf:.1f}%")

        # Count text regions (blocks)
        block_nums = set(data['block_num'])
        text_regions = len([b for b in block_nums if b > 0])

        result = {
            "success": True,
            "filename": filename,
            "extracted_text": text.strip(),
            "confidence": confidence,
            "metadata": {
                "language_detected": "en",
                "text_regions": text_regions,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": warnings
        }

    except Exception as e:
        result = {
            "success": False,
            "filename": filename,
            "extracted_text": "",
            "confidence": "low",
            "metadata": {
                "language_detected": "unknown",
                "text_regions": 0,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": [f"OCR failed: {str(e)}"]
        }

    return result

# Usage
result = ocr_to_json("document.jpg")
print(json.dumps(result, indent=2))

Batch Processing Multiple Images

import pytesseract
from PIL import Image
import json
import os
from pathlib import Path

def process_image_directory(directory_path, output_file):
    """Process all images in a directory and save results."""
    image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
    results = []

    for file_path in sorted(Path(directory_path).iterdir()):
        if file_path.suffix.lower() in image_extensions:
            result = ocr_to_json(str(file_path))
            results.append(result)
            print(f"Processed: {file_path.name}")

    # Save results
    with open(output_file, 'w') as f:
        json.dump(results, f, indent=2)

    return results

Tesseract Configuration Options

Language Selection

# Specify language (default is English)
text = pytesseract.image_to_string(img, lang='eng')

# Multiple languages
text = pytesseract.image_to_string(img, lang='eng+fra+deu')

Page Segmentation Modes (PSM)

Use --psm to control how Tesseract segments the image:

# PSM 3: Fully automatic page segmentation (default)
text = pytesseract.image_to_string(img, config='--psm 3')

# PSM 4: Assume single column of text
text = pytesseract.image_to_string(img, config='--psm 4')

# PSM 6: Assume uniform block of text
text = pytesseract.image_to_string(img, config='--psm 6')

# PSM 11: Sparse text - find as much text as possible
text = pytesseract.image_to_string(img, config='--psm 11')

Common PSM values:

  • 0: Orientation and script detection (OSD) only
  • 3: Fully automatic page segmentation (default)
  • 4: Single column of text of variable sizes
  • 6: Uniform block of text
  • 7: Single text line
  • 11: Sparse text
  • 13: Raw line

Image Preprocessing

For better OCR accuracy, preprocess images:

from PIL import Image, ImageFilter, ImageOps

def preprocess_image(image_path):
    """Preprocess image for better OCR results."""
    img = Image.open(image_path)

    # Convert to grayscale
    img = img.convert('L')

    # Increase contrast
    img = ImageOps.autocontrast(img)

    # Apply slight sharpening
    img = img.filter(ImageFilter.SHARPEN)

    return img

# Use preprocessed image for OCR
img = preprocess_image("document.jpg")
text = pytesseract.image_to_string(img)

Advanced Preprocessing Strategies

For difficult images (low contrast, faded text, dark backgrounds), try multiple preprocessing approaches:

  1. Grayscale + Autocontrast - Basic enhancement for most images
  2. Inverted - Use ImageOps.invert() for dark backgrounds with light text
  3. Scaling - Upscale small images (e.g., 2x) before OCR to improve character recognition
  4. Thresholding - Convert to binary using img.point(lambda p: 255 if p > threshold else 0) with different threshold values (e.g., 100, 128)
  5. Sharpening - Apply ImageFilter.SHARPEN to improve edge clarity

Multi-Pass OCR Strategy

For challenging images, a single OCR pass may miss text. Use multiple passes with different configurations:

  1. Try multiple PSM modes - Different page segmentation modes work better for different layouts (e.g., --psm 6 for blocks, --psm 4 for columns, --psm 11 for sparse text)

  2. Try multiple preprocessing variants - Run OCR on several preprocessed versions of the same image

  3. Combine results - Aggregate text from all passes to maximize extraction coverage

def multi_pass_ocr(image_path):
    """Run OCR with multiple strategies and combine results."""
    img = Image.open(image_path)
    gray = ImageOps.grayscale(img)

    # Generate preprocessing variants
    variants = [
        ImageOps.autocontrast(gray),
        ImageOps.invert(ImageOps.autocontrast(gray)),
        gray.filter(ImageFilter.SHARPEN),
    ]

    # PSM modes to try
    psm_modes = ['--psm 6', '--psm 4', '--psm 11']

    all_text = []
    for variant in variants:
        for psm in psm_modes:
            try:
                text = pytesseract.image_to_string(variant, config=psm)
                if text.strip():
                    all_text.append(text)
            except Exception:
                pass

    # Combine all extracted text
    return "\
".join(all_text)

This approach improves extraction for receipts, faded documents, and images with varying quality.

Error Handling

Common Issues and Solutions

Issue: Tesseract not found

# Verify Tesseract is installed
try:
    pytesseract.get_tesseract_version()
except pytesseract.TesseractNotFoundError:
    print("Tesseract is not installed or not in PATH")

Issue: Poor OCR quality

  • Preprocess image (grayscale, contrast, sharpen)
  • Use appropriate PSM mode for the document type
  • Ensure image resolution is sufficient (300+ DPI)

Issue: Empty or garbage output

  • Check if image contains actual text
  • Try different PSM modes
  • Verify image is not corrupted

Quality Self-Check

Before returning results, verify:

  • Output is valid JSON (use json.loads() to validate)
  • All required fields are present (success, filename, extracted_text, confidence, metadata)
  • Text preserves logical reading order
  • Confidence level reflects actual OCR quality
  • Warnings array includes all detected issues
  • Special characters are properly escaped in JSON

Limitations

  • Tesseract works best with printed text; handwriting recognition is limited
  • Accuracy decreases with decorative fonts, artistic text, or extreme stylization
  • Mathematical equations and special notation may not extract accurately
  • Redacted or watermarked text cannot be recovered
  • Severe image degradation (blur, noise, low resolution) reduces accuracy
  • Complex multi-column layouts may require custom PSM configuration

Version History

  • 1.0.0 (2026-01-13): Initial release with Tesseract/pytesseract OCR
Usage Guidance
This skill appears to actually be an OCR helper using pytesseract, which requires the system Tesseract binary and appropriate language/data packs — but the skill metadata incorrectly lists no required binaries or install steps. Before installing or using it: 1) Ensure the host has the tesseract executable installed and in PATH (and language packs if you need non-English OCR). 2) Install the Python deps (pytesseract, pillow) in a virtual environment. 3) Be cautious processing untrusted images (use a sandbox) and avoid sending sensitive images to external endpoints — the skill's instructions don't exfiltrate data, but your agent or surrounding automation might. 4) If you expect batch processing or specific language support, ask the author to add explicit install and dependency instructions and to declare the tesseract binary as a required dependency.
Capability Analysis
Type: OpenClaw Skill Name: jpg-ocr-stat-image-ocr Version: 0.1.0 The skill bundle provides standard Optical Character Recognition (OCR) functionality using the Tesseract engine and the Pillow library. The code in SKILL.md implements typical image preprocessing, batch processing, and multi-pass OCR strategies without any signs of malicious intent, data exfiltration, or unauthorized system access.
Capability Assessment
Purpose & Capability
The skill claims to perform Tesseract OCR via Python (pytesseract). That requires the Tesseract engine binary to be installed on the host and appropriate language/data packs for non-English OCR, but the registry metadata lists no required binaries or install steps. This mismatch (instruction needs a system binary but metadata says none) is an incoherence.
Instruction Scope
The SKILL.md stays within the OCR scope: it shows example code reading image files, calling pytesseract, and returning JSON. It does not request unrelated files, credentials, or external endpoints. Minor omissions: it doesn't instruct installing or verifying the tesseract binary, nor does it detail handling language packs or potentially malicious/untrusted image inputs.
Install Mechanism
There is no install specification (instruction-only), which is low risk. However, because the instructions rely on external system software (Tesseract) and Python libraries, the absence of guidance on how to install or verify those components is a practical gap that could lead to failed runs or incorrect assumptions about required privileges.
Credentials
The skill requests no environment variables, credentials, or config paths — appropriate for a local OCR utility that processes image files. No excessive or unrelated secrets are requested.
Persistence & Privilege
The skill is not always-enabled and is user-invocable; it does not request persistent system presence or modify other skills. Autonomous invocation is allowed (platform default) but does not combine with other high-risk flags here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install jpg-ocr-stat-image-ocr
  3. After installation, invoke the skill by name or use /jpg-ocr-stat-image-ocr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Bulk publish from all-task-skills-dedup
Metadata
Slug jpg-ocr-stat-image-ocr
Version 0.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is image-ocr?

Extract text content from images using Tesseract OCR via Python. It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install image-ocr?

Run "/install jpg-ocr-stat-image-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image-ocr free?

Yes, image-ocr is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image-ocr support?

image-ocr is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image-ocr?

It is built and maintained by wu-uk (@wu-uk); the current version is v0.1.0.

💬 Comments