Description

Extract text content from images using Tesseract OCR via Python

README (SKILL.md)

Image OCR Skill

Name: image-ocr
Author: wu-uk

Purpose

This skill enables accurate text extraction from image files (JPG, PNG, etc.) using Tesseract OCR via the pytesseract Python library. It is suitable for scanned documents, screenshots, photos of text, receipts, forms, and other visual content containing text.

When to Use

Extracting text from scanned documents or photos
Reading text from screenshots or image captures
Processing batch image files that contain textual information
Converting visual documents to machine-readable text
Extracting structured data from forms, receipts, or tables in images

Required Libraries

The following Python libraries are required:

import pytesseract
from PIL import Image
import json
import os

Input Requirements

File formats: JPG, JPEG, PNG, WEBP
Image quality: Minimum 300 DPI recommended for printed text; clear and legible text
File size: Under 5MB per image (resize if necessary)
Text language: Specify if non-English to improve accuracy

Output Schema

All extracted content must be returned as valid JSON conforming to this schema:

{
  "success": true,
  "filename": "example.jpg",
  "extracted_text": "Full raw text extracted from the image...",
  "confidence": "high|medium|low",
  "metadata": {
    "language_detected": "en",
    "text_regions": 3,
    "has_tables": false,
    "has_handwriting": false
  },
  "warnings": [
    "Text partially obscured in bottom-right corner",
    "Low contrast detected in header section"
  ]
}

Field Descriptions

success: Boolean indicating whether text extraction completed
filename: Original image filename
extracted_text: Complete text content in reading order (top-to-bottom, left-to-right)
confidence: Overall OCR confidence level based on image quality and text clarity
metadata.language_detected: ISO 639-1 language code
metadata.text_regions: Number of distinct text blocks identified
metadata.has_tables: Whether tabular data structures were detected
metadata.has_handwriting: Whether handwritten text was detected
warnings: Array of quality issues or potential errors

Code Examples

Basic OCR Extraction

import pytesseract
from PIL import Image

def extract_text_from_image(image_path):
    """Extract text from a single image using Tesseract OCR."""
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)
    return text.strip()

OCR with Confidence Data

import pytesseract
from PIL import Image

def extract_with_confidence(image_path):
    """Extract text with per-word confidence scores."""
    img = Image.open(image_path)

    # Get detailed OCR data including confidence
    data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

    words = []
    confidences = []

    for i, word in enumerate(data['text']):
        if word.strip():  # Skip empty strings
            words.append(word)
            confidences.append(data['conf'][i])

    # Calculate average confidence
    avg_confidence = sum(c for c in confidences if c > 0) / len([c for c in confidences if c > 0]) if confidences else 0

    return {
        'text': ' '.join(words),
        'average_confidence': avg_confidence,
        'word_count': len(words)
    }

Full OCR with JSON Output

import pytesseract
from PIL import Image
import json
import os

def ocr_to_json(image_path):
    """Perform OCR and return results as JSON."""
    filename = os.path.basename(image_path)
    warnings = []

    try:
        img = Image.open(image_path)

        # Get detailed OCR data
        data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

        # Extract text preserving structure
        text = pytesseract.image_to_string(img)

        # Calculate confidence
        confidences = [c for c in data['conf'] if c > 0]
        avg_conf = sum(confidences) / len(confidences) if confidences else 0

        # Determine confidence level
        if avg_conf >= 80:
            confidence = "high"
        elif avg_conf >= 50:
            confidence = "medium"
        else:
            confidence = "low"
            warnings.append(f"Low OCR confidence: {avg_conf:.1f}%")

        # Count text regions (blocks)
        block_nums = set(data['block_num'])
        text_regions = len([b for b in block_nums if b > 0])

        result = {
            "success": True,
            "filename": filename,
            "extracted_text": text.strip(),
            "confidence": confidence,
            "metadata": {
                "language_detected": "en",
                "text_regions": text_regions,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": warnings
        }

    except Exception as e:
        result = {
            "success": False,
            "filename": filename,
            "extracted_text": "",
            "confidence": "low",
            "metadata": {
                "language_detected": "unknown",
                "text_regions": 0,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": [f"OCR failed: {str(e)}"]
        }

    return result

# Usage
result = ocr_to_json("document.jpg")
print(json.dumps(result, indent=2))

Batch Processing Multiple Images

import pytesseract
from PIL import Image
import json
import os
from pathlib import Path

def process_image_directory(directory_path, output_file):
    """Process all images in a directory and save results."""
    image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
    results = []

    for file_path in sorted(Path(directory_path).iterdir()):
        if file_path.suffix.lower() in image_extensions:
            result = ocr_to_json(str(file_path))
            results.append(result)
            print(f"Processed: {file_path.name}")

    # Save results
    with open(output_file, 'w') as f:
        json.dump(results, f, indent=2)

    return results

Tesseract Configuration Options

Language Selection

# Specify language (default is English)
text = pytesseract.image_to_string(img, lang='eng')

# Multiple languages
text = pytesseract.image_to_string(img, lang='eng+fra+deu')

Page Segmentation Modes (PSM)

Use --psm to control how Tesseract segments the image:

# PSM 3: Fully automatic page segmentation (default)
text = pytesseract.image_to_string(img, config='--psm 3')

# PSM 4: Assume single column of text
text = pytesseract.image_to_string(img, config='--psm 4')

# PSM 6: Assume uniform block of text
text = pytesseract.image_to_string(img, config='--psm 6')

# PSM 11: Sparse text - find as much text as possible
text = pytesseract.image_to_string(img, config='--psm 11')

Common PSM values:

0: Orientation and script detection (OSD) only
3: Fully automatic page segmentation (default)
4: Single column of text of variable sizes
6: Uniform block of text
7: Single text line
11: Sparse text
13: Raw line

Image Preprocessing

For better OCR accuracy, preprocess images:

from PIL import Image, ImageFilter, ImageOps

def preprocess_image(image_path):
    """Preprocess image for better OCR results."""
    img = Image.open(image_path)

    # Convert to grayscale
    img = img.convert('L')

    # Increase contrast
    img = ImageOps.autocontrast(img)

    # Apply slight sharpening
    img = img.filter(ImageFilter.SHARPEN)

    return img

# Use preprocessed image for OCR
img = preprocess_image("document.jpg")
text = pytesseract.image_to_string(img)

Advanced Preprocessing Strategies

For difficult images (low contrast, faded text, dark backgrounds), try multiple preprocessing approaches:

Grayscale + Autocontrast - Basic enhancement for most images
Inverted - Use ImageOps.invert() for dark backgrounds with light text
Scaling - Upscale small images (e.g., 2x) before OCR to improve character recognition
Thresholding - Convert to binary using img.point(lambda p: 255 if p > threshold else 0) with different threshold values (e.g., 100, 128)
Sharpening - Apply ImageFilter.SHARPEN to improve edge clarity

Multi-Pass OCR Strategy

For challenging images, a single OCR pass may miss text. Use multiple passes with different configurations:

Try multiple PSM modes - Different page segmentation modes work better for different layouts (e.g., --psm 6 for blocks, --psm 4 for columns, --psm 11 for sparse text)
Try multiple preprocessing variants - Run OCR on several preprocessed versions of the same image
Combine results - Aggregate text from all passes to maximize extraction coverage

def multi_pass_ocr(image_path):
    """Run OCR with multiple strategies and combine results."""
    img = Image.open(image_path)
    gray = ImageOps.grayscale(img)

    # Generate preprocessing variants
    variants = [
        ImageOps.autocontrast(gray),
        ImageOps.invert(ImageOps.autocontrast(gray)),
        gray.filter(ImageFilter.SHARPEN),
    ]

    # PSM modes to try
    psm_modes = ['--psm 6', '--psm 4', '--psm 11']

    all_text = []
    for variant in variants:
        for psm in psm_modes:
            try:
                text = pytesseract.image_to_string(variant, config=psm)
                if text.strip():
                    all_text.append(text)
            except Exception:
                pass

    # Combine all extracted text
    return "\
".join(all_text)

This approach improves extraction for receipts, faded documents, and images with varying quality.

Error Handling

Common Issues and Solutions

Issue: Tesseract not found

# Verify Tesseract is installed
try:
    pytesseract.get_tesseract_version()
except pytesseract.TesseractNotFoundError:
    print("Tesseract is not installed or not in PATH")

Issue: Poor OCR quality

Preprocess image (grayscale, contrast, sharpen)
Use appropriate PSM mode for the document type
Ensure image resolution is sufficient (300+ DPI)

Issue: Empty or garbage output

Check if image contains actual text
Try different PSM modes
Verify image is not corrupted

Quality Self-Check

Before returning results, verify:

Output is valid JSON (use json.loads() to validate)
All required fields are present (success, filename, extracted_text, confidence, metadata)
Text preserves logical reading order
Confidence level reflects actual OCR quality
Warnings array includes all detected issues
Special characters are properly escaped in JSON

Limitations

Tesseract works best with printed text; handwriting recognition is limited
Accuracy decreases with decorative fonts, artistic text, or extreme stylization
Mathematical equations and special notation may not extract accurately
Redacted or watermarked text cannot be recovered
Severe image degradation (blur, noise, low resolution) reduces accuracy
Complex multi-column layouts may require custom PSM configuration

Version History

1.0.0 (2026-01-13): Initial release with Tesseract/pytesseract OCR

Usage Guidance

This skill appears to actually be an OCR helper using pytesseract, which requires the system Tesseract binary and appropriate language/data packs — but the skill metadata incorrectly lists no required binaries or install steps. Before installing or using it: 1) Ensure the host has the tesseract executable installed and in PATH (and language packs if you need non-English OCR). 2) Install the Python deps (pytesseract, pillow) in a virtual environment. 3) Be cautious processing untrusted images (use a sandbox) and avoid sending sensitive images to external endpoints — the skill's instructions don't exfiltrate data, but your agent or surrounding automation might. 4) If you expect batch processing or specific language support, ask the author to add explicit install and dependency instructions and to declare the tesseract binary as a required dependency.

Capability Analysis

Type: OpenClaw Skill Name: jpg-ocr-stat-image-ocr Version: 0.1.0 The skill bundle provides standard Optical Character Recognition (OCR) functionality using the Tesseract engine and the Pillow library. The code in SKILL.md implements typical image preprocessing, batch processing, and multi-pass OCR strategies without any signs of malicious intent, data exfiltration, or unauthorized system access.

Capability Assessment

⚠ Purpose & Capability

The skill claims to perform Tesseract OCR via Python (pytesseract). That requires the Tesseract engine binary to be installed on the host and appropriate language/data packs for non-English OCR, but the registry metadata lists no required binaries or install steps. This mismatch (instruction needs a system binary but metadata says none) is an incoherence.

ℹ Instruction Scope

The SKILL.md stays within the OCR scope: it shows example code reading image files, calling pytesseract, and returning JSON. It does not request unrelated files, credentials, or external endpoints. Minor omissions: it doesn't instruct installing or verifying the tesseract binary, nor does it detail handling language packs or potentially malicious/untrusted image inputs.

ℹ Install Mechanism

There is no install specification (instruction-only), which is low risk. However, because the instructions rely on external system software (Tesseract) and Python libraries, the absence of guidance on how to install or verify those components is a practical gap that could lead to failed runs or incorrect assumptions about required privileges.

✓ Credentials

The skill requests no environment variables, credentials, or config paths — appropriate for a local OCR utility that processes image files. No excessive or unrelated secrets are requested.

✓ Persistence & Privilege

The skill is not always-enabled and is user-invocable; it does not request persistent system presence or modify other skills. Autonomous invocation is allowed (platform default) but does not combine with other high-risk flags here.

Version History

v0.1.0

Bulk publish from all-task-skills-dedup

Metadata

Slug jpg-ocr-stat-image-ocr

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is image-ocr?

Extract text content from images using Tesseract OCR via Python. It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install image-ocr?

Run "/install jpg-ocr-stat-image-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image-ocr free?

Yes, image-ocr is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image-ocr support?

image-ocr is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image-ocr?

It is built and maintained by wu-uk (@wu-uk); the current version is v0.1.0.

More Skills

image-ocr

Image OCR Skill

Purpose

When to Use

Required Libraries

Input Requirements

Output Schema

Field Descriptions

Code Examples

Basic OCR Extraction

OCR with Confidence Data

Full OCR with JSON Output

Batch Processing Multiple Images

Tesseract Configuration Options

Language Selection

Page Segmentation Modes (PSM)

Image Preprocessing

Advanced Preprocessing Strategies

Multi-Pass OCR Strategy

Error Handling

Common Issues and Solutions

Quality Self-Check

Limitations

Version History

What is image-ocr?

How do I install image-ocr?

Is image-ocr free?

Which platforms does image-ocr support?

Who created image-ocr?

💬 Comments