← Back to Skills Marketplace
ltryee

OCR Locally

by ltryee · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
68
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ocr-locally
Description
[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks...
README (SKILL.md)

Local OCR (macOS Only)

Overview

⚠️ Platform Requirement: This skill is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native frameworks:

  • Vision framework - For OCR text recognition
  • PDFKit framework - For PDF processing
  • Core Graphics - For image rendering

This skill provides OCR (Optical Character Recognition) capabilities using macOS native Vision framework. It extracts text from images and PDFs without requiring any third-party libraries or internet connection.

Platform Requirements

⚠️ macOS Only - This skill cannot run on Linux, Windows, or other operating systems.

Required:

  • macOS 10.15+ (Catalina or later)
  • Vision framework (pre-installed on macOS)
  • PDFKit framework (pre-installed on macOS)

Why macOS Only?

  • Uses Vision framework for OCR (macOS/iOS only)
  • Uses PDFKit framework for PDF processing (macOS/iOS only)
  • Uses AppKit/Core Graphics for image handling (macOS only)

When to Use This Skill

Trigger this skill when the user:

  • Requests OCR or image text extraction
  • Mentions extracting text from images, screenshots, PDF files, or scanned documents
  • Uses keywords like: "识别图片", "OCR", "提取文字", "提取PDF文字", "识别PDF", "extract text from image", "PDF OCR"
  • Provides an image file or PDF file and asks to read or extract its content

Core Capabilities

1. Text Extraction from Images

Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:

  • Multi-language support (Chinese, English, Japanese, Korean, and more)
  • Two output modes (mutually exclusive):
    • Text Mode (-t): Output only extracted text (default)
    • JSON Mode (-j): Output complete raw info including text, position, and confidence as JSON
  • Confidence scores for each detected text block
  • Bounding box information (text position in image)
  • Output to console or file
  • Precise or fast recognition modes

Basic usage:

swift scripts/ocr_vision_pro.swift \x3Cimage_path>

With options:

swift scripts/ocr_vision_pro.swift \x3Cimage_path> -l zh-Hans,en -o output.txt -f

2. Text Extraction from PDF Files

Use scripts/pdf_ocr.swift to extract text from PDF files with the following features:

  • Extract text from specific pages or all pages
  • Support page range specification (e.g., 1-5, 1,3,5)
  • Two output modes (mutually exclusive):
    • Text Mode (-t): Output only extracted text (default)
    • JSON Mode (-j): Output complete raw info as JSON
  • Same multi-language support as image OCR
  • Precise or fast recognition modes

Basic usage (all pages):

swift scripts/pdf_ocr.swift \x3Cpdf_path>

With page specification:

# Single page
swift scripts/pdf_ocr.swift document.pdf -p 1

# Multiple pages
swift scripts/pdf_ocr.swift document.pdf -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift document.pdf -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift document.pdf -p 1 -j

3. Output Modes (Mutually Exclusive)

The script supports two output modes that cannot be used simultaneously:

Text Mode (Default, -t)

Outputs only the extracted text:

  • Console output: Pure text
  • File output (-o or -t): Saves text to file, optionally with separate confidence file

JSON Mode (-j)

Outputs complete raw information as JSON:

  • Contains: image path, total blocks, average confidence, and per-block details
  • Per-block info: index, text, confidence, bounding box (x, y, width, height)
  • Outputs to stdout only (no file output options in JSON mode)

JSON output structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

4. Supported File Formats

Image Formats (ocr_vision_pro.swift):

  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • TIFF (.tiff, .tif)
  • BMP (.bmp)

PDF Format (pdf_ocr.swift):

  • PDF (.pdf) - support single page, multiple pages, or page ranges
  • Specify pages with -p option: 1, 1,3,5, or 1-5

4. Command-Line Options

Option Description
-h, --help Show help information
-t, --text Text mode (default, output only extracted text)
-j, --json JSON mode (output complete raw info as JSON)
-l, --language \x3Clang> Specify recognition language (comma-separated)
-o, --output \x3Cfile> Output text to file, auto-generate confidence file (\x3Cfile>_confidence.txt)
-t, --text \x3Cfile> Output only complete text to specified file (text mode)
-c, --confidence \x3Cfile> Output only confidence details to specified file (text mode)
-f, --fast Use fast mode (default: precise mode)

Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode outputs to stdout only.

Supported languages:

  • zh-Hans - Simplified Chinese
  • zh-Hant - Traditional Chinese
  • en - English
  • ja - Japanese
  • ko - Korean
  • fr - French
  • de - German
  • es - Spanish
  • it - Italian
  • pt - Portuguese
  • ru - Russian

Workflow

Step 1: Identify the Image Path

When the user requests OCR:

  1. Ask for the image path if not provided
  2. Accept common path formats: absolute paths, ~/path, or relative paths
  3. Validate that the file exists before proceeding

Step 2: Determine Recognition Parameters

Based on user request or context:

  1. Language: Default to zh-Hans,en for Chinese users, or en for English users
  2. Mode: Use precise mode (default) for accuracy, fast mode (-f) for quick preview
  3. Output: Ask if user wants results saved to file (-o option)

Step 3: Execute OCR

Run the OCR script with appropriate parameters:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -l zh-Hans,en

For saving to separate files (recommended):

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -o "\x3Coutput>"

This automatically creates two files:

  • \x3Coutput>.txt - Complete extracted text (pure text, no formatting)
  • \x3Coutput>_confidence.txt - Confidence details with statistics and per-block info

For separate text and confidence files with custom names:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -t "text.txt" -c "confidence.txt"

Step 4: Present Results

After OCR completes:

  1. Display the extracted text to the user
  2. If saved to file, inform the user of the output file path
  3. Ask if user wants to:
    • Correct misrecognized characters
    • Process another image
    • Save results in a different format

Step 5: PDF Processing (if PDF file)

When processing a PDF file:

  1. Identify PDF path and pages:

    • Ask for PDF path if not provided
    • Ask which pages to process (default: all pages)
    • Support formats: 1, 1,3,5, or 1-5
  2. Determine recognition parameters:

    • Language: Default to zh-hans,zh-hant,en
    • Mode: Precise (default) or fast (-f)
    • Output: Text mode (default) or JSON mode (-j)
  3. Execute PDF OCR:

# All pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>"

# Specific pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1 -j
  1. Present results:
    • Text mode: Display text by page
    • JSON mode: Output complete JSON to stdout
    • Inform user of output format and options

Output Format

The script supports two mutually exclusive output modes:

Text Mode (Default, -t)

Outputs only the extracted text.

Console Output (without -o or -t \x3Cfile>):

[Extracted text content]

File Output (-o option):

Creates \x3Coutput>.txt with pure text.

With Confidence Details (console, when not using -j):

=== 置信度详情 ===

总识别块数: 25
平均置信度: 0.85

--- 逐块详情 ---

[1] Text content
    置信度: 0.95
    位置: x=0.10, y=0.20, w=0.30, h=0.05

--- 低置信度警告 (\x3C 0.8) ---
[3] "Some text" - 置信度: 0.50

File Output with Confidence (-o option):

  • \x3Coutput>.txt - Complete extracted text
  • \x3Coutput>_confidence.txt - Confidence details

JSON Mode (-j)

Outputs complete raw information as JSON to stdout (no file output in JSON mode).

JSON Structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

Fields:

  • imagePath: Path to the processed image
  • totalBlocks: Total number of recognized text blocks
  • averageConfidence: Average confidence score (0.0 - 1.0)
  • blocks: Array of recognized text blocks
    • index: Block index (1-based)
    • text: Recognized text content
    • confidence: Confidence score (0.0 - 1.0)
    • boundingBox: Normalized bounding box coordinates (0.0 - 1.0)

Tips and Best Practices

  1. macOS Only: This skill requires macOS. It will not work on Linux, Windows, or other operating systems.
  2. Image Quality: Higher resolution images produce better results
  3. Language Specification: Always specify language for better accuracy
  4. Precise vs Fast: Use precise mode for final results, fast mode for testing
  5. Batch Processing: For multiple images, use shell loop:
    for img in *.png; do
        swift scripts/ocr_vision_pro.swift "$img" -o "${img%.png}.txt"
    done
    
  6. Confidence Threshold: Results with confidence \x3C 0.5 may need manual verification

References

For detailed usage instructions and examples, load references/usage.md.

Resources

scripts/

  • ocr_vision_pro.swift - Enhanced OCR script for images with full feature support
  • ocr_vision.swift - Basic OCR script for simple use cases
  • pdf_ocr.swift - PDF OCR script for extracting text from PDF files

references/

  • usage.md - Comprehensive usage guide with examples
Usage Guidance
This appears safe for its stated purpose if you are comfortable running local Swift scripts on macOS. Before installing, note that OCR may expose sensitive document text in the agent conversation or output files, and choose output paths carefully to avoid overwriting files.
Capability Analysis
Type: OpenClaw Skill Name: ocr-locally Version: 1.0.0 The skill provides local OCR (Optical Character Recognition) capabilities for macOS using native Apple frameworks (Vision, PDFKit, and AppKit). The Swift scripts (ocr_vision_pro.swift, pdf_ocr.swift) are well-documented and perform text extraction entirely on-device without any network requests, data exfiltration, or suspicious shell execution. The instructions in SKILL.md are strictly aligned with the stated purpose of processing images and PDFs provided by the user.
Capability Assessment
Purpose & Capability
The advertised purpose is local OCR for images and PDFs, and the provided Swift code uses macOS Vision/PDFKit frameworks to read user-specified image/PDF files and output recognized text.
Instruction Scope
The instructions are generally scoped to user-requested OCR, but they involve passing user-provided local file paths into command examples, so paths should be quoted/validated when run.
Install Mechanism
Registry/install metadata says the skill is instruction-only with no required binaries, while the manifest includes runnable Swift scripts and the docs invoke them with the swift command.
Credentials
The skill clearly says it is macOS-only, but registry metadata lists no OS restriction and no required binaries, which may cause compatibility surprises rather than indicating malicious behavior.
Persistence & Privilege
No credentials, network transmission, background persistence, or privilege escalation are shown, but the scripts can save OCR text and confidence output to user-specified files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ocr-locally
  3. After installation, invoke the skill by name or use /ocr-locally
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of local-ocr, providing offline OCR capabilities on macOS. - Supports image and PDF text extraction using macOS native Vision and PDFKit frameworks (macOS 10.15+ required) - Recognizes multiple languages (Chinese, English, Japanese, Korean, and more) - Two output modes: pure text or detailed JSON with confidence and bounding boxes - Handles a variety of image formats (PNG, JPEG, TIFF, BMP) and PDFs, with support for specific page selection - Command-line interface with flexible options for language, output mode, and file paths
Metadata
Slug ocr-locally
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is OCR Locally?

[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks... It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install OCR Locally?

Run "/install ocr-locally" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is OCR Locally free?

Yes, OCR Locally is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does OCR Locally support?

OCR Locally is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created OCR Locally?

It is built and maintained by ltryee (@ltryee); the current version is v1.0.0.

💬 Comments