Description

[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks...

README (SKILL.md)

Local OCR (macOS Only)

Name: OCR Locally
Author: ltryee

Overview

⚠️ Platform Requirement: This skill is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native frameworks:

Vision framework - For OCR text recognition
PDFKit framework - For PDF processing
Core Graphics - For image rendering

This skill provides OCR (Optical Character Recognition) capabilities using macOS native Vision framework. It extracts text from images and PDFs without requiring any third-party libraries or internet connection.

Platform Requirements

⚠️ macOS Only - This skill cannot run on Linux, Windows, or other operating systems.

Required:

macOS 10.15+ (Catalina or later)
Vision framework (pre-installed on macOS)
PDFKit framework (pre-installed on macOS)

Why macOS Only?

Uses Vision framework for OCR (macOS/iOS only)
Uses PDFKit framework for PDF processing (macOS/iOS only)
Uses AppKit/Core Graphics for image handling (macOS only)

When to Use This Skill

Trigger this skill when the user:

Requests OCR or image text extraction
Mentions extracting text from images, screenshots, PDF files, or scanned documents
Uses keywords like: "识别图片", "OCR", "提取文字", "提取PDF文字", "识别PDF", "extract text from image", "PDF OCR"
Provides an image file or PDF file and asks to read or extract its content

Core Capabilities

1. Text Extraction from Images

Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:

Multi-language support (Chinese, English, Japanese, Korean, and more)
Two output modes (mutually exclusive):
- Text Mode (-t): Output only extracted text (default)
- JSON Mode (-j): Output complete raw info including text, position, and confidence as JSON
Confidence scores for each detected text block
Bounding box information (text position in image)
Output to console or file
Precise or fast recognition modes

Basic usage:

swift scripts/ocr_vision_pro.swift \x3Cimage_path>

With options:

swift scripts/ocr_vision_pro.swift \x3Cimage_path> -l zh-Hans,en -o output.txt -f

2. Text Extraction from PDF Files

Use scripts/pdf_ocr.swift to extract text from PDF files with the following features:

Extract text from specific pages or all pages
Support page range specification (e.g., 1-5, 1,3,5)
Two output modes (mutually exclusive):
- Text Mode (-t): Output only extracted text (default)
- JSON Mode (-j): Output complete raw info as JSON
Same multi-language support as image OCR
Precise or fast recognition modes

Basic usage (all pages):

swift scripts/pdf_ocr.swift \x3Cpdf_path>

With page specification:

# Single page
swift scripts/pdf_ocr.swift document.pdf -p 1

# Multiple pages
swift scripts/pdf_ocr.swift document.pdf -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift document.pdf -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift document.pdf -p 1 -j

3. Output Modes (Mutually Exclusive)

The script supports two output modes that cannot be used simultaneously:

Text Mode (Default, `-t`)

Outputs only the extracted text:

Console output: Pure text
File output (-o or -t): Saves text to file, optionally with separate confidence file

JSON Mode (`-j`)

Outputs complete raw information as JSON:

Contains: image path, total blocks, average confidence, and per-block details
Per-block info: index, text, confidence, bounding box (x, y, width, height)
Outputs to stdout only (no file output options in JSON mode)

JSON output structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

4. Supported File Formats

Image Formats (ocr_vision_pro.swift):

PNG (.png)
JPEG (.jpg, .jpeg)
TIFF (.tiff, .tif)
BMP (.bmp)

PDF Format (pdf_ocr.swift):

PDF (.pdf) - support single page, multiple pages, or page ranges
Specify pages with -p option: 1, 1,3,5, or 1-5

4. Command-Line Options

Option	Description
`-h`, `--help`	Show help information
`-t`, `--text`	Text mode (default, output only extracted text)
`-j`, `--json`	JSON mode (output complete raw info as JSON)
`-l`, `--language \x3Clang>`	Specify recognition language (comma-separated)
`-o`, `--output \x3Cfile>`	Output text to file, auto-generate confidence file (`\x3Cfile>_confidence.txt`)
`-t`, `--text \x3Cfile>`	Output only complete text to specified file (text mode)
`-c`, `--confidence \x3Cfile>`	Output only confidence details to specified file (text mode)
`-f`, `--fast`	Use fast mode (default: precise mode)

Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode outputs to stdout only.

Supported languages:

zh-Hans - Simplified Chinese
zh-Hant - Traditional Chinese
en - English
ja - Japanese
ko - Korean
fr - French
de - German
es - Spanish
it - Italian
pt - Portuguese
ru - Russian

Workflow

Step 1: Identify the Image Path

When the user requests OCR:

Ask for the image path if not provided
Accept common path formats: absolute paths, ~/path, or relative paths
Validate that the file exists before proceeding

Step 2: Determine Recognition Parameters

Based on user request or context:

Language: Default to zh-Hans,en for Chinese users, or en for English users
Mode: Use precise mode (default) for accuracy, fast mode (-f) for quick preview
Output: Ask if user wants results saved to file (-o option)

Step 3: Execute OCR

Run the OCR script with appropriate parameters:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -l zh-Hans,en

For saving to separate files (recommended):

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -o "\x3Coutput>"

This automatically creates two files:

\x3Coutput>.txt - Complete extracted text (pure text, no formatting)
\x3Coutput>_confidence.txt - Confidence details with statistics and per-block info

For separate text and confidence files with custom names:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -t "text.txt" -c "confidence.txt"

Step 4: Present Results

After OCR completes:

Display the extracted text to the user
If saved to file, inform the user of the output file path
Ask if user wants to:
- Correct misrecognized characters
- Process another image
- Save results in a different format

Step 5: PDF Processing (if PDF file)

When processing a PDF file:

Identify PDF path and pages:
- Ask for PDF path if not provided
- Ask which pages to process (default: all pages)
- Support formats: 1, 1,3,5, or 1-5
Determine recognition parameters:
- Language: Default to zh-hans,zh-hant,en
- Mode: Precise (default) or fast (-f)
- Output: Text mode (default) or JSON mode (-j)
Execute PDF OCR:

# All pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>"

# Specific pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1 -j

Present results:
- Text mode: Display text by page
- JSON mode: Output complete JSON to stdout
- Inform user of output format and options

Output Format

The script supports two mutually exclusive output modes:

Text Mode (Default, `-t`)

Outputs only the extracted text.

Console Output (without `-o` or `-t \x3Cfile>`):

[Extracted text content]

File Output (`-o` option):

Creates \x3Coutput>.txt with pure text.

With Confidence Details (console, when not using `-j`):

=== 置信度详情 ===

总识别块数: 25
平均置信度: 0.85

--- 逐块详情 ---

[1] Text content
    置信度: 0.95
    位置: x=0.10, y=0.20, w=0.30, h=0.05

--- 低置信度警告 (\x3C 0.8) ---
[3] "Some text" - 置信度: 0.50

File Output with Confidence (`-o` option):

\x3Coutput>.txt - Complete extracted text
\x3Coutput>_confidence.txt - Confidence details

JSON Mode (`-j`)

Outputs complete raw information as JSON to stdout (no file output in JSON mode).

JSON Structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

Fields:

imagePath: Path to the processed image
totalBlocks: Total number of recognized text blocks
averageConfidence: Average confidence score (0.0 - 1.0)
blocks: Array of recognized text blocks
- index: Block index (1-based)
- text: Recognized text content
- confidence: Confidence score (0.0 - 1.0)
- boundingBox: Normalized bounding box coordinates (0.0 - 1.0)

Tips and Best Practices

macOS Only: This skill requires macOS. It will not work on Linux, Windows, or other operating systems.
Image Quality: Higher resolution images produce better results
Language Specification: Always specify language for better accuracy
Precise vs Fast: Use precise mode for final results, fast mode for testing

Batch Processing: For multiple images, use shell loop:

for img in *.png; do
    swift scripts/ocr_vision_pro.swift "$img" -o "${img%.png}.txt"
done

Confidence Threshold: Results with confidence \x3C 0.5 may need manual verification

References

For detailed usage instructions and examples, load references/usage.md.

Resources

scripts/

ocr_vision_pro.swift - Enhanced OCR script for images with full feature support
ocr_vision.swift - Basic OCR script for simple use cases
pdf_ocr.swift - PDF OCR script for extracting text from PDF files

references/

usage.md - Comprehensive usage guide with examples

Usage Guidance

This appears safe for its stated purpose if you are comfortable running local Swift scripts on macOS. Before installing, note that OCR may expose sensitive document text in the agent conversation or output files, and choose output paths carefully to avoid overwriting files.

Capability Analysis

Type: OpenClaw Skill Name: ocr-locally Version: 1.0.0 The skill provides local OCR (Optical Character Recognition) capabilities for macOS using native Apple frameworks (Vision, PDFKit, and AppKit). The Swift scripts (ocr_vision_pro.swift, pdf_ocr.swift) are well-documented and perform text extraction entirely on-device without any network requests, data exfiltration, or suspicious shell execution. The instructions in SKILL.md are strictly aligned with the stated purpose of processing images and PDFs provided by the user.

Capability Assessment

✓ Purpose & Capability

The advertised purpose is local OCR for images and PDFs, and the provided Swift code uses macOS Vision/PDFKit frameworks to read user-specified image/PDF files and output recognized text.

ℹ Instruction Scope

The instructions are generally scoped to user-requested OCR, but they involve passing user-provided local file paths into command examples, so paths should be quoted/validated when run.

ℹ Install Mechanism

Registry/install metadata says the skill is instruction-only with no required binaries, while the manifest includes runnable Swift scripts and the docs invoke them with the swift command.

ℹ Credentials

The skill clearly says it is macOS-only, but registry metadata lists no OS restriction and no required binaries, which may cause compatibility surprises rather than indicating malicious behavior.

ℹ Persistence & Privilege

No credentials, network transmission, background persistence, or privilege escalation are shown, but the scripts can save OCR text and confidence output to user-specified files.

Version History

v1.0.0

Initial release of local-ocr, providing offline OCR capabilities on macOS. - Supports image and PDF text extraction using macOS native Vision and PDFKit frameworks (macOS 10.15+ required) - Recognizes multiple languages (Chinese, English, Japanese, Korean, and more) - Two output modes: pure text or detailed JSON with confidence and bounding boxes - Handles a variety of image formats (PNG, JPEG, TIFF, BMP) and PDFs, with support for specific page selection - Command-line interface with flexible options for language, output mode, and file paths

Metadata

Slug ocr-locally

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is OCR Locally?

[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks... It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install OCR Locally?

Run "/install ocr-locally" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is OCR Locally free?

Yes, OCR Locally is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does OCR Locally support?

OCR Locally is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created OCR Locally?

It is built and maintained by ltryee (@ltryee); the current version is v1.0.0.

More Skills

OCR Locally

Local OCR (macOS Only)

Overview

Platform Requirements

When to Use This Skill

Core Capabilities

1. Text Extraction from Images

2. Text Extraction from PDF Files

3. Output Modes (Mutually Exclusive)

Text Mode (Default, -t)

JSON Mode (-j)

4. Supported File Formats

4. Command-Line Options

Workflow

Step 1: Identify the Image Path

Step 2: Determine Recognition Parameters

Step 3: Execute OCR

Step 4: Present Results

Step 5: PDF Processing (if PDF file)

Output Format

Text Mode (Default, -t)

Console Output (without -o or -t \x3Cfile>):

File Output (-o option):

With Confidence Details (console, when not using -j):

File Output with Confidence (-o option):

JSON Mode (-j)

Tips and Best Practices

References

Resources

scripts/

references/

What is OCR Locally?

How do I install OCR Locally?

Is OCR Locally free?

Which platforms does OCR Locally support?

Who created OCR Locally?

💬 Comments

Text Mode (Default, `-t`)

JSON Mode (`-j`)

Text Mode (Default, `-t`)

Console Output (without `-o` or `-t \x3Cfile>`):

File Output (`-o` option):

With Confidence Details (console, when not using `-j`):

File Output with Confidence (`-o` option):

JSON Mode (`-j`)