← 返回 Skills 市场
ltryee

OCR Locally

作者 ltryee · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
68
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install ocr-locally
功能描述
[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks...
使用说明 (SKILL.md)

Local OCR (macOS Only)

Overview

⚠️ Platform Requirement: This skill is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native frameworks:

  • Vision framework - For OCR text recognition
  • PDFKit framework - For PDF processing
  • Core Graphics - For image rendering

This skill provides OCR (Optical Character Recognition) capabilities using macOS native Vision framework. It extracts text from images and PDFs without requiring any third-party libraries or internet connection.

Platform Requirements

⚠️ macOS Only - This skill cannot run on Linux, Windows, or other operating systems.

Required:

  • macOS 10.15+ (Catalina or later)
  • Vision framework (pre-installed on macOS)
  • PDFKit framework (pre-installed on macOS)

Why macOS Only?

  • Uses Vision framework for OCR (macOS/iOS only)
  • Uses PDFKit framework for PDF processing (macOS/iOS only)
  • Uses AppKit/Core Graphics for image handling (macOS only)

When to Use This Skill

Trigger this skill when the user:

  • Requests OCR or image text extraction
  • Mentions extracting text from images, screenshots, PDF files, or scanned documents
  • Uses keywords like: "识别图片", "OCR", "提取文字", "提取PDF文字", "识别PDF", "extract text from image", "PDF OCR"
  • Provides an image file or PDF file and asks to read or extract its content

Core Capabilities

1. Text Extraction from Images

Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:

  • Multi-language support (Chinese, English, Japanese, Korean, and more)
  • Two output modes (mutually exclusive):
    • Text Mode (-t): Output only extracted text (default)
    • JSON Mode (-j): Output complete raw info including text, position, and confidence as JSON
  • Confidence scores for each detected text block
  • Bounding box information (text position in image)
  • Output to console or file
  • Precise or fast recognition modes

Basic usage:

swift scripts/ocr_vision_pro.swift \x3Cimage_path>

With options:

swift scripts/ocr_vision_pro.swift \x3Cimage_path> -l zh-Hans,en -o output.txt -f

2. Text Extraction from PDF Files

Use scripts/pdf_ocr.swift to extract text from PDF files with the following features:

  • Extract text from specific pages or all pages
  • Support page range specification (e.g., 1-5, 1,3,5)
  • Two output modes (mutually exclusive):
    • Text Mode (-t): Output only extracted text (default)
    • JSON Mode (-j): Output complete raw info as JSON
  • Same multi-language support as image OCR
  • Precise or fast recognition modes

Basic usage (all pages):

swift scripts/pdf_ocr.swift \x3Cpdf_path>

With page specification:

# Single page
swift scripts/pdf_ocr.swift document.pdf -p 1

# Multiple pages
swift scripts/pdf_ocr.swift document.pdf -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift document.pdf -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift document.pdf -p 1 -j

3. Output Modes (Mutually Exclusive)

The script supports two output modes that cannot be used simultaneously:

Text Mode (Default, -t)

Outputs only the extracted text:

  • Console output: Pure text
  • File output (-o or -t): Saves text to file, optionally with separate confidence file

JSON Mode (-j)

Outputs complete raw information as JSON:

  • Contains: image path, total blocks, average confidence, and per-block details
  • Per-block info: index, text, confidence, bounding box (x, y, width, height)
  • Outputs to stdout only (no file output options in JSON mode)

JSON output structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

4. Supported File Formats

Image Formats (ocr_vision_pro.swift):

  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • TIFF (.tiff, .tif)
  • BMP (.bmp)

PDF Format (pdf_ocr.swift):

  • PDF (.pdf) - support single page, multiple pages, or page ranges
  • Specify pages with -p option: 1, 1,3,5, or 1-5

4. Command-Line Options

Option Description
-h, --help Show help information
-t, --text Text mode (default, output only extracted text)
-j, --json JSON mode (output complete raw info as JSON)
-l, --language \x3Clang> Specify recognition language (comma-separated)
-o, --output \x3Cfile> Output text to file, auto-generate confidence file (\x3Cfile>_confidence.txt)
-t, --text \x3Cfile> Output only complete text to specified file (text mode)
-c, --confidence \x3Cfile> Output only confidence details to specified file (text mode)
-f, --fast Use fast mode (default: precise mode)

Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode outputs to stdout only.

Supported languages:

  • zh-Hans - Simplified Chinese
  • zh-Hant - Traditional Chinese
  • en - English
  • ja - Japanese
  • ko - Korean
  • fr - French
  • de - German
  • es - Spanish
  • it - Italian
  • pt - Portuguese
  • ru - Russian

Workflow

Step 1: Identify the Image Path

When the user requests OCR:

  1. Ask for the image path if not provided
  2. Accept common path formats: absolute paths, ~/path, or relative paths
  3. Validate that the file exists before proceeding

Step 2: Determine Recognition Parameters

Based on user request or context:

  1. Language: Default to zh-Hans,en for Chinese users, or en for English users
  2. Mode: Use precise mode (default) for accuracy, fast mode (-f) for quick preview
  3. Output: Ask if user wants results saved to file (-o option)

Step 3: Execute OCR

Run the OCR script with appropriate parameters:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -l zh-Hans,en

For saving to separate files (recommended):

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -o "\x3Coutput>"

This automatically creates two files:

  • \x3Coutput>.txt - Complete extracted text (pure text, no formatting)
  • \x3Coutput>_confidence.txt - Confidence details with statistics and per-block info

For separate text and confidence files with custom names:

swift scripts/ocr_vision_pro.swift "\x3Cimage_path>" -t "text.txt" -c "confidence.txt"

Step 4: Present Results

After OCR completes:

  1. Display the extracted text to the user
  2. If saved to file, inform the user of the output file path
  3. Ask if user wants to:
    • Correct misrecognized characters
    • Process another image
    • Save results in a different format

Step 5: PDF Processing (if PDF file)

When processing a PDF file:

  1. Identify PDF path and pages:

    • Ask for PDF path if not provided
    • Ask which pages to process (default: all pages)
    • Support formats: 1, 1,3,5, or 1-5
  2. Determine recognition parameters:

    • Language: Default to zh-hans,zh-hant,en
    • Mode: Precise (default) or fast (-f)
    • Output: Text mode (default) or JSON mode (-j)
  3. Execute PDF OCR:

# All pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>"

# Specific pages
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1,3,5

# Page range
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1-5

# JSON mode
swift scripts/pdf_ocr.swift "\x3Cpdf_path>" -p 1 -j
  1. Present results:
    • Text mode: Display text by page
    • JSON mode: Output complete JSON to stdout
    • Inform user of output format and options

Output Format

The script supports two mutually exclusive output modes:

Text Mode (Default, -t)

Outputs only the extracted text.

Console Output (without -o or -t \x3Cfile>):

[Extracted text content]

File Output (-o option):

Creates \x3Coutput>.txt with pure text.

With Confidence Details (console, when not using -j):

=== 置信度详情 ===

总识别块数: 25
平均置信度: 0.85

--- 逐块详情 ---

[1] Text content
    置信度: 0.95
    位置: x=0.10, y=0.20, w=0.30, h=0.05

--- 低置信度警告 (\x3C 0.8) ---
[3] "Some text" - 置信度: 0.50

File Output with Confidence (-o option):

  • \x3Coutput>.txt - Complete extracted text
  • \x3Coutput>_confidence.txt - Confidence details

JSON Mode (-j)

Outputs complete raw information as JSON to stdout (no file output in JSON mode).

JSON Structure:

{
  "imagePath": "/path/to/image.png",
  "totalBlocks": 25,
  "averageConfidence": 0.85,
  "blocks": [
    {
      "index": 1,
      "text": "recognized text",
      "confidence": 0.95,
      "boundingBox": {
        "x": 0.10,
        "y": 0.20,
        "width": 0.30,
        "height": 0.05
      }
    }
  ]
}

Fields:

  • imagePath: Path to the processed image
  • totalBlocks: Total number of recognized text blocks
  • averageConfidence: Average confidence score (0.0 - 1.0)
  • blocks: Array of recognized text blocks
    • index: Block index (1-based)
    • text: Recognized text content
    • confidence: Confidence score (0.0 - 1.0)
    • boundingBox: Normalized bounding box coordinates (0.0 - 1.0)

Tips and Best Practices

  1. macOS Only: This skill requires macOS. It will not work on Linux, Windows, or other operating systems.
  2. Image Quality: Higher resolution images produce better results
  3. Language Specification: Always specify language for better accuracy
  4. Precise vs Fast: Use precise mode for final results, fast mode for testing
  5. Batch Processing: For multiple images, use shell loop:
    for img in *.png; do
        swift scripts/ocr_vision_pro.swift "$img" -o "${img%.png}.txt"
    done
    
  6. Confidence Threshold: Results with confidence \x3C 0.5 may need manual verification

References

For detailed usage instructions and examples, load references/usage.md.

Resources

scripts/

  • ocr_vision_pro.swift - Enhanced OCR script for images with full feature support
  • ocr_vision.swift - Basic OCR script for simple use cases
  • pdf_ocr.swift - PDF OCR script for extracting text from PDF files

references/

  • usage.md - Comprehensive usage guide with examples
安全使用建议
This appears safe for its stated purpose if you are comfortable running local Swift scripts on macOS. Before installing, note that OCR may expose sensitive document text in the agent conversation or output files, and choose output paths carefully to avoid overwriting files.
功能分析
Type: OpenClaw Skill Name: ocr-locally Version: 1.0.0 The skill provides local OCR (Optical Character Recognition) capabilities for macOS using native Apple frameworks (Vision, PDFKit, and AppKit). The Swift scripts (ocr_vision_pro.swift, pdf_ocr.swift) are well-documented and perform text extraction entirely on-device without any network requests, data exfiltration, or suspicious shell execution. The instructions in SKILL.md are strictly aligned with the stated purpose of processing images and PDFs provided by the user.
能力评估
Purpose & Capability
The advertised purpose is local OCR for images and PDFs, and the provided Swift code uses macOS Vision/PDFKit frameworks to read user-specified image/PDF files and output recognized text.
Instruction Scope
The instructions are generally scoped to user-requested OCR, but they involve passing user-provided local file paths into command examples, so paths should be quoted/validated when run.
Install Mechanism
Registry/install metadata says the skill is instruction-only with no required binaries, while the manifest includes runnable Swift scripts and the docs invoke them with the swift command.
Credentials
The skill clearly says it is macOS-only, but registry metadata lists no OS restriction and no required binaries, which may cause compatibility surprises rather than indicating malicious behavior.
Persistence & Privilege
No credentials, network transmission, background persistence, or privilege escalation are shown, but the scripts can save OCR text and confidence output to user-specified files.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ocr-locally
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ocr-locally 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of local-ocr, providing offline OCR capabilities on macOS. - Supports image and PDF text extraction using macOS native Vision and PDFKit frameworks (macOS 10.15+ required) - Recognizes multiple languages (Chinese, English, Japanese, Korean, and more) - Two output modes: pure text or detailed JSON with confidence and bounding boxes - Handles a variety of image formats (PNG, JPEG, TIFF, BMP) and PDFs, with support for specific page selection - Command-line interface with flexible options for language, output mode, and file paths
元数据
Slug ocr-locally
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

OCR Locally 是什么?

[macOS only] Use this skill when the user requests OCR (Optical Character Recognition), image/PDF text extraction. Uses macOS native Vision/PDFKit frameworks... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 68 次。

如何安装 OCR Locally?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ocr-locally」即可一键安装,无需额外配置。

OCR Locally 是免费的吗?

是的,OCR Locally 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

OCR Locally 支持哪些平台?

OCR Locally 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 OCR Locally?

由 ltryee(@ltryee)开发并维护,当前版本 v1.0.0。

💬 留言讨论