Description

Image OCR, text recognition, extract text from image, scan document, read image text, invoice OCR, receipt OCR, contract recognition, table extraction, busin...

README (SKILL.md)

Image OCR (Windows · GLM-OCR · llama.cpp Vulkan)

Name: image-ocr-local-AIPC
Author: violet17

Model: ggml-org/GLM-OCR-GGUF (Q8_0, HuggingFace / hf-mirror)
Inference: llama-cli (llama.cpp Vulkan prebuilt)
SKILL_VERSION: v1.0

Directory Structure (auto-created or user-specified)

\x3COCR_DIR>\                        ← auto-selected drive or user-specified (e.g. C:\image-ocr or D:\image-ocr)
├── llama.cpp\                    ← llama-cli.exe and related binaries
└── models\
    └── GLM-OCR-GGUF\
        ├── GLM-OCR-Q8_0.gguf        ← main model (~950 MB)
        └── mmproj-GLM-OCR-Q8_0.gguf ← vision projection layer (~484 MB, required)

Dependencies: Model files (GLM-OCR-Q8_0.gguf, mmproj-GLM-OCR-Q8_0.gguf) are downloaded via Python's huggingface_hub (hf download) or modelscope. If Python is not installed, Step 2 will automatically install Miniforge (recommended — lightweight, includes conda/pip, no admin rights required).

⚠️ AI Assistant Instructions

Execute one command at a time; wait for output before proceeding.
Stop immediately on error; refer to the Troubleshooting table at the end.
Wrap all paths in double quotes.
\x3COCR_DIR> is the absolute working directory path, determined after Pre-flight.
Single goal: Recognize image content and return text results.

Execution flow (do not skip steps):

Pre-flight: Check working dir + llama.cpp + models      → STATUS values
Step 1:     Install / update llama.cpp (only if MISSING) → LLAMA_OK
Step 2:     Download models (only if MISSING)            → MODEL_OK
Step 3:     Process recognition result + output          → Return result

Progress reporting: Announce each step before starting, e.g.: 🔍 Pre-flight: Checking environment…

Pre-flight: Check Environment

🔍 Pre-flight: Checking working directory, llama.cpp, and model files…

Locate Working Directory

# ── Fix encoding for non-ASCII paths (required at the start of every PowerShell script) ──
chcp 65001 | Out-Null
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8

# ── Optional: if you already have a path, fill it in; leave blank to auto-select drive ──
$customOcrDir = ""   # e.g. "C:\image-ocr" or "D:\image-ocr"
# ──────────────────────────────────────────────────────────────────────────────────────────

if ($customOcrDir -and (Test-Path (Split-Path $customOcrDir))) {
    $OCR_DIR = $customOcrDir
    New-Item -ItemType Directory -Force -Path $OCR_DIR | Out-Null
    Write-Host "OCR_DIR=$OCR_DIR (user-specified)"
} else {
    $best = Get-PSDrive -PSProvider FileSystem |
        Where-Object { $_.Free -gt 0 } |
        Sort-Object Free -Descending |
        Select-Object -First 1
    $OCR_DIR = Join-Path "$($best.Root)" "image-ocr"
    New-Item -ItemType Directory -Force -Path $OCR_DIR | Out-Null
    Write-Host "OCR_DIR=$OCR_DIR (auto-selected drive: $($best.Name))"
}
$env:OCR_DIR = $OCR_DIR

Success criteria: Output contains a line with OCR_DIR=. Record the path and substitute \x3COCR_DIR> in subsequent steps.

Check llama.cpp

$llamaDir = "\x3COCR_DIR>\llama.cpp"
$cliExe   = "$llamaDir\llama-cli.exe"

if (Test-Path $cliExe) {
    $ver = & $cliExe --version 2>&1
    if ($ver -match "version:\s*(\d+)") {
        $build = [int]$Matches[1]
        if ($build -ge 8400) {
            Write-Host "OK: llama.cpp build $build >= b8400, skip Step 1"
            Write-Host "LLAMA_STATUS=READY"
        } else {
            Write-Host "WARN: llama.cpp build $build \x3C b8400, upgrade required"
            Write-Host "LLAMA_STATUS=OUTDATED"
        }
    }
} else {
    Write-Host "ERROR: llama-cli.exe not found"
    Write-Host "LLAMA_STATUS=MISSING"
    Write-Host "   Checked path: $llamaDir"
}

Check Model Files

$modelDir   = "\x3COCR_DIR>\models\GLM-OCR-GGUF"
$modelFile  = "$modelDir\GLM-OCR-Q8_0.gguf"
$mmprojFile = "$modelDir\mmproj-GLM-OCR-Q8_0.gguf"

$modelOk  = Test-Path $modelFile
$mmprojOk = Test-Path $mmprojFile

if ($modelOk -and $mmprojOk) {
    Write-Host "OK: GLM-OCR model files ready, skip Step 2"
    Write-Host "MODEL_STATUS=READY"
} else {
    if (-not $modelOk)  { Write-Host "ERROR: Missing GLM-OCR-Q8_0.gguf" }
    if (-not $mmprojOk) { Write-Host "ERROR: Missing mmproj-GLM-OCR-Q8_0.gguf" }
    Write-Host "MODEL_STATUS=MISSING"
    Write-Host "   Checked path: $modelDir"
}

Output	Action
Both `READY`	✅ Skip to Step 3
`LLAMA_STATUS=MISSING/OUTDATED`	⬇️ Execute Step 1
`MODEL_STATUS=MISSING`	⬇️ Execute Step 2

Announce: ✅ Environment check complete. Execute steps as needed.

Step 1: Install / Update llama.cpp Vulkan

⬇️ Step 1: Downloading and installing llama.cpp Vulkan… (only when LLAMA_STATUS=MISSING/OUTDATED)

$tag      = "b8400"   # Replace with the latest tag from https://github.com/ggml-org/llama.cpp/releases/latest
$llamaDir = "\x3COCR_DIR>\llama.cpp"
$zip      = "$env:TEMP\llama-vulkan.zip"
$url      = "https://github.com/ggml-org/llama.cpp/releases/download/$tag/llama-$tag-bin-win-vulkan-x64.zip"

Write-Host "Downloading llama.cpp $tag ..."
Invoke-WebRequest -Uri $url -OutFile $zip

New-Item -ItemType Directory -Force -Path $llamaDir | Out-Null
Expand-Archive $zip -DestinationPath $llamaDir -Force
Remove-Item $zip
Write-Host "LLAMA_INSTALL=DONE"

Output	Action
`LLAMA_INSTALL=DONE`	✅ Continue to Step 2 to download models
Download error	⛔ Check network, or manually download from browser and extract to `\x3COCR_DIR>\llama.cpp\`

Announce: ✅ llama.cpp installed. Continue to Step 2 to download models.

Step 2: Download GLM-OCR Models

📦 Step 2: Checking Python and downloading GLM-OCR models… (only when MODEL_STATUS=MISSING)

Note: Models are downloaded via Python's hf download (huggingface_hub) or modelscope. The script will auto-locate any existing Python installation; if none is found, Miniforge will be installed automatically to %USERPROFILE%\miniforge3 (no admin rights required).

First-time Download Notice (required reading when MODEL_STATUS=MISSING)

Announce the following to the user, then ask whether to proceed:

📥 First-time model download is approximately 1.5 GB
   (GLM-OCR-Q8_0.gguf ~950 MB + mmproj ~484 MB).
   Estimated download time:
   • 100 Mbps connection: ~2 minutes
   •  50 Mbps connection: ~4 minutes
   •  10 Mbps connection: ~20 minutes

   Downloads support resumption — if interrupted, re-running this step
   will automatically continue from where it left off.

   ✅ Ready — start automatic download
   📂 I prefer to download manually — skip automatic download

User chooses automatic download → continue with Python check and download commands below
User chooses manual download → jump to the "Manual Download Fallback" section at the end of this step

Check Disk Space

$drive = Split-Path "\x3COCR_DIR>" -Qualifier
$free  = (Get-PSDrive ($drive.TrimEnd(':'))).Free / 1GB
Write-Host "DISK_FREE=$([math]::Round($free,1))GB"
if ($free -lt 2) {
    Write-Host "DISK_STATUS=LOW"
    Write-Host "[WARN] Less than 2 GB available — download may fail"
} else {
    Write-Host "DISK_STATUS=OK"
}

Output	Action
`DISK_STATUS=OK`	✅ Continue to Python check
`DISK_STATUS=LOW`	⚠️ Ask user to free space before continuing

Check Python

# ── Optional: if you know the Python path, fill it in; leave blank to auto-search ──
$customPythonExe = ""   # e.g. "C:\Python311\python.exe"
# ──────────────────────────────────────────────────────────────────────────────────

$pythonExe = $null

# 1. User-specified path
if ($customPythonExe -and (Test-Path $customPythonExe)) {
    $ver = & $customPythonExe --version 2>&1
    Write-Host "OK: Using specified Python: $customPythonExe -> $ver"
    $pythonExe = $customPythonExe
}

# 2. Search PATH
if (-not $pythonExe) {
    foreach ($cmd in @("python", "python3", "py")) {
        if (Get-Command $cmd -ErrorAction SilentlyContinue) {
            $ver = & $cmd --version 2>&1
            Write-Host "OK: Found Python in PATH: $cmd -> $ver"
            $pythonExe = (Get-Command $cmd).Source
            break
        }
    }
}

# 3. Scan common install directories
if (-not $pythonExe) {
    $searchPaths = @(
        "$env:USERPROFILE\miniforge3\python.exe",
        "$env:USERPROFILE\miniconda3\python.exe",
        "$env:USERPROFILE\anaconda3\python.exe",
        "$env:LOCALAPPDATA\Programs\Python\Python3*\python.exe",
        "C:\Python3*\python.exe"
    )
    foreach ($pattern in $searchPaths) {
        $found = Get-Item $pattern -ErrorAction SilentlyContinue | Select-Object -First 1
        if ($found) {
            $ver = & $found.FullName --version 2>&1
            Write-Host "OK: Found Python in common directory: $($found.FullName) -> $ver"
            $pythonExe = $found.FullName
            break
        }
    }
}

if ($pythonExe) {
    $env:PYTHON_EXE = $pythonExe
    Write-Host "PYTHON_OK"
} else {
    Write-Host "ERROR: Python not found. Install Miniforge or set `$customPythonExe"
    Write-Host "PYTHON_MISSING"
}

If Python is not found, install Miniforge:

$mf = "$env:TEMP\Miniforge3-Windows-x86_64.exe"
Invoke-WebRequest `
  -Uri "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe" `
  -OutFile $mf
Start-Process $mf -ArgumentList "/S /D=$env:USERPROFILE\miniforge3" -Wait
Remove-Item $mf
$env:PYTHON_EXE = "$env:USERPROFILE\miniforge3\python.exe"
& $env:PYTHON_EXE --version
Write-Host "PYTHON_OK"

Download Models

Option A: hf download (recommended)

& $env:PYTHON_EXE -m pip install huggingface_hub -q

# For users in China: set mirror (skip if outside China)
$env:HF_ENDPOINT = "https://hf-mirror.com"

$modelDir = "\x3COCR_DIR>\models\GLM-OCR-GGUF"
New-Item -ItemType Directory -Force -Path $modelDir | Out-Null

hf download ggml-org/GLM-OCR-GGUF `
  --include "GLM-OCR-Q8_0.gguf" "mmproj-GLM-OCR-Q8_0.gguf" `
  --local-dir $modelDir

Write-Host "MODEL_DOWNLOAD=DONE"

Option B: ModelScope (alternative for users in China)

& $env:PYTHON_EXE -m pip install modelscope -q
& $env:PYTHON_EXE -c "
from modelscope.hub.file_download import model_file_download
import os
dest = r'\x3COCR_DIR>\models\GLM-OCR-GGUF'
os.makedirs(dest, exist_ok=True)
model_file_download('ggml-org/GLM-OCR-GGUF', file_path='GLM-OCR-Q8_0.gguf', local_dir=dest)
model_file_download('ggml-org/GLM-OCR-GGUF', file_path='mmproj-GLM-OCR-Q8_0.gguf', local_dir=dest)
print('MODEL_DOWNLOAD=DONE')
"

Verify:

$modelDir = "\x3COCR_DIR>\models\GLM-OCR-GGUF"
Get-Item "$modelDir\GLM-OCR-Q8_0.gguf", "$modelDir\mmproj-GLM-OCR-Q8_0.gguf" |
  Select-Object Name, @{N='MB';E={[math]::Round($_.Length/1MB,0)}}

Output	Action
`MODEL_DOWNLOAD=DONE`	✅ Continue to Step 3
Timeout / repeated failure	⚠️ Direct user to "Manual Download Fallback" section, or switch between Option A / B and retry

Announce: ✅ Model download complete.

Manual Download Fallback

If automatic download repeatedly fails, guide the user to download manually and place files in the correct directory:

⚠️ Automatic download failed. Please manually download the following two files:

1. GLM-OCR-Q8_0.gguf (~950 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/GLM-OCR-Q8_0.gguf

2. mmproj-GLM-OCR-Q8_0.gguf (~484 MB)
   HuggingFace: https://huggingface.co/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   HF Mirror:   https://hf-mirror.com/ggml-org/GLM-OCR-GGUF/resolve/main/mmproj-GLM-OCR-Q8_0.gguf
   ModelScope:  https://modelscope.cn/models/ggml-org/GLM-OCR-GGUF/resolve/master/mmproj-GLM-OCR-Q8_0.gguf

Once downloaded, place both files into:
   \x3COCR_DIR>\models\GLM-OCR-GGUF\

Then re-run the Verify command to confirm the files are intact before continuing to Step 3.

Step 3: Process Recognition Result

🔍 Step 3: Processing GLM-OCR recognition result…

Determine Input Source

Situation	Action
User message contains a local file path (e.g. `C:\Users\...\xxx.png`)	⬇️ Case A: extract path from message, call `llama-cli`
User uploaded an image via the interface; OpenClaw provides a temp path	⬇️ Case B: retrieve temp path from context, call `llama-cli`
Neither	⛔ Ask user to provide a local file path or upload an image

Case A: User Provides a Local File Path

Extract the file path from the user's message, then call llama-cli directly:

# ── Fix encoding ──
chcp 65001 | Out-Null
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8

$imgPath = "\x3Cfile path extracted from user message>"
$m       = "\x3COCR_DIR>\models\GLM-OCR-GGUF\GLM-OCR-Q8_0.gguf"
$mm      = "\x3COCR_DIR>\models\GLM-OCR-GGUF\mmproj-GLM-OCR-Q8_0.gguf"

if (-not (Test-Path $imgPath)) {
    Write-Host "ERROR: File not found: $imgPath"
    exit 1
}

$cliExe = "\x3COCR_DIR>\llama.cpp\llama-cli.exe"
$result = & $cliExe `
  -m $m `
  --mmproj $mm `
  --image $imgPath `
  -p "Please recognize and extract all text from this image. Output the text content line by line, preserving the original layout." `
  -ngl 99 `
  --device Vulkan0 `
  -c 12000 `
  2>$null

Write-Host $result

Success criteria: stdout contains the recognized text content.

Case B: User Uploaded an Image via the Interface

OpenClaw saves uploaded images to a temporary path. Retrieve that path from context and call llama-cli the same way:

# ── Fix encoding ──
chcp 65001 | Out-Null
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8

# imgPath is the temporary image path provided by OpenClaw in context
$imgPath = "\x3Ctemporary image path provided by OpenClaw>"
$m       = "\x3COCR_DIR>\models\GLM-OCR-GGUF\GLM-OCR-Q8_0.gguf"
$mm      = "\x3COCR_DIR>\models\GLM-OCR-GGUF\mmproj-GLM-OCR-Q8_0.gguf"

if (-not (Test-Path $imgPath)) {
    Write-Host "ERROR: File not found: $imgPath"
    exit 1
}

$cliExe = "\x3COCR_DIR>\llama.cpp\llama-cli.exe"
$result = & $cliExe `
  -m $m `
  --mmproj $mm `
  --image $imgPath `
  -p "Please recognize and extract all text from this image. Output the text content line by line, preserving the original layout." `
  -ngl 99 `
  --device Vulkan0 `
  -c 12000 `
  2>$null

Write-Host $result

Success criteria: stdout contains the recognized text content.

Format Output

Once the recognized text is obtained, process it according to the user's intent:

Scenario	Handling
General text extraction	Output the recognized text as-is, preserving original layout
Invoice / receipt	Extract structured fields from the text; output as JSON + human-readable format
Table	Reformat the recognized text as a Markdown table
Business card	Extract name, title, company, phone, email, address; output as JSON
ID / certificate	Output structured by original layout
Screenshot / document	Organize output by paragraph
User-defined	Process according to the user's stated requirements

Completion announcement:

✅ Recognition complete!
Let me know if you'd like to re-process, change the output format, or export to a file.

Situation	Handling
`ERROR: File not found`	File path does not exist — ask user to verify the path
Empty / garbled output	Low image quality — ask user to retake or rescan
Blurry / low-resolution image	Ask user to retake or zoom in before retrying
No text detected	Inform user that no recognizable text was found in the image

Troubleshooting

Error	Cause	Solution
`llama-cli` command not found	llama-cli.exe path not set correctly	Verify `\x3COCR_DIR>\llama.cpp\llama-cli.exe` exists
`ggml_vulkan: no devices found`	Vulkan driver not installed	Update GPU driver
`error: unable to open model`	Incorrect model path	Re-run Pre-flight model check to verify path
`MODEL_DOWNLOAD=` no output	Download interrupted	Switch between Option A / B, or configure proxy
`PYTHON_MISSING`	Python not installed	Install Miniforge (see Step 2)
Garbled / blank output	Low image quality	Improve image quality
VRAM insufficient / crash	Not enough GPU memory	Lower `-ngl` value, or use `--device none`

References

llama.cpp Releases: https://github.com/ggml-org/llama.cpp/releases
GLM-OCR GGUF: https://huggingface.co/ggml-org/GLM-OCR-GGUF

Usage Guidance

This skill appears to implement a local Windows OCR pipeline and will download and extract binaries and large model files into a directory you choose (or auto-selected). Before installing, verify you trust the GitHub release URL and the Hugging Face / ModelScope model source; confirm whether the model requires authentication (HUGGINGFACE_TOKEN) — the skill does not declare that but may prompt for or require a token. Expect the installer to create folders, place executables (llama-cli) and models on disk, and possibly install Miniforge/Python. If you need to hold downloads to known-good checksums or avoid automatic installers, review the PowerShell steps in SKILL.md and run them manually rather than granting autonomous execution.

Capability Analysis

Type: OpenClaw Skill Name: image-ocr-local-aipc Version: 1.0.0 The skill automates the setup of a local OCR environment by downloading and executing binaries (`llama-cli.exe`) and installers (`Miniforge`) from GitHub repositories (github.com/ggml-org and github.com/conda-forge) via PowerShell in `SKILL.md`. While these actions are plausibly required for the stated purpose and the sources are reputable, the automated fetching and execution of remote artifacts, combined with silent software installation on the host system, constitute high-risk behaviors that could be leveraged for exploitation if the sources were compromised.

Capability Assessment

✓ Purpose & Capability

The name/description (local OCR with GLM-OCR and llama.cpp Vulkan) matches the instructions: creating an OCR directory, downloading a pretrained GGUF model and a llama.cpp Vulkan binary, and running local inference. No unrelated capabilities or credentials are requested in the manifest.

ℹ Instruction Scope

SKILL.md instructs the agent to run PowerShell to create directories, set an environment variable, download/extract binaries, and run inference — all expected for a local OCR installer. It does not instruct the agent to read unrelated user files or secrets. However, the instructions reference downloading model files via huggingface_hub or modelscope but do not explain authentication or consent prompts if private models or rate limits apply.

ℹ Install Mechanism

The skill uses legitimate sources: a GitHub releases URL for llama.cpp and huggingface_hub/modelscope for model downloads. These are expected for this use case. Risk: the install will write and execute binaries and large model files to disk and can automatically install Miniforge if Python is missing — benign if you trust the sources, but carries the usual risks of executing downloaded binaries.

⚠ Credentials

The declared requirements list no credentials, but the runtime instructions rely on huggingface_hub or modelscope to download model artifacts. If the model is gated or large-files require an HF token, the skill may implicitly require HUGGINGFACE_TOKEN or similar credentials (not declared). This is a proportionality mismatch and worth clarifying before use.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or global agent settings. It creates files and installs software under a user-specified or auto-selected directory (normal for a local tool). It sets an environment variable in-session only.

Version History

v1.0.0

image-ocr-local-aipc v1.0.0 - Initial release of local image OCR skill for Windows using the GLM-OCR model. - Recognizes and extracts text from images, supporting mixed Chinese/English. - Prioritizes Intel iGPU (Vulkan) for on-device inference, no cloud API needed. - Automatic setup: checks/installations for llama.cpp, model downloads, and environment preparation. - Supports progress announcements, user opt-in for large initial model downloads, and manual model fallback.

Metadata

Slug image-ocr-local-aipc

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is image-ocr-local-AIPC?

Image OCR, text recognition, extract text from image, scan document, read image text, invoice OCR, receipt OCR, contract recognition, table extraction, busin... It is an AI Agent Skill for Claude Code / OpenClaw, with 176 downloads so far.

How do I install image-ocr-local-AIPC?

Run "/install image-ocr-local-aipc" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image-ocr-local-AIPC free?

Yes, image-ocr-local-AIPC is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image-ocr-local-AIPC support?

image-ocr-local-AIPC is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image-ocr-local-AIPC?

It is built and maintained by violet17 (@violet17); the current version is v1.0.0.

More Skills

image-ocr-local-AIPC