Description

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti...

README (SKILL.md)

GLM-OCR Text Extraction Skill

Name: GLM-OCR
Author: jaredforreal

Extract text from images and PDFs using the GLM-OCR layout parsing API.

When to Use

Extract text from images (PNG, JPG, PDF)
Convert screenshots to text
Process scanned documents
OCR photos containing text (including handwritten text)
Recognize tables and formulas in documents
User mentions "OCR", "文字识别", "文档解析"

Key Features

Table recognition: Detects and converts tables to Markdown format
Formula extraction: LaTeX format output
Handwriting support: Strong recognition for handwritten text
Local file & URL: Supports both local files and remote URLs

Resource Links

Resource	Link
Get API Key	https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
GitHub	https://github.com/zai-org/GLM-OCR

Prerequisites

ZHIPU_API_KEY configured (see Setup below)

Security Notes

No runtime package installation is performed by the scripts.
OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
Only ZHIPU_API_KEY (and optional timeout) is read from environment variables.

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use GLM-OCR API - Execute the script python scripts/glm_ocr_cli.py
NEVER parse documents directly - Do NOT try to extract text yourself
NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt text extraction any other way

Setup

Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

Configure:

python scripts/config_setup.py setup --api-key YOUR_KEY

How to Use

Extract from URL

python scripts/glm_ocr_cli.py --file-url "URL provided by user"

Extract from Local File

python scripts/glm_ocr_cli.py --file /path/to/image.jpg

Save result to file (recommended)

python scripts/glm_ocr_cli.py --file-url "URL" --output result.json

CLI Reference

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output

Response Format

{
  "ok": true,
  "text": "# Extracted text in Markdown...",
  "layout_details": [[...]],
  "result": { "raw_api_response": "..." },
  "error": null,
  "source": "/path/to/file.jpg",
  "source_type": "file"
}

Key fields:

ok — whether extraction succeeded
text — extracted text in Markdown (use this for display)
layout_details — layout analysis details
result — raw API response
error — error details on failure

Error Handling

API key not configured:

Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Reference

references/output_schema.md — detailed output format specification

Usage Guidance

This skill appears to do what it says: it uploads the image/PDF you provide (URL or local file, base64-encoded) to the official GLM-OCR endpoint (open.bigmodel.cn) using your ZHIPU_API_KEY. Before installing or using it: (1) Do not pass sensitive documents you don't want uploaded to the remote service. (2) Keep the .env file containing your API key out of version control (config_setup.py even reminds you to add .env to .gitignore). (3) Confirm you trust the GLM provider (bigmodel.cn) with any data you send and rotate the API key if it may have been exposed. (4) If you want extra assurance, inspect the full, untruncated copy of glm_ocr_cli.py in your environment to confirm there are no additional network endpoints or unexpected behaviors beyond the visible code.

Capability Analysis

Type: OpenClaw Skill Name: glmocr Version: 1.0.4 The skill is a well-implemented wrapper for the GLM-OCR API. It includes security-conscious features such as hardcoding the official API endpoint (https://open.bigmodel.cn) in `scripts/glm_ocr_cli.py` to prevent API key exfiltration via custom URLs and providing a configuration script (`scripts/config_setup.py`) that warns users to keep their `.env` files out of version control. The instructions in `SKILL.md` are focused on the stated task and do not contain any malicious prompt injection or redirection.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (OCR, table/formula/handwriting extraction) align with the code and declared requirements. The skill only requires the GLM provider API key (ZHIPU_API_KEY) and an optional timeout, and the code posts to the official GLM endpoint (open.bigmodel.cn). Required files (Python scripts and a small requirements.txt) are proportionate to the task.

✓ Instruction Scope

SKILL.md instructs the agent to run the provided CLI script(s) and to use only the official API. The code reads a .env in the skill directory (if present) and will base64-encode and upload either a user-supplied URL or a local file to the GLM endpoint — which is expected for an OCR skill. Note: because the skill uploads whichever file you point it at, do not supply paths to sensitive local files if you do not want them transmitted to the GLM service.

✓ Install Mechanism

No automatic install spec; the package is instruction-only with included scripts. A requirements.txt lists only 'requests>=2.31.0'. The CLI prints a clear error if 'requests' is missing and instructs how to install it; no remote arbitrary binaries or archive downloads are present.

✓ Credentials

Only ZHIPU_API_KEY (primary credential) and GLM_OCR_TIMEOUT are required; both are justified. The config helper writes/reads a .env file inside the skill folder to persist the API key — this is expected but the user should avoid committing that file to version control.

✓ Persistence & Privilege

Skill is not always-enabled and does not modify other skills or global agent settings. The only persistent change the provided tooling makes is creating/updating a .env file in the skill directory when the user runs the config setup — a normal configuration behavior.

Version History

v1.0.4

Version 1.0.4 of the GLM-OCR skill - Fixed metadata: removed the unused "bins: python" requirement. - No functional changes to the code or CLI usage. - All documented features, security restrictions, and usage guidelines remain unchanged.

v1.0.3

- The required environment variable for authentication changed from GLM_OCR_API_KEY to ZHIPU_API_KEY. - References to GLM_OCR_API_KEY were updated to ZHIPU_API_KEY throughout the documentation, including in prerequisites and error handling. - No functional or CLI changes; documentation reflects the new API key requirement.

v1.0.2

- Added support for the GLM_OCR_TIMEOUT environment variable for configurable request timeouts. - Clarified that the scripts do not install packages at runtime. - Updated security notes: requests use only the official endpoint and do not allow custom API URLs. - Minor clarifications in the documentation to reflect these changes.

v1.0.1

- Added metadata section with requirements for environment variable and python binary. - Set GLM_OCR_API_KEY as primary environment variable. - Included an emoji ("📄") and a homepage link in metadata. - No functional or CLI changes. Documentation and behavior remain unchanged.

v1.0.0

Initial release of GLM-OCR text extraction skill. - Extracts text from images and PDFs using GLM-OCR API. - Supports table recognition (Markdown), formula extraction (LaTeX), and handwriting OCR. - Handles both local files and remote URLs for input. - Enforces strict usage of the GLM-OCR API only; provides direct error messages if API fails or misconfigured. - Setup guides and CLI usage examples included for easy integration.

Metadata

Slug glmocr

Version 1.0.4

License MIT-0

All-time Installs 3

Active Installs 3

Total Versions 5

Frequently Asked Questions

What is GLM-OCR?

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti... It is an AI Agent Skill for Claude Code / OpenClaw, with 590 downloads so far.

How do I install GLM-OCR?

Run "/install glmocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM-OCR free?

Yes, GLM-OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GLM-OCR support?

GLM-OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM-OCR?

It is built and maintained by Jared Wen (@jaredforreal); the current version is v1.0.4.

More Skills

GLM-OCR