/install glmocr
GLM-OCR Text Extraction Skill
Extract text from images and PDFs using the GLM-OCR layout parsing API.
When to Use
- Extract text from images (PNG, JPG, PDF)
- Convert screenshots to text
- Process scanned documents
- OCR photos containing text (including handwritten text)
- Recognize tables and formulas in documents
- User mentions "OCR", "文字识别", "文档解析"
Key Features
- Table recognition: Detects and converts tables to Markdown format
- Formula extraction: LaTeX format output
- Handwriting support: Strong recognition for handwritten text
- Local file & URL: Supports both local files and remote URLs
Resource Links
| Resource | Link |
|---|---|
| Get API Key | https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys |
| GitHub | https://github.com/zai-org/GLM-OCR |
Prerequisites
- ZHIPU_API_KEY configured (see Setup below)
Security Notes
- No runtime package installation is performed by the scripts.
- OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
- Only
ZHIPU_API_KEY(and optional timeout) is read from environment variables.
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
- ONLY use GLM-OCR API - Execute the script
python scripts/glm_ocr_cli.py - NEVER parse documents directly - Do NOT try to extract text yourself
- NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt text extraction any other way
Setup
- Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
- Configure:
python scripts/config_setup.py setup --api-key YOUR_KEY
How to Use
Extract from URL
python scripts/glm_ocr_cli.py --file-url "URL provided by user"
Extract from Local File
python scripts/glm_ocr_cli.py --file /path/to/image.jpg
Save result to file (recommended)
python scripts/glm_ocr_cli.py --file-url "URL" --output result.json
CLI Reference
python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]
| Parameter | Required | Description |
|---|---|---|
--file-url |
One of | URL to image/PDF |
--file |
One of | Local file path to image/PDF |
--output, -o |
No | Save result JSON to file |
--pretty |
No | Pretty-print JSON output |
Response Format
{
"ok": true,
"text": "# Extracted text in Markdown...",
"layout_details": [[...]],
"result": { "raw_api_response": "..." },
"error": null,
"source": "/path/to/file.jpg",
"source_type": "file"
}
Key fields:
ok— whether extraction succeededtext— extracted text in Markdown (use this for display)layout_details— layout analysis detailsresult— raw API responseerror— error details on failure
Error Handling
API key not configured:
Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Reference
references/output_schema.md— detailed output format specification
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install glmocr - After installation, invoke the skill by name or use
/glmocr - Provide required inputs per the skill's parameter spec and get structured output
What is GLM-OCR?
Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti... It is an AI Agent Skill for Claude Code / OpenClaw, with 590 downloads so far.
How do I install GLM-OCR?
Run "/install glmocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is GLM-OCR free?
Yes, GLM-OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does GLM-OCR support?
GLM-OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created GLM-OCR?
It is built and maintained by Jared Wen (@jaredforreal); the current version is v1.0.4.