/install glmocr
GLM-OCR Text Extraction Skill
Extract text from images and PDFs using the GLM-OCR layout parsing API.
When to Use
- Extract text from images (PNG, JPG, PDF)
- Convert screenshots to text
- Process scanned documents
- OCR photos containing text (including handwritten text)
- Recognize tables and formulas in documents
- User mentions "OCR", "文字识别", "文档解析"
Key Features
- Table recognition: Detects and converts tables to Markdown format
- Formula extraction: LaTeX format output
- Handwriting support: Strong recognition for handwritten text
- Local file & URL: Supports both local files and remote URLs
Resource Links
| Resource | Link |
|---|---|
| Get API Key | https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys |
| GitHub | https://github.com/zai-org/GLM-OCR |
Prerequisites
- ZHIPU_API_KEY configured (see Setup below)
Security Notes
- No runtime package installation is performed by the scripts.
- OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
- Only
ZHIPU_API_KEY(and optional timeout) is read from environment variables.
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
- ONLY use GLM-OCR API - Execute the script
python scripts/glm_ocr_cli.py - NEVER parse documents directly - Do NOT try to extract text yourself
- NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt text extraction any other way
Setup
- Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
- Configure:
python scripts/config_setup.py setup --api-key YOUR_KEY
How to Use
Extract from URL
python scripts/glm_ocr_cli.py --file-url "URL provided by user"
Extract from Local File
python scripts/glm_ocr_cli.py --file /path/to/image.jpg
Save result to file (recommended)
python scripts/glm_ocr_cli.py --file-url "URL" --output result.json
CLI Reference
python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]
| Parameter | Required | Description |
|---|---|---|
--file-url |
One of | URL to image/PDF |
--file |
One of | Local file path to image/PDF |
--output, -o |
No | Save result JSON to file |
--pretty |
No | Pretty-print JSON output |
Response Format
{
"ok": true,
"text": "# Extracted text in Markdown...",
"layout_details": [[...]],
"result": { "raw_api_response": "..." },
"error": null,
"source": "/path/to/file.jpg",
"source_type": "file"
}
Key fields:
ok— whether extraction succeededtext— extracted text in Markdown (use this for display)layout_details— layout analysis detailsresult— raw API responseerror— error details on failure
Error Handling
API key not configured:
Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Reference
references/output_schema.md— detailed output format specification
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install glmocr - 安装完成后,直接呼叫该 Skill 的名称或使用
/glmocr触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
GLM-OCR 是什么?
Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 590 次。
如何安装 GLM-OCR?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install glmocr」即可一键安装,无需额外配置。
GLM-OCR 是免费的吗?
是的,GLM-OCR 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
GLM-OCR 支持哪些平台?
GLM-OCR 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 GLM-OCR?
由 Jared Wen(@jaredforreal)开发并维护,当前版本 v1.0.4。