← Back to Skills Marketplace
jaredforreal

GLM-OCR

by Jared Wen · GitHub ↗ · v1.0.4 · MIT-0
cross-platform ✓ Security Clean
590
Downloads
2
Stars
3
Active Installs
5
Versions
Install in OpenClaw
/install glmocr
Description
Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti...
README (SKILL.md)

GLM-OCR Text Extraction Skill

Extract text from images and PDFs using the GLM-OCR layout parsing API.

When to Use

  • Extract text from images (PNG, JPG, PDF)
  • Convert screenshots to text
  • Process scanned documents
  • OCR photos containing text (including handwritten text)
  • Recognize tables and formulas in documents
  • User mentions "OCR", "文字识别", "文档解析"

Key Features

  • Table recognition: Detects and converts tables to Markdown format
  • Formula extraction: LaTeX format output
  • Handwriting support: Strong recognition for handwritten text
  • Local file & URL: Supports both local files and remote URLs

Resource Links

Resource Link
Get API Key https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
GitHub https://github.com/zai-org/GLM-OCR

Prerequisites

  • ZHIPU_API_KEY configured (see Setup below)

Security Notes

  • No runtime package installation is performed by the scripts.
  • OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
  • Only ZHIPU_API_KEY (and optional timeout) is read from environment variables.

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

  1. ONLY use GLM-OCR API - Execute the script python scripts/glm_ocr_cli.py
  2. NEVER parse documents directly - Do NOT try to extract text yourself
  3. NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
  4. IF API fails - Display the error message and STOP immediately
  5. NO fallback methods - Do NOT attempt text extraction any other way

Setup

  1. Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
  2. Configure:
    python scripts/config_setup.py setup --api-key YOUR_KEY
    

How to Use

Extract from URL

python scripts/glm_ocr_cli.py --file-url "URL provided by user"

Extract from Local File

python scripts/glm_ocr_cli.py --file /path/to/image.jpg

Save result to file (recommended)

python scripts/glm_ocr_cli.py --file-url "URL" --output result.json

CLI Reference

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]
Parameter Required Description
--file-url One of URL to image/PDF
--file One of Local file path to image/PDF
--output, -o No Save result JSON to file
--pretty No Pretty-print JSON output

Response Format

{
  "ok": true,
  "text": "# Extracted text in Markdown...",
  "layout_details": [[...]],
  "result": { "raw_api_response": "..." },
  "error": null,
  "source": "/path/to/file.jpg",
  "source_type": "file"
}

Key fields:

  • ok — whether extraction succeeded
  • text — extracted text in Markdown (use this for display)
  • layout_details — layout analysis details
  • result — raw API response
  • error — error details on failure

Error Handling

API key not configured:

Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Reference

  • references/output_schema.md — detailed output format specification
Usage Guidance
This skill appears to do what it says: it uploads the image/PDF you provide (URL or local file, base64-encoded) to the official GLM-OCR endpoint (open.bigmodel.cn) using your ZHIPU_API_KEY. Before installing or using it: (1) Do not pass sensitive documents you don't want uploaded to the remote service. (2) Keep the .env file containing your API key out of version control (config_setup.py even reminds you to add .env to .gitignore). (3) Confirm you trust the GLM provider (bigmodel.cn) with any data you send and rotate the API key if it may have been exposed. (4) If you want extra assurance, inspect the full, untruncated copy of glm_ocr_cli.py in your environment to confirm there are no additional network endpoints or unexpected behaviors beyond the visible code.
Capability Analysis
Type: OpenClaw Skill Name: glmocr Version: 1.0.4 The skill is a well-implemented wrapper for the GLM-OCR API. It includes security-conscious features such as hardcoding the official API endpoint (https://open.bigmodel.cn) in `scripts/glm_ocr_cli.py` to prevent API key exfiltration via custom URLs and providing a configuration script (`scripts/config_setup.py`) that warns users to keep their `.env` files out of version control. The instructions in `SKILL.md` are focused on the stated task and do not contain any malicious prompt injection or redirection.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
Name/description (OCR, table/formula/handwriting extraction) align with the code and declared requirements. The skill only requires the GLM provider API key (ZHIPU_API_KEY) and an optional timeout, and the code posts to the official GLM endpoint (open.bigmodel.cn). Required files (Python scripts and a small requirements.txt) are proportionate to the task.
Instruction Scope
SKILL.md instructs the agent to run the provided CLI script(s) and to use only the official API. The code reads a .env in the skill directory (if present) and will base64-encode and upload either a user-supplied URL or a local file to the GLM endpoint — which is expected for an OCR skill. Note: because the skill uploads whichever file you point it at, do not supply paths to sensitive local files if you do not want them transmitted to the GLM service.
Install Mechanism
No automatic install spec; the package is instruction-only with included scripts. A requirements.txt lists only 'requests>=2.31.0'. The CLI prints a clear error if 'requests' is missing and instructs how to install it; no remote arbitrary binaries or archive downloads are present.
Credentials
Only ZHIPU_API_KEY (primary credential) and GLM_OCR_TIMEOUT are required; both are justified. The config helper writes/reads a .env file inside the skill folder to persist the API key — this is expected but the user should avoid committing that file to version control.
Persistence & Privilege
Skill is not always-enabled and does not modify other skills or global agent settings. The only persistent change the provided tooling makes is creating/updating a .env file in the skill directory when the user runs the config setup — a normal configuration behavior.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install glmocr
  3. After installation, invoke the skill by name or use /glmocr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
Version 1.0.4 of the GLM-OCR skill - Fixed metadata: removed the unused "bins: python" requirement. - No functional changes to the code or CLI usage. - All documented features, security restrictions, and usage guidelines remain unchanged.
v1.0.3
- The required environment variable for authentication changed from GLM_OCR_API_KEY to ZHIPU_API_KEY. - References to GLM_OCR_API_KEY were updated to ZHIPU_API_KEY throughout the documentation, including in prerequisites and error handling. - No functional or CLI changes; documentation reflects the new API key requirement.
v1.0.2
- Added support for the GLM_OCR_TIMEOUT environment variable for configurable request timeouts. - Clarified that the scripts do not install packages at runtime. - Updated security notes: requests use only the official endpoint and do not allow custom API URLs. - Minor clarifications in the documentation to reflect these changes.
v1.0.1
- Added metadata section with requirements for environment variable and python binary. - Set GLM_OCR_API_KEY as primary environment variable. - Included an emoji ("📄") and a homepage link in metadata. - No functional or CLI changes. Documentation and behavior remain unchanged.
v1.0.0
Initial release of GLM-OCR text extraction skill. - Extracts text from images and PDFs using GLM-OCR API. - Supports table recognition (Markdown), formula extraction (LaTeX), and handwriting OCR. - Handles both local files and remote URLs for input. - Enforces strict usage of the GLM-OCR API only; provides direct error messages if API fails or misconfigured. - Setup guides and CLI usage examples included for easy integration.
Metadata
Slug glmocr
Version 1.0.4
License MIT-0
All-time Installs 3
Active Installs 3
Total Versions 5
Frequently Asked Questions

What is GLM-OCR?

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti... It is an AI Agent Skill for Claude Code / OpenClaw, with 590 downloads so far.

How do I install GLM-OCR?

Run "/install glmocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM-OCR free?

Yes, GLM-OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GLM-OCR support?

GLM-OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM-OCR?

It is built and maintained by Jared Wen (@jaredforreal); the current version is v1.0.4.

💬 Comments