← 返回 Skills 市场

GLM-OCR

Name: GLM-OCR
Author: jaredforreal

作者 Jared Wen · GitHub ↗ · v1.0.4 · MIT-0

cross-platform ✓ 安全检测通过

590

总下载

当前安装

版本数

在 OpenClaw 中安装

/install glmocr

功能描述

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti...

使用说明 (SKILL.md)

GLM-OCR Text Extraction Skill

Extract text from images and PDFs using the GLM-OCR layout parsing API.

When to Use

Extract text from images (PNG, JPG, PDF)
Convert screenshots to text
Process scanned documents
OCR photos containing text (including handwritten text)
Recognize tables and formulas in documents
User mentions "OCR", "文字识别", "文档解析"

Key Features

Table recognition: Detects and converts tables to Markdown format
Formula extraction: LaTeX format output
Handwriting support: Strong recognition for handwritten text
Local file & URL: Supports both local files and remote URLs

Resource Links

Resource	Link
Get API Key	https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
GitHub	https://github.com/zai-org/GLM-OCR

Prerequisites

ZHIPU_API_KEY configured (see Setup below)

Security Notes

No runtime package installation is performed by the scripts.
OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
Only ZHIPU_API_KEY (and optional timeout) is read from environment variables.

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use GLM-OCR API - Execute the script python scripts/glm_ocr_cli.py
NEVER parse documents directly - Do NOT try to extract text yourself
NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt text extraction any other way

Setup

Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

Configure:

python scripts/config_setup.py setup --api-key YOUR_KEY

How to Use

Extract from URL

python scripts/glm_ocr_cli.py --file-url "URL provided by user"

Extract from Local File

python scripts/glm_ocr_cli.py --file /path/to/image.jpg

Save result to file (recommended)

python scripts/glm_ocr_cli.py --file-url "URL" --output result.json

CLI Reference

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output

Response Format

{
  "ok": true,
  "text": "# Extracted text in Markdown...",
  "layout_details": [[...]],
  "result": { "raw_api_response": "..." },
  "error": null,
  "source": "/path/to/file.jpg",
  "source_type": "file"
}

Key fields:

ok — whether extraction succeeded
text — extracted text in Markdown (use this for display)
layout_details — layout analysis details
result — raw API response
error — error details on failure

Error Handling

API key not configured:

Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Reference

references/output_schema.md — detailed output format specification

安全使用建议

This skill appears to do what it says: it uploads the image/PDF you provide (URL or local file, base64-encoded) to the official GLM-OCR endpoint (open.bigmodel.cn) using your ZHIPU_API_KEY. Before installing or using it: (1) Do not pass sensitive documents you don't want uploaded to the remote service. (2) Keep the .env file containing your API key out of version control (config_setup.py even reminds you to add .env to .gitignore). (3) Confirm you trust the GLM provider (bigmodel.cn) with any data you send and rotate the API key if it may have been exposed. (4) If you want extra assurance, inspect the full, untruncated copy of glm_ocr_cli.py in your environment to confirm there are no additional network endpoints or unexpected behaviors beyond the visible code.

功能分析

Type: OpenClaw Skill Name: glmocr Version: 1.0.4 The skill is a well-implemented wrapper for the GLM-OCR API. It includes security-conscious features such as hardcoding the official API endpoint (https://open.bigmodel.cn) in `scripts/glm_ocr_cli.py` to prevent API key exfiltration via custom URLs and providing a configuration script (`scripts/config_setup.py`) that warns users to keep their `.env` files out of version control. The instructions in `SKILL.md` are focused on the stated task and do not contain any malicious prompt injection or redirection.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

Name/description (OCR, table/formula/handwriting extraction) align with the code and declared requirements. The skill only requires the GLM provider API key (ZHIPU_API_KEY) and an optional timeout, and the code posts to the official GLM endpoint (open.bigmodel.cn). Required files (Python scripts and a small requirements.txt) are proportionate to the task.

✓ Instruction Scope

SKILL.md instructs the agent to run the provided CLI script(s) and to use only the official API. The code reads a .env in the skill directory (if present) and will base64-encode and upload either a user-supplied URL or a local file to the GLM endpoint — which is expected for an OCR skill. Note: because the skill uploads whichever file you point it at, do not supply paths to sensitive local files if you do not want them transmitted to the GLM service.

✓ Install Mechanism

No automatic install spec; the package is instruction-only with included scripts. A requirements.txt lists only 'requests>=2.31.0'. The CLI prints a clear error if 'requests' is missing and instructs how to install it; no remote arbitrary binaries or archive downloads are present.

✓ Credentials

Only ZHIPU_API_KEY (primary credential) and GLM_OCR_TIMEOUT are required; both are justified. The config helper writes/reads a .env file inside the skill folder to persist the API key — this is expected but the user should avoid committing that file to version control.

✓ Persistence & Privilege

Skill is not always-enabled and does not modify other skills or global agent settings. The only persistent change the provided tooling makes is creating/updating a .env file in the skill directory when the user runs the config setup — a normal configuration behavior.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install glmocr
安装完成后，直接呼叫该 Skill 的名称或使用 /glmocr 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.4

Version 1.0.4 of the GLM-OCR skill - Fixed metadata: removed the unused "bins: python" requirement. - No functional changes to the code or CLI usage. - All documented features, security restrictions, and usage guidelines remain unchanged.

v1.0.3

- The required environment variable for authentication changed from GLM_OCR_API_KEY to ZHIPU_API_KEY. - References to GLM_OCR_API_KEY were updated to ZHIPU_API_KEY throughout the documentation, including in prerequisites and error handling. - No functional or CLI changes; documentation reflects the new API key requirement.

v1.0.2

- Added support for the GLM_OCR_TIMEOUT environment variable for configurable request timeouts. - Clarified that the scripts do not install packages at runtime. - Updated security notes: requests use only the official endpoint and do not allow custom API URLs. - Minor clarifications in the documentation to reflect these changes.

v1.0.1

- Added metadata section with requirements for environment variable and python binary. - Set GLM_OCR_API_KEY as primary environment variable. - Included an emoji ("📄") and a homepage link in metadata. - No functional or CLI changes. Documentation and behavior remain unchanged.

v1.0.0

Initial release of GLM-OCR text extraction skill. - Extracts text from images and PDFs using GLM-OCR API. - Supports table recognition (Markdown), formula extraction (LaTeX), and handwriting OCR. - Handles both local files and remote URLs for input. - Enforces strict usage of the GLM-OCR API only; provides direct error messages if API fails or misconfigured. - Setup guides and CLI usage examples included for easy integration.

元数据

Slug glmocr

版本 1.0.4

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 5

常见问题

GLM-OCR 是什么？

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recogniti... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 590 次。

如何安装 GLM-OCR？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install glmocr」即可一键安装，无需额外配置。

GLM-OCR 是免费的吗？

是的，GLM-OCR 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

GLM-OCR 支持哪些平台？

GLM-OCR 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 GLM-OCR？

由 Jared Wen（@jaredforreal）开发并维护，当前版本 v1.0.4。