功能描述

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equation...

使用说明 (SKILL.md)

GLM-OCR Formula Recognition Skill / GLM-OCR 公式识别技能

Name: GLM-OCR-Formula
Author: jaredforreal

Recognize mathematical formulas from images and PDFs and convert them to LaTeX format using the ZhiPu GLM-OCR layout parsing API.

When to Use / 使用场景

Extract mathematical formulas from images or scanned documents / 从图片或扫描件中提取数学公式
Convert formula images to LaTeX / 将公式图片转为 LaTeX 格式
Recognize complex equations, integrals, matrices / 识别复杂方程、积分、矩阵
Parse scientific papers, textbooks, exam papers with formulas / 解析含公式的论文、教材、试卷
User mentions "formula OCR", "extract formula", "公式识别", "公式OCR", "提取公式", "图片转LaTeX"

Key Features / 核心特性

Complex formula support: Handles integrals, summations, matrices, fractions, radicals
LaTeX output: Formulas are output in LaTeX format, ready for use in documents
Inline & block formulas: Recognizes both inline and display-style formulas
Mixed content: Can handle documents with both text and formulas
Local file & URL: Supports both local files and remote URLs

Resource Links / 资源链接

Resource	Link
Get API Key	智谱开放平台 API Keys
API Docs	Layout Parsing / 版面解析

Prerequisites / 前置条件

API Key Setup / API Key 配置（Required / 必需）

脚本通过 ZHIPU_API_KEY 环境变量获取密钥，可与其他智谱技能复用同一个 key。 This script reads the key from the ZHIPU_API_KEY environment variable. Reusing the same key across Zhipu skills is optional.

Get Key / 获取 Key： Visit 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

Global config (recommended) / 全局配置（推荐）： Set once in openclaw.json under env.vars, all Zhipu skills will share it:
```
{
  "env": {
    "vars": {
      "ZHIPU_API_KEY": "你的密钥"
    }
  }
}
```

Skill-level config / Skill 级别配置： Set for this skill only in openclaw.json:

{
  "skills": {
    "entries": {
      "glmocr-formula": {
        "env": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
  }
}

Shell environment variable / Shell 环境变量： Add to ~/.zshrc:
```
export ZHIPU_API_KEY="你的密钥"
```

💡 如果你已为其他智谱 skill（如 glmocr、glmv-caption、glm-image-generation）配置过 key，它们共享同一个 ZHIPU_API_KEY，无需重复配置。

Security & Transparency / 安全与透明度

Environment variables used / 使用的环境变量：
- ZHIPU_API_KEY (required / 必需)
- GLM_OCR_TIMEOUT (optional timeout seconds / 可选超时秒数)
Fixed endpoint / 固定官方端点： https://open.bigmodel.cn/api/paas/v4/layout_parsing
No custom API URL override / 不支持自定义 API URL 覆盖： avoids accidental key exfiltration via redirected endpoints.
Raw upstream response is optional / 原始响应默认不返回： use --include-raw only when needed for debugging.

⛔ MANDATORY RESTRICTIONS / 强制限制 ⛔

ONLY use GLM-OCR API — Execute the script python scripts/glm_ocr_cli.py
NEVER parse formulas yourself — Do NOT try to extract formulas using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to read it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt formula extraction any other way

📋 Output Display Rules / 输出展示规则

After running the script, present the OCR result clearly and safely.

Show extracted text/formulas (text) in full
Summarization is allowed, but do not hide important extraction failures
If layout_details contains formula-related entries, you may highlight them
If the result file is saved, tell the user the file path
Show raw upstream response only when explicitly requested or debugging (--include-raw)

⚠️ LaTeX Rendering / LaTeX 渲染注意：

OCR API returns formulas in LaTeX format (e.g., $\frac{1}{2}$ , $ heta^{x+1}$ ). Since most chat platforms do not render LaTeX, you should ask the user once (on first use):

"OCR 结果包含 LaTeX 公式，需要我将公式转为 Unicode 可读格式展示，还是保留原始 LaTeX？"

Remember the user's choice for the rest of the session. Do NOT ask again on subsequent calls unless the user explicitly changes their preference.

User chooses readable format → convert LaTeX to Unicode/plain-text:

LaTeX	Unicode / 纯文本
$\frac{a}{b}$	a/b
$x^{n}$	x^n
$x_{i}$	xᵢ
$\sqrt{x}$	√x
$ heta$	θ
$\phi$	φ
$ herefore$	∴
$\Rightarrow$	⇒
$\left\{ \begin{array}{l} ... \end{array} \right.$	⎧ line1 ⎨ line2 ⎩
$ extcircled{1}$	①
$\in$	∈
$\infty$	∞
$\ln$	ln
$\leq$ / $\geq$	≤ / ≥

User chooses raw LaTeX → display the original LaTeX output directly, and remind them the raw data is also saved in the output file if --output was used.

How to Use / 使用方法

Extract from URL / 从 URL 提取

python scripts/glm_ocr_cli.py --file-url "https://example.com/formula.png"

Extract from Local File / 从本地文件提取

python scripts/glm_ocr_cli.py --file /path/to/equation.png

Save Result to File / 保存结果到文件

python scripts/glm_ocr_cli.py --file formula.png --output result.json --pretty

Include Raw Upstream Response (Debug Only) / 包含原始上游响应（仅调试）

python scripts/glm_ocr_cli.py --file formula.png --output result.json --include-raw

CLI Reference / CLI 参数

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output
`--include-raw`	No	Include raw upstream API response in `result` field (debug only)

Response Format / 响应格式

{
  "ok": true,
  "text": "Extracted formulas and text in Markdown/LaTeX...",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

Key fields:

ok — whether extraction succeeded
text — extracted text in Markdown with LaTeX formulas
layout_details — layout analysis details
error — error details on failure

Error Handling / 错误处理

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

安全使用建议

This skill appears to do what it says: it calls ZhiPu's GLM‑OCR service using the ZHIPU_API_KEY you provide. Before installing, consider: (1) only provide a Zhipu API key you trust to use with this skill (or a dedicated key for isolation); (2) the script will read any local file you give it and include its contents (base64) in the API request — do not pass sensitive local files (private keys, secrets, personal documents) you don't want sent to the OCR service; (3) the SKILL.md forbids local fallback parsing — if the API is down the skill will stop per instructions; (4) ensure the execution environment has Python and the 'requests' package available or install it yourself. If you want tighter control, create a Zhipu API key with limited scope/quota or only run the script on files you explicitly approve.

功能分析

Type: OpenClaw Skill Name: glmocr-formula Version: 1.0.4 The skill is a legitimate tool for extracting mathematical formulas from images and PDFs using the ZhiPu GLM-OCR API. The Python script (scripts/glm_ocr_cli.py) implements secure practices by hardcoding the official API endpoint (open.bigmodel.cn) to prevent credential exfiltration and uses environment variables for API key management. The instructions in SKILL.md are well-defined, providing clear usage constraints and user-interaction guidelines without any signs of malicious prompt injection or unauthorized data access.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

The skill is explicitly an OCR→LaTeX wrapper around ZhiPu's GLM‑OCR API. The declared env vars (ZHIPU_API_KEY, GLM_OCR_TIMEOUT) and the primaryEnv match that purpose. No unrelated binaries, config paths, or extra credentials are requested.

✓ Instruction Scope

SKILL.md instructs the agent to run the included Python CLI and to only use the official GLM‑OCR API. It requires reading user-supplied local files (encoded as base64) or URLs — appropriate for OCR. The doc's strict 'no fallback' and 'only use API' rules are unusual but coherent with the author's intent and the code. The skill does not instruct reading other system files or environment variables beyond those declared.

✓ Install Mechanism

There is no install spec (instruction-only plus an included script). The bundled script uses the widely used 'requests' package and exits with an informative message if it's missing. No downloads from untrusted URLs or archive extraction are present.

✓ Credentials

Only ZHIPU_API_KEY (primary) and an optional GLM_OCR_TIMEOUT are required — both justified. The script sends the key to the documented official endpoint (https://open.bigmodel.cn/api/paas/v4/layout_parsing). No unrelated secrets or multiple service credentials are requested.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated or persistent system privileges. It does not modify other skills or system-wide configs. Autonomous invocation is allowed but is the platform default and not combinated with other red flags.

版本历史

v1.0.4

- No user-visible changes; this version contains no updates to code or documentation. - All features, behavior, and documentation remain unchanged from the previous version.

v1.0.3

glm-ocr-formula v1.0.3 changelog - Clarified this is the "official" formula extraction skill in the description. - Updated supported emoji from "📄" to "📐" for better formula context. - Improved and reorganized API key setup instructions, emphasizing global and skill-level config options. - Added a dedicated "Security & Transparency" section—documenting use of fixed official API endpoints and preventing custom override for safety. - Added optional CLI flag `--include-raw` to include raw upstream API responses for debugging. - Restated output display rules and LaTeX rendering prompt for clarity and safer downstream usage. - Updated API docs/resource URLs for improved accuracy and access.

v1.0.2

Version 1.0.2 of glm-ocr-formula includes updated documentation and usage guidelines: - Clarified and tightened mandatory output requirements: always display full extracted formulas and text, not summaries. - Made restrictions more prominent and concise, emphasizing exclusive use of the GLM-OCR API, no fallbacks, and strict error handling. - Improved instructions for LaTeX rendering choice (Unicode vs. raw LaTeX) and clarified user session behavior. - Updated API key setup section for clarity and consistency. - Removed obsolete options and references (such as `--include-raw`). - Revised CLI parameter and response format documentation for accuracy and simplicity.

v1.0.1

Minor update: improves documentation, security notes, and output display rules. - Clarifies API key usage and environment variable configuration. - Adds a dedicated section on security and endpoint restrictions. - Specifies raw upstream response handling and CLI option (--include-raw). - Refines output display guidance for LaTeX and Unicode conversion. - Improves response format documentation and error handling descriptions.

v1.0.0

Initial release — Recognize and extract mathematical formulas from images or PDFs into LaTeX format using the ZhiPu GLM-OCR API. - Supports complex formulas, including integrals, matrices, fractions, and both inline/block styles. - Outputs formulas in LaTeX and can optionally convert results to Unicode/plain text on user request. - Handles files from both local paths and URLs. - Requires a ZHIPU_API_KEY environment variable for API access. - Full OCR results are always displayed for user evaluation, with clear error messages on failure.

元数据

Slug glmocr-formula

版本 1.0.4

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 5

常见问题

GLM-OCR-Formula 是什么？

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equation... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 422 次。

如何安装 GLM-OCR-Formula？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install glmocr-formula」即可一键安装，无需额外配置。

GLM-OCR-Formula 是免费的吗？

是的，GLM-OCR-Formula 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

GLM-OCR-Formula 支持哪些平台？

GLM-OCR-Formula 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 GLM-OCR-Formula？

由 Jared Wen（@jaredforreal）开发并维护，当前版本 v1.0.4。

GLM-OCR-Formula