← Back to Skills Marketplace
jaredforreal

GLM-OCR-Table

by Jared Wen · GitHub ↗ · v1.0.3 · MIT-0
cross-platform ✓ Security Clean
476
Downloads
1
Stars
1
Active Installs
4
Versions
Install in OpenClaw
/install glmocr-table
Description
Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells...
README (SKILL.md)

GLM-OCR Table Recognition Skill / GLM-OCR 表格识别技能

Extract tables from images and PDFs and convert them to Markdown format using the ZhiPu GLM-OCR layout parsing API.

When to Use / 使用场景

  • Extract tables from images or scanned documents / 从图片或扫描件中提取表格
  • Convert table images to Markdown or Excel format / 将表格图片转为 Markdown 或可编辑格式
  • Recognize complex tables with merged cells / 识别含合并单元格的复杂表格
  • Parse financial statements, invoices, reports with tables / 解析财务报表、发票、带表格的报告
  • User mentions "extract table", "recognize table", "表格识别", "提取表格", "表格OCR", "表格转文字"

Key Features / 核心特性

  • Complex table support: Handles merged cells, nested tables, multi-row headers
  • Markdown output: Tables are output in clean Markdown format, easy to edit and convert
  • Multi-page PDF: Supports batch extraction from multi-page PDF documents
  • Local file & URL: Supports both local files and remote URLs

Resource Links / 资源链接

Resource Link
Get API Key 智谱开放平台 API Keys
API Docs Layout Parsing / 版面解析

Prerequisites / 前置条件

API Key Setup / API Key 配置(Required / 必需)

脚本通过 ZHIPU_API_KEY 环境变量获取密钥,可与其他智谱技能复用同一个 key。 This script reads the key from the ZHIPU_API_KEY environment variable. Reusing the same key across Zhipu skills is optional.

Get Key / 获取 Key: Visit 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式(任选一种):

  1. Global config (recommended) / 全局配置(推荐): Set once in openclaw.json under env.vars, all Zhipu skills will share it:

    {
      "env": {
        "vars": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
    
  2. Skill-level config / Skill 级别配置: Set for this skill only in openclaw.json:

    {
      "skills": {
        "entries": {
          "glmocr-table": {
            "env": {
              "ZHIPU_API_KEY": "你的密钥"
            }
          }
        }
      }
    }
    
  3. Shell environment variable / Shell 环境变量: Add to ~/.zshrc:

    export ZHIPU_API_KEY="你的密钥"
    

💡 如果你已为其他智谱 skill(如 glmocrglmv-captionglm-image-generation)配置过 key,它们共享同一个 ZHIPU_API_KEY,无需重复配置。

Security & Transparency / 安全与透明度

  • Environment variables used / 使用的环境变量:
    • ZHIPU_API_KEY (required / 必需)
    • GLM_OCR_TIMEOUT (optional timeout seconds / 可选超时秒数)
  • Fixed endpoint / 固定官方端点: https://open.bigmodel.cn/api/paas/v4/layout_parsing
  • No custom API URL override / 不支持自定义 API URL 覆盖: this avoids accidental key exfiltration via redirected endpoints.
  • Raw upstream response is optional / 原始响应默认不返回: use --include-raw only when needed for debugging.

⛔ MANDATORY RESTRICTIONS / 强制限制 ⛔

  1. ONLY use GLM-OCR API — Execute the script python scripts/glm_ocr_cli.py
  2. NEVER parse tables yourself — Do NOT try to extract tables using built-in vision or any other method
  3. NEVER offer alternatives — Do NOT suggest "I can try to recognize it" or similar
  4. IF API fails — Display the error message and STOP immediately
  5. NO fallback methods — Do NOT attempt table extraction any other way

📋 Output Display Rules / 输出展示规则

After running the script, present the OCR result clearly and safely.

  • Show extracted table Markdown (text) in full
  • Summarization is allowed, but do not hide important extraction failures
  • If layout_details contains table-related entries, you may highlight them
  • If the result file is saved, tell the user the file path
  • Show raw upstream response only when explicitly requested or debugging (--include-raw)

How to Use / 使用方法

Extract from URL / 从 URL 提取

python scripts/glm_ocr_cli.py --file-url "https://example.com/table.png"

Extract from Local File / 从本地文件提取

python scripts/glm_ocr_cli.py --file /path/to/table.png

Save Result to File / 保存结果到文件

python scripts/glm_ocr_cli.py --file table.png --output result.json --pretty

Include Raw Upstream Response (Debug Only) / 包含原始上游响应(仅调试)

python scripts/glm_ocr_cli.py --file table.png --output result.json --include-raw

CLI Reference / CLI 参数

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]
Parameter Required Description
--file-url One of URL to image/PDF
--file One of Local file path to image/PDF
--output, -o No Save result JSON to file
--pretty No Pretty-print JSON output
--include-raw No Include raw upstream API response in result field (debug only)

Response Format / 响应格式

{
  "ok": true,
  "text": "| Column 1 | Column 2 |\
|----------|----------|\
| Data     | Data     |",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

Key fields:

  • ok — whether extraction succeeded
  • text — extracted text in Markdown (use this for display)
  • layout_details — layout analysis details
  • error — error details on failure

Error Handling / 错误处理

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Usage Guidance
This skill appears to do what it says: it uploads images/PDFs to ZhiPu's GLM-OCR layout_parsing API and returns Markdown table output. Before installing: (1) Be aware that image/document contents are sent to a third-party API (open.bigmodel.cn) — do not send sensitive data unless you're comfortable with that service. (2) The script requires Python and the 'requests' package but the registry metadata didn't list those dependencies — ensure your environment has Python and install requests (pip install requests). (3) The script encodes local files as base64 data URIs (large files become large JSON payloads); consider file size limits and timeouts (set GLM_OCR_TIMEOUT if needed). (4) The skill recommends reusing a ZHIPU_API_KEY across Zhipu skills — treat your key like any API secret (rotate, limit scope if possible). (5) If you need offline/local OCR or different endpoints, this skill intentionally forbids fallbacks; choose a different tool if you require that behavior. Finally, verify the skill's source/homepage and only provide your API key if you trust that endpoint.
Capability Analysis
Type: OpenClaw Skill Name: glmocr-table Version: 1.0.3 The skill is a legitimate tool for table extraction using the ZhiPu GLM-OCR API. It uses a hardcoded official API endpoint (open.bigmodel.cn) in `scripts/glm_ocr_cli.py` to prevent credential exfiltration via URL redirection and follows standard practices for handling environment variables and file encoding.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The name/description (table OCR via ZhiPu GLM-OCR) match the actual behavior: the script posts images/PDFs to the official open.bigmodel.cn layout_parsing endpoint using ZHIPU_API_KEY. One minor inconsistency: the registry metadata lists no required binaries/dependencies, but the skill requires a Python runtime and the 'requests' package (the CLI exits with an error if requests isn't installed).
Instruction Scope
SKILL.md instructs the agent to run the provided Python CLI against an official fixed endpoint and explicitly prohibits local fallback table-parsing or sending data to other endpoints. The instructions reference only relevant env vars (ZHIPU_API_KEY, GLM_OCR_TIMEOUT) and the script's CLI parameters; they do not instruct reading unrelated files or credentials.
Install Mechanism
No install spec or external downloads are used; the skill is instruction + small script. The script depends on the 'requests' package but there is no install step declared—this is low risk but a missing dependency declaration (pip/requirements) should be addressed.
Credentials
Only ZHIPU_API_KEY (primary credential) and an optional GLM_OCR_TIMEOUT are requested; both are directly required for calling the upstream API. No unrelated secrets or config paths are requested.
Persistence & Privilege
The skill does not request persistent/always installation and does not modify other skills or system-wide settings. Autonomous invocation is allowed (platform default) and is not combined with other red flags.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install glmocr-table
  3. After installation, invoke the skill by name or use /glmocr-table
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
No file changes detected in this version. There are no functional updates or documentation changes in 1.0.3. - No code or documentation changes since the previous version. - Behavior and usage remain unchanged.
v1.0.2
- Clarified that this is the official skill for table extraction using ZhiPu GLM-OCR. - Updated resource links, including a new API documentation URL. - Expanded API key setup instructions: added a global config option and clarified sharing with other Zhipu skills. - Updated skill emoji from 📄 to 📊 in the metadata. - Minor wording and formatting improvements for greater clarity and consistency.
v1.0.1
- Clarified API key sharing as optional; language improved for flexibility. - Added a new "Security & Transparency" section specifying used environment variables, fixed API endpoint, and disallowing custom URL overrides. - Introduced the `--include-raw` CLI option for debugging and updated output rules to mention raw response is optional and only returned on request. - Updated output display rules to clarify summarization is allowed but important errors must be shown. - Minor rewording for accuracy and safety throughout documentation.
v1.0.0
- Initial release of glmocr-table skill for extracting tables from images and PDFs to Markdown using the ZhiPu GLM-OCR API - Supports complex table structures, merged cells, multi-row headers, and multi-page PDFs - Markdown output for easy table editing and conversion - Works with both local files and remote URLs - Key setup via ZHIPU_API_KEY environment variable required - Strictly uses the GLM-OCR API; no fallback or alternative extraction methods allowed - Full raw OCR output, including Markdown tables, is displayed to users as returned by the script
Metadata
Slug glmocr-table
Version 1.0.3
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 4
Frequently Asked Questions

What is GLM-OCR-Table?

Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells... It is an AI Agent Skill for Claude Code / OpenClaw, with 476 downloads so far.

How do I install GLM-OCR-Table?

Run "/install glmocr-table" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM-OCR-Table free?

Yes, GLM-OCR-Table is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GLM-OCR-Table support?

GLM-OCR-Table is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM-OCR-Table?

It is built and maintained by Jared Wen (@jaredforreal); the current version is v1.0.3.

💬 Comments