← 返回 Skills 市场
qingzhe2020

ifly-pdf-image-ocr

作者 Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
224
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install ifly-pdf-image-ocr
功能描述
ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from ima...
使用说明 (SKILL.md)

ifly-pdf&image-ocr

AI-powered OCR service for images and PDF documents using iFlytek's advanced recognition APIs.

Quick Start

Image OCR (LLM OCR)

# OCR an image and extract text
python3 scripts/image_ocr.py /path/to/image.jpg

# Save result to file
python3 scripts/image_ocr.py /path/to/image.jpg -o output.txt

# Specify output format
python3 scripts/image_ocr.py /path/to/image.jpg --format json
python3 scripts/image_ocr.py /path/to/image.jpg --format markdown

PDF OCR

# Convert PDF to Word (default)
python3 scripts/pdf_ocr.py document.pdf

# Convert PDF to Markdown
python3 scripts/pdf_ocr.py document.pdf --format markdown

# Convert PDF to JSON
python3 scripts/pdf_ocr.py document.pdf --format json

# From public URL
python3 scripts/pdf_ocr.py --pdf-url "https://example.com/doc.pdf" --format word

Setup

API Credentials

Get credentials from iFlytek Open Platform:

For Image OCR:

  • APP_ID: Application ID
  • API_KEY: API key for authentication
  • API_SECRET: API secret for signing requests

For PDF OCR:

  • APP_ID: Application ID
  • API_SECRET: Application secret (for signature generation)

Environment Variables

# Required for both Image OCR and PDF OCR
export IFLY_APP_ID="your_app_id"

# Required for Image OCR
export IFLY_API_KEY="your_api_key"

# Required for PDF OCR
export IFLY_API_SECRET="your_api_secret"

Features

Image OCR (LLM OCR)

  • AI-powered: Advanced LLM-based OCR for high accuracy
  • Multi-format output: JSON, Markdown, or both
  • Layout understanding: Preserves document structure
  • Multi-language: Supports text extraction in multiple languages
  • Image preprocessing: Automatic rotation correction, noise removal

PDF OCR

  • AI-powered OCR: Advanced AI model for accurate text extraction
  • Multiple output formats:
    • Word (.docx) - Editable Word document
    • Markdown - Plain text with formatting
    • JSON - Structured data
  • Large PDF support: Up to 100 pages per document
  • Page-by-page results: Access individual page results
  • Download URLs: Direct links to processed files

API Parameters

Image OCR Parameters

Parameter Type Required Description
image_path string Yes Path to image file
--format string No Output format: json, markdown, json,markdown (default: json,markdown)
--output string No Save result to file

PDF OCR Parameters

Parameter Type Required Description
pdf_path string Yes* Path to PDF file
--pdf-url string No* Public URL of PDF file
--format string No Output format: word, markdown, json (default: word)
--no-poll flag No Return task ID without polling
--poll-interval int No Polling interval in seconds (min 5, default: 5)
--max-wait int No Maximum wait time in seconds (default: 300)

*Either pdf_path or --pdf-url must be provided

Authentication

Image OCR (HMAC-SHA256)

Uses HMAC-SHA256 signature authentication:

  1. Generate RFC1123 format date: EEE, dd MMM yyyy HH:mm:ss GMT
  2. Create signature origin: host: {host}\\ date: {date}\\ POST {path} HTTP/1.1
  3. Calculate signature: HMAC-SHA256(signature_origin, apiSecret)
  4. Build authorization: hmac username="{apiKey}", algorithm="hmac-sha256", headers="host date request-line", signature="{signature}"
  5. Encode authorization in base64
  6. Send as query parameters: ?authorization={auth}&host={host}&date={date}

PDF OCR (MD5 + HMAC-SHA1)

Uses MD5 + HMAC-SHA1 signature authentication:

  1. Generate timestamp (Unix epoch in seconds)
  2. Calculate auth = MD5(appId + timestamp)
  3. Calculate signature = Base64(HMAC-SHA1(auth, apiSecret))
  4. Send headers:
    • appId: Application ID
    • timestamp: Timestamp in seconds
    • signature: Generated signature

Important: Timestamp must be within 5 minutes of server time.

Response Format

Image OCR Response

{
  "header": {
    "code": 0,
    "message": "success"
  },
  "payload": {
    "result": {
      "text": "Base64-encoded OCR text..."
    }
  }
}

PDF OCR Start Response

{
  "flag": true,
  "code": 0,
  "desc": "成功",
  "data": {
    "taskNo": "25082744936879",
    "status": "CREATE",
    "tip": "任务创建成功"
  }
}

PDF OCR Status Response

{
  "flag": true,
  "code": 0,
  "desc": "成功",
  "data": {
    "taskNo": "25082759289333",
    "exportFormat": "word",
    "status": "FINISH",
    "downUrl": "http://bjcdn.openstorage.cn/...",
    "tip": "已完成",
    "pageList": [...]
  }
}

Task Status (PDF OCR)

Status Description
CREATE Task created successfully
WAITING Waiting in queue
DOING Processing
FINISH Completed
FAILED Failed
ANY_FAILED Partially completed (some pages failed)
STOP Paused

Error Codes

(。・ω・。) 嗨遇到错误码了吗?来看看怎么解决吧 ✧⁺⸜(●˙▾˙●)⸝⁺✧

Platform Common Error Codes

Code Description Hint Solution
10009 input invalid data (◎_◎;) 哎呀~数据格式不太对呢 检查输入数据是否符合要求
10010 service license not enough (╯°□°)╯︵ ┻━┻ 授权数量不足或已过期! 提交工单联系客服
10019 service read buffer timeout (。-`ω´-) session超时啦~ 检查是否数据发送完毕但未关闭连接
10043 Syscall AudioCodingDecode error (◎_◎;) 音频解码失败惹... 检查aue参数,如果为speex,请确保音频是speex音频并分段压缩且与帧大小一致
10114 session timeout (。-`ω´-) 会话时间超时啦~ 检查是否发送数据时间超过了60s
10139 invalid param (◎_◎;) 参数好像不太对呢 检查参数是否正确
10160 parse request json error (◎_◎;) 请求数据格式有误~ 检查请求数据是否是合法的json
10161 parse base64 string error (◎_◎;) Base64解码失败啦 检查发送的数据是否使用base64编码了
10163 param validate error (◎_◎;) 参数校验没通过呢 具体原因见详细的描述
10200 read data timeout (。-`ω´-) 读取数据超时了~ 检查是否累计10s未发送数据并且未关闭连接
10222 context deadline exceeded (╯°□°)╯︵ ┻━┻ 出错啦! 1.检查上传数据是否超过接口上限;2.SSL证书无效请提交工单
10223 RemoteLB: can't find valued addr (◎_◎;) 找不到服务节点呢 提交工单联系技术人员
10313 invalid appid (◎_◎;) appid和apikey不匹配哦 检查appid是否合法
10317 invalid version (◎_◎;) 版本号有问题呢 请到控制台提交工单联系技术人员
10700 not authority (╯°□°)╯︵ ┻━┻ 权限不足! 按照报错原因对照开发文档检查,如仍无法解决,请提供sid及错误信息提交工单
11200 auth no license (╯°□°)╯︵ ┻━┻ 功能未授权! 检查appid是否正确,确认是否添加了相关服务,检查调用量是否超限或授权是否到期
11201 auth no enough license (╯°□°)╯︵ ┻━┻ 每日交互次数超限啦! 提交应用审核提额或联系商务购买企业级接口
11503 server error: atmos return error (。-`ω´-) 服务器返回了错误数据... 提交工单
11502 server error: too many datas (。-`ω´-) 服务器配置有问题呢 提交工单
100001~100010 WrapperInitErr (◎_◎;) 引擎调用出错啦! 请根据message中的errno查看引擎错误码说明

Additional Resources


Original API Error Codes

Code Description Solution
10000 System error Check auth info, request method, parameters
10001 Signature authentication failed Check credentials
10002 Business processing error Check error message
10003 Quota/insufficient balance Check account balance

Limitations

Image OCR

  • Format: Common image formats (JPG, PNG, etc.)
  • Size: Reasonable file sizes for web upload
  • Rate limiting: Follow API rate limits

PDF OCR

  • Max pages: 100 pages per PDF
  • Protected PDFs: Not supported (password/encrypted)
  • Rate limiting: Status query limited to once per 5 seconds
  • Time limit: Timestamp must be within ±5 minutes of server time

Tips

Image OCR

  1. High-quality images: Use clear, high-resolution images for best results
  2. Multiple formats: Use json,markdown to get both structured and formatted output
  3. Save results: Use -o flag to save OCR results to file

PDF OCR

  1. Math formulas: Use markdown format for PDFs with mathematical formulas
  2. Large PDFs: Split into sections if > 100 pages
  3. Polling interval: Minimum 5 seconds between status queries
  4. Network URLs: Ensure PDF URLs are publicly accessible
  5. Download URLs: Download files promptly as URLs may expire
安全使用建议
This skill's code implements iFlytek OCR and will upload images/PDFs to iFlytek servers and requires three environment variables (IFLY_APP_ID, IFLY_API_KEY, IFLY_API_SECRET) — but the registry metadata incorrectly listed no required credentials. Before installing, verify the skill source and owner (origin is unknown), confirm you trust iFlytek or the specific endpoints in SKILL.md, and avoid sending sensitive or regulated documents unless you control the account and understand the provider's data retention/privacy policy. Also ensure you set the declared environment variables only for a dedicated iFlytek account (do not reuse other secrets), and consider running the scripts manually in a sandbox to inspect behavior before granting it to an autonomous agent.
功能分析
Type: OpenClaw Skill Name: ifly-pdf-image-ocr Version: 1.0.0 The skill bundle provides legitimate OCR functionality for images and PDFs using official iFlytek APIs. The scripts (image_ocr.py and pdf_ocr.py) correctly implement the documented authentication protocols (HMAC-SHA256 and MD5+HMAC-SHA1) and communicate exclusively with verified iFlytek endpoints (xf-yun.com and xfyun.cn). No malicious behavior, data exfiltration, or prompt injection attempts were detected.
能力评估
Purpose & Capability
The skill name/description (image and PDF OCR via iFlytek) matches the included scripts and runtime instructions: both scripts call iFlytek endpoints and implement the described HMAC/MD5 signing and result handling. The functionality requested (uploading PDFs/images to OCR service) is legitimate for this purpose.
Instruction Scope
SKILL.md and scripts instruct the agent to read local image/PDF files, read API credentials from environment variables, send files to iFlytek endpoints, and poll for results — all consistent with OCR. There is no evidence the instructions ask for unrelated system files or credentials, but the skill will transmit user files to external servers (iocr.xfyun.cn and cbm01.cn-huabei-1.xf-yun.com), which is expected for a cloud OCR service but has privacy implications.
Install Mechanism
No install spec (instruction-only + shipped scripts). Nothing is downloaded or executed automatically by an installer. This lowers risk, but the included scripts will be executed if run.
Credentials
Registry metadata claims no required env vars/credentials, but both SKILL.md and the scripts require IFLY_APP_ID and at least IFLY_API_SECRET; image OCR also requires IFLY_API_KEY. The metadata omission is an incoherence: the skill legitimately needs these secrets, but they were not declared in the registry entry. Requesting API credentials for the OCR provider itself is reasonable; asking for unrelated credentials is not present. The missing declaration and unknown source increase risk.
Persistence & Privilege
always is false and the skill does not request persistent system-wide privileges or modify other skills. It only requires environment variables and network access to the OCR endpoints.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ifly-pdf-image-ocr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ifly-pdf-image-ocr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of ifly-pdf&image-ocr skill. - Provides AI-powered OCR for both images and PDF documents via iFlytek APIs. - Supports multi-language text extraction with advanced document layout understanding. - Outputs can be in Word (.docx), Markdown, or JSON formats. - Allows conversion of PDF files to desired formats and extraction of text from images. - Includes authentication details, API parameters, example usage, and detailed error codes. - Supports both local files and public URL inputs for PDF processing.
元数据
Slug ifly-pdf-image-ocr
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

ifly-pdf-image-ocr 是什么?

ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from ima... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 224 次。

如何安装 ifly-pdf-image-ocr?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ifly-pdf-image-ocr」即可一键安装,无需额外配置。

ifly-pdf-image-ocr 是免费的吗?

是的,ifly-pdf-image-ocr 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ifly-pdf-image-ocr 支持哪些平台?

ifly-pdf-image-ocr 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ifly-pdf-image-ocr?

由 Iflytek AIcloud(@qingzhe2020)开发并维护,当前版本 v1.0.0。

💬 留言讨论