← 返回 Skills 市场
li-hongmin

Azure Document OCR

作者 HONGMIN LI · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
604
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install azure-doc-ocr
功能描述
Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document...
使用说明 (SKILL.md)

Azure Document Intelligence OCR

Extract text and structured data from documents using Azure Document Intelligence REST API.

Quick Start

1. Environment Setup

Set your Azure Document Intelligence credentials:

export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"

2. Single File OCR

# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf

# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown

# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json

# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"

# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt

# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5

3. Batch Processing

# Process all documents in a folder
python scripts/batch_ocr.py ./documents/

# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown

# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8

# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png

Model Selection Guide

Document Type Recommended Model Use Case
General text prebuilt-read Pure text extraction, any document
Structured docs prebuilt-layout Tables, forms, paragraphs, figures
Invoices prebuilt-invoice Vendor info, line items, totals
Receipts prebuilt-receipt Merchant, items, totals, dates
IDs/Passports prebuilt-idDocument Identity documents
Business cards prebuilt-businessCard Contact information
W-2 forms prebuilt-tax.us.w2 US tax documents
Insurance cards prebuilt-healthInsuranceCard.us Health insurance info

See references/models.md for detailed model documentation.

Supported Input Formats

  • PDF: .pdf (including scanned PDFs)
  • Images: .png, .jpg, .jpeg, .tiff, .bmp
  • URLs: Direct links to documents

Output Formats

  • text: Plain text concatenation of all extracted content
  • markdown: Structured output with headers and tables (best with layout model)
  • json: Raw API response with full extraction details

Features

  • Handwriting Recognition: Extracts handwritten text alongside printed text
  • CJK Support: Full support for Chinese, Japanese, Korean characters
  • Table Extraction: Preserves table structure (use layout model)
  • Multi-page Processing: Handles documents with multiple pages
  • Concurrent Processing: Batch script supports parallel processing
  • URL Input: Process documents directly from URLs

Environment Variables

Variable Required Description
AZURE_DOC_INTEL_ENDPOINT Yes Azure Document Intelligence endpoint URL
AZURE_DOC_INTEL_KEY Yes API subscription key

Error Handling

  • Invalid credentials: Check endpoint URL and API key
  • Unsupported format: Ensure file extension matches supported types
  • Timeout: Large documents may need longer processing (max 300s)
  • Rate limiting: Reduce concurrent workers for batch processing

Examples

Extract text from scanned PDF

python scripts/ocr_extract.py scanned_contract.pdf --model prebuilt-read

Process invoices with structured output

python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json --output invoice_data.json

Batch process with layout analysis

python scripts/batch_ocr.py ./reports/ --model prebuilt-layout --format markdown --workers 4

Extract specific pages from large document

python scripts/ocr_extract.py large_doc.pdf --pages 1,3-5,10 --format text
安全使用建议
This skill's code implements Azure Document Intelligence OCR and is not doing obvious malicious actions, but the package metadata is inconsistent and incomplete. Before installing or using it: - Expect to set two sensitive environment variables: AZURE_DOC_INTEL_ENDPOINT (your Azure resource URL) and AZURE_DOC_INTEL_KEY (your subscription key). The registry did not declare these — verify you set them yourself and never paste your key into untrusted UIs. - Verify the 'requests' Python package and a compatible Python interpreter are available; the skill provides no install steps for dependencies. - Confirm the endpoint you configure is your Azure resource (check domain and subscription) because documents and extracted data will be sent to that endpoint. Do not point the endpoint to unknown third-party domains. - If you need stricter governance, ask the publisher to update the registry metadata to declare required env vars and dependencies, and to add an install spec (e.g., pip requirements). Also consider using a dedicated Azure key with minimal permissions and rotate it after testing. - If you are processing sensitive documents, ensure sending them to Azure complies with your data handling and privacy policies.
功能分析
Type: OpenClaw Skill Name: azure-doc-ocr Version: 1.0.0 The skill is classified as suspicious due to several critical vulnerabilities in `scripts/ocr_extract.py` that could be exploited via prompt injection against the OpenClaw agent. Specifically, the script directly uses `file_path` and `--output` arguments from `argparse` without sanitization, enabling potential arbitrary file read/write. Additionally, the `--url` argument allows fetching documents from arbitrary URLs, posing a Server-Side Request Forgery (SSRF) risk. While the code's stated purpose is benign (OCR), these capabilities, if misused by a compromised agent, could lead to unauthorized data access, modification, or network reconnaissance.
能力评估
Purpose & Capability
The code and SKILL.md align with the described purpose (submitting docs to Azure Document Intelligence and returning results). However the registry metadata declares no required environment variables or primary credential while the scripts and SKILL.md require AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY. The package also assumes a Python runtime and the 'requests' library but lists no dependencies. This metadata omission is an incoherence.
Instruction Scope
Runtime instructions tell the agent to read two Azure env vars and to call the Azure Document Intelligence REST API; the included scripts only read provided files/URLs and those env vars and poll the Azure operation endpoint. The instructions do not attempt to read unrelated system files or unknown environment variables and do not post data to third-party endpoints other than the configured Azure endpoint. Still, the instructions require credentials that the registry metadata did not declare.
Install Mechanism
There is no install specification despite shipping Python scripts. The package assumes invocation via 'python' and the presence of the 'requests' package; there are no pip/install instructions or declared dependencies. This mismatch may cause runtime failures and is an omission that reduces transparency about what's needed and what will be present on disk.
Credentials
The two environment values used (AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY) are proportional to the skill's purpose (authenticating to Azure). However the registry metadata fails to declare them (primaryEnv is unset). The API key is sensitive — the skill legitimately needs it, but users should be aware the key will be sent in requests to whichever endpoint they configure.
Persistence & Privilege
The skill does not request elevated persistence (always is false) and does not modify other skills or system-wide configuration. It runs as scripts invoked by the user/agent and performs network calls only to the configured Azure endpoint.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install azure-doc-ocr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /azure-doc-ocr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: Azure Document Intelligence OCR with single-file and batch processing, supports PDF/images, CJK, handwriting, tables, invoices
元数据
Slug azure-doc-ocr
版本 1.0.0
许可证
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Azure Document OCR 是什么?

Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 604 次。

如何安装 Azure Document OCR?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install azure-doc-ocr」即可一键安装,无需额外配置。

Azure Document OCR 是免费的吗?

是的,Azure Document OCR 完全免费(开源免费),可自由下载、安装和使用。

Azure Document OCR 支持哪些平台?

Azure Document OCR 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Azure Document OCR?

由 HONGMIN LI(@li-hongmin)开发并维护,当前版本 v1.0.0。

💬 留言讨论