← 返回 Skills 市场

Azure Document OCR

Name: Azure Document OCR
Author: li-hongmin

作者 HONGMIN LI · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

604

总下载

当前安装

版本数

在 OpenClaw 中安装

/install azure-doc-ocr

功能描述

Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document...

使用说明 (SKILL.md)

Azure Document Intelligence OCR

Extract text and structured data from documents using Azure Document Intelligence REST API.

Quick Start

1. Environment Setup

Set your Azure Document Intelligence credentials:

export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"

2. Single File OCR

# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf

# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown

# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json

# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"

# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt

# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5

3. Batch Processing

# Process all documents in a folder
python scripts/batch_ocr.py ./documents/

# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown

# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8

# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png

Model Selection Guide

Document Type	Recommended Model	Use Case
General text	`prebuilt-read`	Pure text extraction, any document
Structured docs	`prebuilt-layout`	Tables, forms, paragraphs, figures
Invoices	`prebuilt-invoice`	Vendor info, line items, totals
Receipts	`prebuilt-receipt`	Merchant, items, totals, dates
IDs/Passports	`prebuilt-idDocument`	Identity documents
Business cards	`prebuilt-businessCard`	Contact information
W-2 forms	`prebuilt-tax.us.w2`	US tax documents
Insurance cards	`prebuilt-healthInsuranceCard.us`	Health insurance info

See references/models.md for detailed model documentation.

Supported Input Formats

PDF: .pdf (including scanned PDFs)
Images: .png, .jpg, .jpeg, .tiff, .bmp
URLs: Direct links to documents

Output Formats

text: Plain text concatenation of all extracted content
markdown: Structured output with headers and tables (best with layout model)
json: Raw API response with full extraction details

Features

Handwriting Recognition: Extracts handwritten text alongside printed text
CJK Support: Full support for Chinese, Japanese, Korean characters
Table Extraction: Preserves table structure (use layout model)
Multi-page Processing: Handles documents with multiple pages
Concurrent Processing: Batch script supports parallel processing
URL Input: Process documents directly from URLs

Environment Variables

Variable	Required	Description
`AZURE_DOC_INTEL_ENDPOINT`	Yes	Azure Document Intelligence endpoint URL
`AZURE_DOC_INTEL_KEY`	Yes	API subscription key

Error Handling

Invalid credentials: Check endpoint URL and API key
Unsupported format: Ensure file extension matches supported types
Timeout: Large documents may need longer processing (max 300s)
Rate limiting: Reduce concurrent workers for batch processing

Examples

Extract text from scanned PDF

python scripts/ocr_extract.py scanned_contract.pdf --model prebuilt-read

Process invoices with structured output

python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json --output invoice_data.json

Batch process with layout analysis

python scripts/batch_ocr.py ./reports/ --model prebuilt-layout --format markdown --workers 4

Extract specific pages from large document

python scripts/ocr_extract.py large_doc.pdf --pages 1,3-5,10 --format text

安全使用建议

This skill's code implements Azure Document Intelligence OCR and is not doing obvious malicious actions, but the package metadata is inconsistent and incomplete. Before installing or using it: - Expect to set two sensitive environment variables: AZURE_DOC_INTEL_ENDPOINT (your Azure resource URL) and AZURE_DOC_INTEL_KEY (your subscription key). The registry did not declare these — verify you set them yourself and never paste your key into untrusted UIs. - Verify the 'requests' Python package and a compatible Python interpreter are available; the skill provides no install steps for dependencies. - Confirm the endpoint you configure is your Azure resource (check domain and subscription) because documents and extracted data will be sent to that endpoint. Do not point the endpoint to unknown third-party domains. - If you need stricter governance, ask the publisher to update the registry metadata to declare required env vars and dependencies, and to add an install spec (e.g., pip requirements). Also consider using a dedicated Azure key with minimal permissions and rotate it after testing. - If you are processing sensitive documents, ensure sending them to Azure complies with your data handling and privacy policies.

功能分析

Type: OpenClaw Skill Name: azure-doc-ocr Version: 1.0.0 The skill is classified as suspicious due to several critical vulnerabilities in `scripts/ocr_extract.py` that could be exploited via prompt injection against the OpenClaw agent. Specifically, the script directly uses `file_path` and `--output` arguments from `argparse` without sanitization, enabling potential arbitrary file read/write. Additionally, the `--url` argument allows fetching documents from arbitrary URLs, posing a Server-Side Request Forgery (SSRF) risk. While the code's stated purpose is benign (OCR), these capabilities, if misused by a compromised agent, could lead to unauthorized data access, modification, or network reconnaissance.

能力评估

⚠ Purpose & Capability

The code and SKILL.md align with the described purpose (submitting docs to Azure Document Intelligence and returning results). However the registry metadata declares no required environment variables or primary credential while the scripts and SKILL.md require AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY. The package also assumes a Python runtime and the 'requests' library but lists no dependencies. This metadata omission is an incoherence.

ℹ Instruction Scope

Runtime instructions tell the agent to read two Azure env vars and to call the Azure Document Intelligence REST API; the included scripts only read provided files/URLs and those env vars and poll the Azure operation endpoint. The instructions do not attempt to read unrelated system files or unknown environment variables and do not post data to third-party endpoints other than the configured Azure endpoint. Still, the instructions require credentials that the registry metadata did not declare.

⚠ Install Mechanism

There is no install specification despite shipping Python scripts. The package assumes invocation via 'python' and the presence of the 'requests' package; there are no pip/install instructions or declared dependencies. This mismatch may cause runtime failures and is an omission that reduces transparency about what's needed and what will be present on disk.

⚠ Credentials

The two environment values used (AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY) are proportional to the skill's purpose (authenticating to Azure). However the registry metadata fails to declare them (primaryEnv is unset). The API key is sensitive — the skill legitimately needs it, but users should be aware the key will be sent in requests to whichever endpoint they configure.

✓ Persistence & Privilege

The skill does not request elevated persistence (always is false) and does not modify other skills or system-wide configuration. It runs as scripts invoked by the user/agent and performs network calls only to the configured Azure endpoint.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install azure-doc-ocr
安装完成后，直接呼叫该 Skill 的名称或使用 /azure-doc-ocr 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: Azure Document Intelligence OCR with single-file and batch processing, supports PDF/images, CJK, handwriting, tables, invoices

元数据

Slug azure-doc-ocr

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题