← Back to Skills Marketplace

Azure Document OCR

Name: Azure Document OCR
Author: li-hongmin

by HONGMIN LI · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

604

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install azure-doc-ocr

Description

Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document...

README (SKILL.md)

Azure Document Intelligence OCR

Extract text and structured data from documents using Azure Document Intelligence REST API.

Quick Start

1. Environment Setup

Set your Azure Document Intelligence credentials:

export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"

2. Single File OCR

# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf

# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown

# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json

# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"

# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt

# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5

3. Batch Processing

# Process all documents in a folder
python scripts/batch_ocr.py ./documents/

# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown

# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8

# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png

Model Selection Guide

Document Type	Recommended Model	Use Case
General text	`prebuilt-read`	Pure text extraction, any document
Structured docs	`prebuilt-layout`	Tables, forms, paragraphs, figures
Invoices	`prebuilt-invoice`	Vendor info, line items, totals
Receipts	`prebuilt-receipt`	Merchant, items, totals, dates
IDs/Passports	`prebuilt-idDocument`	Identity documents
Business cards	`prebuilt-businessCard`	Contact information
W-2 forms	`prebuilt-tax.us.w2`	US tax documents
Insurance cards	`prebuilt-healthInsuranceCard.us`	Health insurance info

See references/models.md for detailed model documentation.

Supported Input Formats

PDF: .pdf (including scanned PDFs)
Images: .png, .jpg, .jpeg, .tiff, .bmp
URLs: Direct links to documents

Output Formats

text: Plain text concatenation of all extracted content
markdown: Structured output with headers and tables (best with layout model)
json: Raw API response with full extraction details

Features

Handwriting Recognition: Extracts handwritten text alongside printed text
CJK Support: Full support for Chinese, Japanese, Korean characters
Table Extraction: Preserves table structure (use layout model)
Multi-page Processing: Handles documents with multiple pages
Concurrent Processing: Batch script supports parallel processing
URL Input: Process documents directly from URLs

Environment Variables

Variable	Required	Description
`AZURE_DOC_INTEL_ENDPOINT`	Yes	Azure Document Intelligence endpoint URL
`AZURE_DOC_INTEL_KEY`	Yes	API subscription key

Error Handling

Invalid credentials: Check endpoint URL and API key
Unsupported format: Ensure file extension matches supported types
Timeout: Large documents may need longer processing (max 300s)
Rate limiting: Reduce concurrent workers for batch processing

Examples

Extract text from scanned PDF

python scripts/ocr_extract.py scanned_contract.pdf --model prebuilt-read

Process invoices with structured output

python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json --output invoice_data.json

Batch process with layout analysis

python scripts/batch_ocr.py ./reports/ --model prebuilt-layout --format markdown --workers 4

Extract specific pages from large document

python scripts/ocr_extract.py large_doc.pdf --pages 1,3-5,10 --format text

Usage Guidance

This skill's code implements Azure Document Intelligence OCR and is not doing obvious malicious actions, but the package metadata is inconsistent and incomplete. Before installing or using it: - Expect to set two sensitive environment variables: AZURE_DOC_INTEL_ENDPOINT (your Azure resource URL) and AZURE_DOC_INTEL_KEY (your subscription key). The registry did not declare these — verify you set them yourself and never paste your key into untrusted UIs. - Verify the 'requests' Python package and a compatible Python interpreter are available; the skill provides no install steps for dependencies. - Confirm the endpoint you configure is your Azure resource (check domain and subscription) because documents and extracted data will be sent to that endpoint. Do not point the endpoint to unknown third-party domains. - If you need stricter governance, ask the publisher to update the registry metadata to declare required env vars and dependencies, and to add an install spec (e.g., pip requirements). Also consider using a dedicated Azure key with minimal permissions and rotate it after testing. - If you are processing sensitive documents, ensure sending them to Azure complies with your data handling and privacy policies.

Capability Analysis

Type: OpenClaw Skill Name: azure-doc-ocr Version: 1.0.0 The skill is classified as suspicious due to several critical vulnerabilities in `scripts/ocr_extract.py` that could be exploited via prompt injection against the OpenClaw agent. Specifically, the script directly uses `file_path` and `--output` arguments from `argparse` without sanitization, enabling potential arbitrary file read/write. Additionally, the `--url` argument allows fetching documents from arbitrary URLs, posing a Server-Side Request Forgery (SSRF) risk. While the code's stated purpose is benign (OCR), these capabilities, if misused by a compromised agent, could lead to unauthorized data access, modification, or network reconnaissance.

Capability Assessment

⚠ Purpose & Capability

The code and SKILL.md align with the described purpose (submitting docs to Azure Document Intelligence and returning results). However the registry metadata declares no required environment variables or primary credential while the scripts and SKILL.md require AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY. The package also assumes a Python runtime and the 'requests' library but lists no dependencies. This metadata omission is an incoherence.

ℹ Instruction Scope

Runtime instructions tell the agent to read two Azure env vars and to call the Azure Document Intelligence REST API; the included scripts only read provided files/URLs and those env vars and poll the Azure operation endpoint. The instructions do not attempt to read unrelated system files or unknown environment variables and do not post data to third-party endpoints other than the configured Azure endpoint. Still, the instructions require credentials that the registry metadata did not declare.

⚠ Install Mechanism

There is no install specification despite shipping Python scripts. The package assumes invocation via 'python' and the presence of the 'requests' package; there are no pip/install instructions or declared dependencies. This mismatch may cause runtime failures and is an omission that reduces transparency about what's needed and what will be present on disk.

⚠ Credentials

The two environment values used (AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY) are proportional to the skill's purpose (authenticating to Azure). However the registry metadata fails to declare them (primaryEnv is unset). The API key is sensitive — the skill legitimately needs it, but users should be aware the key will be sent in requests to whichever endpoint they configure.

✓ Persistence & Privilege

The skill does not request elevated persistence (always is false) and does not modify other skills or system-wide configuration. It runs as scripts invoked by the user/agent and performs network calls only to the configured Azure endpoint.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install azure-doc-ocr
After installation, invoke the skill by name or use /azure-doc-ocr
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release: Azure Document Intelligence OCR with single-file and batch processing, supports PDF/images, CJK, handwriting, tables, invoices

Metadata

Slug azure-doc-ocr

Version 1.0.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Azure Document OCR?

Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document... It is an AI Agent Skill for Claude Code / OpenClaw, with 604 downloads so far.

How do I install Azure Document OCR?

Run "/install azure-doc-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Azure Document OCR free?

Yes, Azure Document OCR is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Azure Document OCR support?

Azure Document OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Azure Document OCR?

It is built and maintained by HONGMIN LI (@li-hongmin); the current version is v1.0.0.

More Skills