Azure Document OCR
/install azure-doc-ocr
Azure Document Intelligence OCR
Extract text and structured data from documents using Azure Document Intelligence REST API.
Quick Start
1. Environment Setup
Set your Azure Document Intelligence credentials:
export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"
2. Single File OCR
# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf
# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown
# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json
# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"
# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt
# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5
3. Batch Processing
# Process all documents in a folder
python scripts/batch_ocr.py ./documents/
# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown
# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8
# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png
Model Selection Guide
| Document Type | Recommended Model | Use Case |
|---|---|---|
| General text | prebuilt-read |
Pure text extraction, any document |
| Structured docs | prebuilt-layout |
Tables, forms, paragraphs, figures |
| Invoices | prebuilt-invoice |
Vendor info, line items, totals |
| Receipts | prebuilt-receipt |
Merchant, items, totals, dates |
| IDs/Passports | prebuilt-idDocument |
Identity documents |
| Business cards | prebuilt-businessCard |
Contact information |
| W-2 forms | prebuilt-tax.us.w2 |
US tax documents |
| Insurance cards | prebuilt-healthInsuranceCard.us |
Health insurance info |
See references/models.md for detailed model documentation.
Supported Input Formats
- PDF:
.pdf(including scanned PDFs) - Images:
.png,.jpg,.jpeg,.tiff,.bmp - URLs: Direct links to documents
Output Formats
- text: Plain text concatenation of all extracted content
- markdown: Structured output with headers and tables (best with layout model)
- json: Raw API response with full extraction details
Features
- Handwriting Recognition: Extracts handwritten text alongside printed text
- CJK Support: Full support for Chinese, Japanese, Korean characters
- Table Extraction: Preserves table structure (use layout model)
- Multi-page Processing: Handles documents with multiple pages
- Concurrent Processing: Batch script supports parallel processing
- URL Input: Process documents directly from URLs
Environment Variables
| Variable | Required | Description |
|---|---|---|
AZURE_DOC_INTEL_ENDPOINT |
Yes | Azure Document Intelligence endpoint URL |
AZURE_DOC_INTEL_KEY |
Yes | API subscription key |
Error Handling
- Invalid credentials: Check endpoint URL and API key
- Unsupported format: Ensure file extension matches supported types
- Timeout: Large documents may need longer processing (max 300s)
- Rate limiting: Reduce concurrent workers for batch processing
Examples
Extract text from scanned PDF
python scripts/ocr_extract.py scanned_contract.pdf --model prebuilt-read
Process invoices with structured output
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json --output invoice_data.json
Batch process with layout analysis
python scripts/batch_ocr.py ./reports/ --model prebuilt-layout --format markdown --workers 4
Extract specific pages from large document
python scripts/ocr_extract.py large_doc.pdf --pages 1,3-5,10 --format text
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install azure-doc-ocr - After installation, invoke the skill by name or use
/azure-doc-ocr - Provide required inputs per the skill's parameter spec and get structured output
What is Azure Document OCR?
Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document... It is an AI Agent Skill for Claude Code / OpenClaw, with 604 downloads so far.
How do I install Azure Document OCR?
Run "/install azure-doc-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Azure Document OCR free?
Yes, Azure Document OCR is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Azure Document OCR support?
Azure Document OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Azure Document OCR?
It is built and maintained by HONGMIN LI (@li-hongmin); the current version is v1.0.0.