/install paddleocr-doc-parsing-radeon
PaddleOCR Document Parsing — AMD Radeon Cloud Edition
FREE PaddleOCR-VL 1.5 document parsing, powered by AMD Radeon Cloud. No API key required.
This skill extracts structured Markdown/JSON from PDFs and document images using PaddleOCR-VL 1.5 running on AMD Radeon Cloud — completely free, with no authentication or token needed.
Security Notice
- This skill does not read or transmit any API keys or tokens. The Radeon Cloud endpoint is free and requires no authentication.
- By default, results are printed to stdout. Use
--outputto save to a file when needed. No temporary files are created unless you explicitly choose to save. - Use
PADDLEOCR_DOC_PARSING_API_URLto configure a custom endpoint if needed.
When to Use This Skill
Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML description above — use that field for discovery and routing.
Use this skill for:
- Documents with tables (invoices, financial reports, spreadsheets)
- Documents with mathematical formulas (academic papers, scientific documents)
- Documents with charts and diagrams
- Multi-column layouts (newspapers, magazines, brochures)
- Complex document structures requiring layout analysis
- Any document requiring structured understanding
Do not use for:
- Simple text-only extraction
- Quick OCR tasks where speed is critical
- Screenshots or simple images with clear text
Installation
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/layout_caller.py --help
How to Use This Skill
Working directory: All
uv run scripts/...commands below should be run from this skill's root directory (the directory containing this SKILL.md file).
Basic Workflow
-
Identify the input source:
- User provides URL: Use the
--file-urlparameter - User provides local file path: Use the
--file-pathparameter
- User provides URL: Use the
-
Execute document parsing:
uv run scripts/layout_caller.py --file-url "URL provided by user" --prettyOr for local files:
uv run scripts/layout_caller.py --file-path "file path" --prettyOptional: explicitly set file type:
uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty--file-type 0: PDF--file-type 1: image- If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (
.pdf,.png,.jpg,.jpeg,.bmp,.tiff,.tif,.webp) is required; otherwise pass--file-typeexplicitly. For URLs with unrecognized extensions, the service attempts inference.
Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: output JSON to stdout:
- By default, the script prints JSON to stdout — no files are created on disk
- Use
--output FILEto save the result to a specific file path - This avoids leaving sensitive document data in temp directories
-
Parse JSON response:
- Check the
okfield:truemeans success,falsemeans error - The output contains complete document data: text, tables, formulas (LaTeX), figures, seals, headers/footers, and reading order
- Use the appropriate field based on what the user needs:
text— full document text across all pagesresult.result.layoutParsingResults[n].markdown.text— page-level markdownresult.result.layoutParsingResults[n].prunedResult— structured layout data with positions and confidence
- Handle errors: If
okis false, displayerror.message
- Check the
-
Present results to user:
- Display content based on what the user requested (see "Complete Output Display" below)
- If the content is empty, the document may contain no extractable text
- In save mode, always tell the user the saved file path and that full raw JSON is available there
What to Do After Parsing
Common next steps once you have the structured output:
- Save as Markdown: Write the
textfield to a.mdfile — tables, headings, and formulas are preserved - Extract specific tables: Navigate
result.result.layoutParsingResults[n].prunedResultto access individual layout elements with position and confidence data - Feed to RAG / search pipeline: The
textfield is structured markdown, ready for chunking and indexing - Poor results: See "Tips for Better Results" below before retrying
Complete Output Display
Display the COMPLETE extracted content based on what the user asked for. The parsed output is only useful if the user receives all of it — truncation silently drops data.
- If user asks for "all text", show the entire
textfield - If user asks for "tables", show ALL tables in the document
- If user asks for "main content", filter out headers/footers but show ALL body text
- Do not truncate with "..." unless content is excessively long (>10,000 chars)
- Do not say "Here's a preview" when user expects complete output
Example - Correct:
User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
Understanding the Output
The script returns an envelope with ok, text, result, and error. Use text for the full document content; navigate result.result.layoutParsingResults[n] for per-page structured data.
For the complete schema and field-level details, see references/output_schema.md.
Usage Examples
Example 1: Extract Full Document Text (stdout)
uv run scripts/layout_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use:
- Top-level
textfor quick full-text output result.result.layoutParsingResults[n].markdownwhen page-level output is needed
Example 2: Extract Structured Page Data
uv run scripts/layout_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then use:
result.result.layoutParsingResults[n].prunedResultfor structured parsing data (layout/content/confidence)
Example 3: Save result to a file
uv run scripts/layout_caller.py \
--file-url "URL" \
--output "./result.json" \
--pretty
By default the script prints JSON to stdout. Use --output to save to a file.
Configuration
No configuration required! This skill uses the AMD Radeon Cloud free PaddleOCR-VL 1.5 endpoint by default. It works out of the box — no API key, no token, no sign-up needed.
Optional overrides (via environment variables):
PADDLEOCR_DOC_PARSING_API_URL— Custom API endpoint URL (overrides the default Radeon Cloud URL)PADDLEOCR_DOC_PARSING_TIMEOUT— Request timeout in seconds (default: 600)
Handling Large Files
For PDFs, the maximum is 100 pages per request.
Optimize Large Images Before Parsing
For large image files, compress before uploading — this reduces upload time and can improve processing stability:
uv run scripts/optimize_file.py input.png output.jpg --quality 85
uv run scripts/layout_caller.py --file-path "output.jpg" --pretty
--quality controls JPEG/WebP lossy compression (1-100, default 85); it has no effect on PNG output. Use --target-size (in MB, default 20) to set the max file size — the script iteratively downscales until the target is met.
Use URL for Large Local Files (Recommended)
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"
Process Specific Pages (PDF Only)
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5
uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"
# Mixed ranges are supported
uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"
# Then process the smaller file
uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"
Error Handling
All errors return JSON with ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
API service error (5xx) — error.message contains "API service error"
- Temporary server issue; retry after a moment
Rate limit exceeded (429) — error.message contains "API rate limit exceeded"
- Wait and retry
Unsupported format — error.message contains "Unsupported file format"
- File format not supported, convert to PDF/PNG/JPG
No content detected:
textfield is empty- Document may be blank, image-only, or contain no extractable text
Tips for Better Results
If parsing quality is poor:
- Large or high-resolution images: Compress with
optimize_file.pybefore parsing — oversized inputs can degrade layout detection:uv run scripts/optimize_file.py input.png optimized.jpg --quality 85 - Check confidence:
result.result.layoutParsingResults[n].prunedResultincludes confidence scores per layout element — low values indicate regions worth reviewing
Reference Documentation
references/output_schema.md— Full output schema, field descriptions, and command examples
Note: This skill uses PaddleOCR-VL 1.5 on AMD Radeon Cloud. Model version and capabilities are determined by the AMD Radeon Cloud endpoint.
Testing the Skill
To verify the skill is working properly:
uv run scripts/smoke_test.py
uv run scripts/smoke_test.py --skip-api-test
uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity. --skip-api-test checks configuration only. --test-url overrides the default sample document URL.
About
This skill is a fork of paddleocr-doc-parsing, modified for AMD Radeon Cloud which provides free PaddleOCR-VL 1.5 document parsing inference. No API key or registration is required.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install paddleocr-doc-parsing-radeon - 安装完成后,直接呼叫该 Skill 的名称或使用
/paddleocr-doc-parsing-radeon触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Paddleocr Doc Parsing Radeon 是什么?
FREE document parsing powered by AMD Radeon Cloud running PaddleOCR-VL 1.5. Extract structured Markdown/JSON from PDFs and document images — tables with cell... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 95 次。
如何安装 Paddleocr Doc Parsing Radeon?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install paddleocr-doc-parsing-radeon」即可一键安装,无需额外配置。
Paddleocr Doc Parsing Radeon 是免费的吗?
是的,Paddleocr Doc Parsing Radeon 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Paddleocr Doc Parsing Radeon 支持哪些平台?
Paddleocr Doc Parsing Radeon 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Paddleocr Doc Parsing Radeon?
由 AIwork4me(@aiwork4me)开发并维护,当前版本 v1.1.1。