← Back to Skills Marketplace
aiwork4me

Paddleocr Doc Parsing Radeon

by AIwork4me · GitHub ↗ · v1.1.1 · MIT-0
cross-platform ⚠ suspicious
95
Downloads
0
Stars
0
Active Installs
5
Versions
Install in OpenClaw
/install paddleocr-doc-parsing-radeon
Description
FREE document parsing powered by AMD Radeon Cloud running PaddleOCR-VL 1.5. Extract structured Markdown/JSON from PDFs and document images — tables with cell...
README (SKILL.md)

PaddleOCR Document Parsing — AMD Radeon Cloud Edition

FREE PaddleOCR-VL 1.5 document parsing, powered by AMD Radeon Cloud. No API key required.

This skill extracts structured Markdown/JSON from PDFs and document images using PaddleOCR-VL 1.5 running on AMD Radeon Cloud — completely free, with no authentication or token needed.

Security Notice

  • This skill does not read or transmit any API keys or tokens. The Radeon Cloud endpoint is free and requires no authentication.
  • By default, results are printed to stdout. Use --output to save to a file when needed. No temporary files are created unless you explicitly choose to save.
  • Use PADDLEOCR_DOC_PARSING_API_URL to configure a custom endpoint if needed.

When to Use This Skill

Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML description above — use that field for discovery and routing.

Use this skill for:

  • Documents with tables (invoices, financial reports, spreadsheets)
  • Documents with mathematical formulas (academic papers, scientific documents)
  • Documents with charts and diagrams
  • Multi-column layouts (newspapers, magazines, brochures)
  • Complex document structures requiring layout analysis
  • Any document requiring structured understanding

Do not use for:

  • Simple text-only extraction
  • Quick OCR tasks where speed is critical
  • Screenshots or simple images with clear text

Installation

Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:

uv run scripts/layout_caller.py --help

How to Use This Skill

Working directory: All uv run scripts/... commands below should be run from this skill's root directory (the directory containing this SKILL.md file).

Basic Workflow

  1. Identify the input source:

    • User provides URL: Use the --file-url parameter
    • User provides local file path: Use the --file-path parameter
  2. Execute document parsing:

    uv run scripts/layout_caller.py --file-url "URL provided by user" --pretty
    

    Or for local files:

    uv run scripts/layout_caller.py --file-path "file path" --pretty
    

    Optional: explicitly set file type:

    uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty
    
    • --file-type 0: PDF
    • --file-type 1: image
    • If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (.pdf, .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp) is required; otherwise pass --file-type explicitly. For URLs with unrecognized extensions, the service attempts inference.

    Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.

    Default behavior: output JSON to stdout:

    • By default, the script prints JSON to stdout — no files are created on disk
    • Use --output FILE to save the result to a specific file path
    • This avoids leaving sensitive document data in temp directories
  3. Parse JSON response:

    • Check the ok field: true means success, false means error
    • The output contains complete document data: text, tables, formulas (LaTeX), figures, seals, headers/footers, and reading order
    • Use the appropriate field based on what the user needs:
      • text — full document text across all pages
      • result.result.layoutParsingResults[n].markdown.text — page-level markdown
      • result.result.layoutParsingResults[n].prunedResult — structured layout data with positions and confidence
    • Handle errors: If ok is false, display error.message
  4. Present results to user:

    • Display content based on what the user requested (see "Complete Output Display" below)
    • If the content is empty, the document may contain no extractable text
    • In save mode, always tell the user the saved file path and that full raw JSON is available there

What to Do After Parsing

Common next steps once you have the structured output:

  • Save as Markdown: Write the text field to a .md file — tables, headings, and formulas are preserved
  • Extract specific tables: Navigate result.result.layoutParsingResults[n].prunedResult to access individual layout elements with position and confidence data
  • Feed to RAG / search pipeline: The text field is structured markdown, ready for chunking and indexing
  • Poor results: See "Tips for Better Results" below before retrying

Complete Output Display

Display the COMPLETE extracted content based on what the user asked for. The parsed output is only useful if the user receives all of it — truncation silently drops data.

  • If user asks for "all text", show the entire text field
  • If user asks for "tables", show ALL tables in the document
  • If user asks for "main content", filter out headers/footers but show ALL body text
  • Do not truncate with "..." unless content is excessively long (>10,000 chars)
  • Do not say "Here's a preview" when user expects complete output

Example - Correct:

User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:

[Display entire text field or concatenated regions in reading order]

Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)

Example - Incorrect:

User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"

Understanding the Output

The script returns an envelope with ok, text, result, and error. Use text for the full document content; navigate result.result.layoutParsingResults[n] for per-page structured data.

For the complete schema and field-level details, see references/output_schema.md.

Usage Examples

Example 1: Extract Full Document Text (stdout)

uv run scripts/layout_caller.py \
  --file-url "https://example.com/paper.pdf" \
  --pretty

Then use:

  • Top-level text for quick full-text output
  • result.result.layoutParsingResults[n].markdown when page-level output is needed

Example 2: Extract Structured Page Data

uv run scripts/layout_caller.py \
  --file-path "./financial_report.pdf" \
  --pretty

Then use:

  • result.result.layoutParsingResults[n].prunedResult for structured parsing data (layout/content/confidence)

Example 3: Save result to a file

uv run scripts/layout_caller.py \
  --file-url "URL" \
  --output "./result.json" \
  --pretty

By default the script prints JSON to stdout. Use --output to save to a file.

Configuration

No configuration required! This skill uses the AMD Radeon Cloud free PaddleOCR-VL 1.5 endpoint by default. It works out of the box — no API key, no token, no sign-up needed.

Optional overrides (via environment variables):

  • PADDLEOCR_DOC_PARSING_API_URL — Custom API endpoint URL (overrides the default Radeon Cloud URL)
  • PADDLEOCR_DOC_PARSING_TIMEOUT — Request timeout in seconds (default: 600)

Handling Large Files

For PDFs, the maximum is 100 pages per request.

Optimize Large Images Before Parsing

For large image files, compress before uploading — this reduces upload time and can improve processing stability:

uv run scripts/optimize_file.py input.png output.jpg --quality 85
uv run scripts/layout_caller.py --file-path "output.jpg" --pretty

--quality controls JPEG/WebP lossy compression (1-100, default 85); it has no effect on PNG output. Use --target-size (in MB, default 20) to set the max file size — the script iteratively downscales until the target is met.

Use URL for Large Local Files (Recommended)

For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:

uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"

Process Specific Pages (PDF Only)

If you only need certain pages from a large PDF, extract them first:

# Extract pages 1-5
uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"

# Mixed ranges are supported
uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"

# Then process the smaller file
uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"

Error Handling

All errors return JSON with ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:

API service error (5xx)error.message contains "API service error"

  • Temporary server issue; retry after a moment

Rate limit exceeded (429)error.message contains "API rate limit exceeded"

  • Wait and retry

Unsupported formaterror.message contains "Unsupported file format"

  • File format not supported, convert to PDF/PNG/JPG

No content detected:

  • text field is empty
  • Document may be blank, image-only, or contain no extractable text

Tips for Better Results

If parsing quality is poor:

  • Large or high-resolution images: Compress with optimize_file.py before parsing — oversized inputs can degrade layout detection:
    uv run scripts/optimize_file.py input.png optimized.jpg --quality 85
    
  • Check confidence: result.result.layoutParsingResults[n].prunedResult includes confidence scores per layout element — low values indicate regions worth reviewing

Reference Documentation

  • references/output_schema.md — Full output schema, field descriptions, and command examples

Note: This skill uses PaddleOCR-VL 1.5 on AMD Radeon Cloud. Model version and capabilities are determined by the AMD Radeon Cloud endpoint.

Testing the Skill

To verify the skill is working properly:

uv run scripts/smoke_test.py
uv run scripts/smoke_test.py --skip-api-test
uv run scripts/smoke_test.py --test-url "https://..."

The first form tests configuration and API connectivity. --skip-api-test checks configuration only. --test-url overrides the default sample document URL.

About

This skill is a fork of paddleocr-doc-parsing, modified for AMD Radeon Cloud which provides free PaddleOCR-VL 1.5 document parsing inference. No API key or registration is required.

Usage Guidance
Install only if you are comfortable uploading selected documents to an external cloud parser. Avoid sensitive PDFs or images with the default endpoint, or configure a trusted HTTPS endpoint before use.
Capability Analysis
Type: OpenClaw Skill Name: paddleocr-doc-parsing-radeon Version: 1.1.1 The skill sends document content to a hardcoded IP address (http://134.199.132.159/layout-parsing) over unencrypted HTTP, as seen in scripts/lib.py. While this aligns with the stated purpose of providing a free OCR service, the use of a raw IP and lack of TLS encryption for a service handling potentially sensitive user documents is a significant security risk. No evidence of intentional malice, such as exfiltrating environment variables or establishing persistence, was found in the code logic.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
Cloud OCR is coherent with the stated purpose, but the implementation sends full local or URL-provided document contents to a default raw-IP HTTP endpoint.
Instruction Scope
The instructions are mostly scoped to user-provided URLs or file paths, but users should ensure the agent only processes documents they explicitly intend to upload.
Install Mechanism
There is no separate install spec; the skill uses uv with inline, pinned Python dependencies, which is purpose-aligned but still requires running local scripts and resolving packages.
Credentials
Internet access is expected for cloud parsing, but the default endpoint is an unauthenticated HTTP URL rather than a clearly identified HTTPS provider endpoint.
Persistence & Privilege
No background service, credential storage, or autonomous persistence is shown; output files are written only when an output path is supplied.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install paddleocr-doc-parsing-radeon
  3. After installation, invoke the skill by name or use /paddleocr-doc-parsing-radeon
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.1
- Security notice updated: removed warnings about default HTTP endpoint. - Clarified API key and endpoint configuration section for accuracy and simplicity. - Added a new "Configuration" section noting no setup is required for default use. - No other feature, compatibility, or usage changes.
v1.1.0
paddleocr-doc-parsing-radeon v1.1.0 - Default output now prints to stdout for improved security — result files are only created if --output is specified. - Removed temp file usage unless explicitly requested. - The skill no longer reads/transmits any tokens; authentication is fully disabled for Radeon Cloud endpoint. - Updated documentation to reflect new output defaults and security improvements. - Minor code refactoring and documentation cleanup for clarity.
v1.0.2
paddleocr-doc-parsing-radeon 1.0.2 - Added explicit security note emphasizing that using --stdout skips saving sensitive content to disk, and encourages its use for confidential documents - Clarified that --stdout prevents any file from being written, whereas save mode always prints the location of the temp file - Updated documentation for improved instructions and security best practices - No core functional changes to parsing logic or parameters
v1.0.1
Version 1.0.1 - Added a security notice to the documentation, warning users that the default AMD Radeon Cloud endpoint uses HTTP (not HTTPS), and explaining handling of access tokens and temporary result files. - Clarified recommended practices for handling confidential documents, encouraging use of HTTPS endpoints and secure result output management. - No changes were made to core functionality or API usage. - Improved documentation for user safety and configuration transparency.
v1.0.0
Initial release of PaddleOCR Document Parsing — AMD Radeon Cloud Edition: - Enables FREE document parsing (no API key needed) using PaddleOCR-VL 1.5 on AMD Radeon Cloud. - Extracts structured Markdown/JSON from PDFs and document images, including tables (cell-level), formulas (LaTeX), multi-column layouts, charts, and correct reading order. - Supports both file URLs and local file paths with customizable output options (JSON file, pretty output, or stdout). - Provides full usage instructions, workflow, output schema details, and practical usage examples. - Ideal for complex documents needing advanced layout parsing; not intended for simple or quick OCR tasks. - Compatible with Python 3.9+, requires uv and internet access.
Metadata
Slug paddleocr-doc-parsing-radeon
Version 1.1.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 5
Frequently Asked Questions

What is Paddleocr Doc Parsing Radeon?

FREE document parsing powered by AMD Radeon Cloud running PaddleOCR-VL 1.5. Extract structured Markdown/JSON from PDFs and document images — tables with cell... It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.

How do I install Paddleocr Doc Parsing Radeon?

Run "/install paddleocr-doc-parsing-radeon" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Paddleocr Doc Parsing Radeon free?

Yes, Paddleocr Doc Parsing Radeon is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Paddleocr Doc Parsing Radeon support?

Paddleocr Doc Parsing Radeon is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Paddleocr Doc Parsing Radeon?

It is built and maintained by AIwork4me (@aiwork4me); the current version is v1.1.1.

💬 Comments