← Back to Skills Marketplace

LiteParse Document Parser

Name: LiteParse Document Parser
Author: ricanwarfare

by ricanwarfare · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

112

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install liteparse-docs

Description

Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...

README (SKILL.md)

LiteParse

Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.

Installation

Already installed via Homebrew:

brew install llamaindex-liteparse

Verify:

lit --version

Supported Formats

Category	Formats
PDF	`.pdf`
Word	`.doc`, `.docx`, `.docm`, `.odt`, `.rtf`
PowerPoint	`.ppt`, `.pptx`, `.pptm`, `.odp`
Spreadsheets	`.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv`
Images	`.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.svg`

Dependencies:

Office documents → LibreOffice (brew install --cask libreoffice)
Images → ImageMagick (brew install imagemagick)

Usage

Parse a Single File

# Basic text extraction
lit parse document.pdf

# JSON output with bounding boxes
lit parse document.pdf --format json -o output.json

# Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"

# Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr

# Higher DPI for better quality
lit parse document.pdf --dpi 300

Batch Parse a Directory

lit batch-parse ./input-directory ./output-directory

# Only PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive

Generate Page Screenshots

# All pages
lit screenshot document.pdf -o ./screenshots

# Specific pages
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots

# High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots

Key Options

Option	Description
`--format json`	Structured JSON with bounding boxes
`--format text`	Plain text (default)
`--target-pages "1-5,10"`	Parse specific pages
`--dpi 300`	Higher rendering quality
`--no-ocr`	Disable OCR (faster for text PDFs)
`--ocr-language fra`	Set OCR language
`-o output.json`	Save to file

Config File

For repeated use, create liteparse.config.json:

{
  "ocrLanguage": "en",
  "ocrEnabled": true,
  "maxPages": 1000,
  "dpi": 150,
  "outputFormat": "json",
  "preciseBoundingBox": true
}

Use with:

lit parse document.pdf --config liteparse.config.json

When to Use

PDF text extraction — fast local parsing
Document conversion — Office docs to text/JSON
Screenshot generation — for LLM visual analysis
Batch processing — multiple files at once
Offline/air-gapped — no cloud required

Usage Guidance

This skill looks internally consistent for local document parsing, but the package provenance is unclear. Before installing: 1) verify the Homebrew package origin (which tap/repo provides 'llamaindex-liteparse') and inspect its homepage/source; 2) run 'lit --version' and check what binary was installed and where; 3) consider installing in a sandbox or VM if you want to inspect behavior first; 4) ensure LibreOffice and ImageMagick are installed from official sources; and 5) review/output files (and any logs) to confirm no unexpected network activity or external uploads.

Capability Analysis

Type: OpenClaw Skill Name: liteparse-docs Version: 1.0.0 The skill bundle contains documentation (SKILL.md) for a local document parsing tool called LiteParse. It provides standard usage instructions for an AI agent to perform text extraction and batch processing of various file formats (PDF, DOCX, etc.) using the 'lit' CLI. No malicious code, data exfiltration patterns, or prompt injection attempts were identified.

Capability Assessment

✓ Purpose & Capability

Name, description, and runtime instructions all describe local parsing of PDFs, Office docs, spreadsheets, and images. Required helpers (LibreOffice, ImageMagick) are plausible for the stated features (conversion, rendering, OCR). No unrelated resources or credentials are requested.

✓ Instruction Scope

SKILL.md only instructs running local CLI commands (lit parse, batch-parse, screenshot) and using a local config file; it does not ask the agent to read unrelated system files, access secrets, or transmit data to external endpoints. Outputs are written to local files.

ℹ Install Mechanism

No install spec is included in the registry (instruction-only). SKILL.md tells the user to use Homebrew (brew install llamaindex-liteparse and brew install --cask libreoffice, imagemagick). Using Homebrew is common, but the specific brew package ('llamaindex-liteparse') and overall lack of source/homepage metadata reduce provenance; the package could come from a third-party tap. Recommend verifying the package origin before installing.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. The local liteparse.config.json is reasonable and limited to tool options (OCR language, DPI, etc.).

✓ Persistence & Privilege

Skill is instruction-only, does not request persistent presence, and registry flags are default (always:false). There are no instructions to modify other skills or system-wide agent settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install liteparse-docs
After installation, invoke the skill by name or use /liteparse-docs
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of LiteParse: local document parsing for PDFs, DOCX, PPTX, XLSX, and images. - Supports text extraction, JSON output with bounding boxes, and page-level screenshots. - Enables batch processing of directories and selective page parsing. - No cloud or LLM dependencies; works offline with Homebrew installation. - Supports popular office and image formats with optional dependencies (LibreOffice, ImageMagick). - Includes configurable options and support for reusable config files.

Metadata

Slug liteparse-docs

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is LiteParse Document Parser?

Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots... It is an AI Agent Skill for Claude Code / OpenClaw, with 112 downloads so far.

How do I install LiteParse Document Parser?

Run "/install liteparse-docs" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LiteParse Document Parser free?

Yes, LiteParse Document Parser is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LiteParse Document Parser support?

LiteParse Document Parser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LiteParse Document Parser?

It is built and maintained by ricanwarfare (@ricanwarfare); the current version is v1.0.0.

More Skills