← Back to Skills Marketplace

LiteParse

Name: LiteParse
Author: alfred-intel-handler-source

by alfred-intel-handler-source · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

198

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install liteparse

Description

Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W...

README (SKILL.md)

LiteParse

Local document parser built on PDF.js + Tesseract.js. Zero cloud dependencies.

Binary: lit (installed globally via npm) Docs: https://developers.llamaindex.ai/liteparse/

Quick Reference

# Parse a PDF to text (stdout)
lit parse document.pdf

# Parse to file
lit parse document.pdf -o output.txt

# Parse to JSON (includes bounding boxes)
lit parse document.pdf --format json -o output.json

# Specific pages only
lit parse document.pdf --target-pages "1-5,10,15-20"

# No OCR (faster, text-layer PDFs only)
lit parse document.pdf --no-ocr

# Batch parse a directory
lit batch-parse ./input-dir ./output-dir

# Screenshot pages (for vision model input)
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots

Output Formats

Format	Use case
`text` (default)	Plain text extraction, feeding into prompts
`json`	Structured output with bounding boxes, useful for layout-aware tasks

OCR Behavior

OCR is on by default via Tesseract.js (downloads ~10MB English data on first run)
First run will be slow; subsequent runs use cached data
--no-ocr for pure text-layer PDFs (faster, no network needed)
For multi-language: --ocr-language fra+eng

Supported File Types

Works natively: PDF

Requires LibreOffice (brew install --cask libreoffice): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv

Requires ImageMagick (brew install imagemagick): .jpg, .png, .gif, .bmp, .tiff, .webp

Installation Notes

Installed via npm: npm install -g @llamaindex/liteparse
Brew formula exists (brew tap run-llama/liteparse) but requires current macOS CLT — use npm as primary install path on this machine
Binary path: /opt/homebrew/bin/lit

Workflow Tips

For VA forms, job description PDFs, military docs: lit parse file.pdf -o /tmp/output.txt then read into context
For scanned PDFs (no text layer): OCR is required; complex layouts may degrade — consider LlamaParse cloud for critical docs
For vision model workflows: use lit screenshot to generate page images, then pass to image tool or similar
For batch jobs: use lit batch-parse — it reuses the PDF engine across files for efficiency

Limitations

Complex tables, multi-column layouts, and scanned government forms may produce imperfect output
LlamaParse (cloud) handles the hard cases: https://cloud.llamaindex.ai
Max recommended DPI for screenshots: 300 (higher = slower, larger files)

Reference

See references/output-examples.md for sample JSON/text output structure.

Usage Guidance

This skill appears to do what it says: run a local CLI to extract text/screenshots from documents. Before installing: (1) confirm the npm package identity and publisher (search the npm registry and repository) because the registry metadata here lacks a homepage; (2) be aware that the first install/run will fetch packages and Tesseract language data over the network (so it’s not strictly offline until that completes); (3) npm global installs may run install scripts—review the package contents or run in a sandbox/container if you’re unsure; (4) installing LibreOffice/ImageMagick via brew is optional but required for some file types and may require macOS-specific tooling; (5) if provenance is important, ask the publisher for the source repo or checksum and verify the package before global installation. Overall the skill is coherent with its purpose but verify the package origin and consider running in an isolated environment if you have security concerns.

Capability Analysis

Type: OpenClaw Skill Name: liteparse Version: 1.0.0 The liteparse skill is a legitimate tool wrapper for the LiteParse CLI (associated with the LlamaIndex ecosystem) used for local document parsing and OCR. The instructions in SKILL.md and the examples in references/output-examples.md are consistent with the stated purpose of extracting text and generating screenshots from PDFs and office documents. There are no indicators of data exfiltration, malicious command execution, or prompt injection.

Capability Assessment

✓ Purpose & Capability

The name/description claim a local CLI-based document parser and the SKILL.md consistently describes using the `lit` CLI (npm package @llamaindex/liteparse) to parse PDFs, Office files, images and produce text/JSON/screenshots. Requiring LibreOffice/ImageMagick for some file types is reasonable. Small inconsistency: SKILL.md and references alternate between “LiteParse” and “LlamaParse/LlamaIndex” branding, and the registry metadata lacks a homepage—this is a minor provenance concern but not a functional mismatch.

ℹ Instruction Scope

Instructions focus on running the `lit` CLI against user-supplied documents (parse, batch-parse, screenshot). They do not instruct reading unrelated system files or exfiltrating data. The SKILL.md claims "Runs entirely offline — no cloud, no API key," but also documents that Tesseract.js will download ~10MB of language data on first run and that installation uses npm/brew; those steps require network access on first-run/install even though runtime parsing is local afterwards.

ℹ Install Mechanism

No install spec is embedded in the skill bundle (instruction-only). SKILL.md instructs installing via npm (`npm install -g @llamaindex/liteparse`) or a brew tap. NPM and brew are common but npm global installs can run postinstall scripts and fetch remote artifacts (Tesseract data). There are no direct downloads from obscure URLs in the instructions.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. The runtime instructions only reference optional external tools (LibreOffice, ImageMagick) and local files provided by the user—this is proportionate to the stated purpose.

✓ Persistence & Privilege

The skill is not forced-always, does not request persistent privileges, and does not propose modifying other skills or global agent settings. It is user-invocable and can be run autonomously by the agent (platform default) which is expected for skills.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install liteparse
After installation, invoke the skill by name or use /liteparse
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release: local PDF/doc parser skill using LiteParse CLI

Metadata

Slug liteparse

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is LiteParse?

Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W... It is an AI Agent Skill for Claude Code / OpenClaw, with 198 downloads so far.

How do I install LiteParse?

Run "/install liteparse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LiteParse free?

Yes, LiteParse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LiteParse support?

LiteParse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LiteParse?

It is built and maintained by alfred-intel-handler-source (@alfred-intel-handler-source); the current version is v1.0.0.

More Skills

LiteParse

LiteParse

Quick Reference

Output Formats

OCR Behavior

Supported File Types

Installation Notes

Workflow Tips

Limitations

Reference

What is LiteParse?

How do I install LiteParse?

Is LiteParse free?

Which platforms does LiteParse support?

Who created LiteParse?

💬 Comments