LiteParse
/install liteparse
LiteParse
Local document parser built on PDF.js + Tesseract.js. Zero cloud dependencies.
Binary: lit (installed globally via npm)
Docs: https://developers.llamaindex.ai/liteparse/
Quick Reference
# Parse a PDF to text (stdout)
lit parse document.pdf
# Parse to file
lit parse document.pdf -o output.txt
# Parse to JSON (includes bounding boxes)
lit parse document.pdf --format json -o output.json
# Specific pages only
lit parse document.pdf --target-pages "1-5,10,15-20"
# No OCR (faster, text-layer PDFs only)
lit parse document.pdf --no-ocr
# Batch parse a directory
lit batch-parse ./input-dir ./output-dir
# Screenshot pages (for vision model input)
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots
Output Formats
| Format | Use case |
|---|---|
text (default) |
Plain text extraction, feeding into prompts |
json |
Structured output with bounding boxes, useful for layout-aware tasks |
OCR Behavior
- OCR is on by default via Tesseract.js (downloads ~10MB English data on first run)
- First run will be slow; subsequent runs use cached data
--no-ocrfor pure text-layer PDFs (faster, no network needed)- For multi-language:
--ocr-language fra+eng
Supported File Types
Works natively: PDF
Requires LibreOffice (brew install --cask libreoffice): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv
Requires ImageMagick (brew install imagemagick): .jpg, .png, .gif, .bmp, .tiff, .webp
Installation Notes
- Installed via npm:
npm install -g @llamaindex/liteparse - Brew formula exists (
brew tap run-llama/liteparse) but requires current macOS CLT — use npm as primary install path on this machine - Binary path:
/opt/homebrew/bin/lit
Workflow Tips
- For VA forms, job description PDFs, military docs:
lit parse file.pdf -o /tmp/output.txtthen read into context - For scanned PDFs (no text layer): OCR is required; complex layouts may degrade — consider LlamaParse cloud for critical docs
- For vision model workflows: use
lit screenshotto generate page images, then pass toimagetool or similar - For batch jobs: use
lit batch-parse— it reuses the PDF engine across files for efficiency
Limitations
- Complex tables, multi-column layouts, and scanned government forms may produce imperfect output
- LlamaParse (cloud) handles the hard cases: https://cloud.llamaindex.ai
- Max recommended DPI for screenshots: 300 (higher = slower, larger files)
Reference
See references/output-examples.md for sample JSON/text output structure.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install liteparse - 安装完成后,直接呼叫该 Skill 的名称或使用
/liteparse触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
LiteParse 是什么?
Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 198 次。
如何安装 LiteParse?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install liteparse」即可一键安装,无需额外配置。
LiteParse 是免费的吗?
是的,LiteParse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
LiteParse 支持哪些平台?
LiteParse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 LiteParse?
由 alfred-intel-handler-source(@alfred-intel-handler-source)开发并维护,当前版本 v1.0.0。