← 返回 Skills 市场
alfred-intel-handler-source

LiteParse

作者 alfred-intel-handler-source · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
198
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install liteparse
功能描述
Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W...
使用说明 (SKILL.md)

LiteParse

Local document parser built on PDF.js + Tesseract.js. Zero cloud dependencies.

Binary: lit (installed globally via npm) Docs: https://developers.llamaindex.ai/liteparse/

Quick Reference

# Parse a PDF to text (stdout)
lit parse document.pdf

# Parse to file
lit parse document.pdf -o output.txt

# Parse to JSON (includes bounding boxes)
lit parse document.pdf --format json -o output.json

# Specific pages only
lit parse document.pdf --target-pages "1-5,10,15-20"

# No OCR (faster, text-layer PDFs only)
lit parse document.pdf --no-ocr

# Batch parse a directory
lit batch-parse ./input-dir ./output-dir

# Screenshot pages (for vision model input)
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots

Output Formats

Format Use case
text (default) Plain text extraction, feeding into prompts
json Structured output with bounding boxes, useful for layout-aware tasks

OCR Behavior

  • OCR is on by default via Tesseract.js (downloads ~10MB English data on first run)
  • First run will be slow; subsequent runs use cached data
  • --no-ocr for pure text-layer PDFs (faster, no network needed)
  • For multi-language: --ocr-language fra+eng

Supported File Types

Works natively: PDF

Requires LibreOffice (brew install --cask libreoffice): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv

Requires ImageMagick (brew install imagemagick): .jpg, .png, .gif, .bmp, .tiff, .webp

Installation Notes

  • Installed via npm: npm install -g @llamaindex/liteparse
  • Brew formula exists (brew tap run-llama/liteparse) but requires current macOS CLT — use npm as primary install path on this machine
  • Binary path: /opt/homebrew/bin/lit

Workflow Tips

  • For VA forms, job description PDFs, military docs: lit parse file.pdf -o /tmp/output.txt then read into context
  • For scanned PDFs (no text layer): OCR is required; complex layouts may degrade — consider LlamaParse cloud for critical docs
  • For vision model workflows: use lit screenshot to generate page images, then pass to image tool or similar
  • For batch jobs: use lit batch-parse — it reuses the PDF engine across files for efficiency

Limitations

  • Complex tables, multi-column layouts, and scanned government forms may produce imperfect output
  • LlamaParse (cloud) handles the hard cases: https://cloud.llamaindex.ai
  • Max recommended DPI for screenshots: 300 (higher = slower, larger files)

Reference

See references/output-examples.md for sample JSON/text output structure.

安全使用建议
This skill appears to do what it says: run a local CLI to extract text/screenshots from documents. Before installing: (1) confirm the npm package identity and publisher (search the npm registry and repository) because the registry metadata here lacks a homepage; (2) be aware that the first install/run will fetch packages and Tesseract language data over the network (so it’s not strictly offline until that completes); (3) npm global installs may run install scripts—review the package contents or run in a sandbox/container if you’re unsure; (4) installing LibreOffice/ImageMagick via brew is optional but required for some file types and may require macOS-specific tooling; (5) if provenance is important, ask the publisher for the source repo or checksum and verify the package before global installation. Overall the skill is coherent with its purpose but verify the package origin and consider running in an isolated environment if you have security concerns.
功能分析
Type: OpenClaw Skill Name: liteparse Version: 1.0.0 The liteparse skill is a legitimate tool wrapper for the LiteParse CLI (associated with the LlamaIndex ecosystem) used for local document parsing and OCR. The instructions in SKILL.md and the examples in references/output-examples.md are consistent with the stated purpose of extracting text and generating screenshots from PDFs and office documents. There are no indicators of data exfiltration, malicious command execution, or prompt injection.
能力评估
Purpose & Capability
The name/description claim a local CLI-based document parser and the SKILL.md consistently describes using the `lit` CLI (npm package @llamaindex/liteparse) to parse PDFs, Office files, images and produce text/JSON/screenshots. Requiring LibreOffice/ImageMagick for some file types is reasonable. Small inconsistency: SKILL.md and references alternate between “LiteParse” and “LlamaParse/LlamaIndex” branding, and the registry metadata lacks a homepage—this is a minor provenance concern but not a functional mismatch.
Instruction Scope
Instructions focus on running the `lit` CLI against user-supplied documents (parse, batch-parse, screenshot). They do not instruct reading unrelated system files or exfiltrating data. The SKILL.md claims "Runs entirely offline — no cloud, no API key," but also documents that Tesseract.js will download ~10MB of language data on first run and that installation uses npm/brew; those steps require network access on first-run/install even though runtime parsing is local afterwards.
Install Mechanism
No install spec is embedded in the skill bundle (instruction-only). SKILL.md instructs installing via npm (`npm install -g @llamaindex/liteparse`) or a brew tap. NPM and brew are common but npm global installs can run postinstall scripts and fetch remote artifacts (Tesseract data). There are no direct downloads from obscure URLs in the instructions.
Credentials
The skill declares no required environment variables, credentials, or config paths. The runtime instructions only reference optional external tools (LibreOffice, ImageMagick) and local files provided by the user—this is proportionate to the stated purpose.
Persistence & Privilege
The skill is not forced-always, does not request persistent privileges, and does not propose modifying other skills or global agent settings. It is user-invocable and can be run autonomously by the agent (platform default) which is expected for skills.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install liteparse
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /liteparse 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: local PDF/doc parser skill using LiteParse CLI
元数据
Slug liteparse
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

LiteParse 是什么?

Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a W... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 198 次。

如何安装 LiteParse?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install liteparse」即可一键安装,无需额外配置。

LiteParse 是免费的吗?

是的,LiteParse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

LiteParse 支持哪些平台?

LiteParse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 LiteParse?

由 alfred-intel-handler-source(@alfred-intel-handler-source)开发并维护,当前版本 v1.0.0。

💬 留言讨论