← 返回 Skills 市场
ricanwarfare

LiteParse Document Parser

作者 ricanwarfare · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
112
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install liteparse-docs
功能描述
Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...
使用说明 (SKILL.md)

LiteParse

Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.

Installation

Already installed via Homebrew:

brew install llamaindex-liteparse

Verify:

lit --version

Supported Formats

Category Formats
PDF .pdf
Word .doc, .docx, .docm, .odt, .rtf
PowerPoint .ppt, .pptx, .pptm, .odp
Spreadsheets .xls, .xlsx, .xlsm, .ods, .csv, .tsv
Images .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .svg

Dependencies:

  • Office documents → LibreOffice (brew install --cask libreoffice)
  • Images → ImageMagick (brew install imagemagick)

Usage

Parse a Single File

# Basic text extraction
lit parse document.pdf

# JSON output with bounding boxes
lit parse document.pdf --format json -o output.json

# Specific page range
lit parse document.pdf --target-pages "1-5,10,15-20"

# Disable OCR (faster, text-only PDFs)
lit parse document.pdf --no-ocr

# Higher DPI for better quality
lit parse document.pdf --dpi 300

Batch Parse a Directory

lit batch-parse ./input-directory ./output-directory

# Only PDFs, recursively
lit batch-parse ./input ./output --extension .pdf --recursive

Generate Page Screenshots

# All pages
lit screenshot document.pdf -o ./screenshots

# Specific pages
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots

# High-DPI PNG
lit screenshot document.pdf --dpi 300 --format png -o ./screenshots

Key Options

Option Description
--format json Structured JSON with bounding boxes
--format text Plain text (default)
--target-pages "1-5,10" Parse specific pages
--dpi 300 Higher rendering quality
--no-ocr Disable OCR (faster for text PDFs)
--ocr-language fra Set OCR language
-o output.json Save to file

Config File

For repeated use, create liteparse.config.json:

{
  "ocrLanguage": "en",
  "ocrEnabled": true,
  "maxPages": 1000,
  "dpi": 150,
  "outputFormat": "json",
  "preciseBoundingBox": true
}

Use with:

lit parse document.pdf --config liteparse.config.json

When to Use

  • PDF text extraction — fast local parsing
  • Document conversion — Office docs to text/JSON
  • Screenshot generation — for LLM visual analysis
  • Batch processing — multiple files at once
  • Offline/air-gapped — no cloud required
安全使用建议
This skill looks internally consistent for local document parsing, but the package provenance is unclear. Before installing: 1) verify the Homebrew package origin (which tap/repo provides 'llamaindex-liteparse') and inspect its homepage/source; 2) run 'lit --version' and check what binary was installed and where; 3) consider installing in a sandbox or VM if you want to inspect behavior first; 4) ensure LibreOffice and ImageMagick are installed from official sources; and 5) review/output files (and any logs) to confirm no unexpected network activity or external uploads.
功能分析
Type: OpenClaw Skill Name: liteparse-docs Version: 1.0.0 The skill bundle contains documentation (SKILL.md) for a local document parsing tool called LiteParse. It provides standard usage instructions for an AI agent to perform text extraction and batch processing of various file formats (PDF, DOCX, etc.) using the 'lit' CLI. No malicious code, data exfiltration patterns, or prompt injection attempts were identified.
能力评估
Purpose & Capability
Name, description, and runtime instructions all describe local parsing of PDFs, Office docs, spreadsheets, and images. Required helpers (LibreOffice, ImageMagick) are plausible for the stated features (conversion, rendering, OCR). No unrelated resources or credentials are requested.
Instruction Scope
SKILL.md only instructs running local CLI commands (lit parse, batch-parse, screenshot) and using a local config file; it does not ask the agent to read unrelated system files, access secrets, or transmit data to external endpoints. Outputs are written to local files.
Install Mechanism
No install spec is included in the registry (instruction-only). SKILL.md tells the user to use Homebrew (brew install llamaindex-liteparse and brew install --cask libreoffice, imagemagick). Using Homebrew is common, but the specific brew package ('llamaindex-liteparse') and overall lack of source/homepage metadata reduce provenance; the package could come from a third-party tap. Recommend verifying the package origin before installing.
Credentials
The skill declares no required environment variables, credentials, or config paths. The local liteparse.config.json is reasonable and limited to tool options (OCR language, DPI, etc.).
Persistence & Privilege
Skill is instruction-only, does not request persistent presence, and registry flags are default (always:false). There are no instructions to modify other skills or system-wide agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install liteparse-docs
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /liteparse-docs 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of LiteParse: local document parsing for PDFs, DOCX, PPTX, XLSX, and images. - Supports text extraction, JSON output with bounding boxes, and page-level screenshots. - Enables batch processing of directories and selective page parsing. - No cloud or LLM dependencies; works offline with Homebrew installation. - Supports popular office and image formats with optional dependencies (LibreOffice, ImageMagick). - Includes configurable options and support for reusable config files.
元数据
Slug liteparse-docs
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

LiteParse Document Parser 是什么?

Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 112 次。

如何安装 LiteParse Document Parser?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install liteparse-docs」即可一键安装,无需额外配置。

LiteParse Document Parser 是免费的吗?

是的,LiteParse Document Parser 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

LiteParse Document Parser 支持哪些平台?

LiteParse Document Parser 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 LiteParse Document Parser?

由 ricanwarfare(@ricanwarfare)开发并维护,当前版本 v1.0.0。

💬 留言讨论