← 返回 Skills 市场
mrchenkuan

large-document-reader

作者 陈宽同学 · GitHub ↗ · v1.0.2
cross-platform ⚠ suspicious
535
总下载
0
收藏
4
当前安装
3
版本数
在 OpenClaw 中安装
/install large-document-reader
功能描述
Intelligently splits long academic or technical documents into chapters, generates structured JSON summaries for each, and creates a file system with a globa...
使用说明 (SKILL.md)

Literature Structuring Expert

Automatically decompose long documents (papers, reports, books) into a structured, AI-friendly knowledge base. Splits by chapter, generates machine-readable summaries, and builds a navigable index to overcome context limits.

When to Use This Skill

Use this skill when the user:

  • Has a document that is too long for the AI's context window.
  • Needs to perform cross-chapter analysis or get a high-level overview of a long text.
  • Wants to build a reusable, queryable knowledge base from a PDF, Markdown, or text file.
  • Asks: "How can I get my AI to read this whole book/paper?"

Quick Reference

Situation Action
User provides a long document 1. Analyze and split it into chapters.\x3Cbr>2. Generate a JSON summary for each chapter.\x3Cbr>3. Create a master index file.
User asks a high-level, cross-chapter question Provide the content of the MASTER_INDEX.md file to the AI.
User asks a detailed, chapter-specific question Provide the corresponding single file from the ./chapters/ directory to the AI.
Task completed Present the generated file tree and MASTER_INDEX.md preview to the user.

Core Workflow

Phase 1: Intelligent Splitting

  1. Analyze Input: Receive the long document text or file path.
  2. Identify Structure: Automatically analyze the document to identify heading hierarchies (e.g., #, ##, 1., 1.1) to determine chapter boundaries. Prioritize user-specified splitting preferences.
  3. Execute Split: Split the document into independent plain-text files by chapter.
    • Naming Convention: {sequence_number}_{chapter_title}.md (e.g., 01_Introduction.md).
    • Storage Location: All chapter files are saved in the ./chapters/ directory.

Phase 2: Summary Generation & Structuring

  1. Generate Summary per Chapter: For each file in ./chapters/, generate a corresponding JSON summary file.
    • Structured Fields (JSON format):
      {
        "chapter_id": "Unique identifier matching the filename, e.g., 02_1",
        "chapter_title": "Chapter Title",
        "abstract": "Core summary of the chapter, 200-300 words.",
        "keywords": ["Keyword1", "Keyword2", "Keyword3"],
        "key_points": ["Key point one", "Key point two"],
        "related_sections": ["IDs of other chapters strongly related to this one"]
      }
      
    • Storage Location: JSON summary files are saved in the ./summaries/ directory (e.g., 01_Introduction.summary.json).

Phase 3: Create Global Index

  1. Aggregate Information: Collect data from all JSON files in ./summaries/.
  2. Generate Index: Create a global index file, MASTER_INDEX.md.
    • Content: Lists all chapters' IDs, titles, a short abstract preview, and keywords in a Markdown list or table.
    • Purpose: Provides a "bird's-eye view" for quick navigation and high-level Q&A.

Final Deliverables & File Structure

Upon completion, the following file tree is generated:

Project_Root/
├── chapters/           # 【Source Repository】Contains all split chapter texts (.md files)
│   ├── 01_Introduction.md
│   ├── 02_1_Experimental_Methods.md
│   └── ...
├── summaries/          # 【Summary Repository】Contains all structured JSON summaries
│   ├── 01_Introduction.summary.json
│   ├── 02_1_Experimental_Methods.summary.json
│   └── ...
└── MASTER_INDEX.md     # 【Global Navigation】Core document summary index

Usage Instructions for the User

For Global, Cross-Chapter Queries (e.g., “What is the paper's main thesis?”):

  • Provide the content of the MASTER_INDEX.md file to the AI. This is token-efficient.

For Specific, In-Depth Queries Within a Chapter (e.g., “What were the parameters in the 'Methods' section?”):

  • Provide the corresponding single chapter file from the chapters/ directory to the AI for full context.
安全使用建议
Do not run this skill as-is. The package contains developer-specific absolute paths and does not implement the summary/index functionality claimed in the README. Before using: (1) Review and edit scripts to accept a user-supplied file path (avoid hard-coded paths), (2) change output locations to a safe, documented directory (relative to the current working directory), (3) verify there are no network calls or other unexpected I/O, and (4) test in a sandboxed environment with non-sensitive documents. If you don't want to edit code yourself, ask the author for a corrected release that implements summaries and index generation and removes hard-coded paths. The behavior looks like sloppy packaging rather than overtly malicious, but the inconsistencies and absolute paths are a privacy and operational risk.
功能分析
Type: OpenClaw Skill Name: large-document-reader Version: 1.0.2 The skill is classified as suspicious due to critical vulnerabilities in the Python scripts. Both `scripts/extract_chapters.py` and `scripts/save_chapters.py` contain hardcoded absolute file paths (e.g., `/Users/chenkuan/Desktop/毕业论文/...` and `/Users/chenkuan/.openclaw/workspace/...`). While there is no evidence of malicious intent like data exfiltration or unauthorized execution, these hardcoded paths make the skill non-portable and non-functional outside a specific developer's environment. This represents a significant design flaw and a potential security vulnerability if the agent's execution environment is not strictly sandboxed or if these paths could be manipulated by an attacker.
能力评估
Purpose & Capability
SKILL.md claims splitting, per-chapter JSON summaries, and a MASTER_INDEX.md. The shipped Python scripts only extract chapter boundaries and write chapter files; there is no implementation of summary generation or index creation. Also, the scripts use absolute paths specific to a developer's machine (/Users/chenkuan/...), which is inconsistent with a generic document-processing skill.
Instruction Scope
The runtime instructions describe taking a user-provided document path and producing ./chapters/ and ./summaries/. The actual code ignores an external input and reads a hard-coded file from the developer's Desktop, then writes a chapters_info.json into a hard-coded workspace path. This is scope creep and unexpected file access not described in SKILL.md.
Install Mechanism
No install spec is provided (instruction-only plus two scripts). Nothing is downloaded or executed from remote URLs, which reduces supply-chain risk.
Credentials
No environment variables or credentials are requested (appropriate). However, the code directly accesses absolute local filesystem paths (a specific user's Desktop and home .openclaw workspace), which could inadvertently read sensitive local files if those paths exist in the runtime environment.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It writes files to disk (chapters_info.json and chapter .md files) within paths referenced in the scripts; it does not attempt to modify other skills or global agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install large-document-reader
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /large-document-reader 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
- Added new scripts: extract_chapters.py and save_chapters.py, enabling chapter extraction and saving functionality. - No changes made to documentation or user-facing features.
v1.0.1
rename
v1.0.0
Intelligently splits long academic or technical documents into chapters, generates structured JSON summaries for each, and creates a file system with a global index. This enables efficient AI retrieval and analysis, perfectly solving context window limitations by enabling “overview via summaries, deep-dive on demand” workflows.
元数据
Slug large-document-reader
版本 1.0.2
许可证
累计安装 4
当前安装数 4
历史版本数 3
常见问题

large-document-reader 是什么?

Intelligently splits long academic or technical documents into chapters, generates structured JSON summaries for each, and creates a file system with a globa... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 535 次。

如何安装 large-document-reader?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install large-document-reader」即可一键安装,无需额外配置。

large-document-reader 是免费的吗?

是的,large-document-reader 完全免费(开源免费),可自由下载、安装和使用。

large-document-reader 支持哪些平台?

large-document-reader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 large-document-reader?

由 陈宽同学(@mrchenkuan)开发并维护,当前版本 v1.0.2。

💬 留言讨论