功能描述

Convert academic PDF papers into structured Chinese reading reports with original figures. Use when the user asks to summarize, read, analyze, or create a re...

使用说明 (SKILL.md)

Paper Report

Name: paper-report
Author: nuaalixu

将学术论文 PDF 转化为结构化的中文阅读报告，保留论文原文中的关键图表。

依赖

运行前确保安装 PyMuPDF：

{PYTHON} -m pip install PyMuPDF --quiet

Python 路径约定：本文档中 {PYTHON} 代表系统 Python3 解释器路径。

macOS（有 CommandLineTools）：/Library/Developer/CommandLineTools/usr/bin/python3
其他环境：python3

首次执行时确定可用路径，后续步骤统一使用。

工作流程

复制以下清单并在每一步完成后勾选：

Task Progress:
- [ ] Step 1: 获取 PDF 文件
- [ ] Step 2: 将 PDF 每页转为图片
- [ ] Step 3: 逐页阅读理解，规划图表截取
- [ ] Step 4: 裁剪关键图表
- [ ] Step 5: 校验截图质量
- [ ] Step 6: 撰写中文报告并生成 HTML
- [ ] Step 7: 校验 HTML 文件结构、资源引用与展示效果

Step 1: 获取 PDF 文件

用户提供本地文件路径？ -> 直接使用该路径。

用户提供 arXiv 链接或 ID？ -> 用 curl 下载：

# 从 arXiv ID 下载（如 2401.12345）
curl -L -o {workspace}/paper-report/paper.pdf "https://arxiv.org/pdf/{ARXIV_ID}"

将 PDF 存储到工作目录 {workspace}/paper-report/paper.pdf。验证下载成功后继续。

Step 2: PDF 转图片

使用 scripts/pdf_to_images.py 将每一页渲染为 PNG 图片：

{PYTHON} scripts/pdf_to_images.py {workspace}/paper-report/paper.pdf {workspace}/paper-report/pages

脚本以 2x 分辨率渲染，输出到 {workspace}/paper-report/pages/page_1.png, page_2.png, ...

注意：此步骤的目的是让大模型能以图片形式"看到"每一页的完整布局，包括公式、图表和排版。

Step 3: 逐页阅读理解 & 规划截图

使用 Read 工具逐页查看 {workspace}/paper-report/pages/page_N.png 图片。

每页阅读时，同时完成两个任务：

任务 A — 内容理解：

提取该页的核心论点、方法描述、实验结果等关键信息
用中文在心中记录摘要，为最终报告准备素材

任务 B — 图表定位：

识别页面中的所有 Figure、Table、重要公式或算法框
记录每个图表在 PDF 原始坐标系中的裁剪区域（Rect 坐标）

坐标确定方法：

PDF 页面原始尺寸通常约 595 x 842 pt（A4）
渲染图片为 2x，因此图片像素约 1190 x 1684
在图片中目测位置后，将像素坐标除以 2 得到 PDF 原始坐标
记录格式：(page_index, x0, y0, x1, y1, label) 其中 page_index 从 0 开始

输出：整理出一份截图计划列表，例如：

截图计划：
1. Figure 1 - 系统架构图 → page 3, Rect(30, 38, 565, 290)
2. Figure 2 - 实验结果对比 → page 3, Rect(30, 295, 565, 485)
3. Table 1 - 性能指标 → page 5, Rect(30, 50, 565, 200)
...

Step 4: 裁剪关键图表

根据 Step 3 的截图计划，使用 scripts/crop_figures.py 进行批量裁剪：

{PYTHON} scripts/crop_figures.py \
  {workspace}/paper-report/paper.pdf \
  {workspace}/paper-report/figures \
  '{crop_spec_json}'

其中 {crop_spec_json} 是 JSON 格式的裁剪规格：

[
  {"page": 3, "rect": [30, 38, 565, 290], "name": "fig1_architecture"},
  {"page": 3, "rect": [30, 295, 565, 485], "name": "fig2_results"},
  {"page": 5, "rect": [30, 50, 565, 200], "name": "table1_metrics"}
]

脚本以 3x 分辨率裁剪以确保清晰度，输出到 {workspace}/paper-report/figures/ 目录。

Step 5: 校验截图质量

使用 Read 工具逐一查看 {workspace}/paper-report/figures/ 中的每张截图：

图表内容是否完整，没有被截断？
是否包含了图表标题和标注？
是否有多余的文字区域混入？

如果某张截图有问题：调整对应的 Rect 坐标（扩大或缩小范围），然后重新裁剪该图。反复调整直到所有截图质量合格。

Step 6: 生成中文阅读报告（HTML）

使用以下模板结构，生成一个自包含的 HTML 文件。图片使用 base64 内嵌以确保可移植性。

报告结构：

1. 论文基本信息（标题、作者、机构、发表信息）
2. 研究背景与动机
3. 核心方法 / 技术方案（配架构图）
4. 实验设计
5. 实验结果与分析（配结果图表）
6. 主要贡献与创新点
7. 局限性与未来方向
8. 个人点评与总结

图片内嵌方法：将裁剪好的图片转为 base64 嵌入 HTML：

import base64

def img_to_base64(path):
    with open(path, "rb") as f:
        data = base64.b64encode(f.read()).decode()
    ext = path.rsplit(".", 1)[-1]
    mime = {"png": "image/png", "jpg": "image/jpeg"}.get(ext, "image/png")
    return f"data:{mime};base64,{data}"

在 HTML 中使用：\x3Cimg src="{base64_data}" alt="Figure 1" style="max-width:100%;">

HTML 模板参考：详见 report-template.html

关键写作要求：

全文使用中文撰写，术语首次出现时附英文原文，如"注意力机制（Attention Mechanism）"
图表引用格式："如图 1 所示，..." 或 "表 1 汇总了..."
保持学术严谨性，不添加原文未涉及的推测
每个章节应有实质内容，避免泛泛而谈

输出文件命名：最终 HTML 文件以论文标题命名，格式为 {论文简短标题}-阅读报告.html。从论文标题中提取核心关键词作为简短标题（去除特殊字符，空格替换为短横线），例如论文标题为 "Attention Is All You Need" 则文件名为 Attention-Is-All-You-Need-阅读报告.html。

将最终 HTML 文件保存到 {workspace}/paper-report/outputs/{论文简短标题}-阅读报告.html。

Step 7: 校验 HTML 文件结构、资源引用与展示效果

使用 Read 工具查看生成的 HTML 文件，逐项检查以下内容：

结构完整性：

HTML 基本结构是否完整（\x3C!DOCTYPE html>、\x3Chtml>、\x3Chead>、\x3Cbody> 标签齐全）
所有章节标题（h2）是否按报告结构依次出现，没有遗漏
每个章节是否有实质内容，不存在空段落或占位符文本（如 {{...}}）

资源引用：

所有 \x3Cimg> 标签的 src 属性是否为有效的 base64 data URI（以 data:image/png;base64, 或 data:image/jpeg;base64, 开头）
不存在任何外部图片链接或本地文件路径引用（确保自包含可移植性）
每张图片的 alt 属性和 \x3Cfigcaption> 是否有描述性文字

展示效果：

CSS 样式是否内嵌在 \x3Cstyle> 标签中，无外部样式表依赖
图表与其所属章节的上下文是否对应（如架构图在方法章节，结果图在实验章节）
中文排版是否正常，术语后是否附有英文原文

如果发现问题：直接修复 HTML 文件中的对应内容，修复后重新保存到同一路径。

特殊情况处理

双栏排版论文：大多数学术论文为双栏排版。图表可能跨栏或单栏，注意调整裁剪宽度。单栏图通常宽约 30-280 或 300-565，跨栏图宽约 30-565。

扫描版 PDF：如果 PDF 页面渲染后文字模糊或为扫描件，依然可通过图片方式阅读理解，但截图质量可能受限，在报告中注明来源质量。

超长论文（>20 页）：分批处理页面图片，每批 5-8 页，避免上下文溢出。先通读全文结构，再聚焦重点章节。

论文含附录：附录中的补充图表如有价值也应截取，在报告中设置独立的"附录内容"章节。

安全使用建议

This skill appears to do what it says, but check a few practical points before use: (1) Ensure python3 and curl are available on the host and install PyMuPDF as instructed. (2) Be careful with page indexing: the crop script expects 0-based page numbers while some examples use human 1-based counts — test on a small PDF first. (3) The skill will write files under {workspace}/paper-report; review those outputs before sharing. (4) Downloading PDFs from arXiv is fine for research copies, but confirm copyright/usage terms for other sources. If you need the skill to run in an environment without curl or with custom Python paths, adapt the commands or declare those binaries before invoking the skill.

功能分析

Type: OpenClaw Skill Name: paper-report Version: 1.0.3 The skill bundle is a legitimate tool for converting academic PDFs into structured Chinese reports. It uses standard libraries like PyMuPDF for PDF processing and includes scripts (scripts/pdf_to_images.py and scripts/crop_figures.py) that perform their stated functions without any hidden malicious logic, data exfiltration, or unauthorized execution. The workflow in SKILL.md is transparent and aligns with the tool's purpose.

能力评估

ℹ Purpose & Capability

The skill’s name/description match the included scripts (PDF→PNG, figure crop) and HTML template. Minor inconsistencies: the SKILL.md uses {PYTHON} and a curl command but the registry metadata does not declare required binaries (python3 / curl) — this is an operational omission rather than malicious. Otherwise the requested capabilities are appropriate for the stated purpose.

ℹ Instruction Scope

Instructions stay within scope: they read local PDF files or download PDFs from arXiv, render pages to images, crop figure regions, and generate a self-contained HTML report. No instructions direct data to external endpoints other than arxiv.org. Minor practical issues: examples in the documentation mix human-friendly 1-indexed page descriptions and the crop script expects 0-indexed page numbers (possible off-by-one confusion); SKILL.md assumes an agent 'Read' tool to view files which is expected for agent workflows.

✓ Install Mechanism

No install spec (instruction-only). The only runtime dependency is PyMuPDF, which SKILL.md tells the user to pip install; code files are local and small. No external downloads or obscure installers are used.

✓ Credentials

The skill does not request any environment variables, secrets, or credentials. It operates on local files and optionally downloads PDFs from arXiv — proportional to its function.

✓ Persistence & Privilege

Skill is user-invocable, not always-enabled, and does not request elevated/system-wide persistence or modify other skills. It reads/writes only under the declared workspace paths.

版本历史

v1.0.3

- Updated all file path references and workspace conventions to consistently use the subdirectory `{workspace}/paper-report/`. - Adjusted all related command examples, image/figure directories, and output instructions to reflect the new workspace layout. - No code or logic changes; this update clarifies directory structure for easier organization and future maintenance.

v1.0.2

No user-visible changes in this version (1.0.2); documentation and workflow remain unchanged.

v1.0.1

No changes detected in this version. - No file or documentation changes were made from the previous version. - The skill functionality and workflow remain unchanged.

v1.0.0

- Major update: The skill now converts academic PDF papers into structured Chinese reading reports with original figures, focusing on HTML report generation and figure extraction. - New workflow and task checklist guiding users step-by-step from PDF acquisition to HTML report validation. - Supports both local PDF files and direct download from arXiv links. - Converts PDFs to high-res page images and extracts annotated figure/table crops. - Generates a comprehensive Chinese reading report in HTML, embedding figures as base64 images. - Output adheres to strict formatting and quality checks to ensure completeness, portability, and academic rigor.

元数据

Slug paper-report

版本 1.0.3

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 4

常见问题

paper-report 是什么？

Convert academic PDF papers into structured Chinese reading reports with original figures. Use when the user asks to summarize, read, analyze, or create a re... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 146 次。

如何安装 paper-report？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install paper-report」即可一键安装，无需额外配置。

paper-report 是免费的吗？

是的，paper-report 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

paper-report 支持哪些平台？

paper-report 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 paper-report？

由 xu.li（@nuaalixu）开发并维护，当前版本 v1.0.3。

paper-report