← 返回 Skills 市场
🔌

GLM-V-PDF-to-WEB

作者 zai-org · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
334
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install glmv-pdf-to-web
功能描述
Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. T...
使用说明 (SKILL.md)

PDF → Academic Project Website Skill

Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured outline.json is saved, images are cropped locally, and the final page is saved with generate_web.py.

Scripts are in: {SKILL_DIR}/scripts/

Dependencies

Python packages (install once):

pip install pymupdf pillow

System tools: curl (pre-installed on macOS/Linux).

When to Use

Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.

Output Directory Convention

All output goes under {WORKSPACE}/web/\x3Cpdf_stem>_\x3Ctimestamp>/:

web/
└── \x3Cpdf_stem>_\x3Ctimestamp>/
    ├── outline.json        ← structured web plan (WebPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← the website
  • \x3Cpdf_stem> = PDF filename without extension
  • \x3Ctimestamp> = format YYYYMMDD_HHMMSS
  • HTML references images via relative path crops/\x3Cname>_crop.png

Input

$ARGUMENTS is the path to the PDF file (local) or an HTTP/HTTPS URL.

  • If user provides a URL: download with curl first, then convert
  • If user provides a local PDF path: convert directly

Workflow

Phase 0 — Create Output Directory

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
mkdir -p "\x3Cout_dir>/crops"

Phase 1 — Convert PDF Pages to Images (DPI 120)

If the input is a URL, download it first:

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

Then convert (pass either the downloaded path or the original local path):

python {SKILL_DIR}/scripts/pdf_to_images.py "\x3Cpdf_path>" --dpi 120

Outputs JSON to stdout:

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

Parse and store the full page → path map.


Phase 2 — Read All Pages in Order

View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.

While reading, note:

  • Title, authors, affiliations, venue, year
  • Abstract text (verbatim)
  • Key contributions
  • Paper/Code/Dataset links (arXiv, GitHub, etc.)
  • Figures, tables, diagrams — which pages, rough regions
  • Teaser/hero figure if present

Do NOT plan sections yet — read everything first.


Phase 3 — Plan Sections & Save outline.json

Plan the website sections. Standard structure for academic papers (adapt as needed):

section_id Purpose
hero Title, authors, venue badge, link buttons
abstract Full abstract text
contributions 3–5 key contribution cards
method Architecture figure + method explanation
results Quantitative table + qualitative figures
conclusion Brief conclusion
citation BibTeX block

For each section that needs an image, identify:

  • Which page it comes from (the local page path from Phase 1)
  • A description of what the visual shows and why it belongs in this section

Save as \x3Cout_dir>/outline.json using exactly this schema:

{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "\x3Clocal_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}

Field notes:

  • lang: "Chinese" or "English" — match the PDF language
  • required_images: empty array [] if section needs no images
  • url: the local file path of the source page (from Phase 1 path field)
  • For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4

Write outline.json using the Write tool to \x3Cout_dir>/outline.json.


Phase 4 — Crop Required Images (Grounding + Subagent)

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.

IMPORTANT: You MUST use the provided {SKILL_DIR}/scripts/crop.py script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.

Read outline.json. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.

Use the Agent tool like this:

Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    \x3Cpage_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "\x3Cdescriptive_name>"
       Target: "\x3Cvisual_description from outline.json>"
       Context: "\x3Cusage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python \x3CSKILL_DIR>/scripts/crop.py \
        --path "\x3Cpage_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "\x3Ccrop_name>" \
        --out-dir "\x3Cout_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: \x3Cname>, file: \x3Coutput_filename>, box: [X1, Y1, X2, Y2]

Replace \x3Cpage_image_path>, \x3CSKILL_DIR>, \x3Cout_dir>, and crop details with actual values from your context.

The crop.py script outputs JSON: {"path": "/abs/path/\x3Cname>_crop.png"}

Collect results from all subagents and build the mapping: section_id → [crop filename, ...] to reference in HTML.

Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.


Phase 5 — Measure Cropped Image Dimensions

python3 -c "
from PIL import Image; import os, json
d = '\x3Cout_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
Aspect ratio Layout recommendation
\x3C 0.7 (tall/narrow) max-width: 400–500px, centered
0.7 – 1.3 (square-ish) max-width: 600–700px
> 1.3 (wide) Full-width, max-width: 100%
> 2.0 (very wide, e.g. tables) Full-width with horizontal scroll fallback

Phase 6 — Generate the Single-Page HTML

Step A — Write HTML to /tmp/website.html

  • All \x3Cimg src="..."> must use relative paths: crops/\x3Cname>_crop.png
  • Do NOT use absolute paths

Step B — Save:

python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "\x3Cpaper title>" \
    --out-dir "\x3Cout_dir>/"

HTML Spec

A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.

Page layout:

  • Max content width: 900px, centered, comfortable side padding
  • Sticky top nav with section anchor links + smooth scroll
  • Looks good at 1200px wide; readable at 768px

Typography:

  • Two Google Fonts: one for headings, one for body/UI
  • Body: 17–18px, line-height 1.7
  • Strong heading hierarchy (h1 >> h2 >> h3)

Visual style:

  • If the user specifies a style, follow it exactly
  • Otherwise, infer an appropriate aesthetic from the paper's domain and tone (e.g. CV/ML paper → clean modern academic; systems paper → dark technical; humanities → warm editorial serif)
  • Define colors and fonts as CSS variables; no fixed palette or font choices are required

Section guidelines:

hero:

  • Large title (2–3rem), authors list with affiliation superscripts, venue badge pill
  • Link buttons: [📄 Paper] [💻 Code] [🗄️ Dataset] — grey out if no URL
  • Teaser figure below (if found)

abstract:

  • Verbatim text with subtle left border accent

contributions:

  • Cards in a 2–3 column CSS grid, each with Unicode symbol + heading + description

method:

  • Full-width architecture figure (\x3Cfigure>\x3Cimg>\x3Cfigcaption>) + prose explanation

results:

  • Quantitative table as real \x3Ctable> — use actual numbers from the PDF, best numbers bolded
  • Qualitative figures in a grid (2–4 images with captions)

conclusion:

  • 2–3 paragraphs

citation:

  • \x3Cpre>\x3Ccode> BibTeX block reconstructed from PDF metadata
  • "Copy" button using navigator.clipboard vanilla JS

Images:

  • All \x3Cimg> use relative paths: crops/\x3Cname>_crop.png
  • Add loading="lazy" and descriptive alt
  • Wrap in \x3Cfigure> with \x3Cfigcaption>

Animations (subtle only):

  • Fade-in on scroll via IntersectionObserver + CSS transitions
  • Hover states on buttons/cards


Quality Checklist

  • Output directory named \x3Cpdf_stem>_\x3Ctimestamp>/
  • outline.json saved with valid WebPlan schema
  • All crops saved to crops/ (local only)
  • All metadata (title, authors, venue, year) from the PDF
  • Abstract is verbatim
  • Quantitative table has real numbers from the paper
  • All crop images referenced via crops/\x3Cname>_crop.png
  • BibTeX block accurate and copyable
  • Nav anchors scroll to correct sections
  • generate_web.py called and confirmed success

Language

Match the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.

安全使用建议
This skill appears to do what it says: it converts PDFs to page images, identifies content, crops figures, and writes an outline.json plus index.html. Before installing/using it consider: (1) it will download PDFs with curl if you pass a URL — ensure you trust the source; (2) the workflow deliberately runs a fresh subagent to do cropping, which means page images are sent to that subagent (possible privacy exposure for sensitive documents); (3) you need to install pymupdf and pillow locally for rasterization and cropping; (4) there are no requested credentials or hidden network endpoints in the code files included here. If you plan to process private or confidential PDFs, avoid sending them to remote/third-party models or ensure the subagent runs in an environment you control. If you want greater assurance, inspect the full (non‑truncated) SKILL.md Agent tool call details to confirm how the subagent is invoked and where image data is routed.
功能分析
Type: OpenClaw Skill Name: glmv-pdf-to-web Version: 1.0.1 The skill bundle provides legitimate functionality for converting PDFs into academic websites, but the instructions in SKILL.md introduce shell injection vulnerabilities. Specifically, the workflow suggests using `curl` and Python scripts with unsanitized user-provided input (e.g., `$ARGUMENTS` and `<pdf_path>`) in shell commands. While the included Python scripts (`pdf_to_images.py`, `crop.py`, and `generate_web.py`) appear benign and focused on their stated tasks, the instruction-level pattern of executing shell commands with external data is a high-risk vulnerability.
能力评估
Purpose & Capability
Name/description match the provided scripts and instructions: pdf_to_images.py converts pages to PNG, crop.py performs local crops, and generate_web.py writes the final HTML. Required tools (PyMuPDF, Pillow, curl) are proportional to the task.
Instruction Scope
SKILL.md stays within scope (download PDF if URL, rasterize pages, read pages, produce outline.json, crop images, produce HTML). It requires delegating cropping to a fresh subagent via the Agent tool (explicitly), which means page images will be processed by that subagent; this is coherent for grounding accuracy but has privacy implications for sensitive documents.
Install Mechanism
Instruction-only skill with no install spec (low risk). Dependency installation is limited to pip-installable Python packages (pymupdf, pillow) and uses standard, well-known tools; no network downloads of arbitrary executables are present.
Credentials
No environment variables, credentials, or config paths are requested. All file I/O is limited to workspace/output directories and /tmp when downloading PDFs — consistent with the stated functionality.
Persistence & Privilege
always:false and user-invocable true. The skill writes files only to the declared output directory and uses a subagent for cropping; it does not request persistent privileges or modify other skills/configurations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install glmv-pdf-to-web
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /glmv-pdf-to-web 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Removed the requirement for specific system binaries in metadata. - No changes to features or functionality; behavior remains the same. - Internal metadata was streamlined for clarity.
v1.0.0
- Initial release of glmv-pdf-to-web skill. - Converts a PDF (in Chinese or English) into a single-page academic or project website, structured like conference paper homepages. - Processes research papers, technical reports, or project documents as local files or URLs. - Generates an HTML site, a structured outline.json, and locally-cropped image assets. - Includes a clear workflow for PDF image conversion, content planning, and precise figure cropping via subagents. - Stores all outputs in an organized directory under workspace/web.
元数据
Slug glmv-pdf-to-web
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

GLM-V-PDF-to-WEB 是什么?

Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. T... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 334 次。

如何安装 GLM-V-PDF-to-WEB?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install glmv-pdf-to-web」即可一键安装,无需额外配置。

GLM-V-PDF-to-WEB 是免费的吗?

是的,GLM-V-PDF-to-WEB 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

GLM-V-PDF-to-WEB 支持哪些平台?

GLM-V-PDF-to-WEB 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 GLM-V-PDF-to-WEB?

由 zai-org(@zai-org)开发并维护,当前版本 v1.0.1。

💬 留言讨论