← Back to Skills Marketplace
🔌

GLM-V-PDF-to-WEB

by zai-org · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
334
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install glmv-pdf-to-web
Description
Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. T...
README (SKILL.md)

PDF → Academic Project Website Skill

Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured outline.json is saved, images are cropped locally, and the final page is saved with generate_web.py.

Scripts are in: {SKILL_DIR}/scripts/

Dependencies

Python packages (install once):

pip install pymupdf pillow

System tools: curl (pre-installed on macOS/Linux).

When to Use

Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.

Output Directory Convention

All output goes under {WORKSPACE}/web/\x3Cpdf_stem>_\x3Ctimestamp>/:

web/
└── \x3Cpdf_stem>_\x3Ctimestamp>/
    ├── outline.json        ← structured web plan (WebPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← the website
  • \x3Cpdf_stem> = PDF filename without extension
  • \x3Ctimestamp> = format YYYYMMDD_HHMMSS
  • HTML references images via relative path crops/\x3Cname>_crop.png

Input

$ARGUMENTS is the path to the PDF file (local) or an HTTP/HTTPS URL.

  • If user provides a URL: download with curl first, then convert
  • If user provides a local PDF path: convert directly

Workflow

Phase 0 — Create Output Directory

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
mkdir -p "\x3Cout_dir>/crops"

Phase 1 — Convert PDF Pages to Images (DPI 120)

If the input is a URL, download it first:

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

Then convert (pass either the downloaded path or the original local path):

python {SKILL_DIR}/scripts/pdf_to_images.py "\x3Cpdf_path>" --dpi 120

Outputs JSON to stdout:

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

Parse and store the full page → path map.


Phase 2 — Read All Pages in Order

View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.

While reading, note:

  • Title, authors, affiliations, venue, year
  • Abstract text (verbatim)
  • Key contributions
  • Paper/Code/Dataset links (arXiv, GitHub, etc.)
  • Figures, tables, diagrams — which pages, rough regions
  • Teaser/hero figure if present

Do NOT plan sections yet — read everything first.


Phase 3 — Plan Sections & Save outline.json

Plan the website sections. Standard structure for academic papers (adapt as needed):

section_id Purpose
hero Title, authors, venue badge, link buttons
abstract Full abstract text
contributions 3–5 key contribution cards
method Architecture figure + method explanation
results Quantitative table + qualitative figures
conclusion Brief conclusion
citation BibTeX block

For each section that needs an image, identify:

  • Which page it comes from (the local page path from Phase 1)
  • A description of what the visual shows and why it belongs in this section

Save as \x3Cout_dir>/outline.json using exactly this schema:

{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "\x3Clocal_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}

Field notes:

  • lang: "Chinese" or "English" — match the PDF language
  • required_images: empty array [] if section needs no images
  • url: the local file path of the source page (from Phase 1 path field)
  • For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4

Write outline.json using the Write tool to \x3Cout_dir>/outline.json.


Phase 4 — Crop Required Images (Grounding + Subagent)

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.

IMPORTANT: You MUST use the provided {SKILL_DIR}/scripts/crop.py script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.

Read outline.json. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.

Use the Agent tool like this:

Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    \x3Cpage_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "\x3Cdescriptive_name>"
       Target: "\x3Cvisual_description from outline.json>"
       Context: "\x3Cusage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python \x3CSKILL_DIR>/scripts/crop.py \
        --path "\x3Cpage_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "\x3Ccrop_name>" \
        --out-dir "\x3Cout_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: \x3Cname>, file: \x3Coutput_filename>, box: [X1, Y1, X2, Y2]

Replace \x3Cpage_image_path>, \x3CSKILL_DIR>, \x3Cout_dir>, and crop details with actual values from your context.

The crop.py script outputs JSON: {"path": "/abs/path/\x3Cname>_crop.png"}

Collect results from all subagents and build the mapping: section_id → [crop filename, ...] to reference in HTML.

Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.


Phase 5 — Measure Cropped Image Dimensions

python3 -c "
from PIL import Image; import os, json
d = '\x3Cout_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
Aspect ratio Layout recommendation
\x3C 0.7 (tall/narrow) max-width: 400–500px, centered
0.7 – 1.3 (square-ish) max-width: 600–700px
> 1.3 (wide) Full-width, max-width: 100%
> 2.0 (very wide, e.g. tables) Full-width with horizontal scroll fallback

Phase 6 — Generate the Single-Page HTML

Step A — Write HTML to /tmp/website.html

  • All \x3Cimg src="..."> must use relative paths: crops/\x3Cname>_crop.png
  • Do NOT use absolute paths

Step B — Save:

python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "\x3Cpaper title>" \
    --out-dir "\x3Cout_dir>/"

HTML Spec

A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.

Page layout:

  • Max content width: 900px, centered, comfortable side padding
  • Sticky top nav with section anchor links + smooth scroll
  • Looks good at 1200px wide; readable at 768px

Typography:

  • Two Google Fonts: one for headings, one for body/UI
  • Body: 17–18px, line-height 1.7
  • Strong heading hierarchy (h1 >> h2 >> h3)

Visual style:

  • If the user specifies a style, follow it exactly
  • Otherwise, infer an appropriate aesthetic from the paper's domain and tone (e.g. CV/ML paper → clean modern academic; systems paper → dark technical; humanities → warm editorial serif)
  • Define colors and fonts as CSS variables; no fixed palette or font choices are required

Section guidelines:

hero:

  • Large title (2–3rem), authors list with affiliation superscripts, venue badge pill
  • Link buttons: [📄 Paper] [💻 Code] [🗄️ Dataset] — grey out if no URL
  • Teaser figure below (if found)

abstract:

  • Verbatim text with subtle left border accent

contributions:

  • Cards in a 2–3 column CSS grid, each with Unicode symbol + heading + description

method:

  • Full-width architecture figure (\x3Cfigure>\x3Cimg>\x3Cfigcaption>) + prose explanation

results:

  • Quantitative table as real \x3Ctable> — use actual numbers from the PDF, best numbers bolded
  • Qualitative figures in a grid (2–4 images with captions)

conclusion:

  • 2–3 paragraphs

citation:

  • \x3Cpre>\x3Ccode> BibTeX block reconstructed from PDF metadata
  • "Copy" button using navigator.clipboard vanilla JS

Images:

  • All \x3Cimg> use relative paths: crops/\x3Cname>_crop.png
  • Add loading="lazy" and descriptive alt
  • Wrap in \x3Cfigure> with \x3Cfigcaption>

Animations (subtle only):

  • Fade-in on scroll via IntersectionObserver + CSS transitions
  • Hover states on buttons/cards


Quality Checklist

  • Output directory named \x3Cpdf_stem>_\x3Ctimestamp>/
  • outline.json saved with valid WebPlan schema
  • All crops saved to crops/ (local only)
  • All metadata (title, authors, venue, year) from the PDF
  • Abstract is verbatim
  • Quantitative table has real numbers from the paper
  • All crop images referenced via crops/\x3Cname>_crop.png
  • BibTeX block accurate and copyable
  • Nav anchors scroll to correct sections
  • generate_web.py called and confirmed success

Language

Match the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.

Usage Guidance
This skill appears to do what it says: it converts PDFs to page images, identifies content, crops figures, and writes an outline.json plus index.html. Before installing/using it consider: (1) it will download PDFs with curl if you pass a URL — ensure you trust the source; (2) the workflow deliberately runs a fresh subagent to do cropping, which means page images are sent to that subagent (possible privacy exposure for sensitive documents); (3) you need to install pymupdf and pillow locally for rasterization and cropping; (4) there are no requested credentials or hidden network endpoints in the code files included here. If you plan to process private or confidential PDFs, avoid sending them to remote/third-party models or ensure the subagent runs in an environment you control. If you want greater assurance, inspect the full (non‑truncated) SKILL.md Agent tool call details to confirm how the subagent is invoked and where image data is routed.
Capability Analysis
Type: OpenClaw Skill Name: glmv-pdf-to-web Version: 1.0.1 The skill bundle provides legitimate functionality for converting PDFs into academic websites, but the instructions in SKILL.md introduce shell injection vulnerabilities. Specifically, the workflow suggests using `curl` and Python scripts with unsanitized user-provided input (e.g., `$ARGUMENTS` and `<pdf_path>`) in shell commands. While the included Python scripts (`pdf_to_images.py`, `crop.py`, and `generate_web.py`) appear benign and focused on their stated tasks, the instruction-level pattern of executing shell commands with external data is a high-risk vulnerability.
Capability Assessment
Purpose & Capability
Name/description match the provided scripts and instructions: pdf_to_images.py converts pages to PNG, crop.py performs local crops, and generate_web.py writes the final HTML. Required tools (PyMuPDF, Pillow, curl) are proportional to the task.
Instruction Scope
SKILL.md stays within scope (download PDF if URL, rasterize pages, read pages, produce outline.json, crop images, produce HTML). It requires delegating cropping to a fresh subagent via the Agent tool (explicitly), which means page images will be processed by that subagent; this is coherent for grounding accuracy but has privacy implications for sensitive documents.
Install Mechanism
Instruction-only skill with no install spec (low risk). Dependency installation is limited to pip-installable Python packages (pymupdf, pillow) and uses standard, well-known tools; no network downloads of arbitrary executables are present.
Credentials
No environment variables, credentials, or config paths are requested. All file I/O is limited to workspace/output directories and /tmp when downloading PDFs — consistent with the stated functionality.
Persistence & Privilege
always:false and user-invocable true. The skill writes files only to the declared output directory and uses a subagent for cropping; it does not request persistent privileges or modify other skills/configurations.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install glmv-pdf-to-web
  3. After installation, invoke the skill by name or use /glmv-pdf-to-web
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Removed the requirement for specific system binaries in metadata. - No changes to features or functionality; behavior remains the same. - Internal metadata was streamlined for clarity.
v1.0.0
- Initial release of glmv-pdf-to-web skill. - Converts a PDF (in Chinese or English) into a single-page academic or project website, structured like conference paper homepages. - Processes research papers, technical reports, or project documents as local files or URLs. - Generates an HTML site, a structured outline.json, and locally-cropped image assets. - Includes a clear workflow for PDF image conversion, content planning, and precise figure cropping via subagents. - Stores all outputs in an organized directory under workspace/web.
Metadata
Slug glmv-pdf-to-web
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is GLM-V-PDF-to-WEB?

Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. T... It is an AI Agent Skill for Claude Code / OpenClaw, with 334 downloads so far.

How do I install GLM-V-PDF-to-WEB?

Run "/install glmv-pdf-to-web" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GLM-V-PDF-to-WEB free?

Yes, GLM-V-PDF-to-WEB is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GLM-V-PDF-to-WEB support?

GLM-V-PDF-to-WEB is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GLM-V-PDF-to-WEB?

It is built and maintained by zai-org (@zai-org); the current version is v1.0.1.

💬 Comments