← 返回 Skills 市场
yjr-123456

Document Workflow

作者 Jiarong Yu · GitHub ↗ · v1.3.1 · MIT-0
cross-platform ⚠ suspicious
465
总下载
1
收藏
2
当前安装
10
版本数
在 OpenClaw 中安装
/install document-workflow
功能描述
一键实现学术论文的搜索、下载、分块提取文本及结构化总结,支持按年份和引用数筛选。
使用说明 (SKILL.md)

Document Workflow

Academic paper research: Search → Download LaTeX → Read & Summarize


Quick Start

1. Search Papers

python -m skills.document-workflow.scripts.search_papers --query "world model" --max_results 5 --year_from 2024

2. Download LaTeX Source

python -m skills.document-workflow.scripts.latex_reader "2301.07088" --keep

3. Read & Summarize

Read the LaTeX source files and summarize following the reading guide below.

Reading Guide

After downloading LaTeX source to arxiv_{id}/, read the .tex files in this order:

Step 1: Get Metadata

Read the main .tex file (usually main.tex, root.tex, or {paper-id}.tex) for:

  • itle{} - Paper title
  • \author{} - Authors
  • \begin{abstract}...\end{abstract} - Abstract

Step 2: Understand the Problem

Read the Introduction section (usually intro.tex, 1-introduction.tex, or first \section):

  • What problem does this paper solve?
  • What are the key contributions?
  • How does it relate to prior work?

Step 3: Understand the Method

Read the Method/Approach section:

  • What is the proposed approach?
  • Key equations in \begin{equation}...\end{equation} or \begin{align}...\end{align}
  • Algorithm pseudocode in \begin{algorithm}...\end{algorithm}

Step 4: Check Experiments

Read the Experiments section:

  • Datasets used
  • Baselines compared
  • Metrics in \begin{table}...\end{table} with results
  • Key findings

Step 5: Check References

Read the .bib or .bbl file for:

  • Related work citations
  • Key papers in the field

Output Schema

Summarize the paper in this JSON format(see more details in ./references/output_schema.json):

{
  "paper_title": "Full title",
  "authors": ["Author 1", "Author 2"],
  "source": "arXiv:XXXX.XXXXX",
  "task_definition": {
    "domain": "Research domain",
    "task": "Specific task",
    "problem_statement": "What problem this paper solves",
    "key_contributions": ["Contribution 1", "Contribution 2"]
  },
  "experiments": {
    "datasets": ["Dataset 1", "Dataset 2"],
    "baselines": ["Baseline 1", "Baseline 2"],
    "metrics": [
      {"name": "Metric name", "description": "What it measures","definition":"Mathematical definition or formula for the metric"}
    ],
    "results": [
      {"setting": "Dataset", "metric": "Metric", "proposed_method": "Score", "best_baseline": "Score"}
    ],
    "key_findings": ["Finding 1", "Finding 2"]
  }
}

Scripts

Script Function
search_papers.py Search papers (Tavily + Semantic Scholar)
download_paper.py Download PDF (for human reading)
latex_reader.py Download LaTeX source (for AI reading)

Tips for Reading LaTeX

LaTeX Command Meaning
\section{Title} Section heading
\subsection{Title} Subsection heading
extbf{text} Bold text (often important)
\cite{key} Citation reference
\begin{equation}...\end{equation} Numbered equation
\begin{table}...\end{table} Table
\begin{figure}...\end{figure} Figure
\input{file} or \subfile{file} Include another .tex file

Config

# Optional: Semantic Scholar API key
export SEMANTIC_SCHOLAR_API_KEY="your-key"

# Default download path
C:\Users\Lenovo\Desktop\papers
安全使用建议
This skill appears to do what it says (search arXiv/Semantic Scholar, download PDFs, fetch LaTeX source, and parse .tex files), but there are a few red flags you should consider before installing or running it: - Undeclared dependency: search_papers.py will try to call a local 'mcporter' binary to use Tavily. The skill metadata lists no required binaries. If you don't have mcporter, the script falls back, but if mcporter is present the skill will execute it. Verify what 'mcporter' is and only allow it if you trust that binary. - Hard-coded API key: a Semantic Scholar API key is embedded as a fallback in the script. Hard-coded keys can be abused or invalid; set your own SEMANTIC_SCHOLAR_API_KEY in the environment instead of relying on the embedded value. - File writes and downloads: the scripts will download remote archives and PDFs and extract/write files to disk (default path is C:\Users\Lenovo\Desktop\papers). Run in a sandbox or adjust the download directory to a safe location and ensure you trust the sources of downloaded URLs. - Executing bundled code: although there is no installer, the agent will run the included Python modules. Audit the scripts (you already have them) and consider running them manually in a controlled environment before granting the skill autonomous invocation. Recommended actions: - If you want to use this skill, set SEMANTIC_SCHOLAR_API_KEY yourself and remove or replace the hard-coded fallback from a trusted copy of the code. - Inspect or block use of 'mcporter' unless you know its origin and trust it. Consider editing the skill to remove the mcporter/Tavily path if you won't use it. - Run the scripts in a sandboxed environment (container or VM) or with limited filesystem/network permissions to observe behavior before enabling in production. - If you are not comfortable auditing code, avoid installing the skill or ask the maintainer for clarity on mcporter and the embedded API key.
功能分析
Type: OpenClaw Skill Name: document-workflow Version: 1.3.1 The skill bundle contains a hardcoded Semantic Scholar API key in `scripts/search_papers.py` and hardcoded absolute Windows file paths (e.g., `C:\Users\Lenovo\Desktop\papers`) in `SKILL.md` and `scripts/download_paper.py`, which are significant security and privacy red flags. Additionally, `scripts/latex_reader.py` uses `tarfile.extractall` without sanitization, posing a path traversal risk, and `scripts/search_papers.py` executes external commands via `subprocess.run` to call the `mcporter` utility. While these appear to be development oversights rather than intentional malware, they represent high-risk behaviors and potential credential leakage.
能力评估
Purpose & Capability
Name/description align with the code: searching, downloading PDFs, and extracting LaTeX from arXiv. However, the search script expects an external 'mcporter' binary (used to call Tavily) but the skill's manifest/requirements do not declare this dependency. That mismatch (undeclared binary dependency that spawns a local executable) is incoherent with the 'no required binaries' metadata.
Instruction Scope
SKILL.md instructs running the included Python scripts to search, download, and parse LaTeX — which is consistent. The scripts perform network I/O (HTTP GETs to arxiv.org and Semantic Scholar/openAccessPdf URLs) and write files to disk (default path is a Windows Desktop folder). SKILL.md does not clearly warn about executing included code, reliance on local binaries, or the security implications of downloading arbitrary PDF/source URLs.
Install Mechanism
There is no install spec (instruction-only), but the skill bundles runnable Python scripts that the agent will execute. Because execution happens without an explicit install step, the agent may run these scripts directly — this is lower friction but means the provided code will be executed on the host. Additionally, the scripts call external binaries (mcporter) via subprocess when using Tavily; that external dependency is not declared.
Credentials
The skill declares no required env vars, and SKILL.md mentions an optional SEMANTIC_SCHOLAR_API_KEY. The code uses os.environ.get for that key but also supplies a hard-coded fallback API key embedded in the script. Hard-coded keys are a concern (they may be stale, abused, or belong to someone else). Otherwise, the skill does not request broad credentials or unrelated environment access.
Persistence & Privilege
The skill is not marked 'always', and it doesn't request persistent elevated privileges or modify other skills. It writes files to a user-specified download directory (defaulting to a Windows Desktop path), which is normal for a downloader but worth noting as file-system write activity.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install document-workflow
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /document-workflow 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.3.1
- Added a reference to `./references/output_schema.json` for more details on the output schema. - Enhanced the metrics schema by including a "definition" field for mathematical definitions or formulas. - Minor formatting and clarification updates in the reading guide and output schema instructions.
v1.3.0
- Major update: streamlined workflow to focus on LaTeX source reading and manual summarization. - Removed scripts for automated research and summarization (`research.py`, `summarize.py`). - Updated documentation to provide a clear, step-by-step reading guide for LaTeX source files. - New JSON output schema defined for consistent, structured summaries. - Updated script list; `latex_reader.py` now used for downloading LaTeX sources for AI/manual reading.
v1.2.2
- Enhanced documentation in SKILL.md with a new section detailing the output folder structure of the one-click research workflow. - Added clear bilingual (Chinese & English) instructions on how to ask AI to summarize parsed paper output. - No changes to core functionality or scripts, only improved workflow guidance and usability clarity.
v1.2.1
- SKILL.md is now primarily in English; previously used Chinese for key workflow steps and schemas. - All code comments, workflow explanations, and summary schema labels are now in English for broader accessibility. - No functional or code-level changes—documentation update only. - Table headers in the scripts section and descriptions have been translated and clarified.
v1.2.0
Version 1.2.0 introduces workflow improvements and enhanced AI summarization. - Added "read paper" as a new workflow trigger. - Expanded workflow to explicitly include AI-based paper summarization. - Updated documentation with step-by-step tasks in both English and Chinese, emphasizing structured AI output formats. - Added detailed output schema for AI-generated summaries. - Clarified script purposes, outputs, and improved process clarity.
v1.1.0
- Adds LaTeX source parsing support with the new latex_reader.py script. - Removes PDF-based text extraction and related scripts (pdf_reader.py, auto_research.py, search_tavily.py). - Simplifies workflow: focus is now "Search → Download → Parse LaTeX" instead of PDF chunk extraction. - Updates documentation and command examples for streamlined usage and new triggers. - One-click research workflow now supports LaTeX parsing with --parse_latex option.
v1.0.3
Remove redundant SKILL.en.md (SKILL.md is now English)
v1.0.2
Rewrite SKILL.md following skill-creator standard format
v1.0.1
Translate SKILL.md to English
v1.0.0
- Initial release of the "document-workflow" skill for academic papers. - Supports one-click workflow: search, download, extract, and summarize papers. - Provides separate command-line scripts for each workflow stage (search, download, extract, summarize). - Outputs organized folder structure with metadata, PDFs, and extracted JSON chunks. - Includes daily automated paper search, summary, and Telegram push via cron job. - Troubleshooting and best practice guidance included in documentation.
元数据
Slug document-workflow
版本 1.3.1
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 10
常见问题

Document Workflow 是什么?

一键实现学术论文的搜索、下载、分块提取文本及结构化总结,支持按年份和引用数筛选。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 465 次。

如何安装 Document Workflow?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install document-workflow」即可一键安装,无需额外配置。

Document Workflow 是免费的吗?

是的,Document Workflow 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Document Workflow 支持哪些平台?

Document Workflow 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Document Workflow?

由 Jiarong Yu(@yjr-123456)开发并维护,当前版本 v1.3.1。

💬 留言讨论