← Back to Skills Marketplace
xueylee-dotcom

Deep Research Pro v2.2

by xueylee-dotcom · GitHub ↗ · v2.2.0 · MIT-0
cross-platform ✓ Security Clean
147
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install deep-research-v22
Description
Conducts thorough research by downloading full PDFs, extracting structured data with original text quotes, verifying sources, and generating cross-validated...
README (SKILL.md)

Skill: Deep Research Pro (v2.2 - True Depth)

版本:2.2.0 描述:真深度研究技能,强制全文解析+结构化提取+溯源验证

核心原则

没有真正的原文阅读,就没有深度研究


🔴 强制执行流程(v2.2 新增)

Step 1: 研究规划 (必须输出文件)

  • 生成 research/plan.md
  • 列出至少 5 个具体检索查询式
  • 用户确认后才能继续

Step 2: 全文解析 + 结构化提取 (核心!)

禁止跳过此步骤!

对于每个有效来源,必须执行:

  1. 获取全文

    # 使用 extract-from-pdf.py 脚本
    python3 scripts/extract-from-pdf.py card-001 "https://arxiv.org/pdf/xxx.pdf"
    
    • 如果有DOI/URL,尝试下载PDF
    • 如果无法获取全文,标记 full_text: false跳过该来源
  2. 结构化提取(从PDF原文提取)

    • 样本量:具体数字
    • 主要结果:具体数值 + 单位 + 统计显著性
    • 成本影响:具体金额/百分比
    • 置信区间:95%CI
    • 原文引用:必须从正文中复制至少50字
  3. 更新卡片

    • 用提取的真实数据替换"待提取"
    • 标记 full_text: true/false

最低要求

  • deep 模式:至少 10 个带全文提取的卡片
  • 质量阈值:提取后评分 ≥ 6/10

Step 3: 溯源验证 (强制检查!)

生成报告前必须运行:

bash scripts/check-sourcing.sh reports/final-report.md sources/
  • 检查每个 [[card-xxx]] 引用的数据是否在卡片中存在
  • 如果有数据无法溯源,拒绝生成报告
  • 修复后重新验证

Step 4: 交叉分析

  • 生成 analysis/synthesis.md
  • 至少找出 3 组矛盾数据
  • 标注每个观点的卡片来源

Step 5: 报告生成

  • 生成 reports/final-report.md
  • 每个数据点必须标注 [[card-xxx]]
  • 报告末附溯源检查结果

🔧 工具依赖

工具 用途 状态
pdfplumber PDF全文解析 ✅ 已安装
pdftotext PDF备用解析 ✅ 已安装
extract-from-pdf.py 结构化数据提取 ✅ 已创建
check-sourcing.sh 溯源验证 ✅ 已创建

📋 执行命令

完整流程

# Step 1: 规划
# 编辑 research/plan.md,确认检索式

# Step 2: 检索 + 提取(循环执行)
# 对于每个来源:
python3 scripts/extract-from-pdf.py card-001 "URL"
# 检查提取结果,填入卡片

# Step 3: 溯源验证
bash scripts/check-sourcing.sh reports/final-report.md sources/

# Step 4-5: 分析与报告
# 生成最终报告

⚠️ 限制说明

如果无法获取全文(付费论文/报告):

  1. 标记卡片 full_text: false
  2. 报告中对该来源的数据仅作参考,不作为核心结论
  3. 建议人工复核关键数据

📊 版本对比

维度 v2.1 v2.2
PDF解析 ✅ 强制
数据提取 "待提取" ✅ 真实提取
原文引用 模板话术 ✅ 从正文复制
溯源检查 ✅ 强制验证
报告质量 有引用无验证 有引用+验证

质量门禁(v2.2 强化版)

# 1. 检查卡片数量(≥10个有全文的)
FULLTEXT_COUNT=$(grep -l "full_text: true" sources/card-*.md 2>/dev/null | wc -l)
if [ $FULLTEXT_COUNT -lt 10 ]; then
 echo "❌ 错误:全文提取卡片不足10个,当前 $FULLTEXT_COUNT 个"
 exit 1
fi

# 2. 检查溯源
bash scripts/check-sourcing.sh reports/final-report.md sources/
if [ $? -ne 0 ]; then
 echo "❌ 错误:报告中有数据无法溯源"
 exit 1
fi

# 3. 检查待提取标记
if grep -q "待提取" sources/card-*.md; then
 echo "❌ 错误:卡片中仍有'待提取'数据"
 exit 1
fi

Skill版本:2.2.0 | 最后更新:2026-03-19

Usage Guidance
This skill appears coherent with its stated purpose, but take these precautions before installing or executing it: - Ensure your environment has Python3, pdfplumber (pip) or pdftotext + poppler installed; the package/dependency requirements are not declared by the registry. - Run the workflow in an isolated environment (VM/container) because it downloads and opens arbitrary PDFs — malformed PDFs can exploit local parsers. - Review the code (extract-from-pdf.py and check-sourcing.sh) yourself if you can; they perform network downloads and local file writes but contain no hidden exfiltration. Verify the User-Agent and URL handling if you have network-policy constraints. - Be aware the skill enforces copying verbatim 50+ character quotes from sources — consider copyright/privacy rules when including such text in generated reports. - Confirm grep -P availability (the shell script uses PCRE patterns) or adjust scripts for your environment. If you need higher assurance, request the author add an explicit install spec (pip/apt/poppler), declare required binaries, and add input validation/sanitization for PDF URLs before running.
Capability Analysis
Type: OpenClaw Skill Name: deep-research-v22 Version: 2.2.0 The skill bundle provides a structured framework for automated deep research, including PDF data extraction, quality scoring, and citation verification. The included scripts (extract-from-pdf.py, check-sourcing.sh, quality-score.py) perform tasks consistent with the stated purpose, and the instructions in SKILL.md use authoritative language to enforce a rigorous research methodology rather than to facilitate prompt injection or unauthorized actions.
Capability Assessment
Purpose & Capability
The name/description (full‑text extraction, quoting, source verification) align with the included scripts (extract-from-pdf.py, check-sourcing.sh, quality-score.py) and templates. There are no unrelated credentials or unusual binaries requested.
Instruction Scope
Instructions explicitly require downloading arbitrary PDF URLs, extracting text, copying 50+ character verbatim quotes from source, and running a provenance check; these are coherent with the stated goal. Notes of caution: downloading/parsing arbitrary PDFs can expose the host to malformed/malicious PDFs; the workflow promotes copying verbatim excerpts which may have copyright implications; check-sourcing.sh assumes report and source file layout and relies on grep patterns.
Install Mechanism
There is no install spec even though the code depends on Python and either pdfplumber (a pip package) or pdftotext (and likely the poppler system library). SKILL.md lists those tools as '已安装' but the registry metadata declares no required binaries/dependencies — this mismatch means the agent or user must ensure the environment has the needed packages. Also check that grep -P (PCRE) is available where the scripts run.
Credentials
The skill requests no environment variables or credentials and the scripts do not read secrets or other env vars. Network access to arbitrary URLs is required for the intended purpose; no external endpoints or hidden exfiltration channels are present in the code.
Persistence & Privilege
The skill is not always-on and does not request persistent system privileges or alter other skills. It writes temporary files (uses /tmp) and deletes them; that is normal for its function.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install deep-research-v22
  3. After installation, invoke the skill by name or use /deep-research-v22
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.2.0
Deep Research Pro v2.2.0 introduces strict full-text extraction, structured data requirements, and enforced source verification: - Mandatory full-text parsing and structured data extraction for all sources; sources without full text are excluded from conclusions. - New planning step: users must confirm at least 5 concrete search queries (`research/plan.md`) before proceeding. - Only sources with confirmed full-text extraction (≥10) count towards results; each must include copied quotes, concrete data, and scoring. - Automatic provenance check script is now required before report generation; reports with untraceable data are blocked. - Enhanced cross-analysis: find and document at least 3 sets of conflicting data, clearly citing source cards. - Strengthened quality gates: rejects incomplete extractions, missing provenance, or insufficient card count.
Metadata
Slug deep-research-v22
Version 2.2.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Deep Research Pro v2.2?

Conducts thorough research by downloading full PDFs, extracting structured data with original text quotes, verifying sources, and generating cross-validated... It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.

How do I install Deep Research Pro v2.2?

Run "/install deep-research-v22" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Deep Research Pro v2.2 free?

Yes, Deep Research Pro v2.2 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Deep Research Pro v2.2 support?

Deep Research Pro v2.2 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Deep Research Pro v2.2?

It is built and maintained by xueylee-dotcom (@xueylee-dotcom); the current version is v2.2.0.

💬 Comments