← 返回 Skills 市场

Deep Research Pro v2.2

Name: Deep Research Pro v2.2
Author: xueylee-dotcom

作者 xueylee-dotcom · GitHub ↗ · v2.2.0 · MIT-0

cross-platform ✓ 安全检测通过

147

总下载

当前安装

版本数

在 OpenClaw 中安装

/install deep-research-v22

功能描述

Conducts thorough research by downloading full PDFs, extracting structured data with original text quotes, verifying sources, and generating cross-validated...

使用说明 (SKILL.md)

Skill: Deep Research Pro (v2.2 - True Depth)

版本：2.2.0 描述：真深度研究技能，强制全文解析+结构化提取+溯源验证

核心原则

没有真正的原文阅读，就没有深度研究

🔴 强制执行流程（v2.2 新增）

Step 1: 研究规划 (必须输出文件)

生成 research/plan.md
列出至少 5 个具体检索查询式
用户确认后才能继续

Step 2: 全文解析 + 结构化提取 (核心！)

禁止跳过此步骤！

对于每个有效来源，必须执行：

获取全文

# 使用 extract-from-pdf.py 脚本
python3 scripts/extract-from-pdf.py card-001 "https://arxiv.org/pdf/xxx.pdf"

如果有DOI/URL，尝试下载PDF
如果无法获取全文，标记 full_text: false 并跳过该来源

结构化提取（从PDF原文提取）
- 样本量：具体数字
- 主要结果：具体数值 + 单位 + 统计显著性
- 成本影响：具体金额/百分比
- 置信区间：95%CI
- 原文引用：必须从正文中复制至少50字
更新卡片
- 用提取的真实数据替换"待提取"
- 标记 full_text: true/false

最低要求：

deep 模式：至少 10 个带全文提取的卡片
质量阈值：提取后评分 ≥ 6/10

Step 3: 溯源验证 (强制检查！)

生成报告前必须运行：

bash scripts/check-sourcing.sh reports/final-report.md sources/

检查每个 [[card-xxx]] 引用的数据是否在卡片中存在
如果有数据无法溯源，拒绝生成报告
修复后重新验证

Step 4: 交叉分析

生成 analysis/synthesis.md
至少找出 3 组矛盾数据
标注每个观点的卡片来源

Step 5: 报告生成

生成 reports/final-report.md
每个数据点必须标注 [[card-xxx]]
报告末附溯源检查结果

🔧 工具依赖

工具	用途	状态
pdfplumber	PDF全文解析	✅ 已安装
pdftotext	PDF备用解析	✅ 已安装
extract-from-pdf.py	结构化数据提取	✅ 已创建
check-sourcing.sh	溯源验证	✅ 已创建

📋 执行命令

完整流程

# Step 1: 规划
# 编辑 research/plan.md，确认检索式

# Step 2: 检索 + 提取（循环执行）
# 对于每个来源：
python3 scripts/extract-from-pdf.py card-001 "URL"
# 检查提取结果，填入卡片

# Step 3: 溯源验证
bash scripts/check-sourcing.sh reports/final-report.md sources/

# Step 4-5: 分析与报告
# 生成最终报告

⚠️ 限制说明

如果无法获取全文（付费论文/报告）：

标记卡片 full_text: false
报告中对该来源的数据仅作参考，不作为核心结论
建议人工复核关键数据

📊 版本对比

维度	v2.1	v2.2
PDF解析	❌	✅ 强制
数据提取	"待提取"	✅ 真实提取
原文引用	模板话术	✅ 从正文复制
溯源检查	❌	✅ 强制验证
报告质量	有引用无验证	有引用+验证

质量门禁（v2.2 强化版）

# 1. 检查卡片数量（≥10个有全文的）
FULLTEXT_COUNT=$(grep -l "full_text: true" sources/card-*.md 2>/dev/null | wc -l)
if [ $FULLTEXT_COUNT -lt 10 ]; then
 echo "❌ 错误：全文提取卡片不足10个，当前 $FULLTEXT_COUNT 个"
 exit 1
fi

# 2. 检查溯源
bash scripts/check-sourcing.sh reports/final-report.md sources/
if [ $? -ne 0 ]; then
 echo "❌ 错误：报告中有数据无法溯源"
 exit 1
fi

# 3. 检查待提取标记
if grep -q "待提取" sources/card-*.md; then
 echo "❌ 错误：卡片中仍有'待提取'数据"
 exit 1
fi

Skill版本：2.2.0 | 最后更新：2026-03-19

安全使用建议

This skill appears coherent with its stated purpose, but take these precautions before installing or executing it: - Ensure your environment has Python3, pdfplumber (pip) or pdftotext + poppler installed; the package/dependency requirements are not declared by the registry. - Run the workflow in an isolated environment (VM/container) because it downloads and opens arbitrary PDFs — malformed PDFs can exploit local parsers. - Review the code (extract-from-pdf.py and check-sourcing.sh) yourself if you can; they perform network downloads and local file writes but contain no hidden exfiltration. Verify the User-Agent and URL handling if you have network-policy constraints. - Be aware the skill enforces copying verbatim 50+ character quotes from sources — consider copyright/privacy rules when including such text in generated reports. - Confirm grep -P availability (the shell script uses PCRE patterns) or adjust scripts for your environment. If you need higher assurance, request the author add an explicit install spec (pip/apt/poppler), declare required binaries, and add input validation/sanitization for PDF URLs before running.

功能分析

Type: OpenClaw Skill Name: deep-research-v22 Version: 2.2.0 The skill bundle provides a structured framework for automated deep research, including PDF data extraction, quality scoring, and citation verification. The included scripts (extract-from-pdf.py, check-sourcing.sh, quality-score.py) perform tasks consistent with the stated purpose, and the instructions in SKILL.md use authoritative language to enforce a rigorous research methodology rather than to facilitate prompt injection or unauthorized actions.

能力评估

✓ Purpose & Capability

The name/description (full‑text extraction, quoting, source verification) align with the included scripts (extract-from-pdf.py, check-sourcing.sh, quality-score.py) and templates. There are no unrelated credentials or unusual binaries requested.

ℹ Instruction Scope

Instructions explicitly require downloading arbitrary PDF URLs, extracting text, copying 50+ character verbatim quotes from source, and running a provenance check; these are coherent with the stated goal. Notes of caution: downloading/parsing arbitrary PDFs can expose the host to malformed/malicious PDFs; the workflow promotes copying verbatim excerpts which may have copyright implications; check-sourcing.sh assumes report and source file layout and relies on grep patterns.

⚠ Install Mechanism

There is no install spec even though the code depends on Python and either pdfplumber (a pip package) or pdftotext (and likely the poppler system library). SKILL.md lists those tools as '已安装' but the registry metadata declares no required binaries/dependencies — this mismatch means the agent or user must ensure the environment has the needed packages. Also check that grep -P (PCRE) is available where the scripts run.

✓ Credentials

The skill requests no environment variables or credentials and the scripts do not read secrets or other env vars. Network access to arbitrary URLs is required for the intended purpose; no external endpoints or hidden exfiltration channels are present in the code.

✓ Persistence & Privilege

The skill is not always-on and does not request persistent system privileges or alter other skills. It writes temporary files (uses /tmp) and deletes them; that is normal for its function.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install deep-research-v22
安装完成后，直接呼叫该 Skill 的名称或使用 /deep-research-v22 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v2.2.0

Deep Research Pro v2.2.0 introduces strict full-text extraction, structured data requirements, and enforced source verification: - Mandatory full-text parsing and structured data extraction for all sources; sources without full text are excluded from conclusions. - New planning step: users must confirm at least 5 concrete search queries (`research/plan.md`) before proceeding. - Only sources with confirmed full-text extraction (≥10) count towards results; each must include copied quotes, concrete data, and scoring. - Automatic provenance check script is now required before report generation; reports with untraceable data are blocked. - Enhanced cross-analysis: find and document at least 3 sets of conflicting data, clearly citing source cards. - Strengthened quality gates: rejects incomplete extractions, missing provenance, or insufficient card count.

元数据

Slug deep-research-v22

版本 2.2.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题