功能描述

Multi-stage deep research with reflection loops, multi-query retrieval, LLM chunk selection, and citation integrity. Use when: deep research, literature revi...

使用说明 (SKILL.md)

Deep Research Pipeline

Name: Deep Research Pipeline
Author: vardhineediganesh877-ui

Deep Research Pipeline turns broad questions into cited, publication-quality reports through a staged research workflow: planning, multi-query retrieval, chunk selection, analysis, reflection, writing, and optional verification.

It is designed for research that should not be answered from memory or a single search result. The pipeline keeps claims tied to sources, surfaces contradictions, tracks gaps, and can resume from checkpoints.

Why Use It

Multi-stage research, not one-shot summarization — separate researcher, analyst, reflection, and writer stages.
Citation integrity — findings and final claims trace back to URLs/sources.
Reflection loops — the pipeline checks coverage and decides whether another cycle is needed.
Portable LLM config — supports LLM_API_KEY/LLM_API_BASE, OpenAI-compatible endpoints, or Z.AI GLM.
Operational controls — checkpoint/resume, time limits, token budgets, output formats, and mock mode.

When to Use

Deep research, comprehensive analysis, literature reviews, competitive analysis, fact-checking, technology deep-dives — anything needing multiple sources, synthesis, and verified citations.

Quick Start

cd skills/deep-research

# Optional: configure any OpenAI-compatible provider
export LLM_API_KEY="your-key"
export LLM_API_BASE="https://api.example.com/v1"
export LLM_MODEL="your-model"

# Or use OpenAI-compatible env names
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.example.com/v1"

# Run a report
python3 scripts/research_pipeline.py \
  "Compare Vercel, Netlify, and Cloudflare Pages in 2026" \
  --max-cycles 2 \
  --format report \
  --output report.md

# Test without API calls
python3 scripts/research_pipeline.py "test question" --mock --output report.md

If no universal/OpenAI-compatible variables are set, the skill still supports Z.AI via ZAI_API_KEY and ZAI_API_ENDPOINT.

Architecture

ORCHESTRATOR (you)
    │
    ├── Plan → Decompose question into research dimensions
    │
    ├── REFLECTION LOOP (0-8 cycles)
    │   ├── Researcher Agent (parallel) → multi-query search + chunk selection
    │   ├── Analyst Agent → dedupe + themes + contradictions
    │   └── Reflection → coverage check, gap analysis, continue decision
    │
    ├── Writer Agent → polished report (report/summary/brief/json)
    │
    └── Verify (optional) → adversarial fact-check

Key principle: Orchestrator NEVER searches directly. Clean output flows between stages only.

Two Modes

Mode 1: Full Pipeline CLI (Recommended)

Use the enhanced research_pipeline.py for automated end-to-end research:

# Full research with all features
python3 scripts/research_pipeline.py "What is the state of quantum computing in 2026?" \
    --max-cycles 3 \
    --output report.md \
    --format report

# Mock mode (no API calls, for testing)
python3 scripts/research_pipeline.py "test question" --mock --output report.md

# With budget limits
python3 scripts/research_pipeline.py "question" \
    --max-cycles 3 --time-limit 300 --token-limit 40000

# Resume from checkpoint
python3 scripts/research_pipeline.py "question" \
    --resume checkpoint.json --output report.md

# Explicit dimensions
python3 scripts/research_pipeline.py "question" \
    --dimensions architecture benchmarks limitations \
    --output report.md --format summary

CLI Flags:

Flag	Default	Description
`--max-cycles`	3	Max research cycles (1-8)
`--mock`	false	Use mock data, no API calls
`--output` / `-o`	stdout	Output file path
`--format` / `-f`	report	Output format: `report`, `summary`, `brief`, `json`
`--time-limit`	900	Max seconds for entire pipeline
`--token-limit`	60000	Max estimated tokens
`--checkpoint`	none	Save checkpoints to path
`--resume`	none	Resume from checkpoint file
`--dimensions`	auto	Explicit research dimensions
`--no-parallel`	false	Research dimensions sequentially

Output formats:

report — Full markdown: Executive Summary → Key Findings → Detailed Analysis → Contradictions → Gaps → Sources → Methodology
summary — Executive summary + top 5 findings + sources
brief — Bullet-point format for quick scanning
json — Structured JSON with annotated findings and metadata

Mode 2: Orchestrated Sub-Agents (For complex research)

Use when you need fine-grained control over each stage or parallel dimension research with sub-agents.

Workflow (Orchestrated Mode)

Phase 1: Planning

Analyze question, create slug, make memory/research/\x3Cslug>/ directory
Generate research plan with dimensions and questions
Save to plan.md

Phase 2: Research Cycle (repeat up to 8 times)

Step A: Spawn Researcher Agent(s)

Use sessions_spawn with a task brief (NOT the full query):

{
  "dimension": "technical architecture",
  "specific_questions": ["How does X work?", "What are Y's components?"],
  "context_limit": 5000,
  "max_sources": 10
}

Researcher agent does:

Multi-query generation — scripts/query_generator.py produces 3-5 variants
Parallel search — web_search for each variant
Content fetching — web_fetch for top results
LLM chunk selection — scripts/chunk_selector.py scores each chunk (≥0.7)
Context expansion — scripts/context_expander.py fetches surrounding content
Output: JSON findings with citations

Can spawn 2-3 researcher agents in parallel for different dimensions.

Step B: Spawn Analyst Agent

After researcher(s) complete, spawn analyst with their combined output:

Deduplicate overlapping findings
Flag contradictions (explicit + implicit)
Group into thematic clusters
Identify gaps
Output: Cleaned JSON + gap list

Step C: Run Reflection

After analyst completes, run scripts/reflection.py:

What's covered? (themes + confidence scores)
What gaps remain? (unanswered questions)
What contradictions emerged?
New directions discovered?
Should continue? (coverage ≥ 0.8 + minor gaps → stop)

Save reflection to memory/research/\x3Cslug>/reflection-cycle-N.md

Continue Decision

Coverage ≥ 0.8 AND gaps minor → proceed to Phase 3
Major contradictions → spawn targeted researcher
Significant gaps → another researcher cycle
Hard stop at cycle 8

Phase 3: Write Report

Use the Writer Agent (scripts/writer.py) for publication-quality output:

# From Python
from writer import WriterAgent, OutputFormat, write_report

# Generate report using WriterAgent
agent = WriterAgent(use_llm=True)
result = agent.write_report(
    analyst_output,           # from analyst or run_analyst()
    question="What is RAG?",
    fmt=OutputFormat.REPORT,
)

# Or use convenience function
result = write_report(analyst_output, question, fmt="report")

# Save to file
from writer import save_report
save_report(result, "output/report.md")

Report features:

🟢🟡🟠🔴 Confidence indicators on every finding
[source_url] inline citations throughout
⚠️ Contradiction callout boxes where sources disagree
Structured sections: Summary → Findings → Analysis → Contradictions → Gaps → Sources → Methodology
Template-based fallback when no LLM available

Phase 4: Verify (optional sub-agent)

Spawn adversarial verifier:

Anchor every claim to source
Verify URLs with web_fetch
Remove unsourced claims
Save to review.md

Phase 5: Deliver

Fix any FATAL issues from review
Copy to final.md
Write provenance.md (date, cycles, sources, verification status)
Send summary to user

Python API

import sys, os
sys.path.insert(0, os.path.expanduser("~/.openclaw/workspace/skills/deep-research/scripts"))

from research_pipeline import run_enhanced_pipeline

result = run_enhanced_pipeline(
    question="What is the state of quantum computing in 2026?",
    max_cycles=3,
    dimensions=["hardware", "algorithms", "applications", "challenges"],
    mock_mode=False,
    output_format="report",
    time_limit=900,
    token_limit=60000,
    checkpoint_path="checkpoint.json",    # auto-saves progress
    parallel_dimensions=True,             # parallel research per dimension
)

# result["report"] = markdown string
# result["cycles_completed"] = int
# result["final_coverage"] = float (0.0-1.0)
# result["metadata"] = dict with timing, findings count, etc.

Scripts

Script	Purpose	Usage
`research_pipeline.py`	Full pipeline orchestration	`python3 scripts/research_pipeline.py "question" --max-cycles 3`
`query_generator.py`	Generate 3-5 search query variants	`python3 scripts/query_generator.py -q "..."`
`chunk_selector.py`	LLM scores chunks, filters by threshold	`python3 scripts/chunk_selector.py -q "..." -c chunks.json`
`context_expander.py`	Fetch surrounding context for incomplete chunks	`python3 scripts/context_expander.py -s selected.json -q "..."`
`reflection.py`	Mandatory gap/contradiction check	`python3 scripts/reflection.py -q "..." -f findings.json -c 1`
`writer.py`	Publication-quality report generation	`from writer import WriterAgent, write_report`
`analyst.py`	Dedup + themes + contradictions (no API needed)	`from analyst import analyze_findings`
`researcher.py`	Multi-source research orchestration	`from researcher import research, research_dimension`
`research_sources.py`	Search adapters (web, GitHub, docs)	`from research_sources import WebSearchSource`
`fact-checker.py`	Claim extraction + source ranking	`python3 scripts/fact-checker.py "text" --sources '["url1"]'`

All LLM-enabled scripts use the shared provider-agnostic llm_client.py.

Provider resolution order:

LLM_API_KEY + LLM_API_BASE + optional LLM_MODEL
OPENAI_API_KEY + OPENAI_API_BASE / OPENAI_BASE_URL + optional OPENAI_MODEL
ZAI_API_KEY + optional ZAI_API_ENDPOINT / GLM_MODEL

If no key is configured, use --mock for local pipeline testing or rely on scripts with rule-based fallbacks where available.

Examples

Example 1: Quick Competitive Analysis

python3 scripts/research_pipeline.py \
    "Compare Vercel vs Netlify vs Cloudflare Pages features and pricing 2026" \
    --max-cycles 2 \
    --dimensions features pricing performance ecosystem \
    --format summary \
    --output competitive-analysis.md

Example 2: Deep Technology Research

python3 scripts/research_pipeline.py \
    "What is the current state of AI agent frameworks?" \
    --max-cycles 4 \
    --time-limit 600 \
    --token-limit 80000 \
    --checkpoint /tmp/ai-agents-checkpoint.json \
    --format report \
    --output ai-agents-research.md

Example 3: Literature Review (mock mode for testing)

python3 scripts/research_pipeline.py \
    "What does the research say about transformer architecture efficiency?" \
    --mock \
    --max-cycles 3 \
    --format report \
    --output literature-review.md

Example 4: Bullet Brief for Quick Scanning

python3 scripts/research_pipeline.py \
    "What are the latest developments in Rust web frameworks?" \
    --max-cycles 2 \
    --format brief \
    --output rust-web-brief.md

Example 5: JSON Output for Programmatic Use

python3 scripts/research_pipeline.py \
    "What is the market size of edge computing?" \
    --max-cycles 2 \
    --format json \
    --output edge-computing-data.json

Integration with Night Shift

To queue research plans for Night Shift execution:

Create a research plan file:

// memory/research/queued/\x3Cslug>.json
{
  "question": "What is the state of quantum computing in 2026?",
  "max_cycles": 3,
  "dimensions": ["hardware", "algorithms", "applications"],
  "output_format": "report",
  "output_path": "memory/research/quantum-2026/final.md",
  "time_limit": 600,
  "created_at": "2026-04-25T06:00:00Z"
}

Night Shift picks up queued plans and runs them via:

python3 scripts/research_pipeline.py "$QUESTION" \
    --max-cycles $MAX_CYCLES \
    --dimensions $DIMENSIONS \
    --format $FORMAT \
    --output $OUTPUT_PATH \
    --time-limit $TIME_LIMIT

Results are saved to memory/research/\x3Cslug>/final.md with provenance metadata.

File Layout

memory/research/\x3Cslug>/
├── plan.md                    # Research plan with dimensions
├── reflection-cycle-1.md      # Reflection after each cycle
├── reflection-cycle-2.md
├── researcher-output-*.json   # Raw researcher findings
├── analyst-output.json        # Merged/deduped findings
├── draft.md                   # First draft
├── brief.md                   # Verified brief
├── review.md                  # Adversarial review (optional)
├── final.md                   # Final report
├── provenance.md              # Metadata + source verification status
└── checkpoint.json            # Pipeline checkpoint (auto-saved)

Quick Mode

Skip sub-agents and the full pipeline. Do 5-10 searches yourself. Still use evidence tables, verify URLs, cite sources. Shorter, inline in chat.

Integrity Commandments

Never fabricate a source — no URL = don't mention it
Never claim existence without checking
Never extrapolate unread details
Read before summarizing
No fake certainty — never say "verified" unless checked
Never invent numbers/benchmarks/comparisons
Separate observations from inferences
Every claim traces to a source — citation integrity is mandatory
Reflection is not optional — run it after every cycle
Stage separation — orchestrator never searches, researchers never see full plan

Scale Decision

Single fact → Quick Mode (3-10 tool calls, no sub-agents)
2-3 item comparison → 2 parallel researcher sub-agents, 2-3 cycles
Broad/multi-faceted → 3-4 researcher sub-agents, 3-5 cycles
PhD-level deep dive → 4+ researchers, 5-8 cycles

This skill appears to implement the research pipeline it advertises, but note two practical risks before installing: (1) the registry metadata omits that an LLM API key/endpoint is required for normal (non-mock) operation — you'll need to provide LLM_API_KEY / OPENAI_API_KEY / ZAI_API_KEY and the corresponding base URL; (2) the scripts perform broad outbound network activity (searches, arbitrary page fetches, and LLM API calls) and write checkpoints/reports to disk. Recommended precautions: run first in --mock mode to inspect outputs; review llm_client.py and web-fetching code to ensure endpoints and User-Agent are acceptable; run in an isolated environment (container or VM) if you are concerned about network/file access; avoid supplying high-privilege or unrelated credentials; and audit any additional omitted files (remaining truncated files) before trusting with sensitive data. If you need the skill to run without network access, it won't be able to perform real research without an LLM endpoint and web access.

功能分析

Type: OpenClaw Skill Name: deep-research-pipeline Version: 2.0.1 The 'deep-research-pipeline' skill is a sophisticated research tool designed to perform multi-stage information gathering and synthesis. It utilizes standard Python libraries (primarily 'urllib.request', 'json', and 're') to execute web searches via DuckDuckGo, fetch content from arbitrary URLs, and interact with OpenAI-compatible LLM APIs. The codebase is well-structured, featuring modular components for query generation, relevance filtering, and automated reflection loops. No indicators of malicious intent, such as data exfiltration, unauthorized command execution, or persistence mechanisms, were found; all network and file operations are consistent with the stated purpose of deep web research and report generation.

能力标签

requires-sensitive-credentials

能力评估

ℹ Purpose & Capability

The name/description (deep research pipeline) align with the included Python modules (query generation, web search/fetch, chunk scoring, reflection, writer). However the registry lists 'Required env vars: none' while SKILL.md and llm_client.py clearly expect LLM credentials (LLM_API_KEY / OPENAI_API_KEY / ZAI_API_KEY) or a mock flag. The need for LLM keys is reasonable for an LLM-driven research pipeline, but the metadata omission is an inconsistency the user should note.

✓ Instruction Scope

SKILL.md and the scripts instruct the agent to: generate queries, perform web searches and fetch page content, call LLM endpoints, score chunks, expand context by refetching URLs, analyze locally, write checkpoints/reports to memory/research/<slug>/, and optionally run adversarial fact-checks. All of these actions are coherent with its stated purpose. No instructions ask the agent to read arbitrary local user files or unrelated credentials, but the pipeline does write to disk (checkpoints, plan.md, report files) and performs arbitrary outbound HTTP(S) fetches to sources discovered in searches.

✓ Install Mechanism

There is no install spec (instruction-only in registry) and all code is provided as Python scripts. No external download or package install steps are present in the manifest. This is lower-risk than a skill that downloads and executes remote binaries. Users will need a Python runtime and any required libraries, but the repository appears self-contained and uses only stdlib modules in the provided files.

⚠ Credentials

The registry claims no required environment variables, but the runtime documentation and llm_client.py require one of several LLM API keys (LLM_API_KEY, OPENAI_API_KEY, or ZAI_API_KEY) unless running in --mock mode. Requesting LLM credentials is proportionate to an LLM-driven research tool, but the metadata omission is misleading. Additionally, the skill will use those credentials to make network requests to the configured LLM endpoint and will fetch arbitrary web URLs discovered during searches; there are no other unrelated credential requests in code.

✓ Persistence & Privilege

The skill does not request persistent platform privileges (always:false) and does not declare editing other skills or global agent settings. It writes checkpoints, plan files, and reports to a project-specific path (memory/research/<slug>/) which is normal for a pipeline that supports resume/checkpointing.

版本历史

v2.0.1

v2.0.1: polish ClawHub-facing docs with quick start, clearer value proposition, provider-agnostic LLM configuration, and remove local test cache artifacts.

v2.0.0

v2.0.0: rebuilt as a multi-stage research pipeline with reflection loops, shared provider-agnostic LLM client, checkpoint/resume, parallel dimension research, and publication-quality output.

元数据

Slug deep-research-pipeline

版本 2.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题