Description

Multi-stage deep research with reflection loops, multi-query retrieval, LLM chunk selection, and citation integrity. Use when: deep research, literature revi...

README (SKILL.md)

Deep Research Pipeline

Name: Deep Research Pipeline
Author: vardhineediganesh877-ui

Deep Research Pipeline turns broad questions into cited, publication-quality reports through a staged research workflow: planning, multi-query retrieval, chunk selection, analysis, reflection, writing, and optional verification.

It is designed for research that should not be answered from memory or a single search result. The pipeline keeps claims tied to sources, surfaces contradictions, tracks gaps, and can resume from checkpoints.

Why Use It

Multi-stage research, not one-shot summarization — separate researcher, analyst, reflection, and writer stages.
Citation integrity — findings and final claims trace back to URLs/sources.
Reflection loops — the pipeline checks coverage and decides whether another cycle is needed.
Portable LLM config — supports LLM_API_KEY/LLM_API_BASE, OpenAI-compatible endpoints, or Z.AI GLM.
Operational controls — checkpoint/resume, time limits, token budgets, output formats, and mock mode.

When to Use

Deep research, comprehensive analysis, literature reviews, competitive analysis, fact-checking, technology deep-dives — anything needing multiple sources, synthesis, and verified citations.

Quick Start

cd skills/deep-research

# Optional: configure any OpenAI-compatible provider
export LLM_API_KEY="your-key"
export LLM_API_BASE="https://api.example.com/v1"
export LLM_MODEL="your-model"

# Or use OpenAI-compatible env names
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.example.com/v1"

# Run a report
python3 scripts/research_pipeline.py \
  "Compare Vercel, Netlify, and Cloudflare Pages in 2026" \
  --max-cycles 2 \
  --format report \
  --output report.md

# Test without API calls
python3 scripts/research_pipeline.py "test question" --mock --output report.md

If no universal/OpenAI-compatible variables are set, the skill still supports Z.AI via ZAI_API_KEY and ZAI_API_ENDPOINT.

Architecture

ORCHESTRATOR (you)
    │
    ├── Plan → Decompose question into research dimensions
    │
    ├── REFLECTION LOOP (0-8 cycles)
    │   ├── Researcher Agent (parallel) → multi-query search + chunk selection
    │   ├── Analyst Agent → dedupe + themes + contradictions
    │   └── Reflection → coverage check, gap analysis, continue decision
    │
    ├── Writer Agent → polished report (report/summary/brief/json)
    │
    └── Verify (optional) → adversarial fact-check

Key principle: Orchestrator NEVER searches directly. Clean output flows between stages only.

Two Modes

Mode 1: Full Pipeline CLI (Recommended)

Use the enhanced research_pipeline.py for automated end-to-end research:

# Full research with all features
python3 scripts/research_pipeline.py "What is the state of quantum computing in 2026?" \
    --max-cycles 3 \
    --output report.md \
    --format report

# Mock mode (no API calls, for testing)
python3 scripts/research_pipeline.py "test question" --mock --output report.md

# With budget limits
python3 scripts/research_pipeline.py "question" \
    --max-cycles 3 --time-limit 300 --token-limit 40000

# Resume from checkpoint
python3 scripts/research_pipeline.py "question" \
    --resume checkpoint.json --output report.md

# Explicit dimensions
python3 scripts/research_pipeline.py "question" \
    --dimensions architecture benchmarks limitations \
    --output report.md --format summary

CLI Flags:

Flag	Default	Description
`--max-cycles`	3	Max research cycles (1-8)
`--mock`	false	Use mock data, no API calls
`--output` / `-o`	stdout	Output file path
`--format` / `-f`	report	Output format: `report`, `summary`, `brief`, `json`
`--time-limit`	900	Max seconds for entire pipeline
`--token-limit`	60000	Max estimated tokens
`--checkpoint`	none	Save checkpoints to path
`--resume`	none	Resume from checkpoint file
`--dimensions`	auto	Explicit research dimensions
`--no-parallel`	false	Research dimensions sequentially

Output formats:

report — Full markdown: Executive Summary → Key Findings → Detailed Analysis → Contradictions → Gaps → Sources → Methodology
summary — Executive summary + top 5 findings + sources
brief — Bullet-point format for quick scanning
json — Structured JSON with annotated findings and metadata

Mode 2: Orchestrated Sub-Agents (For complex research)

Use when you need fine-grained control over each stage or parallel dimension research with sub-agents.

Workflow (Orchestrated Mode)

Phase 1: Planning

Analyze question, create slug, make memory/research/\x3Cslug>/ directory
Generate research plan with dimensions and questions
Save to plan.md

Phase 2: Research Cycle (repeat up to 8 times)

Step A: Spawn Researcher Agent(s)

Use sessions_spawn with a task brief (NOT the full query):

{
  "dimension": "technical architecture",
  "specific_questions": ["How does X work?", "What are Y's components?"],
  "context_limit": 5000,
  "max_sources": 10
}

Researcher agent does:

Multi-query generation — scripts/query_generator.py produces 3-5 variants
Parallel search — web_search for each variant
Content fetching — web_fetch for top results
LLM chunk selection — scripts/chunk_selector.py scores each chunk (≥0.7)
Context expansion — scripts/context_expander.py fetches surrounding content
Output: JSON findings with citations

Can spawn 2-3 researcher agents in parallel for different dimensions.

Step B: Spawn Analyst Agent

After researcher(s) complete, spawn analyst with their combined output:

Deduplicate overlapping findings
Flag contradictions (explicit + implicit)
Group into thematic clusters
Identify gaps
Output: Cleaned JSON + gap list

Step C: Run Reflection

After analyst completes, run scripts/reflection.py:

What's covered? (themes + confidence scores)
What gaps remain? (unanswered questions)
What contradictions emerged?
New directions discovered?
Should continue? (coverage ≥ 0.8 + minor gaps → stop)

Save reflection to memory/research/\x3Cslug>/reflection-cycle-N.md

Continue Decision

Coverage ≥ 0.8 AND gaps minor → proceed to Phase 3
Major contradictions → spawn targeted researcher
Significant gaps → another researcher cycle
Hard stop at cycle 8

Phase 3: Write Report

Use the Writer Agent (scripts/writer.py) for publication-quality output:

# From Python
from writer import WriterAgent, OutputFormat, write_report

# Generate report using WriterAgent
agent = WriterAgent(use_llm=True)
result = agent.write_report(
    analyst_output,           # from analyst or run_analyst()
    question="What is RAG?",
    fmt=OutputFormat.REPORT,
)

# Or use convenience function
result = write_report(analyst_output, question, fmt="report")

# Save to file
from writer import save_report
save_report(result, "output/report.md")

Report features:

🟢🟡🟠🔴 Confidence indicators on every finding
[source_url] inline citations throughout
⚠️ Contradiction callout boxes where sources disagree
Structured sections: Summary → Findings → Analysis → Contradictions → Gaps → Sources → Methodology
Template-based fallback when no LLM available

Phase 4: Verify (optional sub-agent)

Spawn adversarial verifier:

Anchor every claim to source
Verify URLs with web_fetch
Remove unsourced claims
Save to review.md

Phase 5: Deliver

Fix any FATAL issues from review
Copy to final.md
Write provenance.md (date, cycles, sources, verification status)
Send summary to user

Python API

import sys, os
sys.path.insert(0, os.path.expanduser("~/.openclaw/workspace/skills/deep-research/scripts"))

from research_pipeline import run_enhanced_pipeline

result = run_enhanced_pipeline(
    question="What is the state of quantum computing in 2026?",
    max_cycles=3,
    dimensions=["hardware", "algorithms", "applications", "challenges"],
    mock_mode=False,
    output_format="report",
    time_limit=900,
    token_limit=60000,
    checkpoint_path="checkpoint.json",    # auto-saves progress
    parallel_dimensions=True,             # parallel research per dimension
)

# result["report"] = markdown string
# result["cycles_completed"] = int
# result["final_coverage"] = float (0.0-1.0)
# result["metadata"] = dict with timing, findings count, etc.

Scripts

Script	Purpose	Usage
`research_pipeline.py`	Full pipeline orchestration	`python3 scripts/research_pipeline.py "question" --max-cycles 3`
`query_generator.py`	Generate 3-5 search query variants	`python3 scripts/query_generator.py -q "..."`
`chunk_selector.py`	LLM scores chunks, filters by threshold	`python3 scripts/chunk_selector.py -q "..." -c chunks.json`
`context_expander.py`	Fetch surrounding context for incomplete chunks	`python3 scripts/context_expander.py -s selected.json -q "..."`
`reflection.py`	Mandatory gap/contradiction check	`python3 scripts/reflection.py -q "..." -f findings.json -c 1`
`writer.py`	Publication-quality report generation	`from writer import WriterAgent, write_report`
`analyst.py`	Dedup + themes + contradictions (no API needed)	`from analyst import analyze_findings`
`researcher.py`	Multi-source research orchestration	`from researcher import research, research_dimension`
`research_sources.py`	Search adapters (web, GitHub, docs)	`from research_sources import WebSearchSource`
`fact-checker.py`	Claim extraction + source ranking	`python3 scripts/fact-checker.py "text" --sources '["url1"]'`

All LLM-enabled scripts use the shared provider-agnostic llm_client.py.

Provider resolution order:

LLM_API_KEY + LLM_API_BASE + optional LLM_MODEL
OPENAI_API_KEY + OPENAI_API_BASE / OPENAI_BASE_URL + optional OPENAI_MODEL
ZAI_API_KEY + optional ZAI_API_ENDPOINT / GLM_MODEL

If no key is configured, use --mock for local pipeline testing or rely on scripts with rule-based fallbacks where available.

Examples

Example 1: Quick Competitive Analysis

python3 scripts/research_pipeline.py \
    "Compare Vercel vs Netlify vs Cloudflare Pages features and pricing 2026" \
    --max-cycles 2 \
    --dimensions features pricing performance ecosystem \
    --format summary \
    --output competitive-analysis.md

Example 2: Deep Technology Research

python3 scripts/research_pipeline.py \
    "What is the current state of AI agent frameworks?" \
    --max-cycles 4 \
    --time-limit 600 \
    --token-limit 80000 \
    --checkpoint /tmp/ai-agents-checkpoint.json \
    --format report \
    --output ai-agents-research.md

Example 3: Literature Review (mock mode for testing)

python3 scripts/research_pipeline.py \
    "What does the research say about transformer architecture efficiency?" \
    --mock \
    --max-cycles 3 \
    --format report \
    --output literature-review.md

Example 4: Bullet Brief for Quick Scanning

python3 scripts/research_pipeline.py \
    "What are the latest developments in Rust web frameworks?" \
    --max-cycles 2 \
    --format brief \
    --output rust-web-brief.md

Example 5: JSON Output for Programmatic Use

python3 scripts/research_pipeline.py \
    "What is the market size of edge computing?" \
    --max-cycles 2 \
    --format json \
    --output edge-computing-data.json

Integration with Night Shift

To queue research plans for Night Shift execution:

Create a research plan file:

// memory/research/queued/\x3Cslug>.json
{
  "question": "What is the state of quantum computing in 2026?",
  "max_cycles": 3,
  "dimensions": ["hardware", "algorithms", "applications"],
  "output_format": "report",
  "output_path": "memory/research/quantum-2026/final.md",
  "time_limit": 600,
  "created_at": "2026-04-25T06:00:00Z"
}

Night Shift picks up queued plans and runs them via:

python3 scripts/research_pipeline.py "$QUESTION" \
    --max-cycles $MAX_CYCLES \
    --dimensions $DIMENSIONS \
    --format $FORMAT \
    --output $OUTPUT_PATH \
    --time-limit $TIME_LIMIT

Results are saved to memory/research/\x3Cslug>/final.md with provenance metadata.

File Layout

memory/research/\x3Cslug>/
├── plan.md                    # Research plan with dimensions
├── reflection-cycle-1.md      # Reflection after each cycle
├── reflection-cycle-2.md
├── researcher-output-*.json   # Raw researcher findings
├── analyst-output.json        # Merged/deduped findings
├── draft.md                   # First draft
├── brief.md                   # Verified brief
├── review.md                  # Adversarial review (optional)
├── final.md                   # Final report
├── provenance.md              # Metadata + source verification status
└── checkpoint.json            # Pipeline checkpoint (auto-saved)

Quick Mode

Skip sub-agents and the full pipeline. Do 5-10 searches yourself. Still use evidence tables, verify URLs, cite sources. Shorter, inline in chat.

Integrity Commandments

Never fabricate a source — no URL = don't mention it
Never claim existence without checking
Never extrapolate unread details
Read before summarizing
No fake certainty — never say "verified" unless checked
Never invent numbers/benchmarks/comparisons
Separate observations from inferences
Every claim traces to a source — citation integrity is mandatory
Reflection is not optional — run it after every cycle
Stage separation — orchestrator never searches, researchers never see full plan

Scale Decision

Single fact → Quick Mode (3-10 tool calls, no sub-agents)
2-3 item comparison → 2 parallel researcher sub-agents, 2-3 cycles
Broad/multi-faceted → 3-4 researcher sub-agents, 3-5 cycles
PhD-level deep dive → 4+ researchers, 5-8 cycles

This skill appears to implement the research pipeline it advertises, but note two practical risks before installing: (1) the registry metadata omits that an LLM API key/endpoint is required for normal (non-mock) operation — you'll need to provide LLM_API_KEY / OPENAI_API_KEY / ZAI_API_KEY and the corresponding base URL; (2) the scripts perform broad outbound network activity (searches, arbitrary page fetches, and LLM API calls) and write checkpoints/reports to disk. Recommended precautions: run first in --mock mode to inspect outputs; review llm_client.py and web-fetching code to ensure endpoints and User-Agent are acceptable; run in an isolated environment (container or VM) if you are concerned about network/file access; avoid supplying high-privilege or unrelated credentials; and audit any additional omitted files (remaining truncated files) before trusting with sensitive data. If you need the skill to run without network access, it won't be able to perform real research without an LLM endpoint and web access.

Capability Analysis

Type: OpenClaw Skill Name: deep-research-pipeline Version: 2.0.1 The 'deep-research-pipeline' skill is a sophisticated research tool designed to perform multi-stage information gathering and synthesis. It utilizes standard Python libraries (primarily 'urllib.request', 'json', and 're') to execute web searches via DuckDuckGo, fetch content from arbitrary URLs, and interact with OpenAI-compatible LLM APIs. The codebase is well-structured, featuring modular components for query generation, relevance filtering, and automated reflection loops. No indicators of malicious intent, such as data exfiltration, unauthorized command execution, or persistence mechanisms, were found; all network and file operations are consistent with the stated purpose of deep web research and report generation.

Capability Tags

requires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

The name/description (deep research pipeline) align with the included Python modules (query generation, web search/fetch, chunk scoring, reflection, writer). However the registry lists 'Required env vars: none' while SKILL.md and llm_client.py clearly expect LLM credentials (LLM_API_KEY / OPENAI_API_KEY / ZAI_API_KEY) or a mock flag. The need for LLM keys is reasonable for an LLM-driven research pipeline, but the metadata omission is an inconsistency the user should note.

✓ Instruction Scope

SKILL.md and the scripts instruct the agent to: generate queries, perform web searches and fetch page content, call LLM endpoints, score chunks, expand context by refetching URLs, analyze locally, write checkpoints/reports to memory/research/<slug>/, and optionally run adversarial fact-checks. All of these actions are coherent with its stated purpose. No instructions ask the agent to read arbitrary local user files or unrelated credentials, but the pipeline does write to disk (checkpoints, plan.md, report files) and performs arbitrary outbound HTTP(S) fetches to sources discovered in searches.

✓ Install Mechanism

There is no install spec (instruction-only in registry) and all code is provided as Python scripts. No external download or package install steps are present in the manifest. This is lower-risk than a skill that downloads and executes remote binaries. Users will need a Python runtime and any required libraries, but the repository appears self-contained and uses only stdlib modules in the provided files.

⚠ Credentials

The registry claims no required environment variables, but the runtime documentation and llm_client.py require one of several LLM API keys (LLM_API_KEY, OPENAI_API_KEY, or ZAI_API_KEY) unless running in --mock mode. Requesting LLM credentials is proportionate to an LLM-driven research tool, but the metadata omission is misleading. Additionally, the skill will use those credentials to make network requests to the configured LLM endpoint and will fetch arbitrary web URLs discovered during searches; there are no other unrelated credential requests in code.

✓ Persistence & Privilege

The skill does not request persistent platform privileges (always:false) and does not declare editing other skills or global agent settings. It writes checkpoints, plan files, and reports to a project-specific path (memory/research/<slug>/) which is normal for a pipeline that supports resume/checkpointing.

Version History

v2.0.1

v2.0.1: polish ClawHub-facing docs with quick start, clearer value proposition, provider-agnostic LLM configuration, and remove local test cache artifacts.

v2.0.0

v2.0.0: rebuilt as a multi-stage research pipeline with reflection loops, shared provider-agnostic LLM client, checkpoint/resume, parallel dimension research, and publication-quality output.

Metadata

Slug deep-research-pipeline

Version 2.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Deep Research Pipeline?

Multi-stage deep research with reflection loops, multi-query retrieval, LLM chunk selection, and citation integrity. Use when: deep research, literature revi... It is an AI Agent Skill for Claude Code / OpenClaw, with 53 downloads so far.

How do I install Deep Research Pipeline?

Run "/install deep-research-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Deep Research Pipeline free?

Yes, Deep Research Pipeline is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Deep Research Pipeline support?

Deep Research Pipeline is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Deep Research Pipeline?

It is built and maintained by vardhineediganesh877-ui (@vardhineediganesh877-ui); the current version is v2.0.1.

More Skills

Deep Research Pipeline

Deep Research Pipeline

Why Use It

When to Use

Quick Start

Architecture

Two Modes

Mode 1: Full Pipeline CLI (Recommended)

Mode 2: Orchestrated Sub-Agents (For complex research)

Workflow (Orchestrated Mode)

Phase 1: Planning

Phase 2: Research Cycle (repeat up to 8 times)

Step A: Spawn Researcher Agent(s)

Step B: Spawn Analyst Agent

Step C: Run Reflection

Continue Decision

Phase 3: Write Report

Phase 4: Verify (optional sub-agent)

Phase 5: Deliver

Python API

Scripts

Examples

Example 1: Quick Competitive Analysis

Example 2: Deep Technology Research

Example 3: Literature Review (mock mode for testing)

Example 4: Bullet Brief for Quick Scanning

Example 5: JSON Output for Programmatic Use

Integration with Night Shift

File Layout

Quick Mode

Integrity Commandments

Scale Decision

See Also

What is Deep Research Pipeline?

How do I install Deep Research Pipeline?

Is Deep Research Pipeline free?

Which platforms does Deep Research Pipeline support?

Who created Deep Research Pipeline?

💬 Comments