功能描述

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily...

使用说明 (SKILL.md)

InfoSeek - Deep Web Search & Archival

Name: InfoSeek
Author: expeditionhub

Overview

InfoSeek performs comprehensive web research on any subject (person, organization, product) across multiple search engines, deduplicates results, extracts clean content, and archives everything with full metadata in organized folders.

Prerequisites

Before executing a search task, verify these skills are installed:

import os
from pathlib import Path

workspace = os.environ.get('OPENCLAW_WORKSPACE')
skills_dir = Path(workspace) / 'skills'

required = ['baidu-search', 'tavily', 'Multi-Search-Engine', 'agent-browser-clawdbot-0.1.0']
missing = [s for s in required if not (skills_dir / s).exists()]

If any are missing, instruct the user to install them:

openclaw skills install baidu-search
openclaw skills install tavily-search
openclaw skills install multi-search-engine

Workflow

Phase 0: Task Setup

Confirm the search subject — name, organization, or product
Collect optional context — background info, time range, output format (default: .md), special requirements
Check dependencies — run the prerequisite check above

Create archive folder — run:

python scripts/infoseek_helper.py create-folder "\x3Csubject_name>"

Phase 1: Multi-Engine Deep Search

Execute searches across all available engines. Each engine runs independently.

1.1 Baidu Search (100+ pages)

Use the baidu-search skill:

Query: "\x3Csubject> \x3Cbackground_context>"
Depth: 100+ pages
Record: URL, title, website name, publish date for each result

1.2 Tavily Search

Use tavily_search tool:

query: "\x3Csubject> \x3Cbackground_context>"
search_depth: advanced
max_results: 50

1.3 Multi-Search-Engine

Use the multi-search-engine skill across multiple engines simultaneously.

1.4 Browser Deep-Crawl

For discovered URLs, use the browser tool to:

Open each page
Extract body content (filter ads, sidebars, comments)
Extract metadata: title, author, editor, date, website name

Phase 2: Deduplication

Run URL deduplication on all collected results:

python scripts/infoseek_helper.py deduplicate "\x3Ctemp_results_file>"

The script normalizes URLs (remove www, tracking params, unify http/https, remove trailing slashes) and checks against the SQLite database to skip duplicates.

Phase 3: Content Extraction & Storage

For each unique URL:

Extract content using the browser tool — get title, body, metadata
Filter content — remove ads, sidebars, navigation, comments, related articles, footers

Generate filename:

python scripts/infoseek_helper.py generate-filename \
  --date "\x3CYYYYMMDD>" --title "\x3Ctitle>" --website "\x3Csite>" --format "\x3Cext>"

Format: YYYYMMDD-title-website.ext

Save the file:

python scripts/infoseek_helper.py save-content \
  --folder "\x3Carchive_path>" --filename "\x3Cname>" --url "\x3Curl>" \
  --website "\x3Csite>" --source "\x3Csource>" --date "\x3Cdate>" \
  --title "\x3Ctitle>" --author "\x3Cauthor>" --editor "\x3Ceditor>" \
  --content "\x3Cbody>" --task "\x3Csubject>"

Record in database:

python scripts/infoseek_helper.py add-url \
  --url "\x3Cnormalized_url>" --task "\x3Csubject>" --filename "\x3Cname>"

Phase 4: Task Report

Output a summary when complete:

InfoSeek Task Report
====================
Subject: {query}
Engines used: {engines}
Total found: {total} | Duplicates skipped: {dupes} | New archived: {new}
Files saved: {count}
Location: {path}
Database records: {db_total}

File Naming

Format: YYYYMMDD-title-website.ext

Date: 8 digits (YYYYMMDD) from page metadata
Title: page title (strip special chars \x3C>:"/\|?*)
Website: domain or media name
Extension: md (default), json, txt, csv, xlsx, html, docx

If filename exists, append 8-char hash to prevent overwrites.

Output Formats

All formats include full metadata (URL, website, source, date, title, author, editor) plus body content.

.md — Markdown with metadata table
.json — Structured JSON with metadata object and content field
.txt — Plain text with header metadata
.csv — One row per article, all metadata as columns
.xlsx — Excel spreadsheet with metadata columns
.html — Styled HTML page with metadata table
.docx — Word document with metadata paragraph

Storage Structure

{workspace}/
├── infoseek-archives/
│   ├── \x3Csubject_1>/
│   │   ├── 20260404-title-website.md
│   │   └── ...
│   └── \x3Csubject_2>/
└── infoseek/
    ├── infoseek.db          # SQLite dedup database
    ├── infoseek.log         # Operation log
    └── backups/

Deletion Policy

Strict data retention — no permanent deletes without confirmation.

Operation	Confirmation	Method
Bulk folder delete	Required	Move to recycle bin
Single file delete	Required	Move to recycle bin
Dedup skip	Automatic	Skip only, no delete
Database cleanup	Required	Mark as deleted

Process:

List files to delete (name, URL, date)
Ask user: "Confirm deletion? Files go to recycle bin and can be recovered."
On confirmation, move to recycle bin (Windows: PowerShell, Mac/Linux: system trash)
Update database, log the deletion, confirm to user

Never:

Delete without user consent
Permanently delete (bypass recycle bin)
Delete without logging
Delete without updating database

Configuration

Override defaults in task instructions:

Search depth: default 100 pages, specify e.g. "150 pages"
Time range: default unlimited, specify e.g. "2020-01-01 to 2026-04-07"
Output format: default md, specify e.g. "xlsx"
Storage path: default {workspace}/infoseek-archives/, specify custom path

Troubleshooting

Problem	Solution
Missing search skill	`openclaw skills install \x3Cname>`
Date extraction fails	Check page metadata; use `00000000` for unknown
Encoding errors	Ensure UTF-8; on Windows enable Unicode UTF-8 in region settings
Database corruption	`python scripts/infoseek_helper.py restore-backup`

Security & Privacy

All searches use public channels only
No personal data stored — only search results
SQLite database is local, never uploaded
Deletions use system recycle bin (recoverable)
All operations logged and auditable
No telemetry, no external data transmission

Version History

Version	Date	Notes
2.0.0	2026-04-07	Full rewrite: SQLite dedup, URL normalization, HTML parsing, multi-engine integration
1.0.0	2026-04-06	Initial version (deprecated)

安全使用建议

This skill appears to do what it says: it expects a workspace path and a readable/writable folder to store archives and an included Python helper script to manage deduplication and file storage. Before installing, consider: 1) Trust/source — the package has no homepage and an unknown source; review the full helper script yourself (it is included) and confirm you trust the publisher. 2) Dependencies — the skill expects other search/browser skills to exist in {workspace}/skills; ensure those are genuine and named exactly as SKILL.md expects (there are some naming mismatches in the instructions). 3) Legal & operational risk — the workflow encourages high-volume crawling (e.g., 100+ pages); ensure you comply with target sites' terms of service, robots.txt, and avoid overloading sites. 4) Workspace safety — the skill will create infoseek-archives/ and an SQLite DB under OPENCLAW_WORKSPACE; point OPENCLAW_WORKSPACE to an isolated location if you don't want data mixed with other agent state. 5) Rate limiting & secrets — the helper script does not exfiltrate data or call remote endpoints, but other search/browser skills might. Verify those dependent skills before use. If you want higher assurance, ask the publisher for a homepage or repository, or run the skill in a sandboxed workspace first.

功能分析

Type: OpenClaw Skill Name: infoseek-en Version: 2.0.0 The infoseek-en skill bundle is a comprehensive web research and archival tool. The Python helper script (infoseek_helper.py) manages a local SQLite database for URL deduplication and provides structured storage in multiple formats (Markdown, JSON, Excel, etc.). While the script uses subprocess.run to execute a PowerShell command for moving files to the Windows Recycle Bin, this is a legitimate functional implementation for the stated 'strict deletion policy' and includes basic path escaping. The SKILL.md instructions are well-aligned with the code logic and emphasize user confirmation for destructive actions, showing no signs of malicious intent or prompt injection.

能力评估

✓ Purpose & Capability

Name/description (deep web search + archival) align with the included helper script (URL normalization, SQLite deduplication, file storage) and the declared requirement of python3 and OPENCLAW_WORKSPACE. The script explicitly handles local file and DB operations and does not perform network searches itself, which fits the model where the agent or other 'search' skills perform crawling.

ℹ Instruction Scope

SKILL.md instructs the agent to use external search/browser skills to fetch pages and to run the local helper script for normalization, deduplication, and saving. It does not instruct the agent to read arbitrary unrelated files or extra environment variables beyond OPENCLAW_WORKSPACE. Minor issues: inconsistent naming for required skills (e.g., 'tavily' vs 'tavily-search', 'Multi-Search-Engine' vs 'multi-search-engine') and a reliance on other skills being present in workspace/skills; these appear to be sloppy bookkeeping rather than malicious scope creep. Also, the workflow encourages high-volume scraping (e.g., '100+ pages' on Baidu) — a functional concern (rate limits, TOS, IP blocking, legal/ethical risk), not a code/credential mismatch.

✓ Install Mechanism

No install spec is provided (instruction-only skill with one helper script included). That is low-risk: nothing is downloaded from remote URLs and the script will only be written to the agent environment when this skill is installed. The helper script is plain Python, readable, and contains no obfuscated code or hidden remote endpoints.

✓ Credentials

The only declared primary credential is OPENCLAW_WORKSPACE (a workspace path used to store archives and check for other skills). No API keys or unrelated secrets are requested. The workspace access is necessary and proportionate for saving archives and database files.

✓ Persistence & Privilege

always is false (no forced always-on presence). The skill writes files and an SQLite DB under the workspace (expected for an archival tool) but does not request elevated system-wide configuration changes or access to other skills' configs.

版本历史

v2.0.0

Initial English release: multi-engine deep search, URL deduplication with SQLite, structured archival, multiple output formats

元数据

Slug infoseek

版本 2.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

InfoSeek 是什么？

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 88 次。

如何安装 InfoSeek？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install infoseek」即可一键安装，无需额外配置。

InfoSeek 是免费的吗？

是的，InfoSeek 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

InfoSeek 支持哪些平台？

InfoSeek 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 InfoSeek？

由 ExpeditionHub（@expeditionhub）开发并维护，当前版本 v2.0.0。

InfoSeek