← 返回 Skills 市场
expeditionhub

InfoSeek

作者 ExpeditionHub · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ✓ 安全检测通过
88
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install infoseek
功能描述
Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily...
使用说明 (SKILL.md)

InfoSeek - Deep Web Search & Archival

Overview

InfoSeek performs comprehensive web research on any subject (person, organization, product) across multiple search engines, deduplicates results, extracts clean content, and archives everything with full metadata in organized folders.

Prerequisites

Before executing a search task, verify these skills are installed:

import os
from pathlib import Path

workspace = os.environ.get('OPENCLAW_WORKSPACE')
skills_dir = Path(workspace) / 'skills'

required = ['baidu-search', 'tavily', 'Multi-Search-Engine', 'agent-browser-clawdbot-0.1.0']
missing = [s for s in required if not (skills_dir / s).exists()]

If any are missing, instruct the user to install them:

openclaw skills install baidu-search
openclaw skills install tavily-search
openclaw skills install multi-search-engine

Workflow

Phase 0: Task Setup

  1. Confirm the search subject — name, organization, or product
  2. Collect optional context — background info, time range, output format (default: .md), special requirements
  3. Check dependencies — run the prerequisite check above
  4. Create archive folder — run:
    python scripts/infoseek_helper.py create-folder "\x3Csubject_name>"
    

Phase 1: Multi-Engine Deep Search

Execute searches across all available engines. Each engine runs independently.

1.1 Baidu Search (100+ pages)

Use the baidu-search skill:

  • Query: "\x3Csubject> \x3Cbackground_context>"
  • Depth: 100+ pages
  • Record: URL, title, website name, publish date for each result

1.2 Tavily Search

Use tavily_search tool:

query: "\x3Csubject> \x3Cbackground_context>"
search_depth: advanced
max_results: 50

1.3 Multi-Search-Engine

Use the multi-search-engine skill across multiple engines simultaneously.

1.4 Browser Deep-Crawl

For discovered URLs, use the browser tool to:

  1. Open each page
  2. Extract body content (filter ads, sidebars, comments)
  3. Extract metadata: title, author, editor, date, website name

Phase 2: Deduplication

Run URL deduplication on all collected results:

python scripts/infoseek_helper.py deduplicate "\x3Ctemp_results_file>"

The script normalizes URLs (remove www, tracking params, unify http/https, remove trailing slashes) and checks against the SQLite database to skip duplicates.

Phase 3: Content Extraction & Storage

For each unique URL:

  1. Extract content using the browser tool — get title, body, metadata
  2. Filter content — remove ads, sidebars, navigation, comments, related articles, footers
  3. Generate filename:
    python scripts/infoseek_helper.py generate-filename \
      --date "\x3CYYYYMMDD>" --title "\x3Ctitle>" --website "\x3Csite>" --format "\x3Cext>"
    
    Format: YYYYMMDD-title-website.ext
  4. Save the file:
    python scripts/infoseek_helper.py save-content \
      --folder "\x3Carchive_path>" --filename "\x3Cname>" --url "\x3Curl>" \
      --website "\x3Csite>" --source "\x3Csource>" --date "\x3Cdate>" \
      --title "\x3Ctitle>" --author "\x3Cauthor>" --editor "\x3Ceditor>" \
      --content "\x3Cbody>" --task "\x3Csubject>"
    
  5. Record in database:
    python scripts/infoseek_helper.py add-url \
      --url "\x3Cnormalized_url>" --task "\x3Csubject>" --filename "\x3Cname>"
    

Phase 4: Task Report

Output a summary when complete:

InfoSeek Task Report
====================
Subject: {query}
Engines used: {engines}
Total found: {total} | Duplicates skipped: {dupes} | New archived: {new}
Files saved: {count}
Location: {path}
Database records: {db_total}

File Naming

Format: YYYYMMDD-title-website.ext

  • Date: 8 digits (YYYYMMDD) from page metadata
  • Title: page title (strip special chars \x3C>:"/\|?*)
  • Website: domain or media name
  • Extension: md (default), json, txt, csv, xlsx, html, docx

If filename exists, append 8-char hash to prevent overwrites.

Output Formats

All formats include full metadata (URL, website, source, date, title, author, editor) plus body content.

  • .md — Markdown with metadata table
  • .json — Structured JSON with metadata object and content field
  • .txt — Plain text with header metadata
  • .csv — One row per article, all metadata as columns
  • .xlsx — Excel spreadsheet with metadata columns
  • .html — Styled HTML page with metadata table
  • .docx — Word document with metadata paragraph

Storage Structure

{workspace}/
├── infoseek-archives/
│   ├── \x3Csubject_1>/
│   │   ├── 20260404-title-website.md
│   │   └── ...
│   └── \x3Csubject_2>/
└── infoseek/
    ├── infoseek.db          # SQLite dedup database
    ├── infoseek.log         # Operation log
    └── backups/

Deletion Policy

Strict data retention — no permanent deletes without confirmation.

Operation Confirmation Method
Bulk folder delete Required Move to recycle bin
Single file delete Required Move to recycle bin
Dedup skip Automatic Skip only, no delete
Database cleanup Required Mark as deleted

Process:

  1. List files to delete (name, URL, date)
  2. Ask user: "Confirm deletion? Files go to recycle bin and can be recovered."
  3. On confirmation, move to recycle bin (Windows: PowerShell, Mac/Linux: system trash)
  4. Update database, log the deletion, confirm to user

Never:

  • Delete without user consent
  • Permanently delete (bypass recycle bin)
  • Delete without logging
  • Delete without updating database

Configuration

Override defaults in task instructions:

  • Search depth: default 100 pages, specify e.g. "150 pages"
  • Time range: default unlimited, specify e.g. "2020-01-01 to 2026-04-07"
  • Output format: default md, specify e.g. "xlsx"
  • Storage path: default {workspace}/infoseek-archives/, specify custom path

Troubleshooting

Problem Solution
Missing search skill openclaw skills install \x3Cname>
Date extraction fails Check page metadata; use 00000000 for unknown
Encoding errors Ensure UTF-8; on Windows enable Unicode UTF-8 in region settings
Database corruption python scripts/infoseek_helper.py restore-backup

Security & Privacy

  • All searches use public channels only
  • No personal data stored — only search results
  • SQLite database is local, never uploaded
  • Deletions use system recycle bin (recoverable)
  • All operations logged and auditable
  • No telemetry, no external data transmission

Version History

Version Date Notes
2.0.0 2026-04-07 Full rewrite: SQLite dedup, URL normalization, HTML parsing, multi-engine integration
1.0.0 2026-04-06 Initial version (deprecated)
安全使用建议
This skill appears to do what it says: it expects a workspace path and a readable/writable folder to store archives and an included Python helper script to manage deduplication and file storage. Before installing, consider: 1) Trust/source — the package has no homepage and an unknown source; review the full helper script yourself (it is included) and confirm you trust the publisher. 2) Dependencies — the skill expects other search/browser skills to exist in {workspace}/skills; ensure those are genuine and named exactly as SKILL.md expects (there are some naming mismatches in the instructions). 3) Legal & operational risk — the workflow encourages high-volume crawling (e.g., 100+ pages); ensure you comply with target sites' terms of service, robots.txt, and avoid overloading sites. 4) Workspace safety — the skill will create infoseek-archives/ and an SQLite DB under OPENCLAW_WORKSPACE; point OPENCLAW_WORKSPACE to an isolated location if you don't want data mixed with other agent state. 5) Rate limiting & secrets — the helper script does not exfiltrate data or call remote endpoints, but other search/browser skills might. Verify those dependent skills before use. If you want higher assurance, ask the publisher for a homepage or repository, or run the skill in a sandboxed workspace first.
功能分析
Type: OpenClaw Skill Name: infoseek-en Version: 2.0.0 The infoseek-en skill bundle is a comprehensive web research and archival tool. The Python helper script (infoseek_helper.py) manages a local SQLite database for URL deduplication and provides structured storage in multiple formats (Markdown, JSON, Excel, etc.). While the script uses subprocess.run to execute a PowerShell command for moving files to the Windows Recycle Bin, this is a legitimate functional implementation for the stated 'strict deletion policy' and includes basic path escaping. The SKILL.md instructions are well-aligned with the code logic and emphasize user confirmation for destructive actions, showing no signs of malicious intent or prompt injection.
能力评估
Purpose & Capability
Name/description (deep web search + archival) align with the included helper script (URL normalization, SQLite deduplication, file storage) and the declared requirement of python3 and OPENCLAW_WORKSPACE. The script explicitly handles local file and DB operations and does not perform network searches itself, which fits the model where the agent or other 'search' skills perform crawling.
Instruction Scope
SKILL.md instructs the agent to use external search/browser skills to fetch pages and to run the local helper script for normalization, deduplication, and saving. It does not instruct the agent to read arbitrary unrelated files or extra environment variables beyond OPENCLAW_WORKSPACE. Minor issues: inconsistent naming for required skills (e.g., 'tavily' vs 'tavily-search', 'Multi-Search-Engine' vs 'multi-search-engine') and a reliance on other skills being present in workspace/skills; these appear to be sloppy bookkeeping rather than malicious scope creep. Also, the workflow encourages high-volume scraping (e.g., '100+ pages' on Baidu) — a functional concern (rate limits, TOS, IP blocking, legal/ethical risk), not a code/credential mismatch.
Install Mechanism
No install spec is provided (instruction-only skill with one helper script included). That is low-risk: nothing is downloaded from remote URLs and the script will only be written to the agent environment when this skill is installed. The helper script is plain Python, readable, and contains no obfuscated code or hidden remote endpoints.
Credentials
The only declared primary credential is OPENCLAW_WORKSPACE (a workspace path used to store archives and check for other skills). No API keys or unrelated secrets are requested. The workspace access is necessary and proportionate for saving archives and database files.
Persistence & Privilege
always is false (no forced always-on presence). The skill writes files and an SQLite DB under the workspace (expected for an archival tool) but does not request elevated system-wide configuration changes or access to other skills' configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install infoseek
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /infoseek 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
Initial English release: multi-engine deep search, URL deduplication with SQLite, structured archival, multiple output formats
元数据
Slug infoseek
版本 2.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

InfoSeek 是什么?

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 88 次。

如何安装 InfoSeek?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install infoseek」即可一键安装,无需额外配置。

InfoSeek 是免费的吗?

是的,InfoSeek 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

InfoSeek 支持哪些平台?

InfoSeek 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 InfoSeek?

由 ExpeditionHub(@expeditionhub)开发并维护,当前版本 v2.0.0。

💬 留言讨论