← Back to Skills Marketplace
expeditionhub

InfoSeek

by ExpeditionHub · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ✓ Security Clean
88
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install infoseek
Description
Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily...
README (SKILL.md)

InfoSeek - Deep Web Search & Archival

Overview

InfoSeek performs comprehensive web research on any subject (person, organization, product) across multiple search engines, deduplicates results, extracts clean content, and archives everything with full metadata in organized folders.

Prerequisites

Before executing a search task, verify these skills are installed:

import os
from pathlib import Path

workspace = os.environ.get('OPENCLAW_WORKSPACE')
skills_dir = Path(workspace) / 'skills'

required = ['baidu-search', 'tavily', 'Multi-Search-Engine', 'agent-browser-clawdbot-0.1.0']
missing = [s for s in required if not (skills_dir / s).exists()]

If any are missing, instruct the user to install them:

openclaw skills install baidu-search
openclaw skills install tavily-search
openclaw skills install multi-search-engine

Workflow

Phase 0: Task Setup

  1. Confirm the search subject — name, organization, or product
  2. Collect optional context — background info, time range, output format (default: .md), special requirements
  3. Check dependencies — run the prerequisite check above
  4. Create archive folder — run:
    python scripts/infoseek_helper.py create-folder "\x3Csubject_name>"
    

Phase 1: Multi-Engine Deep Search

Execute searches across all available engines. Each engine runs independently.

1.1 Baidu Search (100+ pages)

Use the baidu-search skill:

  • Query: "\x3Csubject> \x3Cbackground_context>"
  • Depth: 100+ pages
  • Record: URL, title, website name, publish date for each result

1.2 Tavily Search

Use tavily_search tool:

query: "\x3Csubject> \x3Cbackground_context>"
search_depth: advanced
max_results: 50

1.3 Multi-Search-Engine

Use the multi-search-engine skill across multiple engines simultaneously.

1.4 Browser Deep-Crawl

For discovered URLs, use the browser tool to:

  1. Open each page
  2. Extract body content (filter ads, sidebars, comments)
  3. Extract metadata: title, author, editor, date, website name

Phase 2: Deduplication

Run URL deduplication on all collected results:

python scripts/infoseek_helper.py deduplicate "\x3Ctemp_results_file>"

The script normalizes URLs (remove www, tracking params, unify http/https, remove trailing slashes) and checks against the SQLite database to skip duplicates.

Phase 3: Content Extraction & Storage

For each unique URL:

  1. Extract content using the browser tool — get title, body, metadata
  2. Filter content — remove ads, sidebars, navigation, comments, related articles, footers
  3. Generate filename:
    python scripts/infoseek_helper.py generate-filename \
      --date "\x3CYYYYMMDD>" --title "\x3Ctitle>" --website "\x3Csite>" --format "\x3Cext>"
    
    Format: YYYYMMDD-title-website.ext
  4. Save the file:
    python scripts/infoseek_helper.py save-content \
      --folder "\x3Carchive_path>" --filename "\x3Cname>" --url "\x3Curl>" \
      --website "\x3Csite>" --source "\x3Csource>" --date "\x3Cdate>" \
      --title "\x3Ctitle>" --author "\x3Cauthor>" --editor "\x3Ceditor>" \
      --content "\x3Cbody>" --task "\x3Csubject>"
    
  5. Record in database:
    python scripts/infoseek_helper.py add-url \
      --url "\x3Cnormalized_url>" --task "\x3Csubject>" --filename "\x3Cname>"
    

Phase 4: Task Report

Output a summary when complete:

InfoSeek Task Report
====================
Subject: {query}
Engines used: {engines}
Total found: {total} | Duplicates skipped: {dupes} | New archived: {new}
Files saved: {count}
Location: {path}
Database records: {db_total}

File Naming

Format: YYYYMMDD-title-website.ext

  • Date: 8 digits (YYYYMMDD) from page metadata
  • Title: page title (strip special chars \x3C>:"/\|?*)
  • Website: domain or media name
  • Extension: md (default), json, txt, csv, xlsx, html, docx

If filename exists, append 8-char hash to prevent overwrites.

Output Formats

All formats include full metadata (URL, website, source, date, title, author, editor) plus body content.

  • .md — Markdown with metadata table
  • .json — Structured JSON with metadata object and content field
  • .txt — Plain text with header metadata
  • .csv — One row per article, all metadata as columns
  • .xlsx — Excel spreadsheet with metadata columns
  • .html — Styled HTML page with metadata table
  • .docx — Word document with metadata paragraph

Storage Structure

{workspace}/
├── infoseek-archives/
│   ├── \x3Csubject_1>/
│   │   ├── 20260404-title-website.md
│   │   └── ...
│   └── \x3Csubject_2>/
└── infoseek/
    ├── infoseek.db          # SQLite dedup database
    ├── infoseek.log         # Operation log
    └── backups/

Deletion Policy

Strict data retention — no permanent deletes without confirmation.

Operation Confirmation Method
Bulk folder delete Required Move to recycle bin
Single file delete Required Move to recycle bin
Dedup skip Automatic Skip only, no delete
Database cleanup Required Mark as deleted

Process:

  1. List files to delete (name, URL, date)
  2. Ask user: "Confirm deletion? Files go to recycle bin and can be recovered."
  3. On confirmation, move to recycle bin (Windows: PowerShell, Mac/Linux: system trash)
  4. Update database, log the deletion, confirm to user

Never:

  • Delete without user consent
  • Permanently delete (bypass recycle bin)
  • Delete without logging
  • Delete without updating database

Configuration

Override defaults in task instructions:

  • Search depth: default 100 pages, specify e.g. "150 pages"
  • Time range: default unlimited, specify e.g. "2020-01-01 to 2026-04-07"
  • Output format: default md, specify e.g. "xlsx"
  • Storage path: default {workspace}/infoseek-archives/, specify custom path

Troubleshooting

Problem Solution
Missing search skill openclaw skills install \x3Cname>
Date extraction fails Check page metadata; use 00000000 for unknown
Encoding errors Ensure UTF-8; on Windows enable Unicode UTF-8 in region settings
Database corruption python scripts/infoseek_helper.py restore-backup

Security & Privacy

  • All searches use public channels only
  • No personal data stored — only search results
  • SQLite database is local, never uploaded
  • Deletions use system recycle bin (recoverable)
  • All operations logged and auditable
  • No telemetry, no external data transmission

Version History

Version Date Notes
2.0.0 2026-04-07 Full rewrite: SQLite dedup, URL normalization, HTML parsing, multi-engine integration
1.0.0 2026-04-06 Initial version (deprecated)
Usage Guidance
This skill appears to do what it says: it expects a workspace path and a readable/writable folder to store archives and an included Python helper script to manage deduplication and file storage. Before installing, consider: 1) Trust/source — the package has no homepage and an unknown source; review the full helper script yourself (it is included) and confirm you trust the publisher. 2) Dependencies — the skill expects other search/browser skills to exist in {workspace}/skills; ensure those are genuine and named exactly as SKILL.md expects (there are some naming mismatches in the instructions). 3) Legal & operational risk — the workflow encourages high-volume crawling (e.g., 100+ pages); ensure you comply with target sites' terms of service, robots.txt, and avoid overloading sites. 4) Workspace safety — the skill will create infoseek-archives/ and an SQLite DB under OPENCLAW_WORKSPACE; point OPENCLAW_WORKSPACE to an isolated location if you don't want data mixed with other agent state. 5) Rate limiting & secrets — the helper script does not exfiltrate data or call remote endpoints, but other search/browser skills might. Verify those dependent skills before use. If you want higher assurance, ask the publisher for a homepage or repository, or run the skill in a sandboxed workspace first.
Capability Analysis
Type: OpenClaw Skill Name: infoseek-en Version: 2.0.0 The infoseek-en skill bundle is a comprehensive web research and archival tool. The Python helper script (infoseek_helper.py) manages a local SQLite database for URL deduplication and provides structured storage in multiple formats (Markdown, JSON, Excel, etc.). While the script uses subprocess.run to execute a PowerShell command for moving files to the Windows Recycle Bin, this is a legitimate functional implementation for the stated 'strict deletion policy' and includes basic path escaping. The SKILL.md instructions are well-aligned with the code logic and emphasize user confirmation for destructive actions, showing no signs of malicious intent or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (deep web search + archival) align with the included helper script (URL normalization, SQLite deduplication, file storage) and the declared requirement of python3 and OPENCLAW_WORKSPACE. The script explicitly handles local file and DB operations and does not perform network searches itself, which fits the model where the agent or other 'search' skills perform crawling.
Instruction Scope
SKILL.md instructs the agent to use external search/browser skills to fetch pages and to run the local helper script for normalization, deduplication, and saving. It does not instruct the agent to read arbitrary unrelated files or extra environment variables beyond OPENCLAW_WORKSPACE. Minor issues: inconsistent naming for required skills (e.g., 'tavily' vs 'tavily-search', 'Multi-Search-Engine' vs 'multi-search-engine') and a reliance on other skills being present in workspace/skills; these appear to be sloppy bookkeeping rather than malicious scope creep. Also, the workflow encourages high-volume scraping (e.g., '100+ pages' on Baidu) — a functional concern (rate limits, TOS, IP blocking, legal/ethical risk), not a code/credential mismatch.
Install Mechanism
No install spec is provided (instruction-only skill with one helper script included). That is low-risk: nothing is downloaded from remote URLs and the script will only be written to the agent environment when this skill is installed. The helper script is plain Python, readable, and contains no obfuscated code or hidden remote endpoints.
Credentials
The only declared primary credential is OPENCLAW_WORKSPACE (a workspace path used to store archives and check for other skills). No API keys or unrelated secrets are requested. The workspace access is necessary and proportionate for saving archives and database files.
Persistence & Privilege
always is false (no forced always-on presence). The skill writes files and an SQLite DB under the workspace (expected for an archival tool) but does not request elevated system-wide configuration changes or access to other skills' configs.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install infoseek
  3. After installation, invoke the skill by name or use /infoseek
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
Initial English release: multi-engine deep search, URL deduplication with SQLite, structured archival, multiple output formats
Metadata
Slug infoseek
Version 2.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is InfoSeek?

Deep web information search and archival skill for comprehensive research on persons, organizations, or products. Uses multiple search engines (Baidu, Tavily... It is an AI Agent Skill for Claude Code / OpenClaw, with 88 downloads so far.

How do I install InfoSeek?

Run "/install infoseek" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is InfoSeek free?

Yes, InfoSeek is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does InfoSeek support?

InfoSeek is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created InfoSeek?

It is built and maintained by ExpeditionHub (@expeditionhub); the current version is v2.0.0.

💬 Comments