← 返回 Skills 市场

Doc Scraper

Name: Doc Scraper
Author: kikikari

作者 KikiKari · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install doc-scraper

功能描述

Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchro...

使用说明 (SKILL.md)

Doc Scraper

Dokumentations-Extraktion und Indexierung - arbeitet mit workspace-db zusammen.

Zusammenspiel mit workspace-db

Skill	Aufgabe	Datenbank
workspace-db	Synchronisation & Organisation	docs.db
doc-scraper (dieser)	Informationsextraktion	Nutzt docs.db

Aufgaben

1. Markdown-Extraktion

// Extrahiert aus SKILL.md:
// - Name, Version, Beschreibung
// - Nutzungsbeispiele
// - Konfigurationsoptionen

const docInfo = await docScraper.extractMarkdown({
  file: "skills/my-skill/SKILL.md",
  extract: ["title", "description", "usage", "config"]
});

2. Indexierung in docs.db

// Speichert extrahierte Daten in docs.db
// (workspace-db verwaltet die DB)

await docScraper.index({
  source: "skills/my-skill/SKILL.md",
  data: docInfo,
  tags: ["skill", "api"]
});

3. Auto-Update bei Änderungen

# Überwacht .md Dateien
# Extrahiert bei Änderung neu
# Aktualisiert docs.db

doc-scraper watch --dir skills/ --ext .md

Extraktions-Templates

Skill-Dokumentation

# Aus SKILL.md extrahiert:
name: "skill-name"
description: "Beschreibung"
version: "1.0.0"
category: "database"
usage_examples:
  - command: "openclaw skill"
    result: "..."

API-Dokumentation

# Aus API.md extrahiert:
endpoints:
  - path: "/api/v1/search"
    method: "GET"
    params:
      - query: string
    response: json

System-Dokumentation

# Aus SYSTEM.md extrahiert:
components:
  - databases:
      - docs.db
      - tree.db
cron_jobs:
  - db-maintainer: "*/30"

Workflow

skill.md geändert
    ↓
doc-scraper erkennt Änderung
    ↓
Extrahiert: name, desc, usage, config
    ↓
Speichert in docs.db
    ↓
workspace-db synchronisiert

Nutzung

Einmalig

doc-scraper index --dir skills/ --recursive
doc-scraper index --dir docs/ --ext .md

Watch-Modus

# Kontinuierlich überwachen
doc-scraper watch --dir workspace/

# Einzelne Datei
doc-scraper watch --file README.md

Suche

# Direkt in extrahierten Daten suchen
doc-scraper search --query "database"
doc-scraper search --tag "api" --format json

Integration mit workspace-db

// doc-scraper extrahiert
// workspace-db speichert/organisiert

const extracted = await docScraper.extract('skills/my/SKILL.md');

// Übergabe an workspace-db
await workspaceDb.syncDocument({
  id: extracted.name,
  category: extracted.category,
  data: extracted,
  source_file: 'skills/my/SKILL.md'
});

Konfiguration

{
  "doc-scraper": {
    "watch_dirs": ["skills/", "docs/"],
    "extensions": [".md", ".mdx"],
    "extract_headers": ["##", "###"],
    "auto_index": true,
    "workspace_db_integration": true
  }
}

Links

安全使用建议

This skill is instruction-only and does not include the doc-scraper implementation it describes. Before installing or enabling it: 1) Ask the author for the implementation or an install spec (how is doc-scraper provided?). 2) Verify workspace-db: where docs.db is stored and whether any credentials/network endpoints are used. 3) Limit scanning scope—do not allow the skill to watch your entire workspace; specify allowed directories only. 4) Prefer manual invocation until you can review the underlying code; avoid granting always:true or leaving autonomous long-running watchers enabled. 5) If you must run it, run in a sandboxed/least-privilege environment and audit the files that get indexed to ensure no secrets are captured.

功能分析

Type: OpenClaw Skill Name: doc-scraper Version: 1.0.0 The doc-scraper skill is designed for extracting metadata and indexing documentation from Markdown files into a workspace database. The files (SKILL.md, package.json, and _meta.json) contain functional instructions, usage examples, and configuration templates that align strictly with its stated purpose of documentation management. No indicators of malicious intent, data exfiltration, or prompt injection were found.

能力评估

⚠ Purpose & Capability

SKILL.md describes a CLI (doc-scraper) and JS API (docScraper.extract*/workspaceDb.sync*) and shows commands like doc-scraper watch --dir, but the skill bundle has no install spec, no binaries, and no code implementing those commands. That mismatch (declared purpose vs. provided artifacts) is incoherent: either the author assumed external tools exist on the host or omitted the implementation.

⚠ Instruction Scope

Instructions instruct recursively scanning and watching directories (skills/, docs/, workspace/) and indexing arbitrary markdown files into docs.db. While consistent with a documentation indexer, this scope can capture sensitive files (README, config snippets, credentials if present in docs) and the SKILL.md gives the agent discretion to extract various headers and configs without constraints. There are no limits or filters described.

ℹ Install Mechanism

No install spec is provided (instruction-only), which is lowest-risk from a supply-chain perspective—but it's unexpected given the CLI/API usage in the instructions. If the agent tries to run the named CLI and it does not exist, behavior will depend on the agent's error-handling; if the CLI does exist in the environment (from elsewhere), the skill will call it without presenting its code here.

ℹ Credentials

The skill declares no required environment variables or credentials (which is reasonable), but it explicitly integrates with workspace-db and writes to docs.db. That integration may implicitly require workspace-db credentials or network access (not declared). The absence of declared credentials makes it unclear how workspace sync is authenticated—this is an information gap worth clarifying.

ℹ Persistence & Privilege

always is false (good). However, the SKILL.md encourages long-running watch processes (doc-scraper watch) which would give the agent persistent access to file-system changes while running. Autonomous invocation is allowed by default; combined with watch-style behavior this increases blast radius if the agent is permitted to run watchers without restrictions.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install doc-scraper
安装完成后，直接呼叫该 Skill 的名称或使用 /doc-scraper 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of doc-scraper for documentation extraction and indexing. - Extracts structured information from markdown files, including SKILL.md, API.md, and SYSTEM.md. - Indexes extracted data into docs.db and integrates with workspace-db for synchronization and organization. - Supports auto-update: watches for markdown file changes and updates docs.db automatically. - Provides CLI commands for indexing, watching, and searching documentation. - Offers flexible configuration options for extraction and integration.

元数据

Slug doc-scraper

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

Doc Scraper 是什么？

Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchro... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 66 次。

如何安装 Doc Scraper？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-scraper」即可一键安装，无需额外配置。

Doc Scraper 是免费的吗？

是的，Doc Scraper 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Doc Scraper 支持哪些平台？

Doc Scraper 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Doc Scraper？

由 KikiKari（@kikikari）开发并维护，当前版本 v1.0.0。

Doc Scraper

Doc Scraper

Zusammenspiel mit workspace-db

Aufgaben

1. Markdown-Extraktion

2. Indexierung in docs.db

3. Auto-Update bei Änderungen

Extraktions-Templates

Skill-Dokumentation

API-Dokumentation

System-Dokumentation

Workflow

Nutzung

Einmalig

Watch-Modus

Suche

Integration mit workspace-db

Konfiguration

Links

Doc Scraper 是什么？

如何安装 Doc Scraper？

Doc Scraper 是免费的吗？

Doc Scraper 支持哪些平台？

谁开发了 Doc Scraper？

💬 留言讨论