← Back to Skills Marketplace
kikikari

Doc Scraper

by KikiKari · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
66
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install doc-scraper
Description
Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchro...
README (SKILL.md)

Doc Scraper

Dokumentations-Extraktion und Indexierung - arbeitet mit workspace-db zusammen.

Zusammenspiel mit workspace-db

Skill Aufgabe Datenbank
workspace-db Synchronisation & Organisation docs.db
doc-scraper (dieser) Informationsextraktion Nutzt docs.db

Aufgaben

1. Markdown-Extraktion

// Extrahiert aus SKILL.md:
// - Name, Version, Beschreibung
// - Nutzungsbeispiele
// - Konfigurationsoptionen

const docInfo = await docScraper.extractMarkdown({
  file: "skills/my-skill/SKILL.md",
  extract: ["title", "description", "usage", "config"]
});

2. Indexierung in docs.db

// Speichert extrahierte Daten in docs.db
// (workspace-db verwaltet die DB)

await docScraper.index({
  source: "skills/my-skill/SKILL.md",
  data: docInfo,
  tags: ["skill", "api"]
});

3. Auto-Update bei Änderungen

# Überwacht .md Dateien
# Extrahiert bei Änderung neu
# Aktualisiert docs.db

doc-scraper watch --dir skills/ --ext .md

Extraktions-Templates

Skill-Dokumentation

# Aus SKILL.md extrahiert:
name: "skill-name"
description: "Beschreibung"
version: "1.0.0"
category: "database"
usage_examples:
  - command: "openclaw skill"
    result: "..."

API-Dokumentation

# Aus API.md extrahiert:
endpoints:
  - path: "/api/v1/search"
    method: "GET"
    params:
      - query: string
    response: json

System-Dokumentation

# Aus SYSTEM.md extrahiert:
components:
  - databases:
      - docs.db
      - tree.db
cron_jobs:
  - db-maintainer: "*/30"

Workflow

skill.md geändert
    ↓
doc-scraper erkennt Änderung
    ↓
Extrahiert: name, desc, usage, config
    ↓
Speichert in docs.db
    ↓
workspace-db synchronisiert

Nutzung

Einmalig

doc-scraper index --dir skills/ --recursive
doc-scraper index --dir docs/ --ext .md

Watch-Modus

# Kontinuierlich überwachen
doc-scraper watch --dir workspace/

# Einzelne Datei
doc-scraper watch --file README.md

Suche

# Direkt in extrahierten Daten suchen
doc-scraper search --query "database"
doc-scraper search --tag "api" --format json

Integration mit workspace-db

// doc-scraper extrahiert
// workspace-db speichert/organisiert

const extracted = await docScraper.extract('skills/my/SKILL.md');

// Übergabe an workspace-db
await workspaceDb.syncDocument({
  id: extracted.name,
  category: extracted.category,
  data: extracted,
  source_file: 'skills/my/SKILL.md'
});

Konfiguration

{
  "doc-scraper": {
    "watch_dirs": ["skills/", "docs/"],
    "extensions": [".md", ".mdx"],
    "extract_headers": ["##", "###"],
    "auto_index": true,
    "workspace_db_integration": true
  }
}

Links

Usage Guidance
This skill is instruction-only and does not include the doc-scraper implementation it describes. Before installing or enabling it: 1) Ask the author for the implementation or an install spec (how is doc-scraper provided?). 2) Verify workspace-db: where docs.db is stored and whether any credentials/network endpoints are used. 3) Limit scanning scope—do not allow the skill to watch your entire workspace; specify allowed directories only. 4) Prefer manual invocation until you can review the underlying code; avoid granting always:true or leaving autonomous long-running watchers enabled. 5) If you must run it, run in a sandboxed/least-privilege environment and audit the files that get indexed to ensure no secrets are captured.
Capability Analysis
Type: OpenClaw Skill Name: doc-scraper Version: 1.0.0 The doc-scraper skill is designed for extracting metadata and indexing documentation from Markdown files into a workspace database. The files (SKILL.md, package.json, and _meta.json) contain functional instructions, usage examples, and configuration templates that align strictly with its stated purpose of documentation management. No indicators of malicious intent, data exfiltration, or prompt injection were found.
Capability Assessment
Purpose & Capability
SKILL.md describes a CLI (doc-scraper) and JS API (docScraper.extract*/workspaceDb.sync*) and shows commands like doc-scraper watch --dir, but the skill bundle has no install spec, no binaries, and no code implementing those commands. That mismatch (declared purpose vs. provided artifacts) is incoherent: either the author assumed external tools exist on the host or omitted the implementation.
Instruction Scope
Instructions instruct recursively scanning and watching directories (skills/, docs/, workspace/) and indexing arbitrary markdown files into docs.db. While consistent with a documentation indexer, this scope can capture sensitive files (README, config snippets, credentials if present in docs) and the SKILL.md gives the agent discretion to extract various headers and configs without constraints. There are no limits or filters described.
Install Mechanism
No install spec is provided (instruction-only), which is lowest-risk from a supply-chain perspective—but it's unexpected given the CLI/API usage in the instructions. If the agent tries to run the named CLI and it does not exist, behavior will depend on the agent's error-handling; if the CLI does exist in the environment (from elsewhere), the skill will call it without presenting its code here.
Credentials
The skill declares no required environment variables or credentials (which is reasonable), but it explicitly integrates with workspace-db and writes to docs.db. That integration may implicitly require workspace-db credentials or network access (not declared). The absence of declared credentials makes it unclear how workspace sync is authenticated—this is an information gap worth clarifying.
Persistence & Privilege
always is false (good). However, the SKILL.md encourages long-running watch processes (doc-scraper watch) which would give the agent persistent access to file-system changes while running. Autonomous invocation is allowed by default; combined with watch-style behavior this increases blast radius if the agent is permitted to run watchers without restrictions.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-scraper
  3. After installation, invoke the skill by name or use /doc-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of doc-scraper for documentation extraction and indexing. - Extracts structured information from markdown files, including SKILL.md, API.md, and SYSTEM.md. - Indexes extracted data into docs.db and integrates with workspace-db for synchronization and organization. - Supports auto-update: watches for markdown file changes and updates docs.db automatically. - Provides CLI commands for indexing, watching, and searching documentation. - Offers flexible configuration options for extraction and integration.
Metadata
Slug doc-scraper
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Doc Scraper?

Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchro... It is an AI Agent Skill for Claude Code / OpenClaw, with 66 downloads so far.

How do I install Doc Scraper?

Run "/install doc-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc Scraper free?

Yes, Doc Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc Scraper support?

Doc Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc Scraper?

It is built and maintained by KikiKari (@kikikari); the current version is v1.0.0.

💬 Comments