Description

Build and maintain a personal Obsidian-based knowledge base from articles, papers, repositories, datasets, spreadsheets, and local files. Use when the user w...

README (SKILL.md)

Duru Obsidian KB

Name: Duru Obsidian KB
Author: durugy

Overview

Use this skill to operate a local knowledge-base workflow inspired by "raw → compiled wiki → outputs". Keep the system markdown-first, Obsidian-friendly, incremental, and auditable. Prefer producing files in the knowledge base over chat-only answers when the user is doing research or building long-lived notes.

Current scope

Implement and use the current workflow:

Ingest source material into raw/
Normalize metadata into a manifest
Extract web article text when available
Download and ingest papers, including arXiv URLs and PDF-backed sources, when possible
Route incoming content to the most appropriate configured KB repository using rules-first routing with a default fallback, and optionally a local model hook for low-confidence cases
Clone and summarize repositories when requested or when the source is a GitHub repo
Build or refresh source pages, concept pages, backlinks, and indexes in wiki/
Generate output documents in outputs/
Run lightweight lint checks over structure and metadata

Treat the system as an incremental scaffold with growing intelligence. Do not claim high-quality extraction or synthesis when the data is incomplete.

Directory layout

Use this layout inside the chosen knowledge-base root:

\x3Ckb-root>/
  raw/
    articles/
    papers/
    repos/
    files/
  assets/
  wiki/
    concepts/
    sources/
    indexes/
    _meta/
  outputs/
  logs/
  manifest.json
  config.json

If the user does not specify a KB root, default to a folder inside the workspace such as knowledge-bases/\x3Cname>. Keep generated files deterministic and easy to diff.

Workflow

1. Initialize a knowledge base

When starting a new KB:

Create the standard directory tree
Create config.json if missing
Create manifest.json if missing
Create starter index files in wiki/indexes/
Keep all paths relative to the KB root where practical

Use scripts/kb_init.py for initialization.

Example:

python3 scripts/kb_init.py --root /path/to/kb

2. Ingest content

When the user shares a URL or local file:

Detect whether it is a web article, PDF, repo URL, or local file
Copy or record the source into the correct raw/ subdirectory
Generate a slug
Append an entry to manifest.json
Preserve the original source URL/path in metadata
Prefer storing content as markdown when already available; otherwise store a stub entry with metadata and acquisition details

Prefer scripts/kb_add.py as the unified entrypoint when multiple KB repositories are configured. It performs route → ingest → build → summarize. Use scripts/kb_route.py separately when you want a route-only decision or explanation before ingestion.

Example:

python3 scripts/kb_ingest.py --root /path/to/kb --source "https://example.com/article"
python3 scripts/kb_ingest.py --root /path/to/kb --source "/path/to/local.pdf" --type paper

If the source is a normal web article, attempt deterministic extraction first and store the extracted markdown in raw/articles/. Run prompt-shield-lite over extracted text before trusting it. Use prompt-shield-lite as a security/injection scan, then use local heuristics for segment-level noise detection so the pipeline does not over-trigger on rate limits. If extraction fails, is partial, or is flagged as suspicious, record that clearly and preserve suspicious segments for review instead of hallucinating content. If the source is an arXiv abstract URL, normalize it to the PDF URL, try to pull title/abstract metadata from the abs page, download the PDF, and extract a text preview with the available local tools. If the source is a direct PDF or local PDF file, route it through the KB paper path and record both the preferred processor (vendor-anthropic/pdf) and the current fallback used locally. If the source is a local spreadsheet (.xlsx, .xlsm, .csv, .tsv), route it through the KB spreadsheet path and record both the preferred processor (vendor-anthropic/xlsx) and the current fallback used locally. If the source is a GitHub repo, clone it into raw/repos/\x3Cslug>/repo/ when possible and generate a repository summary stub.

3. Build the wiki

When the user asks to build or refresh the KB:

Read manifest.json
Create or refresh wiki/sources/\x3Cslug>.md pages from manifest entries
Create or refresh wiki/concepts/\x3Ctag>.md pages from tags
Add backlinks from concept pages to source pages and from source pages to concept pages
Refresh wiki/indexes/sources.md
Refresh wiki/indexes/tags.md
Refresh wiki/indexes/timeline.md
Refresh wiki/indexes/concepts.md
Keep generation incremental and idempotent

Use scripts/kb_build.py.

The build step should summarize known metadata, link raw items, and create deterministic concept pages from existing tags. Do not fabricate deep conceptual synthesis. If a source lacks extracted body text, say so explicitly in the generated page.

4. Ask questions and generate outputs

When the user asks a question against the KB:

Prefer creating a markdown deliverable in outputs/
Base the answer on manifest entries and existing wiki pages
Cite source pages by relative path when possible
Distinguish confirmed facts from hypotheses or gaps

Use scripts/kb_ask.py to generate a structured prompt/output scaffold for the agent.

In Phase 1, this script prepares a research brief shell from the current KB state. The agent may then refine it.

5. Lint the KB

When the user wants a health check:

Validate required directories and files
Detect manifest entries with missing fields
Detect source pages missing from wiki/sources/
Detect wiki pages whose manifest entry no longer exists
Detect duplicate slugs

Use scripts/kb_lint.py.

Output rules

Prefer these output forms:

Research brief: outputs/YYYY-MM-DD-topic.md
Marp deck draft: outputs/YYYY-MM-DD-topic.marp.md
Topic memo: wiki/concepts/\x3Ctopic>.md only when the user wants the result filed back into the KB

Keep frontmatter lightweight. Recommended fields:

---
title: ...
slug: ...
source_type: article|paper|repo|file
source_url: ...
ingested_at: ...
tags: []
status: raw|stub|indexed|reviewed
---

Safety and quality

Never silently invent extracted content. If retrieval failed, store a stub and mark it clearly. If prompt-shield-lite flags extracted text or the cleaner detects suspicious/noisy segments, mark the source as suspicious and keep the flagged snippets visible for review. Prefer small deterministic updates over large rewrites. When improving generated wiki pages, preserve provenance and links back to raw sources. Keep the KB usable in Obsidian without requiring proprietary tooling.

Resources

scripts/

Use these scripts as the current backbone:

kb_init.py — create folder structure and starter files
kb_add.py — unified add flow: route to the best KB repo, ingest content, then optionally build and summarize
kb_route.py — route content to the best KB repo using rules-first scoring with default fallback and optional local-model hook
kb_ingest.py — register URLs or local files into a chosen KB root, attempt article extraction, ingest arXiv/PDF papers, ingest spreadsheets, and support repo ingest
kb_build.py — generate source pages, concept pages, backlinks, and indexes from the manifest
kb_summarize_concepts.py — generate first-pass topic memo scaffolds for concept pages
kb_ask.py — generate output scaffolds for research questions
kb_search.py — run lightweight local relevance search over manifest/wiki sources and return scored snippets
kb_chart.py — generate chart artifacts (png + markdown note) from CSV/XLSX data into outputs/ and optionally file back to concept pages
kb_lint.py — run structural checks and emit a report
kb_healthcheck.py — run lint checks across configured KB repositories from repos.json
kb_smoke.py — run end-to-end smoke tests (init/ingest/build/search/ask/lint/chart) in a temporary KB

references/

Read these references when refining the skill:

references/layout.md — canonical folder layout and file contracts
references/phase-plan.md — roadmap from MVP to richer wiki compilation
workspace knowledge-bases/config/repos.json — multi-repo routing configuration with default fallback and optional local-model hook

Usage Guidance

This skill appears to be what it claims: a local, markdown-first KB tool. Before installing or running it, consider the following: - Run in a dedicated workspace (set OPENCLAW_WORKSPACE to a non-sensitive directory) so downloaded files and cloned repos don't mix with other files. - Inspect and vet optional external dependencies the README suggests (prompt-shield-lite repo and the 'uv' workflow). The skill will call a prompt-shield script if present; that script comes from a separate GitHub repo and should be audited before use. - The ingest pipeline fetches URLs, downloads PDFs/binaries, and may clone GitHub repositories. These actions are normal for this use case but can pull untrusted content onto disk. Do not point the KB root at locations that contain secrets or system configs. - The scripts use subprocess.run and will execute other local scripts (e.g., vendor processors if present). Avoid running as a privileged user and avoid auto-running code from ingested repositories. - There are no required API keys or credentials, but the KB config file path (KB_CONFIG_PATH) controls which repos are managed; review that config before running healthchecks or bulk ingest. If you want greater assurance, run the scripts manually in a sandbox: create an empty KB root, add one test source, and inspect the created manifest/wiki/output files before using it against real data.

Capability Analysis

Type: OpenClaw Skill Name: duru-obsidian-kb Version: 0.1.1 The skill bundle implements a sophisticated Obsidian-based knowledge management system with capabilities for web ingestion, PDF processing, and GitHub repository cloning in `kb_ingest.py`. It utilizes `subprocess.run` to interface with local tools like `git` and `ollama` (`kb_route.py`), and performs arbitrary network requests to fetch user-supplied content. Although the code demonstrates a strong security posture by integrating a 'Prompt Shield' to scan for injection attacks in ingested data and provides transparent logging of 'suspicious segments,' the inherent risks associated with automated repository cloning and arbitrary network access classify it as suspicious under the provided criteria.

Capability Assessment

✓ Purpose & Capability

Name/description match the code and SKILL.md. Scripts implement ingest → build → ask → outputs workflows that are appropriate for an Obsidian-style local KB. References to PDF/spreadsheet processors and a local prompt-shield scanner are consistent with extracting and vetting remote content.

ℹ Instruction Scope

Instructions and scripts download web pages, fetch binaries (PDFs), clone repos and write files under a configured KB root — which is expected for ingestion. The skill also runs a local prompt-shield script to flag suspicious text. This behavior is within scope, but it does perform network I/O and writes to local filesystem paths derived from OPENCLAW_WORKSPACE / repo config, so run in a controlled workspace.

✓ Install Mechanism

There is no install spec (instruction-only skill) so nothing is automatically downloaded by the platform. The README recommends installing external utilities (prompt-shield-lite and 'uv' / a venv) from external sources; those are optional instructions and not automatically run by the skill, but they should be reviewed before installing.

✓ Credentials

The skill declares no required environment variables or credentials in the registry. The code uses environment-derived defaults (OPENCLAW_WORKSPACE, KB_CONFIG_PATH, PROMPT_SHIELD_SCRIPT, KB_VENV_PYTHON) for locating configuration and optional helpers. These are reasonable for a local KB; no API keys or secrets are requested.

✓ Persistence & Privilege

always:false and user-invocable:true. The skill does not request elevated system-wide privileges and does not modify other skills' configs. It can be invoked autonomously (platform default), which is expected; nothing in the repo suggests it needs permanent, system-level presence.

Version History

v0.1.1

Add env-driven runtime config, document prompt-shield-lite dependency, and reduce hardcoded path assumptions.

v0.1.0

Initial public release: ingest/build/search/ask/lint/chart, daily ops, uv setup, MIT license

Metadata

Slug duru-obsidian-kb

Version 0.1.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Duru Obsidian KB?

Build and maintain a personal Obsidian-based knowledge base from articles, papers, repositories, datasets, spreadsheets, and local files. Use when the user w... It is an AI Agent Skill for Claude Code / OpenClaw, with 92 downloads so far.

How do I install Duru Obsidian KB?

Run "/install duru-obsidian-kb" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Duru Obsidian KB free?

Yes, Duru Obsidian KB is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Duru Obsidian KB support?

Duru Obsidian KB is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Duru Obsidian KB?

It is built and maintained by Duru (@durugy); the current version is v0.1.1.

More Skills

Duru Obsidian KB

Duru Obsidian KB

Overview

Current scope

Directory layout

Workflow

1. Initialize a knowledge base

2. Ingest content

3. Build the wiki

4. Ask questions and generate outputs

5. Lint the KB

Output rules

Safety and quality

Resources

scripts/

references/

What is Duru Obsidian KB?

How do I install Duru Obsidian KB?

Is Duru Obsidian KB free?

Which platforms does Duru Obsidian KB support?

Who created Duru Obsidian KB?

💬 Comments