Description

LLM-powered personal knowledge base. Raw documents in, an LLM compiles them into a structured interlinked wiki with trilingual articles, emergent taxonomy, a...

README (SKILL.md)

llmwiki

Name: Llmwiki
Author: hosuke

A personal knowledge base that an LLM compiles, not just stores. Raw documents go in, an LLM writes trilingual (EN / 中文 / 日本語) wiki articles with [[wiki-links]], backlinks, and an emergent taxonomy. The MCP server dispatches every tool through llmwiki/operations.py; the CLI exposes the same registry via llmbase ops call; individual HTTP/CLI wrappers are being migrated onto the registry over time.

PyPI: pip install llmwiki
CLI command: llmbase (the package name and the command differ)
GitHub: https://github.com/Hosuke/llmbase
Demo: https://huazangge-production.up.railway.app

Setup

pip install llmwiki

mkdir my-kb && cd my-kb

cat > .env \x3C\x3C 'EOF'
LLMBASE_API_KEY=sk-your-key
LLMBASE_BASE_URL=https://your-endpoint/v1
LLMBASE_MODEL=your-model
# Optional: LLMBASE_FALLBACK_MODELS=backup-1,backup-2
EOF

cat > config.yaml \x3C\x3C 'EOF'
llm:
  max_tokens: 16384
paths:
  raw: "./raw"
  wiki: "./wiki"
EOF

Commands

Command	Description
`llmbase ingest url \x3Curl>`	Ingest a web article
`llmbase ingest pdf \x3Cfile>`	Ingest a PDF (auto-chunks)
`llmbase ingest file \x3Cfile>`	Ingest any local file
`llmbase ingest dir \x3Cdir>`	Ingest all files from a directory
`llmbase ingest cbeta-learn --batch 10`	Corpus plugin: Buddhist canon
`llmbase ingest ctext-book 论语 /analects/zh`	Corpus plugin: Chinese classics
`llmbase compile new`	Compile new raw docs incrementally (3-layer dedup)
`llmbase compile all`	Full rebuild
`llmbase compile index`	Rebuild index + aliases
`llmbase query "\x3Cq>"`	Ask a question (single-pass; add `--deep` for multi-step research)
`llmbase query "\x3Cq>" --tone wenyan`	📜 classical Chinese voice
`llmbase query "\x3Cq>" --tone scholar`	🎓 academic voice
`llmbase query "\x3Cq>" --tone eli5`	👶 simple voice
`llmbase query "\x3Cq>" --tone caveman`	🦴 primitive voice
`llmbase query "\x3Cq>" --file-back`	File answer back into the wiki
`llmbase lint check`	8-category structural health check
`llmbase lint heal`	Check → fix → re-check → report
`llmbase lint deep`	LLM deep quality analysis
`llmbase web`	Web UI at :5555
`llmbase serve`	Agent HTTP API at :5556
`llmbase mcp`	Start MCP server (stdio)
`llmbase stats`	KB statistics

MCP Integration (for AI clients)

{
  "mcpServers": {
    "llmwiki": {
      "command": "python",
      "args": ["-m", "llmwiki", "--base-dir", "/path/to/my-kb"]
    }
  }
}

Tools exposed by the MCP server:

Tool	Purpose
`kb_search`	Full-text search over compiled concepts
`kb_search_raw`	Verbatim full-text fallback over raw/ sources (v0.6.2+)
`kb_ask`	Deep-research Q&A with tone modes
`kb_get`	Get article by slug or alias (`空`, `kong`, `emptiness` all work)
`kb_list`	List articles, filter by tag
`kb_backlinks`	Find articles citing a given article
`kb_taxonomy`	Multilingual category tree
`kb_stats`	Article count, word count
`kb_xici`	Guided reading (导读)
`kb_ingest`	Ingest a URL
`kb_compile`	Compile raw → wiki
`kb_lint`	Health check / auto-fix
`kb_export` / `kb_export_article` / `kb_export_tag` / `kb_export_graph`	Structured export for downstream projects

All tools are declared in llmwiki/operations.py — downstream projects register custom ops via operations.register(...) and they become available on CLI + MCP automatically.

Agents mounted on this server can answer from compiled concepts, fall back to raw sources with kb_search_raw when compile glossed a detail, ingest new material mid-session, and trigger healing.

Workflows

Build a KB from scratch

llmbase ingest url https://example.com/topic
llmbase ingest pdf ./paper.pdf
llmbase compile new
llmbase query "What are the key concepts?"
llmbase lint heal

Autonomous mode (deploy once, server keeps learning)

# config.yaml
worker:
  enabled: true
  learn_source: cbeta         # built-in: cbeta | wikisource | both; custom via register_learn_source()
  learn_interval_hours: 6
  compile_interval_hours: 1
  health_check_interval_hours: 24

health:
  auto_fix_broken_links: true
  max_stubs_per_run: 10

The worker starts under the production WSGI entrypoint (wsgi.py → start_worker_thread). Deploy with gunicorn wsgi:app; llmbase web alone does not self-start the worker.

Daily use as agent memory

Agent receives a task → calls kb_search for relevant concepts
If the compiled answer is too abstract → calls kb_search_raw for verbatim detail
Learns something new → calls kb_ingest with the URL
Optionally kb_compile to fold it into concepts for next session
Periodically kb_lint heals the graph

Key Concepts

Synthesis, not archiving — LLM reads raw material and writes composed articles; storage is the cheap part
Two-layer recall — kb_search (concepts) + kb_search_raw (verbatim raw sources)
Trilingual default — every article has EN / 中文 / 日本語 sections
叠加进化 — new data merges into existing concepts, never overwrites
Domain-agnostic — taxonomy emerges per-domain, nothing hardcoded
Self-healing — 7-step auto-fix pipeline repairs drift
Alias resolution — [[参禅]] → can-chan.md across scripts and simplified/traditional
Registry-backed ops — MCP dispatches every tool through operations.py; CLI exposes the same registry via llmbase ops list / llmbase ops call; direct HTTP/CLI wrappers are being migrated onto the registry

Tips

--file-back saves Q&A answers into the wiki so future queries benefit
--tone wenyan for Chinese users (classical Chinese responses)
Run llmbase lint heal after large ingestion batches
Web UI /health has buttons for every repair op
Knowledge graph at /graph — density slider for large KBs
Timeline at /explore — requires entities: { enabled: true } in config

Security & Privacy

All data stays local — wiki files are plain markdown on your filesystem
LLM API key — user-supplied, loaded from .env
Network access — user-initiated (URL ingest, SSRF-protected) plus corpus plugins (cbeta-learn, wikisource-learn, ctext-book) and the autonomous worker when enabled
Web server — optional; binds 0.0.0.0 so LAN-accessible by default — front with a reverse proxy or bind override for public exposure
API secret — cloud deployments (with PORT env) gate most mutating endpoints behind LLMBASE_API_SECRET (auto-generated if unset). Note: /api/ask is open by default and writes Q&A back via file_back; only promotion to concepts requires the secret
Autonomous worker — opt-in via config, disabled by default
No telemetry — nothing is sent anywhere except the configured LLM API

Usage Guidance

This skill appears to be what it claims (a local, LLM-backed personal wiki) but there are two things to check before installing: 1) The SKILL.md asks you to 'pip install llmwiki' and to provide an LLM API key and optional base URL/model — verify the PyPI package matches the GitHub repo (review the repo and PyPI page) so you know what code will run. 2) The registry metadata did not list these env vars/install steps even though SKILL.md does — treat that as an inconsistency and prefer the SKILL.md but verify sources. Operational cautions: run the package in a sandboxed environment or VM if possible; don't enable the autonomous worker or public web UI until you confirm configuration (bind to localhost, require auth); avoid ingesting sensitive local files unless you understand where the wiki stores and transmits data; and restrict the LLM API key scope/usage and monitor network activity. If you want to proceed, audit the GitHub repo (especially any setup/operations scripts) or run tests in a controlled environment first.

Capability Analysis

Type: OpenClaw Skill Name: llmwiki Version: 0.8.0 The llmwiki skill is a personal knowledge base tool that uses LLMs to compile documents into a structured wiki. It includes features for ingesting web content, PDFs, and specific historical corpora (CBETA, Wikisource). While it requests network and filesystem permissions, these are consistent with its stated functionality of fetching content and storing markdown files locally. The documentation in SKILL.md includes security notes regarding SSRF protection and web server exposure, and there are no indicators of data exfiltration, unauthorized execution, or malicious prompt injection.

Capability Tags

requires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

The SKILL.md describes an LLM-powered personal KB (ingest, compile, query, web UI, MCP). The declared requirements in SKILL.md (LLM API key, optional base URL/model, network/filesystem/server permissions) are coherent with that purpose. However, the registry metadata above lists no required env vars and no install spec, which conflicts with the SKILL.md.

ℹ Instruction Scope

Runtime instructions include pip install, setting LLMBASE_* env vars, ingesting arbitrary URLs and local files/dirs, compiling to local wiki paths, and starting optional servers and an autonomous worker. Those actions are expected for a KB but permit broad operations (fetching remote URLs, reading local files, and writing under raw/ and wiki/). The worker auto‑fetch behavior is opt‑in but could fetch external sources if enabled.

ℹ Install Mechanism

SKILL.md lists 'pip install llmwiki' (a standard PyPI install). Installing from PyPI is common but runs third‑party code on the host. The registry previously indicated 'No install spec' which is inconsistent with the instruction; that mismatch should be resolved (confirm package identity and provenance on PyPI/GitHub before installing).

ℹ Credentials

SKILL.md requires an LLM API key (LLMBASE_API_KEY) and offers optional LLMBASE_BASE_URL, LLMBASE_MODEL, and fallback list — these are proportionate for an LLM-driven tool. Again, registry metadata earlier reported no required env vars, so there's an inconsistency between declared runtime requirements and the registry record.

ℹ Persistence & Privilege

The skill can run a web UI, an agent HTTP API, and an MCP server and supports an autonomous worker when enabled. 'always' is false and autonomous invocation is platform default; the skill does not request forced/global persistence. Still, starting network services and a persistent worker are privileged actions the user should intentionally enable and configure (e.g., bind interfaces, firewall, authentication).

Version History

v0.8.0

BREAKING: rename tools/ → llmwiki/ package namespace. All imports change from 'from tools.xxx' to 'from llmwiki.xxx'. MCP invocation: python -m llmwiki. CLI (llmbase) and PyPI name (llmwiki) unchanged.

v0.7.10

v0.7.10: wikisource — preserve {{*|content}} small-note template (王弼注/河上公章句/七家注 content was being stripped at ingest)

v0.7.9

tools.anchor: locate_span + normalize_text — annotation→annotated-span alignment primitive for kepan, citations, targeted comments. Pure string algorithm; offsets into ORIGINAL content; normalize_text exposed for JS frontend mirror. siwen 议己.

v0.7.8

v0.7.8: chat_with_meta + reasoning_budget (议 5-甲乙). Siwen 5th-batch post-mortem — 11h wenguan failure root cause was absence of finish_reason='length' detection. chat_with_meta surfaces finish_reason/usage(incl. reasoning_tokens)/attempts/truncated. reasoning_budget(max_tokens, tokens_per_char, safety=0.8) is a pure calculator, no upstream model table. chat() is a thin wrapper — zero v0.7.x break. docs/pipelines.md adds 'Choosing the cid' + 'Sizing chunks' pattern sections (戊). 议丙 aggregate_and_fallback deferred, 议丁 siwen-side. 3 Codex rounds fixed dict-shaped usage / inf overflow / int-too-large-for-float. 419 tests.

v0.7.7

tools.pipeline 议 D — composable multi-stage primitives: run_stage contextmanager (driver guarantees ok/failed/partial terminal per run), rebuild_state (log is truth, state is view), StageLock (atomic tempfile+os.link + fcntl.flock breaker, strict TTL). Opaque stage/key/meta; no DAG, no scheduler. 13 Codex review rounds. 80 pipeline tests; 391 total.

v0.7.2

/api/articles/lite gains ?tag=<slug> server-side filter (index.json-backed, no frontmatter parse) + opt-in browser cache via LLMBASE_LITE_CACHE_MAX_AGE env var. ETag now keyed on tag param so distinct slices never share a 304. Driven by siwen.ink (~13k articles) sidebar payload pain. Default Cache-Control behaviour unchanged.

v0.6.8

Web-UI compile button survives navigation (closes #7): GET /api/worker/status reports {busy:bool}; Ingest.tsx polls on mount, recovers in-flight compile state, mounted-ref guards on every post-await setState; typed ApiError in lib/api.ts. (Backfilled to PyPI 2026-04-18.)

v0.6.7

Hardened /api/ask model override (require raw API_SECRET when secret set); fixed URL-slug corruption (#5) + heal_urly_slugs pass; LLMBASE_HTTP_TIMEOUT/CONNECT_TIMEOUT env vars (#6); LLMBASE_MODEL_ALLOWLIST; llmbase -v/-vv/-vvv CLI verbosity. (Backfilled to PyPI 2026-04-18.)

v0.7.1

Section-slicing API for long articles: tools/sections.py + kb_get_sections op + GET /api/articles/{slug}/sections + kb_get section= subtree extraction. Anchor format h{level}-{slug-short}-{hash6}, stable across cosmetic title edits + sibling reorder. Codex pre-commit caught 5 issues (HIGH: path-traversal x2; MEDIUM: fence-close, ATX heading edge cases; LOW: hash collisions) — all fixed.

v0.6.9

Mermaid render in Markdown component (lazy-loaded, theme-aware) + deep-nest CSS for ul/ol up to 8 layers (bullet rotation + outline rail). Frontend-only release driven by 斯文·太虛間 (太虛大師全書 reading library).

v0.6.6

v0.6.6: per-request model override on /api/ask + UTF-8 surrogate sanitize. (A) /api/ask body now accepts `model` field, threaded through kb_ask Operation → query()/query_with_search() → chat(). (C) Lone surrogates (U+D800-U+DFFF) sneaking in via half-decoded HTML/PDF ingest no longer crash deep RAG — sanitized at chat_with_context and at ingest write.

v0.6.5

v0.6.5: fix #4 — .env now discovered correctly under pipx/PyPI installs. New lookup order: LLMBASE_ENV_FILE → $PWD/.env (when config.yaml declares llmbase paths) → ~/.config/llmbase/.env → package dir. Shell exports still win.

v0.6.4

v0.6.4: /api/articles scales — pagination (limit/cursor/tag/q/fields), new /api/articles/lite endpoint, RFC 7232 ETag + 304 on articles/taxonomy. 12k-article sidebar load: 3.66MB/3.5s → 500KB/0.3s.

v0.6.3

v0.6.3: TF-IDF prefilter in query_with_search caps the LLM selector prompt at O(top_k), unblocking kb_ask deep=true and promote=true on KBs above ~10k articles (observed: 11,625-article KB previously exceeded upstream context windows). Configurable via config.yaml::query.prefilter_threshold (500) and .prefilter_top_k (200).

v0.6.2

v0.6.2: kb_search_raw raw-source fallback + lint perf cache; SKILL.md corrected for CLI command name (llmbase not llmwiki), accurate security posture, and up-to-date MCP tool list

v0.1.1

Fix security metadata: declare credentials, permissions, install steps. Add Security section.

v0.1.0

Initial release: CLI commands, workflows, MCP integration, self-healing KB

Metadata

Slug llmwiki

Version 0.8.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 17

Frequently Asked Questions

What is Llmwiki?

LLM-powered personal knowledge base. Raw documents in, an LLM compiles them into a structured interlinked wiki with trilingual articles, emergent taxonomy, a... It is an AI Agent Skill for Claude Code / OpenClaw, with 297 downloads so far.

How do I install Llmwiki?

Run "/install llmwiki" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Llmwiki free?

Yes, Llmwiki is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Llmwiki support?

Llmwiki is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Llmwiki?

It is built and maintained by Huang Geyang (@hosuke); the current version is v0.8.0.

More Skills

Llmwiki