功能描述

Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents

使用说明 (SKILL.md)

CLI Reference

Name: llm-wiki SKILL inspired by Karpathy
Author: nemo4110

Protocol Mode (Recommended)

Use natural language with your agent:

"Please ingest sources/paper.pdf into wiki"
"Query wiki: What is the difference between Transformer and RNN?"
"Check wiki health"

CLI Mode (Optional)

After installing dependencies:

# Show wiki status overview
python -m src.llm_wiki status

# Run health check
python -m src.llm_wiki lint

# Show help
python -m src.llm_wiki --help

Note: ingest and query commands in CLI only provide auxiliary functions (like listing pages). Actual content processing requires natural language interaction with the agent.

LLM-Wiki

Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents.

Core Philosophy: LLM as programmer, Wiki as codebase, User as product manager.

Why SKILL Form?

Dimension	Standalone App (e.g. Sage-Wiki)	This SKILL Implementation
Architecture	Go + SQLite + Embedded Frontend	Pure Markdown
Deployment	Requires running service	Zero deployment
Integration	Indirect via MCP	Native commands
Code Size	~10k lines	~500 lines
Data Format	Proprietary	Plain text Markdown
Editor	Locked in app	Obsidian/VSCode/Any

Features

Protocol-driven: Works with natural language (no installation required)
Pure Markdown: No database, no lock-in, git-native
Wiki-style links: [[PageName]] format, Obsidian-compatible
Cumulative learning: Every query can create new knowledge
Health checks: Orphan pages, dead links, stale content detection
Optional CLI: Python scripts for automation and batch operations

Quick Start

# 1. Clone
git clone https://github.com/Nemo4110/llm-wiki.git
cd llm-wiki

# 2. Add source material
cp ~/Downloads/paper.pdf sources/

# 3. Tell your agent
"Please ingest sources/paper.pdf into wiki"

Installation

Protocol Mode (Recommended)

No installation needed. Agent reads CLAUDE.md and operates directly.

CLI Mode (Optional)

Using uv (Fastest)

# Create virtual environment and install dependencies
uv venv
uv pip install -r src/requirements.txt --python .venv/Scripts/python.exe

# Activate environment (Windows)
.venv\Scripts\activate
# Or Linux/macOS
source .venv/bin/activate

Using conda

# Create environment
conda create -n llm-wiki python=3.11

# Activate environment
conda activate llm-wiki

# Install dependencies
pip install -r src/requirements.txt

Using pip

# Create virtual environment
python -m venv .venv

# Activate environment
source .venv/bin/activate  # Linux/macOS
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r src/requirements.txt

Verify Installation

python -c "from src.llm_wiki.core import WikiManager; print('✓ Installation successful')"

Important Dependency Notes:

Dependency	Version	Purpose	Notes
`click`	>=8.0.0	CLI framework	-
`pyyaml`	>=6.0	YAML parsing	-
`pymupdf`	>=1.25.0	PDF processing	Primary PDF engine, best for CJK

Optional dependencies (for enhanced features):

numpy >=1.24.0 — Vector operations for embedding retrieval
httpx >=0.27.0 — HTTP client for Ollama/local services
openai >=1.0.0 — OpenAI embedding API
mcp >=1.0.0 — MCP SDK for remote embedding providers

Fallback PDF dependency:

pdfplumber >=0.11.8 — Table extraction fallback (security version required for CVE-2025-64512)
pdfminer.six >=20251107 — PDF underlying library fallback

Project Structure

llm-wiki/
├── CLAUDE.md           # ⭐ Core protocol: Agent behavior guidelines
├── AGENTS.md           # Agent implementation guide (CLI usage)
├── SKILL.md            # This file, machine-readable specification
├── log.md              # Timeline log (append-only)
├── sources/            # Raw materials (user-managed + tool-fetched; Agent forbidden from writing LLM-generated content)
│   └── README.md
├── wiki/               # Generated knowledge pages (Agent-managed)
│   ├── index.md        # Entry index
│   └── *.md            # Topic pages
├── assets/             # Templates and configuration
│   ├── page_template.md
│   └── ingest_rules.md
├── src/                # SKILL implementation (optional, for CLI)
│   ├── llm_wiki/
│   └── requirements.txt
├── scripts/            # Auxiliary scripts
├── hooks/              # Platform hooks (optional)
└── examples/           # Example wiki

About sources/: Excluded from git by default to avoid repository bloat. Wiki only retains extracted knowledge; original files are managed separately (cloud storage, Zotero, etc.). See sources/README.md for tracking specific files.

How It Works

Data Flow

+----------+     +--------------------+     +--------------+
| sources/ |---->|   LLM Processing   |---->|    wiki/     |
|  (Raw)   |     | (Extract + Link)   |     | (Structured) |
+----------+     +--------------------+     +--------------+
                          |
                          v
                    +----------+
                    |  log.md  |
                    | (Record) |
                    +----------+

Key Design

CLAUDE.md as Protocol: Defines Agent behavior standards, anyone/any Agent can follow
Pure Markdown: No database, no lock-in, native git version control
Bidirectional Links: [[PageName]] format, compatible with Obsidian
Cumulative Learning: Each query can generate new wiki pages, knowledge continuously accumulates

Query Mechanism

Current Implementation: Symbolic Navigation + LLM Synthesis (Default)

By default, this SKILL does not require Embedding/vector retrieval. Queries are completed through:

User asks question
         |
         v
+-------------------------------+
|  1. Read index.md             |  \x3C-- Human/Agent-maintained category index
|     Locate relevant topics    |
+-------------------------------+
         |
         v
+-------------------------------+
|  2. Read relevant pages       |  \x3C-- Discover associations through [[links]]
|     and their link neighbors  |
+-------------------------------+
         |
         v
+-------------------------------+
|  3. LLM Synthesis             |  \x3C-- Generate answers based on read content
|     Generate with citations   |  Citation format: [[PageName]]
+-------------------------------+

Optional Enhancement: After enabling config.yaml embedding settings, CLI query --semantic adds hybrid search (Keyword Match + Vector Search + Link Traversal) for faster, more accurate retrieval.

Example Flow:

User asks: "What is LoRA?"

Agent reads wiki/index.md, finds [[LoRA]] under "AI/ML" topic
Agent reads wiki/LoRA.md, discovers links to [[Fine-tuning]], [[Adapter]]
Agent synthesizes answer:

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method — see [[LoRA]]. Compared to traditional [[Fine-tuning]], it only trains low-rank matrices...

Why is Embedding Optional?

Consideration	Current Solution	Embedding Solution
Dependencies	Zero external dependencies	Requires Embedding API or local model
Cost	No additional fees	Charged per token/request
Privacy	Data not uploaded	Must send content to external service
Accuracy	Precise links, explainable	Approximate similarity, may retrieve irrelevant content
Scale	Suitable for 0-500 pages	Essential for large scale (1000+ pages)

Conclusion: For personal/small team knowledge bases, maintaining index.md and page links is simpler and more effective than introducing Embedding. Embedding is available as an opt-in CLI enhancement when scale demands it.

When to Use Embedding?

Consider enabling config.yaml embedding settings when your wiki shows:

Page count > 500, manual indexing hard to maintain
Queries are mostly fuzzy semantic ("that paper about optimization...")
Need cross-topic associations (user didn't mention keywords, but semantically related)

See ROADMAP.md for details.

Example Workflows

Scenario 1: Researching New Domain

# Day 1: Add materials
sources/
├── 2026-04-10-transformer-paper.pdf
└── 2026-04-10-lora-blog.md

User: Please ingest new materials
Claude: Processed 2 materials
  - Created [[Transformer]] — Attention mechanism architecture
  - Created [[LoRA]] — Low-rank adaptation
  - Created [[Self-Attention]] — Self-attention details
  - Updated wiki/index.md

Scenario 2: Deep Exploration

User: What's the difference between LoRA and full fine-tuning?

Claude (reads wiki/index.md → finds [[LoRA]] → synthesizes answer):

Based on our wiki:

**LoRA** — [[LoRA]]
- Freezes pre-trained weights, only trains low-rank matrices
- Parameter count reduced ~10,000x

**Full Fine-tuning**
- Updates all parameters
- High computational cost

Key differences:
1. Parameters: LoRA only trains 0.1%~1% of parameters [[Parameter Efficiency]]
2. Storage: LoRA only saves adapters, not full model
3. Composition: Multiple LoRAs can be stacked [[LoRA Composition]]

Was this helpful? Should I archive this answer to wiki?

Scenario 3: Knowledge Accumulation

User: Yes, archive this answer

Claude: Created [[LoRA vs Full Fine-tuning]]
- Extracted comparison points from conversation
- Linked to [[LoRA]] and [[Fine-tuning]]
- Added to FAQ section in wiki/index.md

Using with Obsidian

Open wiki/ directory in Obsidian
Enjoy graph view, quick navigation, beautiful rendering
Claude Code handles maintenance, Obsidian handles reading and thinking

Comparison with Alternatives

Solution	Characteristics	Best For
This SKILL	Zero dependencies, pure text, Claude Code native	Personal knowledge management, research notes
Sage-Wiki	Full-featured, multimodal, standalone app	Team knowledge base, enterprise deployment
Obsidian + Plugins	Strong visualization, rich community	Existing Obsidian workflow
Notion/Logseq	Collaborative, real-time sync	Multi-user collaboration, mobile access

Documentation

CLAUDE.md — User-facing protocol (read this first)
AGENTS.md — Implementation guide for agent developers
SKILL.md — This file, machine-readable specification
ROADMAP.md — Future plans

Contributing

Issues and PRs welcome!

Current TODO

MCP server wrapper (for other Agents)
Obsidian plugin (one-click sync)
Incremental embedding for faster retrieval
Multi-language support

License

MIT — free to use, modify, and distribute.

Inspired by Karpathy's llm-wiki

安全使用建议

This skill appears to implement the advertised LLM-wiki functionality, but it also instructs agents to fetch web pages, run Playwright/curl, and optionally call remote embedding providers (OpenAI/Ollama/MCP). Before installing or enabling it: - Verify network policy: the skill's spec did not declare network capability; confirm whether your agent environment will allow outbound network requests and whether you are comfortable with that. - Check config.yaml and defaults: embedding.enabled defaults to false, but if you enable embeddings you may need to provide API keys. Do NOT store sensitive secrets in repository files; prefer runtime environment variables in a secure store. The skill does not declare required env vars in registry metadata, so manually inspect config.yaml and any env interpolation before use. - Review scripts and provider code (create_provider / mcp integration): MCP can be configured to use stdio/command transports; confirm there is no unintended command execution path you don't want. - Limit write scope: the agent will write to wiki/ and log.md and may write fetched files to sources/. Keep sensitive files out of these directories or run the skill in an isolated sandbox/repo copy. - If you do not want external network calls, keep embedding.enabled: false and avoid invoking ingest flows that require web fetch. Prefer dry-run or CLI read-only modes first to observe behavior. If you want, I can highlight the exact lines/functions that enable network calls and where environment interpolation is used so you can audit them more easily.

能力标签

cryptocan-make-purchasesrequires-sensitive-credentials

能力评估

ℹ Purpose & Capability

Name/description, code, and SKILL.md consistently implement a local markdown-based wiki (ingest, link, relink, query, lint). Requested host capabilities (filesystem-read, filesystem-write, llm-completion) match the core purpose. However, the skill's documented workflows and code also assume the ability to fetch web resources and call remote embedding providers (httpx, OpenAI, Ollama, MCP), which is not declared in the top-level capabilities — an omission that should be clarified.

⚠ Instruction Scope

SKILL.md / AGENTS.md explicitly instruct agents to run network fetches (curl, Playwright, httpx), to write downloaded files into sources/, to create and back-update wiki pages, and to run CLI scripts. These instructions give the agent permission to download arbitrary URLs, spawn Playwright browsers, and modify the repo's wiki and log files. That scope is appropriate for an ingesting wiki, but the skill fails to explicitly declare or limit network usage and does not enumerate the external endpoints/providers it may contact, which is a transparency concern.

ℹ Install Mechanism

No registry-level install spec was provided (instruction-only skill), but the repository includes installation instructions and a requirements.txt. The declared dependencies are common (click, pyyaml, pymupdf, httpx, openai, mcp). There are no downloads from arbitrary URLs in the install spec. Overall install risk is moderate and typical for a Python CLI project; nothing obviously malicious in the provided install guidance.

⚠ Credentials

The skill lists providers (openai, ollama, mcp) and config.yaml interpolation supports environment variables (config.py), yet the registry metadata declares no required env vars or primary credential. That mismatch means the skill can be configured to use powerful remote APIs (which require API keys) without those needs being surfaced during installation. Users may accidentally enable/expose external providers. Also, MCP transport can be configured to run commands/stdio — this deserves attention before enabling.

✓ Persistence & Privilege

The skill does not request always:true and does not appear to modify other skills or system-wide agent settings. It writes to local wiki/, log.md, and sources/ (when fetching) which are normal for its purpose. Model invocation is allowed (disable-model-invocation: false) — this is the platform default and expected for skills.

版本历史

v1.3.0

# llm-wiki v1.1.4 Changelog - Added a new file: `docs/README.cn.md` (Chinese README). - No changes to code or functionality; documentation improvement only.

v1.1.3

Version 1.3.0 introduces relationship linking and merging capabilities for wiki pages. - Added new CLI commands and core functions for dynamic linking (`link`) and batch relationship discovery (`relink`) between wiki pages. - Introduced `src/llm_wiki/linker.py` and `src/llm_wiki/merge.py` modules to support relationship discovery and merge strategies. - Enhanced `ingest` workflow to automatically suggest and update page relationships, with options for light or deep linking. - Updated function documentation to reflect new workflows, triggers, and relationship management steps. - Increased version number to 1.3.0 to reflect these significant new knowledge graph and linking features.

v1.1.2

- Added `log.md` for timeline/changelog tracking. - Added `wiki/index.md` as a structured index entry point. - Updated and reorganized dependency list to include `pymupdf`, `numpy`, `httpx`, `openai`, and `mcp` for enhanced PDF and embedding support. - Clarified and expanded CLI and installation instructions in documentation. - Updated security notes about fallback PDF handling in the README. - Improved documentation across AGENTS.md, ROADMAP.md, and new file handling details.

v1.1.0

- Introduced stub page creation for any new [[Dead Link]] encountered during content ingestion. - Added new source files: config.py, embeddings.py, and retrieval.py to enhance functionality. - Removed redundant files: log.md and wiki/index.md as part of project restructuring.

v1.0.4

- Updated the repository URL in metadata from "https://github.com/yourname/llm-wiki" to "https://github.com/Nemo4110/llm-wiki.git". - No functional, interface, or code changes were made.

v1.0.2

No changes detected in this version. - Version bumped to 1.0.2, but no code or documentation changes from previous release. - All features, structure, and instructions remain unchanged.

v1.0.1

Version 1.0.1 - Added `src/llm_wiki/__main__.py` to support direct CLI invocation via `python -m src.llm_wiki`. - Expanded documentation in SKILL.md to include a detailed CLI reference and installation notes. - Added security warnings and dependency version requirements for PDF processing libraries. - No changes to core functions or protocol—this is a documentation and usability update.

v1.0.0

Initial release of llm-wiki skill — cumulative knowledge management for AI agents. - Implements Karpathy’s llm-wiki pattern for managing and querying knowledge. - Supports ingestion of source material, wiki querying, and lint/health checks. - Compatible with Claude Code, OpenClaw, and generic LLM-agent platforms. - Offers both protocol (no dependencies) and CLI (Python + click/pyyaml) modes. - Provides clear separation of sources, wiki content, scripts, and examples. - Detailed installation options and accessible documentation included.

元数据

Slug 041-llm-wiki

版本 1.3.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 8

常见问题

llm-wiki SKILL inspired by Karpathy 是什么？

Karpathy's llm-wiki pattern implementation — cumulative knowledge management for AI agents. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 243 次。

如何安装 llm-wiki SKILL inspired by Karpathy？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install 041-llm-wiki」即可一键安装，无需额外配置。

llm-wiki SKILL inspired by Karpathy 是免费的吗？

是的，llm-wiki SKILL inspired by Karpathy 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

llm-wiki SKILL inspired by Karpathy 支持哪些平台？

llm-wiki SKILL inspired by Karpathy 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 llm-wiki SKILL inspired by Karpathy？

由 T0M0R1N（@nemo4110）开发并维护，当前版本 v1.3.0。

llm-wiki SKILL inspired by Karpathy

CLI Reference

Protocol Mode (Recommended)

CLI Mode (Optional)

LLM-Wiki

Why SKILL Form?

Features

Quick Start

Installation

Protocol Mode (Recommended)

CLI Mode (Optional)

Using uv (Fastest)

Using conda

Using pip

Verify Installation

Project Structure

How It Works

Data Flow

Key Design

Query Mechanism

Current Implementation: Symbolic Navigation + LLM Synthesis (Default)

Why is Embedding Optional?

When to Use Embedding?

Example Workflows

Scenario 1: Researching New Domain

Scenario 2: Deep Exploration

Scenario 3: Knowledge Accumulation

Using with Obsidian

Comparison with Alternatives

Documentation

Contributing

Current TODO

License

llm-wiki SKILL inspired by Karpathy 是什么？

如何安装 llm-wiki SKILL inspired by Karpathy？

llm-wiki SKILL inspired by Karpathy 是免费的吗？

llm-wiki SKILL inspired by Karpathy 支持哪些平台？

谁开发了 llm-wiki SKILL inspired by Karpathy？

💬 留言讨论