功能描述

Operational tooling for teams running local LLM infrastructure. Request tracing with full scoring breakdowns, per-application usage analytics via request tag...

使用说明 (SKILL.md)

AI DevOps Toolkit — Observability for Local AI Fleets

Name: Ai Devops Toolkit
Author: twinsgeeks

DevOps tooling for running local LLM inference at production quality. This DevOps skill provides the observability, tracing, and health monitoring layer for an Ollama Herd fleet. Every DevOps workflow — from request tracing to capacity planning — runs through a single SQLite-backed observability stack.

DevOps Prerequisites

pip install ollama-herd
herd              # start the DevOps router (exposes all DevOps observability endpoints)
herd-node         # start on each DevOps-monitored node

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

DevOps Scope

This DevOps toolkit assumes you have an Ollama Herd router running at http://localhost:11435 with one or more node agents reporting in. It focuses on the DevOps operational side: are requests succeeding? what's slow? which apps consume the most tokens? are nodes healthy? is capacity adequate?

DevOps Observability Stack

Everything in this DevOps observability layer is backed by SQLite at ~/.fleet-manager/latency.db. No external databases, no time-series infrastructure. Query DevOps traces with standard sqlite3.

~/.fleet-manager/
├── latency.db          # DevOps traces, latency history, usage stats
└── logs/
    └── herd.jsonl      # DevOps structured logs, daily rotation, 30-day retention

DevOps Health Checks

Automated DevOps fleet health analysis

devops_health=$(curl -s http://localhost:11435/dashboard/api/health)
echo "$devops_health" | python3 -m json.tool

Fifteen DevOps checks, each returning a severity (info/warning/critical) and recommendation:

DevOps Check	What it detects
Offline nodes	Nodes that stopped sending heartbeats
Degraded nodes	Nodes reporting errors or high memory pressure
Memory pressure	Nodes approaching memory limits
Underutilized nodes	Healthy nodes not receiving traffic
VRAM fallbacks	Requests rerouted to loaded alternatives to avoid cold loads
Version mismatch	Nodes running different versions than the router
Context protection	num_ctx values stripped or models upgraded to prevent reloads
Zombie reaper	Stuck in-flight requests cleaned up
Model thrashing	Models loading/unloading frequently (memory contention)
Request timeouts	Requests exceeding expected DevOps latency thresholds
Error rates	Elevated failure rates per model or per node

DevOps node-level status

devops_fleet_status=$(curl -s http://localhost:11435/fleet/status)
echo "$devops_fleet_status" | python3 -c "
import sys, json
d = json.load(sys.stdin)
print(f\"DevOps Fleet: {d['fleet']['nodes_online']}/{d['fleet']['nodes_total']} online, {d['fleet']['requests_active']} active requests\")
for n in d['nodes']:
    mem = n.get('memory', {})
    cpu = n.get('cpu', {})
    print(f\"  {n['node_id']:20s} {n['status']:10s} CPU={cpu.get('utilization_pct',0):.0f}% MEM={mem.get('used_gb',0):.0f}/{mem.get('total_gb',0):.0f}GB pressure={mem.get('pressure','?')}\")
"

DevOps Request Tracing

Every DevOps routing decision is recorded with full observability context.

Recent DevOps traces

devops_traces=$(curl -s "http://localhost:11435/dashboard/api/traces?limit=20")
echo "$devops_traces" | python3 -m json.tool

Each DevOps trace includes: request_id, model, original_model (before fallback), node_id, score, scores_breakdown (all 7 signals), status, latency_ms, time_to_first_token_ms, prompt_tokens, completion_tokens, retry_count, fallback_used, tags.

DevOps failure investigation

# Recent DevOps failures with error details
sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, node_id, error_message, latency_ms/1000.0 as secs, datetime(timestamp, 'unixepoch', 'localtime') as time FROM request_traces WHERE status='failed' ORDER BY timestamp DESC LIMIT 20"

# DevOps retry frequency — which nodes need attention?
sqlite3 ~/.fleet-manager/latency.db "SELECT node_id, SUM(retry_count) as retries, COUNT(*) as total, ROUND(100.0 * SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END) / COUNT(*), 1) as fail_pct FROM request_traces GROUP BY node_id ORDER BY fail_pct DESC"

# DevOps fallback frequency — which models are unreliable?
sqlite3 ~/.fleet-manager/latency.db "SELECT original_model, model as fell_back_to, COUNT(*) as n FROM request_traces WHERE fallback_used=1 GROUP BY original_model, model ORDER BY n DESC"

DevOps Latency Analysis

# DevOps P50/P75/P99 latency by model
sqlite3 ~/.fleet-manager/latency.db "
WITH ranked AS (
  SELECT model, latency_ms,
    PERCENT_RANK() OVER (PARTITION BY model ORDER BY latency_ms) as pct
  FROM request_traces WHERE status='completed'
)
SELECT model,
  ROUND(MIN(CASE WHEN pct >= 0.5 THEN latency_ms END)/1000.0, 1) as p50_s,
  ROUND(MIN(CASE WHEN pct >= 0.75 THEN latency_ms END)/1000.0, 1) as p75_s,
  ROUND(MIN(CASE WHEN pct >= 0.99 THEN latency_ms END)/1000.0, 1) as p99_s,
  COUNT(*) as n
FROM ranked GROUP BY model HAVING n > 10 ORDER BY p75_s DESC
"

# DevOps time-to-first-token observability (cold load detection)
sqlite3 ~/.fleet-manager/latency.db "SELECT node_id, model, ROUND(AVG(time_to_first_token_ms), 0) as avg_ttft_ms, ROUND(MAX(time_to_first_token_ms), 0) as max_ttft_ms, COUNT(*) as n FROM request_traces WHERE time_to_first_token_ms IS NOT NULL GROUP BY node_id, model HAVING n > 5 ORDER BY avg_ttft_ms DESC"

# DevOps outlier detection — slowest requests
sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, node_id, ROUND(latency_ms/1000.0, 1) as secs, prompt_tokens, completion_tokens, retry_count, datetime(timestamp, 'unixepoch', 'localtime') as time FROM request_traces WHERE status='completed' ORDER BY latency_ms DESC LIMIT 10"

DevOps Per-Application Analytics

Tag requests to track DevOps usage per application, team, or environment.

DevOps request tagging

# DevOps tag via request body
curl -s http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello"}],"metadata":{"tags":["devops-prod","devops-code-review"]}}'

# DevOps tag via header
curl -s -H "X-Herd-Tags: devops-prod, devops-code-review" \
  http://localhost:11435/v1/chat/completions \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello"}]}'

DevOps per-tag dashboards

curl -s http://localhost:11435/dashboard/api/apps | python3 -m json.tool
curl -s http://localhost:11435/dashboard/api/apps/daily | python3 -m json.tool

DevOps token consumption by tag

sqlite3 ~/.fleet-manager/latency.db "SELECT j.value as devops_tag, COUNT(*) as requests, SUM(COALESCE(prompt_tokens,0)) as prompt_tok, SUM(COALESCE(completion_tokens,0)) as completion_tok, SUM(COALESCE(prompt_tokens,0)+COALESCE(completion_tokens,0)) as total_tok FROM request_traces, json_each(tags) j WHERE tags IS NOT NULL GROUP BY j.value ORDER BY total_tok DESC"

DevOps Traffic Patterns

# DevOps requests per hour (find peak load times)
sqlite3 ~/.fleet-manager/latency.db "SELECT CAST((timestamp % 86400) / 3600 AS INTEGER) as hour_utc, COUNT(*) as requests, ROUND(AVG(latency_ms)/1000.0, 1) as avg_secs FROM request_traces GROUP BY hour_utc ORDER BY hour_utc"

# DevOps daily request volume
sqlite3 ~/.fleet-manager/latency.db "SELECT date(timestamp, 'unixepoch') as day, COUNT(*) as requests, SUM(COALESCE(prompt_tokens,0)+COALESCE(completion_tokens,0)) as tokens FROM request_traces GROUP BY day ORDER BY day DESC LIMIT 14"

DevOps Capacity Planning

DevOps model recommendations per node

devops_recommendations=$(curl -s http://localhost:11435/dashboard/api/recommendations)
echo "$devops_recommendations" | python3 -m json.tool

Returns DevOps recommendations based on hardware capabilities, current usage, and curated benchmark data. Use for DevOps capacity planning: which models fit on which machines, and what's the optimal mix.

DevOps usage statistics

curl -s http://localhost:11435/dashboard/api/usage | python3 -m json.tool

DevOps Configuration

# View all DevOps settings
curl -s http://localhost:11435/dashboard/api/settings | python3 -m json.tool

# Toggle DevOps runtime settings
curl -s -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"auto_pull": false}'

DevOps Log Analysis

Structured JSONL logs at ~/.fleet-manager/logs/herd.jsonl — the DevOps log layer:

# Recent DevOps errors
grep '"level": "ERROR"' ~/.fleet-manager/logs/herd.jsonl | tail -10 | python3 -m json.tool

# DevOps context protection events
grep "Context protection" ~/.fleet-manager/logs/herd.jsonl | tail -10

# DevOps stream errors
grep "Stream error" ~/.fleet-manager/logs/herd.jsonl | tail -10

DevOps Dashboard

Web dashboard at http://localhost:11435/dashboard. Key DevOps tabs:

Trends — DevOps requests/hour, latency, token throughput over 24h–7d
Apps — DevOps per-tag analytics with daily breakdowns
Health — automated DevOps health checks with severity and recommendations
Model Insights — per-model DevOps latency and throughput comparison

Guardrails

Never restart DevOps services without explicit user confirmation.
Never delete or modify ~/.fleet-manager/ contents.
Do not pull or delete models without user confirmation.
Report DevOps issues to the user rather than attempting automated fixes.
If the router isn't running, suggest herd or uv run herd.

安全使用建议

This skill appears to be an observability helper for a local Ollama Herd deployment and largely does what it says, but take these precautions before installing or running it: - Verify the package/repo: SKILL.md points to a PyPI package and GitHub repo; inspect the repository and PyPI package (code, maintainer, recent activity) before running pip install. - Use isolation: install ollama-herd inside a dedicated virtualenv, container, or sandbox so install-time scripts can't affect your primary environment. - Audit network exposure: the daemon listens on http://localhost:11435 — confirm this port and service are acceptable for your environment and restrict access if necessary. - Inspect local data: the skill reads and writes ~/.fleet-manager/latency.db and logs; ensure those files don't contain sensitive data you don't want read or exposed and check file permissions. - Confirm registry mismatch: the registry metadata provided to the platform lists no required binaries/config paths, but SKILL.md does — treat the SKILL.md as authoritative and be wary of incomplete registry declarations. If you want to proceed safely: clone the GitHub repo, review the code, run the service in a container/VM, and avoid running pip install globally without inspection.

能力评估

ℹ Purpose & Capability

The SKILL.md describes an Ollama Herd observability tool and the actions it requires (curl, sqlite3, pip/ollama-herd, reading ~/.fleet-manager/latency.db and logs) are appropriate for that purpose. However the registry-level metadata supplied with the skill lists no required binaries/config paths while the embedded SKILL.md metadata explicitly requires curl and sqlite3 and lists configPaths — this mismatch is an inconsistency to be aware of.

ℹ Instruction Scope

Instructions are focused on running an observability daemon (herd), querying local HTTP endpoints (http://localhost:11435) via curl, and reading a local SQLite DB and JSONL logs. These actions are within the stated scope, but the skill explicitly tells the operator to pip install a third-party package (ollama-herd) and to run background services that will listen on localhost port 11435 — both of which have operational and security implications (package install runs arbitrary install-time code, running a local daemon increases attack surface).

⚠ Install Mechanism

The registry contains no formal install spec, yet the SKILL.md instructs 'pip install ollama-herd'. Installing from PyPI is a moderate-risk install mechanism because package install scripts may execute arbitrary code; the registry provides no pinned version, checksums, or verification guidance. Recommend verifying the package and repository before installing and using a virtualenv or sandbox.

✓ Credentials

The skill does not request any environment variables or external credentials and only references local config paths (~/.fleet-manager/*). This is proportionate to the described observability purpose. There are no requests for unrelated secrets or cloud credentials.

ℹ Persistence & Privilege

The skill is not marked always:true and does not request elevated platform privileges, which is good. However it instructs running persistent components (herd/ herd-node) that create a local observability service listening on a fixed localhost port (11435) and create/read ~/.fleet-manager artifacts — these imply persistent presence and a local network endpoint you should validate and monitor.

版本历史

v1.2.1

Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.

v1.2.0

- Updated description and documentation to emphasize "DevOps" branding throughout. - Added multilingual keywords (Chinese and Spanish) in the description for broader reach. - Changed version from 1.0.0 to 1.0.1. - No functional changes to code or CLI; only text and instructional updates. - All usage examples, headings, and comments now highlight DevOps observability and analytics.

v1.1.0

- Metadata updated: The "os" key in the "metadata.openclaw.requires" section is now correctly nested within "requires". - No functional or user-facing changes; content and commands remain the same.

v1.0.1

- Added metadata entries for optional binaries (`python3`, `pip`) and config paths in SKILL.md. - No functional or code changes, documentation update only. - Improves system introspection and integration by clarifying dependencies and config file locations.

v1.0.0

- Initial release of ai-devops-toolkit, providing operational tooling for teams running local LLM infrastructure. - Features request tracing with full scoring breakdowns and per-application usage analytics via request tagging. - Includes automated health checks with severity levels, latency percentile tracking, error rate monitoring, and capacity planning via model recommendations. - Uses an SQLite-backed observability stack—no Prometheus, Grafana, or other external dependencies required. - Supports analytics and troubleshooting via command-line queries and API endpoints, all focused on production-quality monitoring for Ollama Herd deployments.

元数据

Slug ai-devops-toolkit

版本 1.2.1

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 5

常见问题

Ai Devops Toolkit 是什么？

Operational tooling for teams running local LLM infrastructure. Request tracing with full scoring breakdowns, per-application usage analytics via request tag... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 317 次。

如何安装 Ai Devops Toolkit？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-devops-toolkit」即可一键安装，无需额外配置。

Ai Devops Toolkit 是免费的吗？

是的，Ai Devops Toolkit 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Ai Devops Toolkit 支持哪些平台？

Ai Devops Toolkit 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Ai Devops Toolkit？

由 Twin Geeks（@twinsgeeks）开发并维护，当前版本 v1.2.1。

Ai Devops Toolkit