Private/Team Marketplace: marketplace.json and Five Plugin Source Types
Chapter 55: Monitoring and Observability Plugins: Tracking Every Claude Decision
55.1 Why Claude Needs Observability
Deploying Claude in production presents a unique challenge: Claude's reasoning process is a black box. When Claude makes an incorrect decision โ calling the wrong tool, generating a low-quality response, consuming more tokens than expected โ without observability infrastructure you can only see the input and output. Everything that happened in between is invisible.
Observability is a concept from control theory: the degree to which you can infer a system's internal state from its external outputs. For Claude, the core questions an observability Plugin must answer are:
- What: Which decisions did Claude make? Which tools were called?
- Why: Why did Claude choose this tool rather than another?
- How long: How much time did each step take? Where are the bottlenecks?
- How much: How many tokens were consumed? What was the cost?
- What went wrong: Which step failed? What was the root cause?
55.2 The Three Pillars of Observability
Following the OpenTelemetry design philosophy, Claude's observability encompasses three dimensions:
Metrics
Time-series data reflecting aggregated system state:
claude.session.count New sessions per minute
claude.tool.calls_total Total tool calls (grouped by tool name)
claude.tool.latency_p99 Tool call P99 latency
claude.tokens.input Cumulative input tokens
claude.tokens.output Cumulative output tokens
claude.cost.usd API call cost in USD
claude.error.rate Tool call failure rate
claude.response.latency Time to first token
Logs
Structured event records capturing individual operation details:
{
"timestamp": "2026-04-28T10:23:45.123Z",
"level": "INFO",
"event": "tool_call",
"sessionId": "sess_abc123",
"toolName": "query_database",
"input": { "sql": "SELECT COUNT(*) FROM users WHERE...", "limit": 100 },
"output": { "rows": 1, "data": [{ "count": 42891 }] },
"latencyMs": 234,
"inputTokens": 1847,
"outputTokens": 312
}
Traces
Complete execution chains across multiple tool calls, capturing causal relationships:
Session sess_abc123 [3.4s]
โโ LLM Inference #1 [1.2s] โ decision: call query_database
โ โโ tool: query_database [234ms]
โโ LLM Inference #2 [0.8s] โ decision: call analyze_results
โ โโ tool: analyze_results [412ms]
โโ LLM Inference #3 [1.1s] โ generate final response
55.3 Monitoring Plugin Architecture
monitoring-plugin/
โโโ plugin.json
โโโ hooks/
โ โโโ pre-tool.ts โ record tool call start
โ โโโ post-tool.ts โ record tool call end, compute latency
โ โโโ pre-response.ts โ record inference start
โ โโโ post-response.ts โ record inference end, token usage
โโโ monitor/
โ โโโ session-tracker.ts โ span and session management
โ โโโ metrics.ts โ metric collection and buffering
โโโ exporters/
โโโ opentelemetry.ts
โโโ datadog.ts
โโโ file.ts
55.4 Implementing the Monitoring Plugin
plugin.json
{
"name": "claude-observability",
"version": "1.0.0",
"description": "Full observability plugin: metrics, logs, and traces",
"config": {
"schema": {
"exportTarget": {
"type": "string",
"enum": ["console", "file", "opentelemetry", "datadog"],
"default": "console"
},
"otlpEndpoint": {
"type": "string",
"default": "http://localhost:4318"
},
"datadogApiKey": {
"type": "string",
"secret": true
},
"samplingRate": {
"type": "number",
"default": 1.0,
"minimum": 0,
"maximum": 1
}
}
},
"hooks": {
"preToolCall": "./dist/hooks/pre-tool.js",
"postToolCall": "./dist/hooks/post-tool.js",
"preResponse": "./dist/hooks/pre-response.js",
"postResponse": "./dist/hooks/post-response.js",
"sessionStart": "./dist/hooks/session-start.js",
"sessionEnd": "./dist/hooks/session-end.js"
},
"monitor": {
"collector": "./dist/monitor/collector.js",
"sampling": 1.0
}
}
Session Tracker
// monitor/session-tracker.ts
import { randomUUID } from "crypto";
interface SpanContext {
spanId: string;
parentSpanId?: string;
operation: string;
startTime: Date;
endTime?: Date;
attributes: Record<string, string | number | boolean>;
events: Array<{ timestamp: Date; name: string; attributes?: Record<string, unknown> }>;
status: "ok" | "error" | "running";
errorMessage?: string;
}
interface SessionContext {
sessionId: string;
userId?: string;
model: string;
startTime: Date;
spans: SpanContext[];
}
class SessionTracker {
private sessions = new Map<string, SessionContext>();
startSession(sessionId: string, model: string, userId?: string): SessionContext {
const session: SessionContext = {
sessionId, userId, model, startTime: new Date(), spans: [],
};
this.sessions.set(sessionId, session);
return session;
}
endSession(sessionId: string): SessionContext | undefined {
const session = this.sessions.get(sessionId);
this.sessions.delete(sessionId);
return session;
}
startSpan(sessionId: string, operation: string, parentSpanId?: string): SpanContext {
const span: SpanContext = {
spanId: randomUUID(),
parentSpanId,
operation,
startTime: new Date(),
attributes: {},
events: [],
status: "running",
};
this.sessions.get(sessionId)?.spans.push(span);
return span;
}
endSpan(span: SpanContext, status: "ok" | "error", attrs?: Record<string, string | number | boolean>): void {
span.endTime = new Date();
span.status = status;
if (attrs) Object.assign(span.attributes, attrs);
}
}
export const sessionTracker = new SessionTracker();
Pre-Tool Hook
// hooks/pre-tool.ts
import type { PreToolCallHook, HookContext } from "@claude/plugin-sdk";
import { sessionTracker } from "../monitor/session-tracker.js";
export const activeSpans = new Map<string, ReturnType<typeof sessionTracker.startSpan>>();
export const preToolCall: PreToolCallHook = async (toolName, toolInput, context) => {
const span = sessionTracker.startSpan(
context.session.id,
`tool:${toolName}`,
context.currentSpanId
);
span.attributes.toolName = toolName;
span.attributes.inputSize = JSON.stringify(toolInput).length;
span.events.push({
timestamp: new Date(),
name: "tool.input",
attributes: { toolName, inputKeys: Object.keys(toolInput).join(",") },
});
activeSpans.set(`${context.session.id}:${span.spanId}`, span);
context.setMetadata("currentSpanId", span.spanId);
return { action: "allow" };
};
Post-Tool Hook
// hooks/post-tool.ts
import type { PostToolCallHook } from "@claude/plugin-sdk";
import { sessionTracker } from "../monitor/session-tracker.js";
import { activeSpans } from "./pre-tool.js";
import { metricsCollector } from "../monitor/metrics.js";
export const postToolCall: PostToolCallHook = async (toolName, _input, toolResult, context) => {
const spanId = context.getMetadata("currentSpanId") as string;
if (!spanId) return { action: "allow" };
const span = activeSpans.get(`${context.session.id}:${spanId}`);
if (!span) return { action: "allow" };
const latencyMs = Date.now() - span.startTime.getTime();
const success = !(toolResult as Record<string, unknown>)?.isError;
sessionTracker.endSpan(span, success ? "ok" : "error", {
latencyMs, success: String(success),
});
activeSpans.delete(`${context.session.id}:${spanId}`);
metricsCollector.increment("claude.tool.calls_total", { tool: toolName, success: String(success) });
metricsCollector.histogram("claude.tool.latency_ms", latencyMs, { tool: toolName });
if (!success) {
metricsCollector.increment("claude.tool.errors_total", { tool: toolName });
}
return { action: "allow" };
};
Post-Response Hook: Token Cost Tracking
// hooks/post-response.ts
import type { PostResponseHook } from "@claude/plugin-sdk";
import { metricsCollector } from "../monitor/metrics.js";
const PRICING_USD_PER_1M: Record<string, { input: number; output: number }> = {
"claude-3-5-sonnet": { input: 3.0, output: 15.0 },
"claude-3-5-haiku": { input: 0.8, output: 4.0 },
"claude-3-opus": { input: 15.0, output: 75.0 },
};
export const postResponse: PostResponseHook = async (_response, context) => {
const usage = context.usage;
const model = context.session.model;
if (!usage) return { action: "allow" };
const pricing = PRICING_USD_PER_1M[model];
const costUsd = pricing
? (usage.inputTokens / 1_000_000) * pricing.input +
(usage.outputTokens / 1_000_000) * pricing.output
: 0;
metricsCollector.add("claude.tokens.input", usage.inputTokens, { model });
metricsCollector.add("claude.tokens.output", usage.outputTokens, { model });
metricsCollector.add("claude.cost.usd", costUsd, { model });
return { action: "allow" };
};
55.5 OpenTelemetry Export
// exporters/opentelemetry.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
export function createOpenTelemetrySDK(endpoint: string, serviceName: string) {
return new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: `${endpoint}/v1/traces` }),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({ url: `${endpoint}/v1/metrics` }),
exportIntervalMillis: 10_000,
}),
serviceName,
});
}
With OpenTelemetry, the collected data flows to any compatible backend: Grafana Tempo + Prometheus for self-hosted setups, Datadog APM, AWS X-Ray, or Jaeger. The choice of backend does not require any code changes in the Plugin โ only the OTLP endpoint configuration differs.
55.6 Cost Tracking Dashboard
Grafana Query Examples
# Daily cost by model (PromQL)
sum(increase(claude_cost_usd_total[24h])) by (model)
# Tool call P99 latency
histogram_quantile(0.99, sum(rate(claude_tool_latency_ms_bucket[5m])) by (le, tool))
# Error rate by tool
sum(rate(claude_tool_errors_total[1h])) by (tool)
/ sum(rate(claude_tool_calls_total[1h])) by (tool)
# Token usage trend
sum(rate(claude_tokens_input[1h])) * 3600
These queries power a real-time Claude usage dashboard that answers questions like "Which teams are driving the most Claude API spend?" and "Which tool is causing the most latency?"
55.7 Decision Audit Logging
In compliance scenarios, it's not enough to track metrics and traces โ why Claude made a particular decision must also be auditable.
// monitor/audit-logger.ts
import { createHash } from "crypto";
interface AuditRecord {
timestamp: string;
sessionId: string;
userId: string;
decisionType: "tool_call" | "tool_skip" | "response_generated" | "safety_block";
toolName?: string;
reasoning?: string; // Summary of Claude's thinking (if extended thinking is enabled)
inputHash: string; // Hash of input (not raw content, to protect privacy)
outputHash?: string;
metadata: Record<string, unknown>;
}
export class AuditLogger {
constructor(private readonly auditLogPath: string) {}
async logDecision(event: {
type: AuditRecord["decisionType"];
toolName?: string;
thinking?: string;
input: unknown;
output?: unknown;
}, session: { id: string; userId?: string; model: string; turnIndex: number }): Promise<void> {
const record: AuditRecord = {
timestamp: new Date().toISOString(),
sessionId: session.id,
userId: session.userId ?? "anonymous",
decisionType: event.type,
toolName: event.toolName,
reasoning: event.thinking?.substring(0, 500),
inputHash: this.hashContent(JSON.stringify(event.input)),
outputHash: event.output ? this.hashContent(JSON.stringify(event.output)) : undefined,
metadata: { model: session.model, turnIndex: session.turnIndex },
};
// Write to append-only storage for immutability
await fs.appendFile(this.auditLogPath, JSON.stringify(record) + "\n", "utf8");
}
private hashContent(content: string): string {
return createHash("sha256").update(content).digest("hex").substring(0, 16);
}
}
The audit log uses content hashing rather than raw content for two reasons: it preserves privacy (sensitive inputs are not stored in plain text) while still providing a fingerprint that can be correlated with the original data if needed under legal authority.
55.8 Alerting Rules
Configure alerting to catch anomalies before they become incidents:
# alerting/rules.yaml (Prometheus AlertManager format)
groups:
- name: claude_alerts
rules:
- alert: HighToolErrorRate
expr: |
sum(rate(claude_tool_errors_total[5m])) by (tool)
/ sum(rate(claude_tool_calls_total[5m])) by (tool) > 0.1
for: 2m
annotations:
summary: "Tool {{ $labels.tool }} has error rate above 10%"
- alert: UnexpectedCostSpike
expr: |
sum(rate(claude_cost_usd_total[1h])) * 24 > 100
for: 10m
annotations:
summary: "Projected daily Claude cost exceeds $100"
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
sum(rate(claude_tool_latency_ms_bucket[5m])) by (le, tool)
) > 5000
for: 5m
annotations:
summary: "Tool {{ $labels.tool }} P99 latency exceeds 5 seconds"
Summary
Monitoring and observability Plugins transform Claude's black-box decision process into something transparent and traceable. The three core pillars โ metrics (aggregated state), logs (event details), and traces (causal relationships) โ jointly build a complete observability system. Telemetry collection via the Hooks layer is non-invasive: the core Claude Code codebase requires zero modifications. OpenTelemetry's industry-standard interface ensures seamless integration with major monitoring backends (Grafana, Datadog, Jaeger). Cost tracking and decision audit logging are indispensable compliance infrastructure in enterprise deployments. The final chapter in this Part covers deploying private Plugin registries in enterprise environments.