Chapter 55

Private/Team Marketplace: marketplace.json and Five Plugin Source Types

Chapter 55: Monitoring and Observability Plugins: Tracking Every Claude Decision

55.1 Why Claude Needs Observability

Deploying Claude in production presents a unique challenge: Claude's reasoning process is a black box. When Claude makes an incorrect decision — calling the wrong tool, generating a low-quality response, consuming more tokens than expected — without observability infrastructure you can only see the input and output. Everything that happened in between is invisible.

Observability is a concept from control theory: the degree to which you can infer a system's internal state from its external outputs. For Claude, the core questions an observability Plugin must answer are:

What: Which decisions did Claude make? Which tools were called?
Why: Why did Claude choose this tool rather than another?
How long: How much time did each step take? Where are the bottlenecks?
How much: How many tokens were consumed? What was the cost?
What went wrong: Which step failed? What was the root cause?

55.2 The Three Pillars of Observability

Following the OpenTelemetry design philosophy, Claude's observability encompasses three dimensions:

Metrics

Time-series data reflecting aggregated system state:

claude.session.count          New sessions per minute
claude.tool.calls_total       Total tool calls (grouped by tool name)
claude.tool.latency_p99       Tool call P99 latency
claude.tokens.input           Cumulative input tokens
claude.tokens.output          Cumulative output tokens
claude.cost.usd               API call cost in USD
claude.error.rate             Tool call failure rate
claude.response.latency       Time to first token

Logs

Structured event records capturing individual operation details:

{
  "timestamp": "2026-04-28T10:23:45.123Z",
  "level": "INFO",
  "event": "tool_call",
  "sessionId": "sess_abc123",
  "toolName": "query_database",
  "input": { "sql": "SELECT COUNT(*) FROM users WHERE...", "limit": 100 },
  "output": { "rows": 1, "data": [{ "count": 42891 }] },
  "latencyMs": 234,
  "inputTokens": 1847,
  "outputTokens": 312
}

Traces

Complete execution chains across multiple tool calls, capturing causal relationships:

Session sess_abc123 [3.4s]
  ├─ LLM Inference #1 [1.2s] — decision: call query_database
  │   └─ tool: query_database [234ms]
  ├─ LLM Inference #2 [0.8s] — decision: call analyze_results
  │   └─ tool: analyze_results [412ms]
  └─ LLM Inference #3 [1.1s] — generate final response

55.3 Monitoring Plugin Architecture

monitoring-plugin/
├── plugin.json
├── hooks/
│   ├── pre-tool.ts         ← record tool call start
│   ├── post-tool.ts        ← record tool call end, compute latency
│   ├── pre-response.ts     ← record inference start
│   └── post-response.ts   ← record inference end, token usage
├── monitor/
│   ├── session-tracker.ts  ← span and session management
│   └── metrics.ts          ← metric collection and buffering
└── exporters/
    ├── opentelemetry.ts
    ├── datadog.ts
    └── file.ts

55.4 Implementing the Monitoring Plugin

plugin.json

{
  "name": "claude-observability",
  "version": "1.0.0",
  "description": "Full observability plugin: metrics, logs, and traces",
  
  "config": {
    "schema": {
      "exportTarget": {
        "type": "string",
        "enum": ["console", "file", "opentelemetry", "datadog"],
        "default": "console"
      },
      "otlpEndpoint": {
        "type": "string",
        "default": "http://localhost:4318"
      },
      "datadogApiKey": {
        "type": "string",
        "secret": true
      },
      "samplingRate": {
        "type": "number",
        "default": 1.0,
        "minimum": 0,
        "maximum": 1
      }
    }
  },
  
  "hooks": {
    "preToolCall": "./dist/hooks/pre-tool.js",
    "postToolCall": "./dist/hooks/post-tool.js",
    "preResponse": "./dist/hooks/pre-response.js",
    "postResponse": "./dist/hooks/post-response.js",
    "sessionStart": "./dist/hooks/session-start.js",
    "sessionEnd": "./dist/hooks/session-end.js"
  },
  
  "monitor": {
    "collector": "./dist/monitor/collector.js",
    "sampling": 1.0
  }
}

Session Tracker

// monitor/session-tracker.ts
import { randomUUID } from "crypto";

interface SpanContext {
  spanId: string;
  parentSpanId?: string;
  operation: string;
  startTime: Date;
  endTime?: Date;
  attributes: Record<string, string | number | boolean>;
  events: Array<{ timestamp: Date; name: string; attributes?: Record<string, unknown> }>;
  status: "ok" | "error" | "running";
  errorMessage?: string;
}

interface SessionContext {
  sessionId: string;
  userId?: string;
  model: string;
  startTime: Date;
  spans: SpanContext[];
}

class SessionTracker {
  private sessions = new Map<string, SessionContext>();
  
  startSession(sessionId: string, model: string, userId?: string): SessionContext {
    const session: SessionContext = {
      sessionId, userId, model, startTime: new Date(), spans: [],
    };
    this.sessions.set(sessionId, session);
    return session;
  }
  
  endSession(sessionId: string): SessionContext | undefined {
    const session = this.sessions.get(sessionId);
    this.sessions.delete(sessionId);
    return session;
  }
  
  startSpan(sessionId: string, operation: string, parentSpanId?: string): SpanContext {
    const span: SpanContext = {
      spanId: randomUUID(),
      parentSpanId,
      operation,
      startTime: new Date(),
      attributes: {},
      events: [],
      status: "running",
    };
    this.sessions.get(sessionId)?.spans.push(span);
    return span;
  }
  
  endSpan(span: SpanContext, status: "ok" | "error", attrs?: Record<string, string | number | boolean>): void {
    span.endTime = new Date();
    span.status = status;
    if (attrs) Object.assign(span.attributes, attrs);
  }
}

export const sessionTracker = new SessionTracker();

Pre-Tool Hook

// hooks/pre-tool.ts
import type { PreToolCallHook, HookContext } from "@claude/plugin-sdk";
import { sessionTracker } from "../monitor/session-tracker.js";

export const activeSpans = new Map<string, ReturnType<typeof sessionTracker.startSpan>>();

export const preToolCall: PreToolCallHook = async (toolName, toolInput, context) => {
  const span = sessionTracker.startSpan(
    context.session.id,
    `tool:${toolName}`,
    context.currentSpanId
  );
  
  span.attributes.toolName = toolName;
  span.attributes.inputSize = JSON.stringify(toolInput).length;
  span.events.push({
    timestamp: new Date(),
    name: "tool.input",
    attributes: { toolName, inputKeys: Object.keys(toolInput).join(",") },
  });
  
  activeSpans.set(`${context.session.id}:${span.spanId}`, span);
  context.setMetadata("currentSpanId", span.spanId);
  
  return { action: "allow" };
};

Post-Tool Hook

// hooks/post-tool.ts
import type { PostToolCallHook } from "@claude/plugin-sdk";
import { sessionTracker } from "../monitor/session-tracker.js";
import { activeSpans } from "./pre-tool.js";
import { metricsCollector } from "../monitor/metrics.js";

export const postToolCall: PostToolCallHook = async (toolName, _input, toolResult, context) => {
  const spanId = context.getMetadata("currentSpanId") as string;
  if (!spanId) return { action: "allow" };
  
  const span = activeSpans.get(`${context.session.id}:${spanId}`);
  if (!span) return { action: "allow" };
  
  const latencyMs = Date.now() - span.startTime.getTime();
  const success = !(toolResult as Record<string, unknown>)?.isError;
  
  sessionTracker.endSpan(span, success ? "ok" : "error", {
    latencyMs, success: String(success),
  });
  activeSpans.delete(`${context.session.id}:${spanId}`);
  
  metricsCollector.increment("claude.tool.calls_total", { tool: toolName, success: String(success) });
  metricsCollector.histogram("claude.tool.latency_ms", latencyMs, { tool: toolName });
  
  if (!success) {
    metricsCollector.increment("claude.tool.errors_total", { tool: toolName });
  }
  
  return { action: "allow" };
};

Post-Response Hook: Token Cost Tracking

// hooks/post-response.ts
import type { PostResponseHook } from "@claude/plugin-sdk";
import { metricsCollector } from "../monitor/metrics.js";

const PRICING_USD_PER_1M: Record<string, { input: number; output: number }> = {
  "claude-3-5-sonnet": { input: 3.0, output: 15.0 },
  "claude-3-5-haiku": { input: 0.8, output: 4.0 },
  "claude-3-opus": { input: 15.0, output: 75.0 },
};

export const postResponse: PostResponseHook = async (_response, context) => {
  const usage = context.usage;
  const model = context.session.model;
  if (!usage) return { action: "allow" };
  
  const pricing = PRICING_USD_PER_1M[model];
  const costUsd = pricing
    ? (usage.inputTokens / 1_000_000) * pricing.input +
      (usage.outputTokens / 1_000_000) * pricing.output
    : 0;
  
  metricsCollector.add("claude.tokens.input", usage.inputTokens, { model });
  metricsCollector.add("claude.tokens.output", usage.outputTokens, { model });
  metricsCollector.add("claude.cost.usd", costUsd, { model });
  
  return { action: "allow" };
};

55.5 OpenTelemetry Export

// exporters/opentelemetry.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";

export function createOpenTelemetrySDK(endpoint: string, serviceName: string) {
  return new NodeSDK({
    traceExporter: new OTLPTraceExporter({ url: `${endpoint}/v1/traces` }),
    metricReader: new PeriodicExportingMetricReader({
      exporter: new OTLPMetricExporter({ url: `${endpoint}/v1/metrics` }),
      exportIntervalMillis: 10_000,
    }),
    serviceName,
  });
}

With OpenTelemetry, the collected data flows to any compatible backend: Grafana Tempo + Prometheus for self-hosted setups, Datadog APM, AWS X-Ray, or Jaeger. The choice of backend does not require any code changes in the Plugin — only the OTLP endpoint configuration differs.

55.6 Cost Tracking Dashboard

Grafana Query Examples

# Daily cost by model (PromQL)
sum(increase(claude_cost_usd_total[24h])) by (model)

# Tool call P99 latency
histogram_quantile(0.99, sum(rate(claude_tool_latency_ms_bucket[5m])) by (le, tool))

# Error rate by tool
sum(rate(claude_tool_errors_total[1h])) by (tool)
/ sum(rate(claude_tool_calls_total[1h])) by (tool)

# Token usage trend
sum(rate(claude_tokens_input[1h])) * 3600

These queries power a real-time Claude usage dashboard that answers questions like "Which teams are driving the most Claude API spend?" and "Which tool is causing the most latency?"

55.7 Decision Audit Logging

In compliance scenarios, it's not enough to track metrics and traces — why Claude made a particular decision must also be auditable.

// monitor/audit-logger.ts
import { createHash } from "crypto";

interface AuditRecord {
  timestamp: string;
  sessionId: string;
  userId: string;
  decisionType: "tool_call" | "tool_skip" | "response_generated" | "safety_block";
  toolName?: string;
  reasoning?: string;       // Summary of Claude's thinking (if extended thinking is enabled)
  inputHash: string;        // Hash of input (not raw content, to protect privacy)
  outputHash?: string;
  metadata: Record<string, unknown>;
}

export class AuditLogger {
  constructor(private readonly auditLogPath: string) {}
  
  async logDecision(event: {
    type: AuditRecord["decisionType"];
    toolName?: string;
    thinking?: string;
    input: unknown;
    output?: unknown;
  }, session: { id: string; userId?: string; model: string; turnIndex: number }): Promise<void> {
    const record: AuditRecord = {
      timestamp: new Date().toISOString(),
      sessionId: session.id,
      userId: session.userId ?? "anonymous",
      decisionType: event.type,
      toolName: event.toolName,
      reasoning: event.thinking?.substring(0, 500),
      inputHash: this.hashContent(JSON.stringify(event.input)),
      outputHash: event.output ? this.hashContent(JSON.stringify(event.output)) : undefined,
      metadata: { model: session.model, turnIndex: session.turnIndex },
    };
    
    // Write to append-only storage for immutability
    await fs.appendFile(this.auditLogPath, JSON.stringify(record) + "\n", "utf8");
  }
  
  private hashContent(content: string): string {
    return createHash("sha256").update(content).digest("hex").substring(0, 16);
  }
}

The audit log uses content hashing rather than raw content for two reasons: it preserves privacy (sensitive inputs are not stored in plain text) while still providing a fingerprint that can be correlated with the original data if needed under legal authority.

55.8 Alerting Rules

Configure alerting to catch anomalies before they become incidents:

# alerting/rules.yaml (Prometheus AlertManager format)

groups:
  - name: claude_alerts
    rules:
      - alert: HighToolErrorRate
        expr: |
          sum(rate(claude_tool_errors_total[5m])) by (tool)
          / sum(rate(claude_tool_calls_total[5m])) by (tool) > 0.1
        for: 2m
        annotations:
          summary: "Tool {{ $labels.tool }} has error rate above 10%"
          
      - alert: UnexpectedCostSpike
        expr: |
          sum(rate(claude_cost_usd_total[1h])) * 24 > 100
        for: 10m
        annotations:
          summary: "Projected daily Claude cost exceeds $100"
          
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            sum(rate(claude_tool_latency_ms_bucket[5m])) by (le, tool)
          ) > 5000
        for: 5m
        annotations:
          summary: "Tool {{ $labels.tool }} P99 latency exceeds 5 seconds"

Summary

Monitoring and observability Plugins transform Claude's black-box decision process into something transparent and traceable. The three core pillars — metrics (aggregated state), logs (event details), and traces (causal relationships) — jointly build a complete observability system. Telemetry collection via the Hooks layer is non-invasive: the core Claude Code codebase requires zero modifications. OpenTelemetry's industry-standard interface ensures seamless integration with major monitoring backends (Grafana, Datadog, Jaeger). Cost tracking and decision audit logging are indispensable compliance infrastructure in enterprise deployments. The final chapter in this Part covers deploying private Plugin registries in enterprise environments.

Rate this chapter

4.6 / 5 (3 ratings)