Description

Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based...

README (SKILL.md)

Skill: langfuse-trace-logger

Name: Langfuse Trace Logger
Author: nissan

Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion. Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py

⚠️ CRITICAL: Python Version

Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)

The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.

# ✅ Correct
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...

# ❌ Wrong — silent failure on Python 3.14
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...

Basic Invocation

~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
  --session-id "$SESSION_ID" \
  --parent-id "agent:main" \
  --agent "kit" \
  --task "task-label-kebab-case" \
  --model "anthropic/claude-sonnet-4-6" \
  --status "completed" \
  --input "full task prompt given to agent (first 4000 chars)..." \
  --output "what the agent returned or accomplished..." \
  --duration 278 \
  --tokens 16900 \
  --project "reddi-agent-protocol" \
  --skills "product-tour-capture"

Trace Schema

Field	Type	Purpose	Notes
`--session-id`	string	Subagent session key	Use actual subagent session key — enables lineage tracing
`--parent-id`	string	Parent session reference	Always `"agent:main"` unless nested subagent
`--agent`	string	Agent name	Lowercase: kit, archie, sara, finn, quill, etc.
`--task`	string	Task label (kebab-case)	Used for replay grouping: `replay-judge.py --tag "task:kit-setup-rebuild"`
`--model`	string	Model used	e.g. `anthropic/claude-sonnet-4-6`, `anthropic/claude-haiku-4-5`
`--status`	string	Outcome	`completed` / `partial` / `failed`
`--input`	string	Full task prompt	First 4000 chars — this is what gets replayed against other models in judge runs
`--output`	string	Result summary	Agent's output/result — this is what the judge scores
`--duration`	int	Time in seconds	Used for efficiency analysis and agent routing decisions
`--tokens`	int	Total tokens used	Used for cost analysis and budget governance
`--project`	string	Project slug	Must match `projects/\x3Cslug>/STATUS.md` — enables project-level filtering
`--skills`	string	Comma-separated skills	e.g. `"product-tour-capture,ffmpeg-studio"` — enables skill effectiveness filtering

Tag Taxonomy

The logger automatically generates these tags from the fields above:

agent:kit — from --agent
model_family:claude-sonnet — derived from --model
project:reddi-agent-protocol — from --project
skill:product-tour-capture — one tag per skill in --skills
task:kit-setup-rebuild — from --task
status:completed — from --status

These tags power the replay-judge filter syntax.

Backfill Pattern

For retroactive logging when a session wrap was skipped or traces are missing.

Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.

# Preview first (dry run)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24 \
  --dry-run

# Then run for real
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24

Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.

Backfill ID format: backfill-YYYY-MM-DD-\x3Cagent>-\x3Ctask-slug> — deterministic, no duplicate risk.

Replay and Judge

# Report on all Kit traces (past 30 days)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --report

# Compare all Kit traces against Haiku (cost reduction analysis)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --models "claude-haiku-4-5" --judge "claude-haiku-4-5" --report

# Judge a specific trace
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --trace-id "backfill-2026-03-24-kit-setup-rebuild" \
  --models "claude-haiku-4-5" --judge "claude-haiku-4-5"

# Filter by project
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "project:reddi-agent-protocol" --report

# Filter by skill
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "skill:product-tour-capture" --report

Verify Traces Appeared

After logging, verify in Langfuse UI: http://localhost:3100

Or check programmatically:

~/.chatterbox-venv/bin/python3 -c "
import subprocess
sk = subprocess.run(
    ['op', 'read', 'op://OpenClaw/Langfuse (Local)/credential'],
    capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(public_key='pk-lf-openclaw-local', secret_key=sk, host='http://localhost:3100')
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
"

Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).

Langfuse Connection Details

Setting	Value
UI	http://localhost:3100
Public key	`pk-lf-openclaw-local`
Secret key	`op://OpenClaw/Langfuse (Local)/credential` (1Password)
Also in 1Password	`op://OpenClaw/Langfuse (Local)/Secret Key`
Docker	Always running (daemon service)

When to Call This Skill

This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).

Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.

Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.

Troubleshooting

Symptom	Cause	Fix
Trace doesn't appear in UI	Wrong Python version	Use `~/.chatterbox-venv/bin/python3`
No output, no error	Same — Python 3.14 pydantic v1 incompatibility	Same fix
`ImportError: langfuse not found`	Wrong venv	Same fix
Duplicate traces on backfill	Shouldn't happen — backfill is idempotent	Check if running logger + backfill both for same trace
`op: command not found`	1Password CLI not in PATH	Run from shell with OP_SERVICE_ACCOUNT_TOKEN set, or source `~/.zshrc` first
Langfuse UI empty after logging	Docker daemon down	`docker ps` — restart Langfuse container if needed

Usage Guidance

This skill appears to be a wrapper around existing local scripts that send traces to Langfuse — the credential requests match that purpose, but the skill bundle contains no code and assumes scripts and a specific Python venv exist. Before installing or enabling it: (1) verify the referenced scripts actually exist at the stated paths and inspect their contents to see exactly what files they read and where they send data; (2) prefer using a self-hosted Langfuse endpoint (localhost:3100) for sensitive logs or supply keys scoped with minimal write permissions; (3) confirm the chatterbox venv Python (3.11) is used — the SKILL.md warns about silent failure on other Python versions; (4) be aware the backfill feature parses memory/YYYY-MM-DD.md files (potentially sensitive) — if you don't want that data exported, do not run backfill or audit the parser first; (5) if you cannot inspect the scripts or do not trust the source (homepage unknown, source unknown), do not provide LANGFUSE_SECRET_KEY; consider creating a dedicated, limited-permission key or testing in an isolated environment. Additional info (script contents, where traces are posted) would raise confidence and could change this assessment.

Capability Analysis

Type: OpenClaw Skill Name: langfuse-trace-logger Version: 1.0.0 The skill instructions in SKILL.md include a code snippet that directs the agent to programmatically access the 1Password CLI ('op') to retrieve credentials, which is a high-risk behavior. While the stated purpose is to log traces to Langfuse, the skill is designed to exfiltrate agent conversation history (inputs and outputs) to an external or local service, and the actual implementation scripts (e.g., langfuse-trace-logger.py and langfuse-backfill-historical.py) are not provided in the bundle, preventing a full audit of the data-handling logic.

Capability Assessment

ℹ Purpose & Capability

The name/description (logging traces to Langfuse) align with the required env vars LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY and the need for python. However, the SKILL.md expects specific scripts (e.g., /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py) and a chatterbox venv to already exist; the skill bundle includes no code or install steps to create those scripts or the venv, which is a coherence gap.

⚠ Instruction Scope

Instructions direct the agent to run local scripts and to parse memory/YYYY-MM-DD.md files for backfill. Reading local 'memory' files can expose sensitive user data; the backfill behavior and file paths are outside the skill's code and may access private information. The README also references runtime env vars (e.g., SESSION_ID examples) and absolute home paths (/Users/loki/...) that may not exist for other users — the agent could be instructed to read or transmit data the user wouldn't expect.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files, so it does not download or write code. That lowers installation risk but also means it assumes preexisting scripts and environments; there's no bundled code to inspect or validate.

ℹ Credentials

Requesting the two Langfuse keys is proportional to the described function (sending traces). Still: LANGFUSE_SECRET_KEY is sensitive and would allow writing traces to a Langfuse account; ensure the keys are scoped to the intended account/project. The SKILL.md references other local state (memory files, SESSION_ID) that are not declared as required envs but are used by the scripts, which broadens the effective access.

✓ Persistence & Privilege

always is false and the skill does not request any persistent platform privileges. It does not modify other skills' configs nor ask to be force-enabled; autonomous invocation is allowed (platform default) but not an added privilege here.

Version History

v1.0.0

New skill: Langfuse trace logging and observability for agent pipelines

Metadata

Slug langfuse-trace-logger

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Langfuse Trace Logger?

Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based... It is an AI Agent Skill for Claude Code / OpenClaw, with 91 downloads so far.

How do I install Langfuse Trace Logger?

Run "/install langfuse-trace-logger" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Langfuse Trace Logger free?

Yes, Langfuse Trace Logger is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Langfuse Trace Logger support?

Langfuse Trace Logger is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Langfuse Trace Logger?

It is built and maintained by Nissan Dookeran (@nissan); the current version is v1.0.0.

More Skills

Langfuse Trace Logger