Langfuse Trace Logger
/install langfuse-trace-logger
Skill: langfuse-trace-logger
Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis.
Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion.
Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py
⚠️ CRITICAL: Python Version
Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)
The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.
# ✅ Correct
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...
# ❌ Wrong — silent failure on Python 3.14
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...
Basic Invocation
~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
--session-id "$SESSION_ID" \
--parent-id "agent:main" \
--agent "kit" \
--task "task-label-kebab-case" \
--model "anthropic/claude-sonnet-4-6" \
--status "completed" \
--input "full task prompt given to agent (first 4000 chars)..." \
--output "what the agent returned or accomplished..." \
--duration 278 \
--tokens 16900 \
--project "reddi-agent-protocol" \
--skills "product-tour-capture"
Trace Schema
| Field | Type | Purpose | Notes |
|---|---|---|---|
--session-id |
string | Subagent session key | Use actual subagent session key — enables lineage tracing |
--parent-id |
string | Parent session reference | Always "agent:main" unless nested subagent |
--agent |
string | Agent name | Lowercase: kit, archie, sara, finn, quill, etc. |
--task |
string | Task label (kebab-case) | Used for replay grouping: replay-judge.py --tag "task:kit-setup-rebuild" |
--model |
string | Model used | e.g. anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5 |
--status |
string | Outcome | completed / partial / failed |
--input |
string | Full task prompt | First 4000 chars — this is what gets replayed against other models in judge runs |
--output |
string | Result summary | Agent's output/result — this is what the judge scores |
--duration |
int | Time in seconds | Used for efficiency analysis and agent routing decisions |
--tokens |
int | Total tokens used | Used for cost analysis and budget governance |
--project |
string | Project slug | Must match projects/\x3Cslug>/STATUS.md — enables project-level filtering |
--skills |
string | Comma-separated skills | e.g. "product-tour-capture,ffmpeg-studio" — enables skill effectiveness filtering |
Tag Taxonomy
The logger automatically generates these tags from the fields above:
agent:kit— from--agentmodel_family:claude-sonnet— derived from--modelproject:reddi-agent-protocol— from--projectskill:product-tour-capture— one tag per skill in--skillstask:kit-setup-rebuild— from--taskstatus:completed— from--status
These tags power the replay-judge filter syntax.
Backfill Pattern
For retroactive logging when a session wrap was skipped or traces are missing.
Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.
# Preview first (dry run)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
--from-date 2026-03-24 \
--to-date 2026-03-24 \
--dry-run
# Then run for real
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
--from-date 2026-03-24 \
--to-date 2026-03-24
Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.
Backfill ID format: backfill-YYYY-MM-DD-\x3Cagent>-\x3Ctask-slug> — deterministic, no duplicate risk.
Replay and Judge
# Report on all Kit traces (past 30 days)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag "agent:kit" --report
# Compare all Kit traces against Haiku (cost reduction analysis)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag "agent:kit" --models "claude-haiku-4-5" --judge "claude-haiku-4-5" --report
# Judge a specific trace
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--trace-id "backfill-2026-03-24-kit-setup-rebuild" \
--models "claude-haiku-4-5" --judge "claude-haiku-4-5"
# Filter by project
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag "project:reddi-agent-protocol" --report
# Filter by skill
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag "skill:product-tour-capture" --report
Verify Traces Appeared
After logging, verify in Langfuse UI: http://localhost:3100
Or check programmatically:
~/.chatterbox-venv/bin/python3 -c "
import subprocess
sk = subprocess.run(
['op', 'read', 'op://OpenClaw/Langfuse (Local)/credential'],
capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(public_key='pk-lf-openclaw-local', secret_key=sk, host='http://localhost:3100')
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
"
Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).
Langfuse Connection Details
| Setting | Value |
|---|---|
| UI | http://localhost:3100 |
| Public key | pk-lf-openclaw-local |
| Secret key | op://OpenClaw/Langfuse (Local)/credential (1Password) |
| Also in 1Password | op://OpenClaw/Langfuse (Local)/Secret Key |
| Docker | Always running (daemon service) |
When to Call This Skill
This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).
Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.
Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Trace doesn't appear in UI | Wrong Python version | Use ~/.chatterbox-venv/bin/python3 |
| No output, no error | Same — Python 3.14 pydantic v1 incompatibility | Same fix |
ImportError: langfuse not found |
Wrong venv | Same fix |
| Duplicate traces on backfill | Shouldn't happen — backfill is idempotent | Check if running logger + backfill both for same trace |
op: command not found |
1Password CLI not in PATH | Run from shell with OP_SERVICE_ACCOUNT_TOKEN set, or source ~/.zshrc first |
| Langfuse UI empty after logging | Docker daemon down | docker ps — restart Langfuse container if needed |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install langfuse-trace-logger - After installation, invoke the skill by name or use
/langfuse-trace-logger - Provide required inputs per the skill's parameter spec and get structured output
What is Langfuse Trace Logger?
Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based... It is an AI Agent Skill for Claude Code / OpenClaw, with 91 downloads so far.
How do I install Langfuse Trace Logger?
Run "/install langfuse-trace-logger" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Langfuse Trace Logger free?
Yes, Langfuse Trace Logger is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Langfuse Trace Logger support?
Langfuse Trace Logger is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Langfuse Trace Logger?
It is built and maintained by Nissan Dookeran (@nissan); the current version is v1.0.0.