← Back to Skills Marketplace
nissan

Langfuse Trace Logger

by Nissan Dookeran · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
91
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install langfuse-trace-logger
Description
Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based...
README (SKILL.md)

Skill: langfuse-trace-logger

Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion. Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py


⚠️ CRITICAL: Python Version

Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)

The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.

# ✅ Correct
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...

# ❌ Wrong — silent failure on Python 3.14
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...

Basic Invocation

~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
  --session-id "$SESSION_ID" \
  --parent-id "agent:main" \
  --agent "kit" \
  --task "task-label-kebab-case" \
  --model "anthropic/claude-sonnet-4-6" \
  --status "completed" \
  --input "full task prompt given to agent (first 4000 chars)..." \
  --output "what the agent returned or accomplished..." \
  --duration 278 \
  --tokens 16900 \
  --project "reddi-agent-protocol" \
  --skills "product-tour-capture"

Trace Schema

Field Type Purpose Notes
--session-id string Subagent session key Use actual subagent session key — enables lineage tracing
--parent-id string Parent session reference Always "agent:main" unless nested subagent
--agent string Agent name Lowercase: kit, archie, sara, finn, quill, etc.
--task string Task label (kebab-case) Used for replay grouping: replay-judge.py --tag "task:kit-setup-rebuild"
--model string Model used e.g. anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5
--status string Outcome completed / partial / failed
--input string Full task prompt First 4000 chars — this is what gets replayed against other models in judge runs
--output string Result summary Agent's output/result — this is what the judge scores
--duration int Time in seconds Used for efficiency analysis and agent routing decisions
--tokens int Total tokens used Used for cost analysis and budget governance
--project string Project slug Must match projects/\x3Cslug>/STATUS.md — enables project-level filtering
--skills string Comma-separated skills e.g. "product-tour-capture,ffmpeg-studio" — enables skill effectiveness filtering

Tag Taxonomy

The logger automatically generates these tags from the fields above:

  • agent:kit — from --agent
  • model_family:claude-sonnet — derived from --model
  • project:reddi-agent-protocol — from --project
  • skill:product-tour-capture — one tag per skill in --skills
  • task:kit-setup-rebuild — from --task
  • status:completed — from --status

These tags power the replay-judge filter syntax.


Backfill Pattern

For retroactive logging when a session wrap was skipped or traces are missing.

Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.

# Preview first (dry run)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24 \
  --dry-run

# Then run for real
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
  --from-date 2026-03-24 \
  --to-date 2026-03-24

Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.

Backfill ID format: backfill-YYYY-MM-DD-\x3Cagent>-\x3Ctask-slug> — deterministic, no duplicate risk.


Replay and Judge

# Report on all Kit traces (past 30 days)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --report

# Compare all Kit traces against Haiku (cost reduction analysis)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "agent:kit" --models "claude-haiku-4-5" --judge "claude-haiku-4-5" --report

# Judge a specific trace
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --trace-id "backfill-2026-03-24-kit-setup-rebuild" \
  --models "claude-haiku-4-5" --judge "claude-haiku-4-5"

# Filter by project
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "project:reddi-agent-protocol" --report

# Filter by skill
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
  --tag "skill:product-tour-capture" --report

Verify Traces Appeared

After logging, verify in Langfuse UI: http://localhost:3100

Or check programmatically:

~/.chatterbox-venv/bin/python3 -c "
import subprocess
sk = subprocess.run(
    ['op', 'read', 'op://OpenClaw/Langfuse (Local)/credential'],
    capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(public_key='pk-lf-openclaw-local', secret_key=sk, host='http://localhost:3100')
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
"

Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).


Langfuse Connection Details

Setting Value
UI http://localhost:3100
Public key pk-lf-openclaw-local
Secret key op://OpenClaw/Langfuse (Local)/credential (1Password)
Also in 1Password op://OpenClaw/Langfuse (Local)/Secret Key
Docker Always running (daemon service)

When to Call This Skill

This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).

Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.

Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.


Troubleshooting

Symptom Cause Fix
Trace doesn't appear in UI Wrong Python version Use ~/.chatterbox-venv/bin/python3
No output, no error Same — Python 3.14 pydantic v1 incompatibility Same fix
ImportError: langfuse not found Wrong venv Same fix
Duplicate traces on backfill Shouldn't happen — backfill is idempotent Check if running logger + backfill both for same trace
op: command not found 1Password CLI not in PATH Run from shell with OP_SERVICE_ACCOUNT_TOKEN set, or source ~/.zshrc first
Langfuse UI empty after logging Docker daemon down docker ps — restart Langfuse container if needed
Usage Guidance
This skill appears to be a wrapper around existing local scripts that send traces to Langfuse — the credential requests match that purpose, but the skill bundle contains no code and assumes scripts and a specific Python venv exist. Before installing or enabling it: (1) verify the referenced scripts actually exist at the stated paths and inspect their contents to see exactly what files they read and where they send data; (2) prefer using a self-hosted Langfuse endpoint (localhost:3100) for sensitive logs or supply keys scoped with minimal write permissions; (3) confirm the chatterbox venv Python (3.11) is used — the SKILL.md warns about silent failure on other Python versions; (4) be aware the backfill feature parses memory/YYYY-MM-DD.md files (potentially sensitive) — if you don't want that data exported, do not run backfill or audit the parser first; (5) if you cannot inspect the scripts or do not trust the source (homepage unknown, source unknown), do not provide LANGFUSE_SECRET_KEY; consider creating a dedicated, limited-permission key or testing in an isolated environment. Additional info (script contents, where traces are posted) would raise confidence and could change this assessment.
Capability Analysis
Type: OpenClaw Skill Name: langfuse-trace-logger Version: 1.0.0 The skill instructions in SKILL.md include a code snippet that directs the agent to programmatically access the 1Password CLI ('op') to retrieve credentials, which is a high-risk behavior. While the stated purpose is to log traces to Langfuse, the skill is designed to exfiltrate agent conversation history (inputs and outputs) to an external or local service, and the actual implementation scripts (e.g., langfuse-trace-logger.py and langfuse-backfill-historical.py) are not provided in the bundle, preventing a full audit of the data-handling logic.
Capability Assessment
Purpose & Capability
The name/description (logging traces to Langfuse) align with the required env vars LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY and the need for python. However, the SKILL.md expects specific scripts (e.g., /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py) and a chatterbox venv to already exist; the skill bundle includes no code or install steps to create those scripts or the venv, which is a coherence gap.
Instruction Scope
Instructions direct the agent to run local scripts and to parse memory/YYYY-MM-DD.md files for backfill. Reading local 'memory' files can expose sensitive user data; the backfill behavior and file paths are outside the skill's code and may access private information. The README also references runtime env vars (e.g., SESSION_ID examples) and absolute home paths (/Users/loki/...) that may not exist for other users — the agent could be instructed to read or transmit data the user wouldn't expect.
Install Mechanism
This is an instruction-only skill with no install spec and no code files, so it does not download or write code. That lowers installation risk but also means it assumes preexisting scripts and environments; there's no bundled code to inspect or validate.
Credentials
Requesting the two Langfuse keys is proportional to the described function (sending traces). Still: LANGFUSE_SECRET_KEY is sensitive and would allow writing traces to a Langfuse account; ensure the keys are scoped to the intended account/project. The SKILL.md references other local state (memory files, SESSION_ID) that are not declared as required envs but are used by the scripts, which broadens the effective access.
Persistence & Privilege
always is false and the skill does not request any persistent platform privileges. It does not modify other skills' configs nor ask to be force-enabled; autonomous invocation is allowed (platform default) but not an added privilege here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install langfuse-trace-logger
  3. After installation, invoke the skill by name or use /langfuse-trace-logger
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
New skill: Langfuse trace logging and observability for agent pipelines
Metadata
Slug langfuse-trace-logger
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Langfuse Trace Logger?

Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis. Called during session-wrap Phase 4. Supports backfill, tag-based... It is an AI Agent Skill for Claude Code / OpenClaw, with 91 downloads so far.

How do I install Langfuse Trace Logger?

Run "/install langfuse-trace-logger" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Langfuse Trace Logger free?

Yes, Langfuse Trace Logger is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Langfuse Trace Logger support?

Langfuse Trace Logger is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Langfuse Trace Logger?

It is built and maintained by Nissan Dookeran (@nissan); the current version is v1.0.0.

💬 Comments