Clean Log Toolkit
/install clean-log-toolkit
clean-log-toolkit
v0.1.0
A small, honest local toolkit for the work agents end up doing constantly: read a log someone sent you, figure out the format, find the actual problems, and produce a summary you can paste into a ticket. Built on Python 3 standard library only. No awk/sed/jq wrappers, no pip installs, no remote calls.
This skill is the third of the "clean-*" trio:
clean-csv-toolkit— structured tabular dataclean-text-toolkit— unstructured textclean-log-toolkit— semi-structured timestamped logs
What this skill does
scripts/parse.py— parse a log file into structured rows. Auto-detectsapache-common,apache-combined,nginx-access,syslog, andjson-lineformats by sniffing the first ~50 lines. Falls back to a generic timestamp + level + message extractor when nothing matches. Pass--regex PATTERNwith named groups to define a custom format. Output as.csv,.tsv, or.jsonl.scripts/errors.py— aggregate the errors in a log file. Counts by level (WARN / ERROR / FATAL by default), buckets the timeline by minute / hour / day, normalizes each message into a "fingerprint" (replaces numbers, UUIDs, hex tokens, file:line pairs, and embedded timestamps with placeholders) and surfaces the top-N most frequent error groups. Writes a JSON / Markdown / CSV report or prints a one-screen summary.scripts/grep.py— grep, but log-aware. Combine--pattern REGEX,--not-pattern REGEX,--level LVL[,LVL2...],--since TIMESTAMP,--until TIMESTAMP, and-B / -A / -Ccontext lines into one filter pass. Output goes to stdout or to a file. Returns exit 0 on at least one match, 1 on zero matches.scripts/check_deps.sh— verifypython3is available.
What this skill does not do
- It does not tail/follow live log files (yet — possible v0.2 feature if there's demand).
- It does not call any LLM, web service, or remote API.
- It does not write outside the input/output paths the caller provides.
Quick start
1. Parse an unknown log file
# Auto-detect the format
python3 scripts/parse.py app.log app.csv
# Or be explicit
python3 scripts/parse.py access.log out.jsonl --format apache-combined
python3 scripts/parse.py syslog.txt out.csv --format syslog
python3 scripts/parse.py events.log out.csv --format json-line --fields ts,level,msg
2. Custom format via named-group regex
python3 scripts/parse.py app.log structured.csv \
--regex '^(?P\x3Cts>\S+)\s+(?P\x3Clevel>\S+)\s+(?P\x3Cmessage>.*)$'
3. Aggregate errors and produce a report
# One-screen summary
python3 scripts/errors.py app.log
# Bucket by minute, top 20 message groups
python3 scripts/errors.py app.log --bucket minute --top 20
# Only count specific levels
python3 scripts/errors.py app.log --level ERROR,FATAL
# Write a Markdown report ready to paste into a ticket
python3 scripts/errors.py app.log --output report.md
# Or a JSON report for downstream tooling
python3 scripts/errors.py app.log --output report.json --bucket hour
# Or a CSV of the timeline only
python3 scripts/errors.py app.log --output timeline.csv --bucket minute
errors.py fingerprints messages so repeated errors that only differ in numbers / UUIDs / file-line refs collapse to one group with a count. Example: 50 occurrences of Connection timeout to 10.0.0.5 after 1234ms and Connection timeout to 10.0.0.7 after 567ms collapse into one group Connection timeout to \x3CN>.\x3CN>.\x3CN>.\x3CN> after \x3CN>ms with count 50.
4. Log-aware grep
# Pattern + level filter
python3 scripts/grep.py app.log --pattern "Database" --level ERROR,FATAL
# Time window
python3 scripts/grep.py app.log \
--since "2026-05-10T10:00:00Z" \
--until "2026-05-10T11:00:00Z"
# Context lines (1 before + 1 after each match)
python3 scripts/grep.py app.log --pattern "FATAL" -C 1 --with-line
# Exclude noisy lines while keeping the rest
python3 scripts/grep.py app.log --level ERROR --not-pattern "heartbeat"
# Invert: keep everything that does NOT match
python3 scripts/grep.py app.log --pattern "INFO" --invert
--since and --until accept the same timestamp formats parse.py understands: ISO 8601 (2026-05-10T10:00:00Z, 2026-05-10 10:00:00, with or without microseconds / timezone), apache-style (10/May/2026:10:00:00 +0000), and syslog (May 10 10:00:00 — current year assumed).
Exit codes
| Code | Meaning |
|---|---|
| 0 | success / one or more rows / one or more matches |
| 1 | parse produced zero rows / grep found zero matches / errors found zero matching log entries |
| 2 | bad arguments / unsafe path / missing input / bad regex / unknown format / unsupported output extension |
This 0 / 1 / 2 split is consistent across all three scripts so they slot into shell pipelines cleanly:
# Parse to JSONL, then summarize errors, then post to a ticket
python3 scripts/parse.py raw.log structured.jsonl \
&& python3 scripts/errors.py raw.log --output ticket.md \
&& cat ticket.md
Safety properties
- Pure Python 3 standard library. No third-party dependencies.
- No
subprocesscalls. No shell invocation. - All file paths are validated against a strict allowlist regex that rejects shell metacharacters. The same
safe_path()helper used inclean-csv-toolkitandclean-text-toolkit. - Scripts only read the input paths the caller provides and write to the output paths the caller provides.
- All inputs default to UTF-8; reads fall back through
utf-8-sig,cp1252,latin-1if needed. Writes are always UTF-8.
Timestamp + level detection
_common.py ships a pragmatic timestamp parser that tries the following formats in order, picking the first that matches:
2026-05-10T10:00:00.123456+00:00 (ISO 8601 with TZ + microseconds)
2026-05-10T10:00:00+00:00 (ISO 8601 with TZ)
2026-05-10T10:00:00.123Z (ISO 8601 UTC Zulu)
2026-05-10T10:00:00Z (ISO 8601 UTC Zulu)
2026-05-10T10:00:00 (ISO 8601 no TZ)
2026-05-10 10:00:00 (space-separated)
2026/05/10 10:00:00
10/May/2026:10:00:00 +0000 (apache common log)
May 10 10:00:00 (syslog, no year)
Levels are detected case-insensitively from these tokens and folded to canonical names: TRACE, DEBUG, INFO, NOTICE, WARN (from WARN/WARNING), ERROR (from ERROR/ERR), FATAL (from FATAL/CRITICAL/CRIT/EMERG/EMERGENCY).
Known limitations
- The regex-based parsers are pragmatic, not strict — they accept slightly malformed Apache / nginx / syslog lines as long as the structure is close enough.
errors.pyfingerprint normalization is a best-effort heuristic. Two semantically different errors that happen to differ only in numbers / hashes will be collapsed; if that matters, use--topwith a larger N and inspect the samples.parse.pydoes not follow a live log file. For tail-follow, pipetail -F file | ...into your own tool. If there's enough demand for a built-in follower, it will land in v0.2.
Pairs well with
clean-csv-toolkit— pipeparse.pyoutput (CSV / JSONL) intoinspect,validate,pivot, ortransformto turn raw logs into reportable tables.clean-text-toolkit— pairparse.pywithtext-toolkit/redact.pyto scrub PII before sharing log dumps.
v0.1.0 changes
- First public release of clean-log-toolkit.
- Three scripts:
parse.py,errors.py,grep.py. - Shared
_common.pywithsafe_path,iter_lines,parse_timestamp,extract_timestamp,extract_levelhelpers (mirrors the design ofclean-csv-toolkit/scripts/_common.pyandclean-text-toolkit/scripts/_common.py). - Auto-detects 5 log formats by sniffing the first 50 lines.
- Zero third-party dependencies; works on any system that ships Python 3.
License
MIT
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install clean-log-toolkit - After installation, invoke the skill by name or use
/clean-log-toolkit - Provide required inputs per the skill's parameter spec and get structured output
What is Clean Log Toolkit?
Local log file inspection and analysis toolkit. Parse common log formats (apache-common, apache-combined, nginx-access, syslog, JSON-line) or custom regex wi... It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.
How do I install Clean Log Toolkit?
Run "/install clean-log-toolkit" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Clean Log Toolkit free?
Yes, Clean Log Toolkit is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Clean Log Toolkit support?
Clean Log Toolkit is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Clean Log Toolkit?
It is built and maintained by gopendrasharma89-tech (@gopendrasharma89-tech); the current version is v0.1.0.