/install extract-error-patterns
Core Position
This skill extracts structured error and log data from unstructured text using pattern matching, regex extraction, and classification rules. It handles multiple log formats (JSON, syslog, Apache, Nginx, Python tracebacks, Node.js stack traces, Docker, Kubernetes) and produces structured output grouped by error type, severity, and frequency.
Key responsibilities:
- Auto-detect log format by examining structure (JSON lines, timestamp patterns, key=value patterns)
- Apply format-specific regex patterns to extract: timestamp, level, error message, stack trace, source file, line number, request ID, session ID
- Classify errors into categories:
CRITICAL,ERROR,WARNING,INFO(by severity level mapping) - Group duplicate/near-duplicate errors and report frequency counts
- Flag low-confidence extractions and ambiguous formats for manual review
Modes
/extract-error-patterns --verbose
Verbose mode. Returns every match with:
matched_text: the exact matched textconfidence: 0.0-1.0 based on pattern strengthposition: line number in sourcecontext: 2 lines before and after (surrounding log entries)pattern_used: which regex/rule matched this entry
Use when: debugging, doing root cause analysis, or auditing extraction quality.
/extract-error-patterns --summary
Summary mode. Returns aggregated results:
total_errors: countby_severity:{CRITICAL: N, ERROR: N, WARNING: N, INFO: N}by_type: grouped error messages with count and examplestop_5_errors: most frequent errors with fingerprint and first occurrencetime_range: first to last log entry
Use when: getting a high-level overview for monitoring dashboards.
/extract-error-patterns --json
JSON output mode. Expects JSON-formatted log input. Extracts structured fields from each JSON object: timestamp, level, message, error, stack_trace, and any custom fields present.
/extract-error-patterns --stacktrace
Stack trace only mode. Focuses exclusively on extracting stack traces (exceptions) from application logs. Parses multi-line stack traces into structured records with: exception type, message, frames (file, line, function), and git commit hash if present.
/extract-error-patterns --custom
Custom pattern mode. Accepts user-provided regex patterns via --patterns flag:
--patterns "ERROR.*connection refused|CRITICAL.*out of memory" --fields error_type,detail
Extracts using the provided patterns, reports which pattern matched each result.
Execution Steps
Step 1: Detect log format
Read the first 20 lines of input. Examine patterns to identify format:
JSON Lines (NDJSON):
{"timestamp":"2024-01-15T10:23:45Z","level":"error","message":"Connection refused"}
{"timestamp":"2024-01-15T10:23:46Z","level":"warn","message":"Retrying..."}
Pattern: lines start with { and are valid JSON when parsed individually.
Fields to extract: all keys from each JSON object.
Syslog (RFC 3164 / RFC 5424):
Jan 15 10:23:45 hostname sshd[1234]: Failed password for invalid user admin from 1.2.3.4 port 54321 ssh2
Pattern: ^[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}
RFC 5424: \x3Cpriority>timestamp hostname process[pid]: message
Fields: timestamp, hostname, process, pid, message.
Apache/Nginx error log:
[Thu Jan 15 10:23:45.123456 2024] [error] [client 1.2.3.4] PHP Fatal error: Uncaught Exception: DB connection failed in /var/www/html/db.php:12
[Thu Jan 15 10:23:46.789012 2024] [warn] [pid 1234] Request timeout from 1.2.3.4
Pattern: \[\w+ \w+ \d+ [\d:.]+ \d{4}\] \[(\w+)\]
Fields: timestamp, level (error/warn/critical), client IP, message.
Python traceback:
Traceback (most recent call last):
File "app.py", line 123, in main
db.query("SELECT * FROM users")
RuntimeError: DB connection failed
During handling of the above exception, another exception occurred:
...
Pattern: starts with Traceback (most recent call last):, contains File "..." lines.
Fields: exception type, exception message, frames (file, line, function).
Node.js stack trace:
Error: ENOENT: no such file or directory, open '/tmp/data.json'
at Object.openSync (node:fs:914:3)
at Object.readFileSync (node:fs:555:3)
at main (app.js:45:10)
Pattern: starts with Error: or ReferenceError:, contains at lines with node: or .js references.
Docker/Kubernetes log:
2024-01-15T10:23:45.123456789Z stdout F Application started on port 3000
2024-01-15T10:23:46.123456789Z stderr F Error: Cannot connect to database
Pattern: Kubernetes structured log format: timestamp stdout/stderr stream_flag message.
Plain text with timestamps:
2024-01-15 10:23:45 ERROR [app] Connection to db failed: timeout after 30s
2024-01-15 10:23:46 WARN [api] Request timeout from client=1.2.3.4
Pattern: ^\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → detect timestamp prefix.
If no known format detected, treat as plain text and apply generic extraction (see Step 3).
Step 2: Define regex patterns for format-specific extraction
For detected format, use appropriate patterns:
JSON Lines extraction:
import json, re
# Each line is a JSON object — parse directly
# Extract timestamp: try fields "timestamp", "time", "@timestamp", "date"
# Extract level: try fields "level", "severity", "log_level", "loglevel"
# Extract message: try fields "message", "msg", "text", "log"
# Extract stack_trace: try fields "stack_trace", "stack", "error.stack", "exception"
Syslog extraction:
^(?P\x3Cmonth>\w{3})\s+(?P\x3Cday>\d{1,2})\s+(?P\x3Ctime>\d{2}:\d{2}:\d{2})\s+(?P\x3Chost>\S+)\s+(?P\x3Cprocess>\S+?)(?:\[(?P\x3Cpid>\d+)\])?:\s+(?P\x3Cmessage>.*)$
# timestamp: month + day + time (assuming current year)
# Extract severity: look for "error", "fail", "warn" in message
Python traceback extraction:
^(?P\x3Cexception_type>\w+(?:\w+)*Error):\s+(?P\x3Cmessage>.*)$
File\s+"(?P\x3Cfile>.+?)",\s+line\s+(?P\x3Cline>\d+)(?:,\s+in\s+(?P\x3Cfunction>.+))?
(?:\s+\d+\s+chars\.\s+(?P\x3Ccode_context>.*))?$
Apache error log extraction:
\[(?P\x3Ctimestamp>[^\]]+)\]\s+\[(?P\x3Clevel>\w+)\](?:\s+\[client\s+(?P\x3Cclient_ip>[^\]]+)\])?\s+(?P\x3Cmessage>.*)$
PHP\s+(?P\x3Cphp_level>Fatal|Parse|Recoverable)\s+error:\s+(?P\x3Cphp_message>.*?)\s+in\s+(?P\x3Cphp_file>.*?)\s+on\s+line\s+(?P\x3Cphp_line>\d+)
Generic timestamp + level + message:
^(?P\x3Ctimestamp>\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)\s+(?P\x3Clevel>ERROR|WARN|WARNING|INFO|DEBUG|CRITICAL|FATAL|TRACE)\s+(?:\[(?P\x3Ccomponent>\w+)\]\s+)?(?P\x3Cmessage>.*)$
Step 3: Extract structured data
For each log line:
- Apply format-specific pattern
- If no match, try next format pattern (in order of detection confidence)
- If no pattern matches, try generic multi-pattern extraction:
- Look for
ERROR|FATAL|CRITICAL|WARN→ severity - Look for
\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}→ timestamp - Look for
at [module.]function [(]file:line[)]→ stack trace frame - Use remaining text as message
- Look for
Collect: line_number, raw_line, extracted_fields, pattern_matched, confidence.
Step 4: Classify and assign confidence
Severity classification:
CRITICAL:CRITICAL,FATAL,EMERGENCY,ALERT(or exit code 1 + stack trace)ERROR:ERROR,ERR,SEVEREWARNING:WARNING,WARN,CAUTIONINFO:INFO,INFORMATION,NOTICEDEBUG:DEBUG,TRACE,VERBOSE
Confidence scoring:
| Signal | Score modifier |
|---|---|
| Matches format-specific regex perfectly | +0.3 |
| Has structured fields (timestamp, level, message all present) | +0.2 |
| Has stack trace (multi-line exception) | +0.2 |
| Has request/trace ID for correlation | +0.1 |
| Single-line generic (no structured fields) | -0.2 |
| Ambiguous severity (e.g., "Error" in user text, not log level) | -0.3 |
| Cannot determine timestamp | -0.2 |
Final confidence: base 0.5 + modifiers, clamped to [0.0, 1.0].
Step 5: Group and deduplicate
Error fingerprinting: Create a fingerprint for each error to group duplicates:
fingerprint = hash(exception_type + first line of message (first 100 chars) + file name from stack trace)
Group errors with same fingerprint. Report:
count: number of occurrencesfirst_seen: timestamp of first occurrencelast_seen: timestamp of last occurrenceexamples: first 3 raw error messages
Near-duplicate detection: If two errors have > 90% similarity in message text (Levenshtein distance), flag as related:
{"related_to": "error_id_123", "similarity": 0.94, "difference": "timestamp value changed"}
Step 6: Produce output
Verbose mode output:
{
"total_lines": 4821,
"total_extracted": 127,
"by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
"extractions": [
{
"id": "err_001",
"line_number": 142,
"severity": "ERROR",
"timestamp": "2024-01-15T10:23:45Z",
"message": "Connection refused: ECONNREFUSED 1.2.3.4:5432",
"fingerprint": "a3f7c...",
"confidence": 0.92,
"pattern_used": "syslog_generic",
"context": {
"before": [lines 140, 141],
"after": [lines 143, 144]
},
"stack_trace": null,
"request_id": "req_abc123",
"source": "app.log"
}
],
"grouped_errors": [
{
"fingerprint": "a3f7c...",
"count": 12,
"first_seen": "2024-01-15T10:23:00Z",
"last_seen": "2024-01-15T11:45:00Z",
"type": "Connection refused",
"examples": ["...", "...", "..."]
}
],
"format_detected": "syslog",
"parsing_warnings": [
{"line": 87, "issue": "JSON parse error at char 234 — truncated", "raw": "{\"timestamp\":"}
]
}
Summary mode output:
{
"total_errors": 127,
"by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
"by_type": [
{"type": "Connection refused", "count": 23, "severity": "ERROR", "example": "ECONNREFUSED 1.2.3.4:5432"},
{"type": "Timeout", "count": 18, "severity": "ERROR", "example": "Request timeout after 30s"},
{"type": "Out of memory", "count": 3, "severity": "CRITICAL", "example": "FATAL: out of memory (OOM)"}
],
"top_5_errors": [...],
"time_range": {"first": "2024-01-15T10:00:00Z", "last": "2024-01-15T12:00:00Z"},
"format_detected": "syslog"
}
Mandatory Rules
Do not
- Do not suppress parsing failures — if a line cannot be parsed, report it in
parsing_warningswith the raw content - Do not assign high confidence to errors extracted from ambiguous or inconsistent patterns (e.g., "Error: foo" in user text vs log level)
- Do not extract PII (IP addresses, usernames, email addresses in error messages) without explicit user confirmation
- Do not apply patterns that were not detected in the input format — do not assume JSON format if input is plain text
- Do not silently truncate long input — if input exceeds 10MB or 50K lines, report truncation with count
- Do not deduplicate errors across different severity levels — a WARNING and an ERROR with same message are different
Do
- Report
format_detectedwith every output so the user knows how the log was interpreted - Include
line_numberandcontext(surrounding lines) for every extraction so the user can verify - Group duplicate errors and report frequency — a repeated error is more likely to be the root cause
- Flag low-confidence extractions (confidence \x3C 0.6) with
needs_manual_review: true - Detect and report mixed formats (e.g., JSON lines + plain text) — handle each line with appropriate parser
- Include the pattern/rule used for each extraction in verbose mode so the user can audit
Quality Bar
| Criterion | Minimum | Ideal |
|---|---|---|
| Format detection accuracy | Correct format detected for known log types | Auto-detect and parse mixed-format logs |
| Extraction coverage | >= 95% of identifiable error lines | 100% with confidence score per line |
| False positive rate | \x3C 5% of extractions wrong or misclassified | \x3C 1% with manual review flagging |
| Confidence calibration | Low (\x3C 0.6) items flagged for review | All extractions pre-screened, high confidence only |
| Context preserved | Every extraction has source line_number | Source line + 2 context lines before and after |
| Deduplication | Exact duplicates removed | Near-duplicates (90%+) flagged as related |
| Error grouping | Errors grouped by fingerprint | Grouped by root cause type + severity |
A good extraction result identifies the log format correctly, extracts every error with a confidence score, preserves source location, groups duplicate errors, and clearly flags low-confidence or ambiguous entries for manual review.
A good output has source location preserved, confidence score assigned, and grouped duplicate errors.
Good vs. Bad Examples
| Scenario | Bad | Good |
|---|---|---|
| Format detection | Assumes all logs are JSON, fails on plain text | Auto-detects syslog/plain/json and applies appropriate parser |
| Low confidence match | Marks as high confidence without flagging | Reports "Extraction 23: confidence 0.45 — flagged for manual review" |
| Mixed format | Stops at first parse error | Continues with warnings: "12 lines parsed as syslog, 3 lines skipped (unrecognized format)" |
| Duplicate errors | Reports each occurrence separately | Groups 50 identical "Connection refused" errors with count=50 and first/last timestamps |
| Stack trace split | Only extracts first line of 20-line trace | Extracts entire trace as one record: type + message + all 20 frames |
| Truncation | Silently stops at 10K lines | Reports "Input truncated at line 50000 (limit) — processed 50K of 120K lines" |
| PII in error | Extracts IP addresses without warning | Flags: "IP address 1.2.3.4 detected in error message — redact unless --include-pii is set" |
| Unparseable line | Drops line silently | Reports "Line 847: unparseable — raw: {broken json..." in parsing_warnings |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install extract-error-patterns - After installation, invoke the skill by name or use
/extract-error-patterns - Provide required inputs per the skill's parameter spec and get structured output
What is Extract Error Patterns?
Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.
How do I install Extract Error Patterns?
Run "/install extract-error-patterns" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Extract Error Patterns free?
Yes, Extract Error Patterns is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Extract Error Patterns support?
Extract Error Patterns is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Extract Error Patterns?
It is built and maintained by 王继鹏 (@wangjipeng977); the current version is v1.0.0.