Extract Error Patterns

Name: Extract Error Patterns
Author: wangjipeng977

Description

Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col...

README (SKILL.md)

Core Position

This skill extracts structured error and log data from unstructured text using pattern matching, regex extraction, and classification rules. It handles multiple log formats (JSON, syslog, Apache, Nginx, Python tracebacks, Node.js stack traces, Docker, Kubernetes) and produces structured output grouped by error type, severity, and frequency.

Key responsibilities:

Auto-detect log format by examining structure (JSON lines, timestamp patterns, key=value patterns)
Apply format-specific regex patterns to extract: timestamp, level, error message, stack trace, source file, line number, request ID, session ID
Classify errors into categories: CRITICAL, ERROR, WARNING, INFO (by severity level mapping)
Group duplicate/near-duplicate errors and report frequency counts
Flag low-confidence extractions and ambiguous formats for manual review

Modes

`/extract-error-patterns --verbose`

Verbose mode. Returns every match with:

matched_text: the exact matched text
confidence: 0.0-1.0 based on pattern strength
position: line number in source
context: 2 lines before and after (surrounding log entries)
pattern_used: which regex/rule matched this entry

Use when: debugging, doing root cause analysis, or auditing extraction quality.

`/extract-error-patterns --summary`

Summary mode. Returns aggregated results:

total_errors: count
by_severity: {CRITICAL: N, ERROR: N, WARNING: N, INFO: N}
by_type: grouped error messages with count and examples
top_5_errors: most frequent errors with fingerprint and first occurrence
time_range: first to last log entry

Use when: getting a high-level overview for monitoring dashboards.

`/extract-error-patterns --json`

JSON output mode. Expects JSON-formatted log input. Extracts structured fields from each JSON object: timestamp, level, message, error, stack_trace, and any custom fields present.

`/extract-error-patterns --stacktrace`

Stack trace only mode. Focuses exclusively on extracting stack traces (exceptions) from application logs. Parses multi-line stack traces into structured records with: exception type, message, frames (file, line, function), and git commit hash if present.

`/extract-error-patterns --custom`

Custom pattern mode. Accepts user-provided regex patterns via --patterns flag:

--patterns "ERROR.*connection refused|CRITICAL.*out of memory" --fields error_type,detail

Extracts using the provided patterns, reports which pattern matched each result.

Execution Steps

Step 1: Detect log format

Read the first 20 lines of input. Examine patterns to identify format:

JSON Lines (NDJSON):

{"timestamp":"2024-01-15T10:23:45Z","level":"error","message":"Connection refused"}
{"timestamp":"2024-01-15T10:23:46Z","level":"warn","message":"Retrying..."}

Pattern: lines start with { and are valid JSON when parsed individually. Fields to extract: all keys from each JSON object.

Syslog (RFC 3164 / RFC 5424):

Jan 15 10:23:45 hostname sshd[1234]: Failed password for invalid user admin from 1.2.3.4 port 54321 ssh2

Pattern: ^[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2} RFC 5424: \x3Cpriority>timestamp hostname process[pid]: message Fields: timestamp, hostname, process, pid, message.

Apache/Nginx error log:

[Thu Jan 15 10:23:45.123456 2024] [error] [client 1.2.3.4] PHP Fatal error: Uncaught Exception: DB connection failed in /var/www/html/db.php:12
[Thu Jan 15 10:23:46.789012 2024] [warn] [pid 1234] Request timeout from 1.2.3.4

Pattern: \[\w+ \w+ \d+ [\d:.]+ \d{4}\] \[(\w+)\] Fields: timestamp, level (error/warn/critical), client IP, message.

Python traceback:

Traceback (most recent call last):
  File "app.py", line 123, in main
    db.query("SELECT * FROM users")
RuntimeError: DB connection failed

During handling of the above exception, another exception occurred:
...

Pattern: starts with Traceback (most recent call last):, contains File "..." lines. Fields: exception type, exception message, frames (file, line, function).

Node.js stack trace:

Error: ENOENT: no such file or directory, open '/tmp/data.json'
    at Object.openSync (node:fs:914:3)
    at Object.readFileSync (node:fs:555:3)
    at main (app.js:45:10)

Pattern: starts with Error: or ReferenceError:, contains at lines with node: or .js references.

Docker/Kubernetes log:

2024-01-15T10:23:45.123456789Z stdout F Application started on port 3000
2024-01-15T10:23:46.123456789Z stderr F Error: Cannot connect to database

Pattern: Kubernetes structured log format: timestamp stdout/stderr stream_flag message.

Plain text with timestamps:

2024-01-15 10:23:45 ERROR [app] Connection to db failed: timeout after 30s
2024-01-15 10:23:46 WARN  [api] Request timeout from client=1.2.3.4

Pattern: ^\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → detect timestamp prefix.

If no known format detected, treat as plain text and apply generic extraction (see Step 3).

Step 2: Define regex patterns for format-specific extraction

For detected format, use appropriate patterns:

JSON Lines extraction:

import json, re
# Each line is a JSON object — parse directly
# Extract timestamp: try fields "timestamp", "time", "@timestamp", "date"
# Extract level: try fields "level", "severity", "log_level", "loglevel"
# Extract message: try fields "message", "msg", "text", "log"
# Extract stack_trace: try fields "stack_trace", "stack", "error.stack", "exception"

Syslog extraction:

^(?P\x3Cmonth>\w{3})\s+(?P\x3Cday>\d{1,2})\s+(?P\x3Ctime>\d{2}:\d{2}:\d{2})\s+(?P\x3Chost>\S+)\s+(?P\x3Cprocess>\S+?)(?:\[(?P\x3Cpid>\d+)\])?:\s+(?P\x3Cmessage>.*)$
# timestamp: month + day + time (assuming current year)
# Extract severity: look for "error", "fail", "warn" in message

Python traceback extraction:

^(?P\x3Cexception_type>\w+(?:\w+)*Error):\s+(?P\x3Cmessage>.*)$
File\s+"(?P\x3Cfile>.+?)",\s+line\s+(?P\x3Cline>\d+)(?:,\s+in\s+(?P\x3Cfunction>.+))?
  (?:\s+\d+\s+chars\.\s+(?P\x3Ccode_context>.*))?$

Apache error log extraction:

\[(?P\x3Ctimestamp>[^\]]+)\]\s+\[(?P\x3Clevel>\w+)\](?:\s+\[client\s+(?P\x3Cclient_ip>[^\]]+)\])?\s+(?P\x3Cmessage>.*)$
PHP\s+(?P\x3Cphp_level>Fatal|Parse|Recoverable)\s+error:\s+(?P\x3Cphp_message>.*?)\s+in\s+(?P\x3Cphp_file>.*?)\s+on\s+line\s+(?P\x3Cphp_line>\d+)

Generic timestamp + level + message:

^(?P\x3Ctimestamp>\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)\s+(?P\x3Clevel>ERROR|WARN|WARNING|INFO|DEBUG|CRITICAL|FATAL|TRACE)\s+(?:\[(?P\x3Ccomponent>\w+)\]\s+)?(?P\x3Cmessage>.*)$

Step 3: Extract structured data

For each log line:

Apply format-specific pattern
If no match, try next format pattern (in order of detection confidence)
If no pattern matches, try generic multi-pattern extraction:
- Look for ERROR|FATAL|CRITICAL|WARN → severity
- Look for \d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → timestamp
- Look for at [module.]function [(]file:line[)] → stack trace frame
- Use remaining text as message

Collect: line_number, raw_line, extracted_fields, pattern_matched, confidence.

Step 4: Classify and assign confidence

Severity classification:

CRITICAL: CRITICAL, FATAL, EMERGENCY, ALERT (or exit code 1 + stack trace)
ERROR: ERROR, ERR, SEVERE
WARNING: WARNING, WARN, CAUTION
INFO: INFO, INFORMATION, NOTICE
DEBUG: DEBUG, TRACE, VERBOSE

Confidence scoring:

Signal	Score modifier
Matches format-specific regex perfectly	+0.3
Has structured fields (timestamp, level, message all present)	+0.2
Has stack trace (multi-line exception)	+0.2
Has request/trace ID for correlation	+0.1
Single-line generic (no structured fields)	-0.2
Ambiguous severity (e.g., "Error" in user text, not log level)	-0.3
Cannot determine timestamp	-0.2

Final confidence: base 0.5 + modifiers, clamped to [0.0, 1.0].

Step 5: Group and deduplicate

Error fingerprinting: Create a fingerprint for each error to group duplicates:

fingerprint = hash(exception_type + first line of message (first 100 chars) + file name from stack trace)

Group errors with same fingerprint. Report:

count: number of occurrences
first_seen: timestamp of first occurrence
last_seen: timestamp of last occurrence
examples: first 3 raw error messages

Near-duplicate detection: If two errors have > 90% similarity in message text (Levenshtein distance), flag as related:

{"related_to": "error_id_123", "similarity": 0.94, "difference": "timestamp value changed"}

Step 6: Produce output

Verbose mode output:

{
  "total_lines": 4821,
  "total_extracted": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "extractions": [
    {
      "id": "err_001",
      "line_number": 142,
      "severity": "ERROR",
      "timestamp": "2024-01-15T10:23:45Z",
      "message": "Connection refused: ECONNREFUSED 1.2.3.4:5432",
      "fingerprint": "a3f7c...",
      "confidence": 0.92,
      "pattern_used": "syslog_generic",
      "context": {
        "before": [lines 140, 141],
        "after": [lines 143, 144]
      },
      "stack_trace": null,
      "request_id": "req_abc123",
      "source": "app.log"
    }
  ],
  "grouped_errors": [
    {
      "fingerprint": "a3f7c...",
      "count": 12,
      "first_seen": "2024-01-15T10:23:00Z",
      "last_seen": "2024-01-15T11:45:00Z",
      "type": "Connection refused",
      "examples": ["...", "...", "..."]
    }
  ],
  "format_detected": "syslog",
  "parsing_warnings": [
    {"line": 87, "issue": "JSON parse error at char 234 — truncated", "raw": "{\"timestamp\":"}
  ]
}

Summary mode output:

{
  "total_errors": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "by_type": [
    {"type": "Connection refused", "count": 23, "severity": "ERROR", "example": "ECONNREFUSED 1.2.3.4:5432"},
    {"type": "Timeout", "count": 18, "severity": "ERROR", "example": "Request timeout after 30s"},
    {"type": "Out of memory", "count": 3, "severity": "CRITICAL", "example": "FATAL: out of memory (OOM)"}
  ],
  "top_5_errors": [...],
  "time_range": {"first": "2024-01-15T10:00:00Z", "last": "2024-01-15T12:00:00Z"},
  "format_detected": "syslog"
}

Mandatory Rules

Do not

Do not suppress parsing failures — if a line cannot be parsed, report it in parsing_warnings with the raw content
Do not assign high confidence to errors extracted from ambiguous or inconsistent patterns (e.g., "Error: foo" in user text vs log level)
Do not extract PII (IP addresses, usernames, email addresses in error messages) without explicit user confirmation
Do not apply patterns that were not detected in the input format — do not assume JSON format if input is plain text
Do not silently truncate long input — if input exceeds 10MB or 50K lines, report truncation with count
Do not deduplicate errors across different severity levels — a WARNING and an ERROR with same message are different

Do

Report format_detected with every output so the user knows how the log was interpreted
Include line_number and context (surrounding lines) for every extraction so the user can verify
Group duplicate errors and report frequency — a repeated error is more likely to be the root cause
Flag low-confidence extractions (confidence \x3C 0.6) with needs_manual_review: true
Detect and report mixed formats (e.g., JSON lines + plain text) — handle each line with appropriate parser
Include the pattern/rule used for each extraction in verbose mode so the user can audit

Quality Bar

Criterion	Minimum	Ideal
Format detection accuracy	Correct format detected for known log types	Auto-detect and parse mixed-format logs
Extraction coverage	>= 95% of identifiable error lines	100% with confidence score per line
False positive rate	\x3C 5% of extractions wrong or misclassified	\x3C 1% with manual review flagging
Confidence calibration	Low (\x3C 0.6) items flagged for review	All extractions pre-screened, high confidence only
Context preserved	Every extraction has source line_number	Source line + 2 context lines before and after
Deduplication	Exact duplicates removed	Near-duplicates (90%+) flagged as related
Error grouping	Errors grouped by fingerprint	Grouped by root cause type + severity

A good extraction result identifies the log format correctly, extracts every error with a confidence score, preserves source location, groups duplicate errors, and clearly flags low-confidence or ambiguous entries for manual review.

A good output has source location preserved, confidence score assigned, and grouped duplicate errors.

Good vs. Bad Examples

Scenario	Bad	Good
Format detection	Assumes all logs are JSON, fails on plain text	Auto-detects syslog/plain/json and applies appropriate parser
Low confidence match	Marks as high confidence without flagging	Reports "Extraction 23: confidence 0.45 — flagged for manual review"
Mixed format	Stops at first parse error	Continues with warnings: "12 lines parsed as syslog, 3 lines skipped (unrecognized format)"
Duplicate errors	Reports each occurrence separately	Groups 50 identical "Connection refused" errors with count=50 and first/last timestamps
Stack trace split	Only extracts first line of 20-line trace	Extracts entire trace as one record: type + message + all 20 frames
Truncation	Silently stops at 10K lines	Reports "Input truncated at line 50000 (limit) — processed 50K of 120K lines"
PII in error	Extracts IP addresses without warning	Flags: "IP address 1.2.3.4 detected in error message — redact unless --include-pii is set"
Unparseable line	Drops line silently	Reports "Line 847: unparseable — raw: `{broken json...`" in parsing_warnings

Capability Tags

requires-sensitive-credentials

Version History

v1.0.0

Initial release

Metadata

Slug extract-error-patterns

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Extract Error Patterns?

Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.

How do I install Extract Error Patterns?

Run "/install extract-error-patterns" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Extract Error Patterns free?

Yes, Extract Error Patterns is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Extract Error Patterns support?

Extract Error Patterns is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Extract Error Patterns?

It is built and maintained by 王继鹏 (@wangjipeng977); the current version is v1.0.0.

More Skills