← Back to Skills Marketplace
wangjipeng977

Extract Error Patterns

by 王继鹏 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ pending
36
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install extract-error-patterns
Description
Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col...
README (SKILL.md)

Core Position

This skill extracts structured error and log data from unstructured text using pattern matching, regex extraction, and classification rules. It handles multiple log formats (JSON, syslog, Apache, Nginx, Python tracebacks, Node.js stack traces, Docker, Kubernetes) and produces structured output grouped by error type, severity, and frequency.

Key responsibilities:

  • Auto-detect log format by examining structure (JSON lines, timestamp patterns, key=value patterns)
  • Apply format-specific regex patterns to extract: timestamp, level, error message, stack trace, source file, line number, request ID, session ID
  • Classify errors into categories: CRITICAL, ERROR, WARNING, INFO (by severity level mapping)
  • Group duplicate/near-duplicate errors and report frequency counts
  • Flag low-confidence extractions and ambiguous formats for manual review

Modes

/extract-error-patterns --verbose

Verbose mode. Returns every match with:

  • matched_text: the exact matched text
  • confidence: 0.0-1.0 based on pattern strength
  • position: line number in source
  • context: 2 lines before and after (surrounding log entries)
  • pattern_used: which regex/rule matched this entry

Use when: debugging, doing root cause analysis, or auditing extraction quality.

/extract-error-patterns --summary

Summary mode. Returns aggregated results:

  • total_errors: count
  • by_severity: {CRITICAL: N, ERROR: N, WARNING: N, INFO: N}
  • by_type: grouped error messages with count and examples
  • top_5_errors: most frequent errors with fingerprint and first occurrence
  • time_range: first to last log entry

Use when: getting a high-level overview for monitoring dashboards.

/extract-error-patterns --json

JSON output mode. Expects JSON-formatted log input. Extracts structured fields from each JSON object: timestamp, level, message, error, stack_trace, and any custom fields present.

/extract-error-patterns --stacktrace

Stack trace only mode. Focuses exclusively on extracting stack traces (exceptions) from application logs. Parses multi-line stack traces into structured records with: exception type, message, frames (file, line, function), and git commit hash if present.

/extract-error-patterns --custom

Custom pattern mode. Accepts user-provided regex patterns via --patterns flag:

--patterns "ERROR.*connection refused|CRITICAL.*out of memory" --fields error_type,detail

Extracts using the provided patterns, reports which pattern matched each result.

Execution Steps

Step 1: Detect log format

Read the first 20 lines of input. Examine patterns to identify format:

JSON Lines (NDJSON):

{"timestamp":"2024-01-15T10:23:45Z","level":"error","message":"Connection refused"}
{"timestamp":"2024-01-15T10:23:46Z","level":"warn","message":"Retrying..."}

Pattern: lines start with { and are valid JSON when parsed individually. Fields to extract: all keys from each JSON object.

Syslog (RFC 3164 / RFC 5424):

Jan 15 10:23:45 hostname sshd[1234]: Failed password for invalid user admin from 1.2.3.4 port 54321 ssh2

Pattern: ^[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2} RFC 5424: \x3Cpriority>timestamp hostname process[pid]: message Fields: timestamp, hostname, process, pid, message.

Apache/Nginx error log:

[Thu Jan 15 10:23:45.123456 2024] [error] [client 1.2.3.4] PHP Fatal error: Uncaught Exception: DB connection failed in /var/www/html/db.php:12
[Thu Jan 15 10:23:46.789012 2024] [warn] [pid 1234] Request timeout from 1.2.3.4

Pattern: \[\w+ \w+ \d+ [\d:.]+ \d{4}\] \[(\w+)\] Fields: timestamp, level (error/warn/critical), client IP, message.

Python traceback:

Traceback (most recent call last):
  File "app.py", line 123, in main
    db.query("SELECT * FROM users")
RuntimeError: DB connection failed

During handling of the above exception, another exception occurred:
...

Pattern: starts with Traceback (most recent call last):, contains File "..." lines. Fields: exception type, exception message, frames (file, line, function).

Node.js stack trace:

Error: ENOENT: no such file or directory, open '/tmp/data.json'
    at Object.openSync (node:fs:914:3)
    at Object.readFileSync (node:fs:555:3)
    at main (app.js:45:10)

Pattern: starts with Error: or ReferenceError:, contains at lines with node: or .js references.

Docker/Kubernetes log:

2024-01-15T10:23:45.123456789Z stdout F Application started on port 3000
2024-01-15T10:23:46.123456789Z stderr F Error: Cannot connect to database

Pattern: Kubernetes structured log format: timestamp stdout/stderr stream_flag message.

Plain text with timestamps:

2024-01-15 10:23:45 ERROR [app] Connection to db failed: timeout after 30s
2024-01-15 10:23:46 WARN  [api] Request timeout from client=1.2.3.4

Pattern: ^\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → detect timestamp prefix.

If no known format detected, treat as plain text and apply generic extraction (see Step 3).

Step 2: Define regex patterns for format-specific extraction

For detected format, use appropriate patterns:

JSON Lines extraction:

import json, re
# Each line is a JSON object — parse directly
# Extract timestamp: try fields "timestamp", "time", "@timestamp", "date"
# Extract level: try fields "level", "severity", "log_level", "loglevel"
# Extract message: try fields "message", "msg", "text", "log"
# Extract stack_trace: try fields "stack_trace", "stack", "error.stack", "exception"

Syslog extraction:

^(?P\x3Cmonth>\w{3})\s+(?P\x3Cday>\d{1,2})\s+(?P\x3Ctime>\d{2}:\d{2}:\d{2})\s+(?P\x3Chost>\S+)\s+(?P\x3Cprocess>\S+?)(?:\[(?P\x3Cpid>\d+)\])?:\s+(?P\x3Cmessage>.*)$
# timestamp: month + day + time (assuming current year)
# Extract severity: look for "error", "fail", "warn" in message

Python traceback extraction:

^(?P\x3Cexception_type>\w+(?:\w+)*Error):\s+(?P\x3Cmessage>.*)$
File\s+"(?P\x3Cfile>.+?)",\s+line\s+(?P\x3Cline>\d+)(?:,\s+in\s+(?P\x3Cfunction>.+))?
  (?:\s+\d+\s+chars\.\s+(?P\x3Ccode_context>.*))?$

Apache error log extraction:

\[(?P\x3Ctimestamp>[^\]]+)\]\s+\[(?P\x3Clevel>\w+)\](?:\s+\[client\s+(?P\x3Cclient_ip>[^\]]+)\])?\s+(?P\x3Cmessage>.*)$
PHP\s+(?P\x3Cphp_level>Fatal|Parse|Recoverable)\s+error:\s+(?P\x3Cphp_message>.*?)\s+in\s+(?P\x3Cphp_file>.*?)\s+on\s+line\s+(?P\x3Cphp_line>\d+)

Generic timestamp + level + message:

^(?P\x3Ctimestamp>\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)\s+(?P\x3Clevel>ERROR|WARN|WARNING|INFO|DEBUG|CRITICAL|FATAL|TRACE)\s+(?:\[(?P\x3Ccomponent>\w+)\]\s+)?(?P\x3Cmessage>.*)$

Step 3: Extract structured data

For each log line:

  1. Apply format-specific pattern
  2. If no match, try next format pattern (in order of detection confidence)
  3. If no pattern matches, try generic multi-pattern extraction:
    • Look for ERROR|FATAL|CRITICAL|WARN → severity
    • Look for \d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → timestamp
    • Look for at [module.]function [(]file:line[)] → stack trace frame
    • Use remaining text as message

Collect: line_number, raw_line, extracted_fields, pattern_matched, confidence.

Step 4: Classify and assign confidence

Severity classification:

  • CRITICAL: CRITICAL, FATAL, EMERGENCY, ALERT (or exit code 1 + stack trace)
  • ERROR: ERROR, ERR, SEVERE
  • WARNING: WARNING, WARN, CAUTION
  • INFO: INFO, INFORMATION, NOTICE
  • DEBUG: DEBUG, TRACE, VERBOSE

Confidence scoring:

Signal Score modifier
Matches format-specific regex perfectly +0.3
Has structured fields (timestamp, level, message all present) +0.2
Has stack trace (multi-line exception) +0.2
Has request/trace ID for correlation +0.1
Single-line generic (no structured fields) -0.2
Ambiguous severity (e.g., "Error" in user text, not log level) -0.3
Cannot determine timestamp -0.2

Final confidence: base 0.5 + modifiers, clamped to [0.0, 1.0].

Step 5: Group and deduplicate

Error fingerprinting: Create a fingerprint for each error to group duplicates:

fingerprint = hash(exception_type + first line of message (first 100 chars) + file name from stack trace)

Group errors with same fingerprint. Report:

  • count: number of occurrences
  • first_seen: timestamp of first occurrence
  • last_seen: timestamp of last occurrence
  • examples: first 3 raw error messages

Near-duplicate detection: If two errors have > 90% similarity in message text (Levenshtein distance), flag as related:

{"related_to": "error_id_123", "similarity": 0.94, "difference": "timestamp value changed"}

Step 6: Produce output

Verbose mode output:

{
  "total_lines": 4821,
  "total_extracted": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "extractions": [
    {
      "id": "err_001",
      "line_number": 142,
      "severity": "ERROR",
      "timestamp": "2024-01-15T10:23:45Z",
      "message": "Connection refused: ECONNREFUSED 1.2.3.4:5432",
      "fingerprint": "a3f7c...",
      "confidence": 0.92,
      "pattern_used": "syslog_generic",
      "context": {
        "before": [lines 140, 141],
        "after": [lines 143, 144]
      },
      "stack_trace": null,
      "request_id": "req_abc123",
      "source": "app.log"
    }
  ],
  "grouped_errors": [
    {
      "fingerprint": "a3f7c...",
      "count": 12,
      "first_seen": "2024-01-15T10:23:00Z",
      "last_seen": "2024-01-15T11:45:00Z",
      "type": "Connection refused",
      "examples": ["...", "...", "..."]
    }
  ],
  "format_detected": "syslog",
  "parsing_warnings": [
    {"line": 87, "issue": "JSON parse error at char 234 — truncated", "raw": "{\"timestamp\":"}
  ]
}

Summary mode output:

{
  "total_errors": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "by_type": [
    {"type": "Connection refused", "count": 23, "severity": "ERROR", "example": "ECONNREFUSED 1.2.3.4:5432"},
    {"type": "Timeout", "count": 18, "severity": "ERROR", "example": "Request timeout after 30s"},
    {"type": "Out of memory", "count": 3, "severity": "CRITICAL", "example": "FATAL: out of memory (OOM)"}
  ],
  "top_5_errors": [...],
  "time_range": {"first": "2024-01-15T10:00:00Z", "last": "2024-01-15T12:00:00Z"},
  "format_detected": "syslog"
}

Mandatory Rules

Do not

  • Do not suppress parsing failures — if a line cannot be parsed, report it in parsing_warnings with the raw content
  • Do not assign high confidence to errors extracted from ambiguous or inconsistent patterns (e.g., "Error: foo" in user text vs log level)
  • Do not extract PII (IP addresses, usernames, email addresses in error messages) without explicit user confirmation
  • Do not apply patterns that were not detected in the input format — do not assume JSON format if input is plain text
  • Do not silently truncate long input — if input exceeds 10MB or 50K lines, report truncation with count
  • Do not deduplicate errors across different severity levels — a WARNING and an ERROR with same message are different

Do

  • Report format_detected with every output so the user knows how the log was interpreted
  • Include line_number and context (surrounding lines) for every extraction so the user can verify
  • Group duplicate errors and report frequency — a repeated error is more likely to be the root cause
  • Flag low-confidence extractions (confidence \x3C 0.6) with needs_manual_review: true
  • Detect and report mixed formats (e.g., JSON lines + plain text) — handle each line with appropriate parser
  • Include the pattern/rule used for each extraction in verbose mode so the user can audit

Quality Bar

Criterion Minimum Ideal
Format detection accuracy Correct format detected for known log types Auto-detect and parse mixed-format logs
Extraction coverage >= 95% of identifiable error lines 100% with confidence score per line
False positive rate \x3C 5% of extractions wrong or misclassified \x3C 1% with manual review flagging
Confidence calibration Low (\x3C 0.6) items flagged for review All extractions pre-screened, high confidence only
Context preserved Every extraction has source line_number Source line + 2 context lines before and after
Deduplication Exact duplicates removed Near-duplicates (90%+) flagged as related
Error grouping Errors grouped by fingerprint Grouped by root cause type + severity

A good extraction result identifies the log format correctly, extracts every error with a confidence score, preserves source location, groups duplicate errors, and clearly flags low-confidence or ambiguous entries for manual review.

A good output has source location preserved, confidence score assigned, and grouped duplicate errors.

Good vs. Bad Examples

Scenario Bad Good
Format detection Assumes all logs are JSON, fails on plain text Auto-detects syslog/plain/json and applies appropriate parser
Low confidence match Marks as high confidence without flagging Reports "Extraction 23: confidence 0.45 — flagged for manual review"
Mixed format Stops at first parse error Continues with warnings: "12 lines parsed as syslog, 3 lines skipped (unrecognized format)"
Duplicate errors Reports each occurrence separately Groups 50 identical "Connection refused" errors with count=50 and first/last timestamps
Stack trace split Only extracts first line of 20-line trace Extracts entire trace as one record: type + message + all 20 frames
Truncation Silently stops at 10K lines Reports "Input truncated at line 50000 (limit) — processed 50K of 120K lines"
PII in error Extracts IP addresses without warning Flags: "IP address 1.2.3.4 detected in error message — redact unless --include-pii is set"
Unparseable line Drops line silently Reports "Line 847: unparseable — raw: {broken json..." in parsing_warnings
Capability Tags
requires-sensitive-credentials
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install extract-error-patterns
  3. After installation, invoke the skill by name or use /extract-error-patterns
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release
Metadata
Slug extract-error-patterns
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Extract Error Patterns?

Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.

How do I install Extract Error Patterns?

Run "/install extract-error-patterns" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Extract Error Patterns free?

Yes, Extract Error Patterns is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Extract Error Patterns support?

Extract Error Patterns is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Extract Error Patterns?

It is built and maintained by 王继鹏 (@wangjipeng977); the current version is v1.0.0.

💬 Comments