← 返回 Skills 市场
wangjipeng977

Extract Error Patterns

作者 王继鹏 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ pending
36
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install extract-error-patterns
功能描述
Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col...
使用说明 (SKILL.md)

Core Position

This skill extracts structured error and log data from unstructured text using pattern matching, regex extraction, and classification rules. It handles multiple log formats (JSON, syslog, Apache, Nginx, Python tracebacks, Node.js stack traces, Docker, Kubernetes) and produces structured output grouped by error type, severity, and frequency.

Key responsibilities:

  • Auto-detect log format by examining structure (JSON lines, timestamp patterns, key=value patterns)
  • Apply format-specific regex patterns to extract: timestamp, level, error message, stack trace, source file, line number, request ID, session ID
  • Classify errors into categories: CRITICAL, ERROR, WARNING, INFO (by severity level mapping)
  • Group duplicate/near-duplicate errors and report frequency counts
  • Flag low-confidence extractions and ambiguous formats for manual review

Modes

/extract-error-patterns --verbose

Verbose mode. Returns every match with:

  • matched_text: the exact matched text
  • confidence: 0.0-1.0 based on pattern strength
  • position: line number in source
  • context: 2 lines before and after (surrounding log entries)
  • pattern_used: which regex/rule matched this entry

Use when: debugging, doing root cause analysis, or auditing extraction quality.

/extract-error-patterns --summary

Summary mode. Returns aggregated results:

  • total_errors: count
  • by_severity: {CRITICAL: N, ERROR: N, WARNING: N, INFO: N}
  • by_type: grouped error messages with count and examples
  • top_5_errors: most frequent errors with fingerprint and first occurrence
  • time_range: first to last log entry

Use when: getting a high-level overview for monitoring dashboards.

/extract-error-patterns --json

JSON output mode. Expects JSON-formatted log input. Extracts structured fields from each JSON object: timestamp, level, message, error, stack_trace, and any custom fields present.

/extract-error-patterns --stacktrace

Stack trace only mode. Focuses exclusively on extracting stack traces (exceptions) from application logs. Parses multi-line stack traces into structured records with: exception type, message, frames (file, line, function), and git commit hash if present.

/extract-error-patterns --custom

Custom pattern mode. Accepts user-provided regex patterns via --patterns flag:

--patterns "ERROR.*connection refused|CRITICAL.*out of memory" --fields error_type,detail

Extracts using the provided patterns, reports which pattern matched each result.

Execution Steps

Step 1: Detect log format

Read the first 20 lines of input. Examine patterns to identify format:

JSON Lines (NDJSON):

{"timestamp":"2024-01-15T10:23:45Z","level":"error","message":"Connection refused"}
{"timestamp":"2024-01-15T10:23:46Z","level":"warn","message":"Retrying..."}

Pattern: lines start with { and are valid JSON when parsed individually. Fields to extract: all keys from each JSON object.

Syslog (RFC 3164 / RFC 5424):

Jan 15 10:23:45 hostname sshd[1234]: Failed password for invalid user admin from 1.2.3.4 port 54321 ssh2

Pattern: ^[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2} RFC 5424: \x3Cpriority>timestamp hostname process[pid]: message Fields: timestamp, hostname, process, pid, message.

Apache/Nginx error log:

[Thu Jan 15 10:23:45.123456 2024] [error] [client 1.2.3.4] PHP Fatal error: Uncaught Exception: DB connection failed in /var/www/html/db.php:12
[Thu Jan 15 10:23:46.789012 2024] [warn] [pid 1234] Request timeout from 1.2.3.4

Pattern: \[\w+ \w+ \d+ [\d:.]+ \d{4}\] \[(\w+)\] Fields: timestamp, level (error/warn/critical), client IP, message.

Python traceback:

Traceback (most recent call last):
  File "app.py", line 123, in main
    db.query("SELECT * FROM users")
RuntimeError: DB connection failed

During handling of the above exception, another exception occurred:
...

Pattern: starts with Traceback (most recent call last):, contains File "..." lines. Fields: exception type, exception message, frames (file, line, function).

Node.js stack trace:

Error: ENOENT: no such file or directory, open '/tmp/data.json'
    at Object.openSync (node:fs:914:3)
    at Object.readFileSync (node:fs:555:3)
    at main (app.js:45:10)

Pattern: starts with Error: or ReferenceError:, contains at lines with node: or .js references.

Docker/Kubernetes log:

2024-01-15T10:23:45.123456789Z stdout F Application started on port 3000
2024-01-15T10:23:46.123456789Z stderr F Error: Cannot connect to database

Pattern: Kubernetes structured log format: timestamp stdout/stderr stream_flag message.

Plain text with timestamps:

2024-01-15 10:23:45 ERROR [app] Connection to db failed: timeout after 30s
2024-01-15 10:23:46 WARN  [api] Request timeout from client=1.2.3.4

Pattern: ^\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → detect timestamp prefix.

If no known format detected, treat as plain text and apply generic extraction (see Step 3).

Step 2: Define regex patterns for format-specific extraction

For detected format, use appropriate patterns:

JSON Lines extraction:

import json, re
# Each line is a JSON object — parse directly
# Extract timestamp: try fields "timestamp", "time", "@timestamp", "date"
# Extract level: try fields "level", "severity", "log_level", "loglevel"
# Extract message: try fields "message", "msg", "text", "log"
# Extract stack_trace: try fields "stack_trace", "stack", "error.stack", "exception"

Syslog extraction:

^(?P\x3Cmonth>\w{3})\s+(?P\x3Cday>\d{1,2})\s+(?P\x3Ctime>\d{2}:\d{2}:\d{2})\s+(?P\x3Chost>\S+)\s+(?P\x3Cprocess>\S+?)(?:\[(?P\x3Cpid>\d+)\])?:\s+(?P\x3Cmessage>.*)$
# timestamp: month + day + time (assuming current year)
# Extract severity: look for "error", "fail", "warn" in message

Python traceback extraction:

^(?P\x3Cexception_type>\w+(?:\w+)*Error):\s+(?P\x3Cmessage>.*)$
File\s+"(?P\x3Cfile>.+?)",\s+line\s+(?P\x3Cline>\d+)(?:,\s+in\s+(?P\x3Cfunction>.+))?
  (?:\s+\d+\s+chars\.\s+(?P\x3Ccode_context>.*))?$

Apache error log extraction:

\[(?P\x3Ctimestamp>[^\]]+)\]\s+\[(?P\x3Clevel>\w+)\](?:\s+\[client\s+(?P\x3Cclient_ip>[^\]]+)\])?\s+(?P\x3Cmessage>.*)$
PHP\s+(?P\x3Cphp_level>Fatal|Parse|Recoverable)\s+error:\s+(?P\x3Cphp_message>.*?)\s+in\s+(?P\x3Cphp_file>.*?)\s+on\s+line\s+(?P\x3Cphp_line>\d+)

Generic timestamp + level + message:

^(?P\x3Ctimestamp>\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)\s+(?P\x3Clevel>ERROR|WARN|WARNING|INFO|DEBUG|CRITICAL|FATAL|TRACE)\s+(?:\[(?P\x3Ccomponent>\w+)\]\s+)?(?P\x3Cmessage>.*)$

Step 3: Extract structured data

For each log line:

  1. Apply format-specific pattern
  2. If no match, try next format pattern (in order of detection confidence)
  3. If no pattern matches, try generic multi-pattern extraction:
    • Look for ERROR|FATAL|CRITICAL|WARN → severity
    • Look for \d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2} → timestamp
    • Look for at [module.]function [(]file:line[)] → stack trace frame
    • Use remaining text as message

Collect: line_number, raw_line, extracted_fields, pattern_matched, confidence.

Step 4: Classify and assign confidence

Severity classification:

  • CRITICAL: CRITICAL, FATAL, EMERGENCY, ALERT (or exit code 1 + stack trace)
  • ERROR: ERROR, ERR, SEVERE
  • WARNING: WARNING, WARN, CAUTION
  • INFO: INFO, INFORMATION, NOTICE
  • DEBUG: DEBUG, TRACE, VERBOSE

Confidence scoring:

Signal Score modifier
Matches format-specific regex perfectly +0.3
Has structured fields (timestamp, level, message all present) +0.2
Has stack trace (multi-line exception) +0.2
Has request/trace ID for correlation +0.1
Single-line generic (no structured fields) -0.2
Ambiguous severity (e.g., "Error" in user text, not log level) -0.3
Cannot determine timestamp -0.2

Final confidence: base 0.5 + modifiers, clamped to [0.0, 1.0].

Step 5: Group and deduplicate

Error fingerprinting: Create a fingerprint for each error to group duplicates:

fingerprint = hash(exception_type + first line of message (first 100 chars) + file name from stack trace)

Group errors with same fingerprint. Report:

  • count: number of occurrences
  • first_seen: timestamp of first occurrence
  • last_seen: timestamp of last occurrence
  • examples: first 3 raw error messages

Near-duplicate detection: If two errors have > 90% similarity in message text (Levenshtein distance), flag as related:

{"related_to": "error_id_123", "similarity": 0.94, "difference": "timestamp value changed"}

Step 6: Produce output

Verbose mode output:

{
  "total_lines": 4821,
  "total_extracted": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "extractions": [
    {
      "id": "err_001",
      "line_number": 142,
      "severity": "ERROR",
      "timestamp": "2024-01-15T10:23:45Z",
      "message": "Connection refused: ECONNREFUSED 1.2.3.4:5432",
      "fingerprint": "a3f7c...",
      "confidence": 0.92,
      "pattern_used": "syslog_generic",
      "context": {
        "before": [lines 140, 141],
        "after": [lines 143, 144]
      },
      "stack_trace": null,
      "request_id": "req_abc123",
      "source": "app.log"
    }
  ],
  "grouped_errors": [
    {
      "fingerprint": "a3f7c...",
      "count": 12,
      "first_seen": "2024-01-15T10:23:00Z",
      "last_seen": "2024-01-15T11:45:00Z",
      "type": "Connection refused",
      "examples": ["...", "...", "..."]
    }
  ],
  "format_detected": "syslog",
  "parsing_warnings": [
    {"line": 87, "issue": "JSON parse error at char 234 — truncated", "raw": "{\"timestamp\":"}
  ]
}

Summary mode output:

{
  "total_errors": 127,
  "by_severity": {"CRITICAL": 3, "ERROR": 54, "WARNING": 41, "INFO": 29},
  "by_type": [
    {"type": "Connection refused", "count": 23, "severity": "ERROR", "example": "ECONNREFUSED 1.2.3.4:5432"},
    {"type": "Timeout", "count": 18, "severity": "ERROR", "example": "Request timeout after 30s"},
    {"type": "Out of memory", "count": 3, "severity": "CRITICAL", "example": "FATAL: out of memory (OOM)"}
  ],
  "top_5_errors": [...],
  "time_range": {"first": "2024-01-15T10:00:00Z", "last": "2024-01-15T12:00:00Z"},
  "format_detected": "syslog"
}

Mandatory Rules

Do not

  • Do not suppress parsing failures — if a line cannot be parsed, report it in parsing_warnings with the raw content
  • Do not assign high confidence to errors extracted from ambiguous or inconsistent patterns (e.g., "Error: foo" in user text vs log level)
  • Do not extract PII (IP addresses, usernames, email addresses in error messages) without explicit user confirmation
  • Do not apply patterns that were not detected in the input format — do not assume JSON format if input is plain text
  • Do not silently truncate long input — if input exceeds 10MB or 50K lines, report truncation with count
  • Do not deduplicate errors across different severity levels — a WARNING and an ERROR with same message are different

Do

  • Report format_detected with every output so the user knows how the log was interpreted
  • Include line_number and context (surrounding lines) for every extraction so the user can verify
  • Group duplicate errors and report frequency — a repeated error is more likely to be the root cause
  • Flag low-confidence extractions (confidence \x3C 0.6) with needs_manual_review: true
  • Detect and report mixed formats (e.g., JSON lines + plain text) — handle each line with appropriate parser
  • Include the pattern/rule used for each extraction in verbose mode so the user can audit

Quality Bar

Criterion Minimum Ideal
Format detection accuracy Correct format detected for known log types Auto-detect and parse mixed-format logs
Extraction coverage >= 95% of identifiable error lines 100% with confidence score per line
False positive rate \x3C 5% of extractions wrong or misclassified \x3C 1% with manual review flagging
Confidence calibration Low (\x3C 0.6) items flagged for review All extractions pre-screened, high confidence only
Context preserved Every extraction has source line_number Source line + 2 context lines before and after
Deduplication Exact duplicates removed Near-duplicates (90%+) flagged as related
Error grouping Errors grouped by fingerprint Grouped by root cause type + severity

A good extraction result identifies the log format correctly, extracts every error with a confidence score, preserves source location, groups duplicate errors, and clearly flags low-confidence or ambiguous entries for manual review.

A good output has source location preserved, confidence score assigned, and grouped duplicate errors.

Good vs. Bad Examples

Scenario Bad Good
Format detection Assumes all logs are JSON, fails on plain text Auto-detects syslog/plain/json and applies appropriate parser
Low confidence match Marks as high confidence without flagging Reports "Extraction 23: confidence 0.45 — flagged for manual review"
Mixed format Stops at first parse error Continues with warnings: "12 lines parsed as syslog, 3 lines skipped (unrecognized format)"
Duplicate errors Reports each occurrence separately Groups 50 identical "Connection refused" errors with count=50 and first/last timestamps
Stack trace split Only extracts first line of 20-line trace Extracts entire trace as one record: type + message + all 20 frames
Truncation Silently stops at 10K lines Reports "Input truncated at line 50000 (limit) — processed 50K of 120K lines"
PII in error Extracts IP addresses without warning Flags: "IP address 1.2.3.4 detected in error message — redact unless --include-pii is set"
Unparseable line Drops line silently Reports "Line 847: unparseable — raw: {broken json..." in parsing_warnings
能力标签
requires-sensitive-credentials
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install extract-error-patterns
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /extract-error-patterns 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release
元数据
Slug extract-error-patterns
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Extract Error Patterns 是什么?

Use when (1) user pastes a log file (application logs, server logs, error traces) and asks to extract error patterns or stack traces. (2) user provides a col... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 36 次。

如何安装 Extract Error Patterns?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install extract-error-patterns」即可一键安装,无需额外配置。

Extract Error Patterns 是免费的吗?

是的,Extract Error Patterns 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Extract Error Patterns 支持哪些平台?

Extract Error Patterns 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Extract Error Patterns?

由 王继鹏(@wangjipeng977)开发并维护,当前版本 v1.0.0。

💬 留言讨论