← 返回 Skills 市场
wu-uk

fuzzy-match

作者 wu-uk · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ 安全检测通过
87
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install invoice-fraud-detection-fuzzy-match
功能描述
A toolkit for fuzzy string matching and data reconciliation. Useful for matching entity names (companies, people) across different datasets where spelling va...
使用说明 (SKILL.md)

Fuzzy Matching Guide

Overview

This skill provides methods to compare strings and find the best matches using Levenshtein distance and other similarity metrics. It is essential when joining datasets on string keys that are not identical.

Quick Start

from difflib import SequenceMatcher

def similarity(a, b):
    return SequenceMatcher(None, a, b).ratio()

print(similarity("Apple Inc.", "Apple Incorporated"))
# Output: 0.7...

Python Libraries

difflib (Standard Library)

The difflib module provides classes and functions for comparing sequences.

Basic Similarity

from difflib import SequenceMatcher

def get_similarity(str1, str2):
    """Returns a ratio between 0 and 1."""
    return SequenceMatcher(None, str1, str2).ratio()

# Example
s1 = "Acme Corp"
s2 = "Acme Corporation"
print(f"Similarity: {get_similarity(s1, s2)}")

Finding Best Match in a List

from difflib import get_close_matches

word = "appel"
possibilities = ["ape", "apple", "peach", "puppy"]
matches = get_close_matches(word, possibilities, n=1, cutoff=0.6)
print(matches)
# Output: ['apple']

rapidfuzz (Recommended for Performance)

If rapidfuzz is available (pip install rapidfuzz), it is much faster and offers more metrics.

from rapidfuzz import fuzz, process

# Simple Ratio
score = fuzz.ratio("this is a test", "this is a test!")
print(score)

# Partial Ratio (good for substrings)
score = fuzz.partial_ratio("this is a test", "this is a test!")
print(score)

# Extraction
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
best_match = process.extractOne("new york jets", choices)
print(best_match)
# Output: ('New York Jets', 100.0, 1)

Common Patterns

Normalization before Matching

Always normalize strings before comparing to improve accuracy.

import re

def normalize(text):
    # Convert to lowercase
    text = text.lower()
    # Remove special characters
    text = re.sub(r'[^\w\s]', '', text)
    # Normalize whitespace
    text = " ".join(text.split())
    # Common abbreviations
    text = text.replace("limited", "ltd").replace("corporation", "corp")
    return text

s1 = "Acme  Corporation, Inc."
s2 = "acme corp inc"
print(normalize(s1) == normalize(s2))

Entity Resolution

When matching a list of dirty names to a clean database:

clean_names = ["Google LLC", "Microsoft Corp", "Apple Inc"]
dirty_names = ["google", "Microsft", "Apple"]

results = {}
for dirty in dirty_names:
    # simple containment check first
    match = None
    for clean in clean_names:
        if dirty.lower() in clean.lower():
            match = clean
            break

    # fallback to fuzzy
    if not match:
        matches = get_close_matches(dirty, clean_names, n=1, cutoff=0.6)
        if matches:
            match = matches[0]

    results[dirty] = match
安全使用建议
This guide is internally consistent and low-risk: it only describes local string-matching techniques. Before using, confirm where the example code will run (your local machine or a hosted agent) and avoid feeding sensitive data into untrusted environments. If you want rapidfuzz performance, install it from the official PyPI package (pip install rapidfuzz) in a controlled environment.
功能分析
Type: OpenClaw Skill Name: invoice-fraud-detection-fuzzy-match Version: 0.1.0 The skill bundle provides standard Python code snippets and documentation for fuzzy string matching using the built-in 'difflib' library and the common 'rapidfuzz' library. There are no indicators of data exfiltration, malicious execution, or prompt injection; the content is entirely consistent with its stated purpose of entity resolution and string normalization.
能力评估
Purpose & Capability
The name and description (fuzzy string matching / data reconciliation) match the SKILL.md examples (difflib, normalization, rapidfuzz). There are no unrelated requirements (no cloud credentials, no unrelated binaries).
Instruction Scope
The instructions are limited to string-similarity algorithms, normalization, and example code. They do not direct the agent to read arbitrary files, environment variables, system paths, or send data to external endpoints.
Install Mechanism
There is no install spec (instruction-only). The doc mentions optionally installing rapidfuzz via pip, which is reasonable for a performance library and does not by itself create risk in the skill bundle.
Credentials
No environment variables, credentials, or config paths are requested. The examples run in-process and only require standard Python libraries or an optional third-party package (rapidfuzz).
Persistence & Privilege
always is false and the skill is user-invocable. The skill does not request persistent presence or modify other skills or system-wide configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install invoice-fraud-detection-fuzzy-match
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /invoice-fraud-detection-fuzzy-match 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Bulk publish from all-task-skills-dedup
元数据
Slug invoice-fraud-detection-fuzzy-match
版本 0.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

fuzzy-match 是什么?

A toolkit for fuzzy string matching and data reconciliation. Useful for matching entity names (companies, people) across different datasets where spelling va... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 87 次。

如何安装 fuzzy-match?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install invoice-fraud-detection-fuzzy-match」即可一键安装,无需额外配置。

fuzzy-match 是免费的吗?

是的,fuzzy-match 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

fuzzy-match 支持哪些平台?

fuzzy-match 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 fuzzy-match?

由 wu-uk(@wu-uk)开发并维护,当前版本 v0.1.0。

💬 留言讨论