/install invoice-fraud-detection-fuzzy-match
Fuzzy Matching Guide
Overview
This skill provides methods to compare strings and find the best matches using Levenshtein distance and other similarity metrics. It is essential when joining datasets on string keys that are not identical.
Quick Start
from difflib import SequenceMatcher
def similarity(a, b):
return SequenceMatcher(None, a, b).ratio()
print(similarity("Apple Inc.", "Apple Incorporated"))
# Output: 0.7...
Python Libraries
difflib (Standard Library)
The difflib module provides classes and functions for comparing sequences.
Basic Similarity
from difflib import SequenceMatcher
def get_similarity(str1, str2):
"""Returns a ratio between 0 and 1."""
return SequenceMatcher(None, str1, str2).ratio()
# Example
s1 = "Acme Corp"
s2 = "Acme Corporation"
print(f"Similarity: {get_similarity(s1, s2)}")
Finding Best Match in a List
from difflib import get_close_matches
word = "appel"
possibilities = ["ape", "apple", "peach", "puppy"]
matches = get_close_matches(word, possibilities, n=1, cutoff=0.6)
print(matches)
# Output: ['apple']
rapidfuzz (Recommended for Performance)
If rapidfuzz is available (pip install rapidfuzz), it is much faster and offers more metrics.
from rapidfuzz import fuzz, process
# Simple Ratio
score = fuzz.ratio("this is a test", "this is a test!")
print(score)
# Partial Ratio (good for substrings)
score = fuzz.partial_ratio("this is a test", "this is a test!")
print(score)
# Extraction
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
best_match = process.extractOne("new york jets", choices)
print(best_match)
# Output: ('New York Jets', 100.0, 1)
Common Patterns
Normalization before Matching
Always normalize strings before comparing to improve accuracy.
import re
def normalize(text):
# Convert to lowercase
text = text.lower()
# Remove special characters
text = re.sub(r'[^\w\s]', '', text)
# Normalize whitespace
text = " ".join(text.split())
# Common abbreviations
text = text.replace("limited", "ltd").replace("corporation", "corp")
return text
s1 = "Acme Corporation, Inc."
s2 = "acme corp inc"
print(normalize(s1) == normalize(s2))
Entity Resolution
When matching a list of dirty names to a clean database:
clean_names = ["Google LLC", "Microsoft Corp", "Apple Inc"]
dirty_names = ["google", "Microsft", "Apple"]
results = {}
for dirty in dirty_names:
# simple containment check first
match = None
for clean in clean_names:
if dirty.lower() in clean.lower():
match = clean
break
# fallback to fuzzy
if not match:
matches = get_close_matches(dirty, clean_names, n=1, cutoff=0.6)
if matches:
match = matches[0]
results[dirty] = match
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install invoice-fraud-detection-fuzzy-match - 安装完成后,直接呼叫该 Skill 的名称或使用
/invoice-fraud-detection-fuzzy-match触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
fuzzy-match 是什么?
A toolkit for fuzzy string matching and data reconciliation. Useful for matching entity names (companies, people) across different datasets where spelling va... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 87 次。
如何安装 fuzzy-match?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install invoice-fraud-detection-fuzzy-match」即可一键安装,无需额外配置。
fuzzy-match 是免费的吗?
是的,fuzzy-match 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
fuzzy-match 支持哪些平台?
fuzzy-match 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 fuzzy-match?
由 wu-uk(@wu-uk)开发并维护,当前版本 v0.1.0。