Dedupe
/install dedupe
Dedupe — Data Deduplication Reference
Quick-reference skill for deduplication strategies, algorithms, and data quality patterns.
When to Use
- Removing duplicate rows from datasets or databases
- Deduplicating files in storage systems
- Implementing fuzzy matching for near-duplicate detection
- Choosing between exact and probabilistic dedup methods
- Building ETL pipelines with deduplication stages
Commands
intro
scripts/script.sh intro
Overview of deduplication — types, strategies, and tradeoffs.
exact
scripts/script.sh exact
Exact deduplication — hash-based, key-based, and sorting approaches.
fuzzy
scripts/script.sh fuzzy
Fuzzy deduplication — similarity measures, blocking, and record linkage.
files
scripts/script.sh files
File-level deduplication — fdupes, jdupes, rdfind, and storage dedup.
algorithms
scripts/script.sh algorithms
Dedup algorithms — bloom filters, HyperLogLog, MinHash, SimHash.
sql
scripts/script.sh sql
SQL deduplication patterns — ROW_NUMBER, DISTINCT, GROUP BY strategies.
cli
scripts/script.sh cli
Command-line dedup tools — sort, uniq, awk, and stream processing.
checklist
scripts/script.sh checklist
Deduplication quality checklist and validation steps.
help
scripts/script.sh help
version
scripts/script.sh version
Configuration
| Variable | Description |
|---|---|
DEDUPE_DIR |
Data directory (default: ~/.dedupe/) |
Powered by BytesAgain | bytesagain.com | [email protected]
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install dedupe - 安装完成后,直接呼叫该 Skill 的名称或使用
/dedupe触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Dedupe 是什么?
Deduplication reference — exact matching, fuzzy matching, hash-based dedup, bloom filters, and data quality. Use when removing duplicate records, files, or d... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 178 次。
如何安装 Dedupe?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install dedupe」即可一键安装,无需额外配置。
Dedupe 是免费的吗?
是的,Dedupe 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Dedupe 支持哪些平台?
Dedupe 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Dedupe?
由 bytesagain4(@xueyetianya)开发并维护,当前版本 v1.0.0。