← Back to Skills Marketplace
xueyetianya

Dedupe

by bytesagain4 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
178
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install dedupe
Description
Deduplication reference — exact matching, fuzzy matching, hash-based dedup, bloom filters, and data quality. Use when removing duplicate records, files, or d...
README (SKILL.md)

Dedupe — Data Deduplication Reference

Quick-reference skill for deduplication strategies, algorithms, and data quality patterns.

When to Use

  • Removing duplicate rows from datasets or databases
  • Deduplicating files in storage systems
  • Implementing fuzzy matching for near-duplicate detection
  • Choosing between exact and probabilistic dedup methods
  • Building ETL pipelines with deduplication stages

Commands

intro

scripts/script.sh intro

Overview of deduplication — types, strategies, and tradeoffs.

exact

scripts/script.sh exact

Exact deduplication — hash-based, key-based, and sorting approaches.

fuzzy

scripts/script.sh fuzzy

Fuzzy deduplication — similarity measures, blocking, and record linkage.

files

scripts/script.sh files

File-level deduplication — fdupes, jdupes, rdfind, and storage dedup.

algorithms

scripts/script.sh algorithms

Dedup algorithms — bloom filters, HyperLogLog, MinHash, SimHash.

sql

scripts/script.sh sql

SQL deduplication patterns — ROW_NUMBER, DISTINCT, GROUP BY strategies.

cli

scripts/script.sh cli

Command-line dedup tools — sort, uniq, awk, and stream processing.

checklist

scripts/script.sh checklist

Deduplication quality checklist and validation steps.

help

scripts/script.sh help

version

scripts/script.sh version

Configuration

Variable Description
DEDUPE_DIR Data directory (default: ~/.dedupe/)

Powered by BytesAgain | bytesagain.com | [email protected]

Usage Guidance
This skill is a local documentation/reference tool implemented as a shell script and appears coherent with its description. Before installing, you can: (1) quickly inspect the full scripts/script.sh file to confirm it only prints documentation and does not execute network commands or remove files; (2) ensure you are comfortable allowing the agent to run the included script when you invoke the skill. If you see commands that read arbitrary paths, call curl/wget, or run delete operations, treat those as a potential risk and ask for clarification.
Capability Analysis
Type: OpenClaw Skill Name: dedupe Version: 1.0.0 The 'dedupe' skill is a purely educational reference tool for data deduplication strategies. The primary script, `scripts/script.sh`, contains only static text output (via heredocs) providing information on algorithms, SQL patterns, and CLI tools; it performs no actual data manipulation or network activity. No indicators of malicious intent, data exfiltration, or prompt injection were found.
Capability Assessment
Purpose & Capability
Name/description match the provided artifacts: a reference skill for deduplication. It does not ask for unrelated credentials, binaries, or system access.
Instruction Scope
SKILL.md instructs running the included scripts/script.sh commands to show reference content (intro, exact, fuzzy, files, etc.). The instructions do not ask the agent to read unrelated files, contact external endpoints, or exfiltrate data. The only optional configuration variable is DEDUPE_DIR (a local data dir), which is reasonable for a local reference tool.
Install Mechanism
No install spec — instruction-only plus an included script. Nothing is downloaded or written to disk at install time beyond the skill files themselves.
Credentials
The skill declares no required environment variables or credentials. The SKILL.md mentions an optional DEDUPE_DIR, which is proportional to the purpose.
Persistence & Privilege
Skill does not request always:true and is user-invocable only. It does not modify other skills or system-wide agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install dedupe
  3. After installation, invoke the skill by name or use /dedupe
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
publish v1.0.0
Metadata
Slug dedupe
Version 1.0.0
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is Dedupe?

Deduplication reference — exact matching, fuzzy matching, hash-based dedup, bloom filters, and data quality. Use when removing duplicate records, files, or d... It is an AI Agent Skill for Claude Code / OpenClaw, with 178 downloads so far.

How do I install Dedupe?

Run "/install dedupe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Dedupe free?

Yes, Dedupe is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Dedupe support?

Dedupe is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Dedupe?

It is built and maintained by bytesagain4 (@xueyetianya); the current version is v1.0.0.

💬 Comments