← 返回 Skills 市场
michael-laffin

File Deduplicator

作者 Michael-laffin · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
2776
总下载
7
收藏
13
当前安装
1
版本数
在 OpenClaw 中安装
/install file-deduplicator
功能描述
Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.
使用说明 (SKILL.md)

File-Deduplicator - Find and Remove Duplicates

Vernox Utility Skill - Clean up your digital hoard.

Overview

File-Deduplicator is an intelligent file duplicate finder and remover. Uses content hashing to identify identical files across directories, then provides options to remove duplicates safely.

Features

✅ Duplicate Detection

  • Content-based hashing (MD5) for fast comparison
  • Size-based detection (exact match, near match)
  • Name-based detection (similar filenames)
  • Directory scanning (recursive)
  • Exclude patterns (.git, node_modules, etc.)

✅ Removal Options

  • Auto-delete duplicates (keep newest/oldest)
  • Interactive review before deletion
  • Move to archive instead of delete
  • Preserve permissions and metadata
  • Dry-run mode (preview changes)

✅ Analysis Tools

  • Duplicate count summary
  • Space savings estimation
  • Largest duplicate files
  • Most common duplicate patterns
  • Detailed report generation

✅ Safety Features

  • Confirmation prompts before deletion
  • Backup to archive folder
  • Size threshold (don't remove huge files by mistake)
  • Whitelist important directories
  • Undo functionality (log for recovery)

Installation

clawhub install file-deduplicator

Quick Start

Find Duplicates in Directory

const result = await findDuplicates({
  directories: ['./documents', './downloads', './projects'],
  options: {
    method: 'content',  // content-based comparison
    includeSubdirs: true
  }
});

console.log(`Found ${result.duplicateCount} duplicate groups`);
console.log(`Potential space savings: ${result.spaceSaved}`);

Remove Duplicates Automatically

const result = await removeDuplicates({
  directories: ['./documents', './downloads'],
  options: {
    method: 'content',
    keep: 'newest',  // keep newest, delete oldest
    action: 'delete',  // or 'move' to archive
    autoConfirm: false  // show confirmation for each
  }
});

console.log(`Removed ${result.filesRemoved} duplicates`);
console.log(`Space saved: ${result.spaceSaved}`);

Dry-Run Preview

const result = await removeDuplicates({
  directories: ['./documents', './downloads'],
  options: {
    method: 'content',
    keep: 'newest',
    action: 'delete',
    dryRun: true  // Preview without actual deletion
  }
});

console.log('Would remove:');
result.duplicates.forEach((dup, i) => {
  console.log(`${i+1}. ${dup.file}`);
});

Tool Functions

findDuplicates

Find duplicate files across directories.

Parameters:

  • directories (array|string, required): Directory paths to scan
  • options (object, optional):
    • method (string): 'content' | 'size' | 'name' - comparison method
    • includeSubdirs (boolean): Scan recursively (default: true)
    • minSize (number): Minimum size in bytes (default: 0)
    • maxSize (number): Maximum size in bytes (default: 0)
    • excludePatterns (array): Glob patterns to exclude (default: ['.git', 'node_modules'])
    • whitelist (array): Directories to never scan (default: [])

Returns:

  • duplicates (array): Array of duplicate groups
    • duplicateCount (number): Number of duplicate groups found
    • totalFiles (number): Total files scanned
    • scanDuration (number): Time taken to scan (ms)
    • spaceWasted (number): Total bytes wasted by duplicates
    • spaceSaved (number): Potential savings if duplicates removed

removeDuplicates

Remove duplicate files based on findings.

Parameters:

  • directories (array|string, required): Same as findDuplicates
  • options (object, optional):
    • keep (string): 'newest' | 'oldest' | 'smallest' | 'largest' - which to keep
    • action (string): 'delete' | 'move' | 'archive'
    • archivePath (string): Where to move files when action='move'
    • dryRun (boolean): Preview without actual action
    • autoConfirm (boolean): Auto-confirm deletions
    • sizeThreshold (number): Don't remove files larger than this

Returns:

  • filesRemoved (number): Number of files removed/moved
  • spaceSaved (number): Bytes saved
  • groupsProcessed (number): Number of duplicate groups handled
  • logPath (string): Path to action log
  • errors (array): Any errors encountered

analyzeDirectory

Analyze a single directory for duplicates.

Parameters:

  • directory (string, required): Path to directory
  • options (object, optional): Same as findDuplicates options

Returns:

  • fileCount (number): Total files in directory
  • totalSize (number): Total bytes in directory
  • duplicateSize (number): Bytes in duplicate files
  • duplicateRatio (number): Percentage of files that are duplicates

Use Cases

Digital Hoarder Cleanup

  • Find duplicate photos/videos
  • Identify wasted storage space
  • Remove old duplicates, keep newest
  • Clean up download folders

Document Management

  • Find duplicate PDFs, docs, reports
  • Keep latest version, archive old versions
  • Prevent version confusion
  • Reduce backup bloat

Project Cleanup

  • Find duplicate source files
  • Remove duplicate build artifacts
  • Clean up node_modules duplicates
  • Save storage on SSD/HDD

Backup Optimization

  • Find duplicate backup files
  • Remove redundant backups
  • Identify what's actually duplicated
  • Save space on backup drives

Configuration

Edit config.json:

{
  "detection": {
    "defaultMethod": "content",
    "sizeTolerancePercent": 0,  // exact match only
    "nameSimilarity": 0.7,  // 0-1, lower = more similar
    "includeSubdirs": true
  },
  "removal": {
    "defaultAction": "delete",
    "defaultKeep": "newest",
    "archivePath": "./archive",
    "sizeThreshold": 10485760,  // 10MB threshold
    "autoConfirm": false,
    "dryRunDefault": false
  },
  "exclude": {
    "patterns": [".git", "node_modules", ".vscode", ".idea"],
    "whitelist": ["important", "work", "projects"]
  }
}

Methods

Content-Based (Recommended)

  • Fast MD5 hashing
  • Detects exact duplicates regardless of filename
  • Works across renamed files
  • Perfect for documents, code, archives

Size-Based

  • Compares file sizes
  • Faster than content hashing
  • Good for media files where content hashing is slow
  • Finds near-duplicates (similar but not exact)

Name-Based

  • Compares filenames
  • Detects similar named files
  • Good for finding version duplicates (file_v1, file_v2)

Examples

Find Duplicates in Documents

const result = await findDuplicates({
  directories: '~/Documents',
  options: {
    method: 'content',
    includeSubdirs: true
  }
});

console.log(`Found ${result.duplicateCount} duplicate sets`);
result.duplicates.slice(0, 5).forEach((set, i) => {
  console.log(`Set ${i+1}: ${set.files.length} files`);
  console.log(`  Total size: ${set.totalSize} bytes`);
});

Remove Duplicates, Keep Newest

const result = await removeDuplicates({
  directories: '~/Documents',
  options: {
    keep: 'newest',
    action: 'delete'
  }
});

console.log(`Removed ${result.filesRemoved} files`);
console.log(`Saved ${result.spaceSaved} bytes`);

Move to Archive Instead of Delete

const result = await removeDuplicates({
  directories: '~/Downloads',
  options: {
    keep: 'newest',
    action: 'move',
    archivePath: '~/Documents/Archive'
  }
});

console.log(`Archived ${result.filesRemoved} files`);
console.log(`Safe in: ~/Documents/Archive`);

Dry-Run Preview Changes

const result = await removeDuplicates({
  directories: '~/Documents',
  options: {
    dryRun: true  // Just show what would happen
  }
});

console.log('=== Dry Run Preview ===');
result.duplicates.forEach((set, i) => {
  console.log(`Would delete: ${set.toDelete.join(', ')}`);
});

Performance

Scanning Speed

  • Small directories (\x3C1000 files): \x3C1s
  • Medium directories (1000-10000 files): 1-5s
  • Large directories (10000+ files): 5-20s

Detection Accuracy

  • Content-based: 100% (exact duplicates)
  • Size-based: Fast but may miss renamed files
  • Name-based: Detects naming patterns only

Memory Usage

  • Hash cache: ~1MB per 100,000 files
  • Batch processing: Processes 1000 files at a time
  • Peak memory: ~200MB for 1M files

Safety Features

Size Thresholding

Won't remove files larger than configurable threshold (default: 10MB). Prevents accidental deletion of important large files.

Archive Mode

Move files to archive directory instead of deleting. No data loss, full recoverability.

Action Logging

All deletions/moves are logged to file for recovery and audit.

Undo Functionality

Log file can be used to restore accidentally deleted files (limited undo window).

Error Handling

Permission Errors

  • Clear error message
  • Suggest running with sudo
  • Skip files that can't be accessed

File Lock Errors

  • Detect locked files
  • Skip and report
  • Suggest closing applications using files

Space Errors

  • Check available disk space before deletion
  • Warn if space is critically low
  • Prevent disk-full scenarios

Troubleshooting

Not Finding Expected Duplicates

  • Check detection method (content vs size vs name)
  • Verify exclude patterns aren't too broad
  • Check if files are in whitelisted directories
  • Try with includeSubdirs: false

Deletion Not Working

  • Check write permissions on directories
  • Verify action isn't 'delete' with autoConfirm: true
  • Check size threshold isn't blocking all deletions
  • Check file locks (is another program using files?)

Slow Scanning

  • Reduce includeSubdirs scope
  • Use size-based detection (faster)
  • Exclude large directories (node_modules, .git)
  • Process directories individually instead of batch

Tips

Best Results

  • Use content-based detection for documents (100% accurate)
  • Run dry-run first to preview changes
  • Archive instead of delete for important files
  • Check logs if anything unexpected deleted

Performance Optimization

  • Process frequently used directories first
  • Use size threshold to skip large media files
  • Exclude hidden directories from scan
  • Process directories in parallel when possible

Space Management

  • Regular duplicate cleanup prevents storage bloat
  • Delete temp directories regularly
  • Clear download folders of installers
  • Empty trash before large scans

Roadmap

  • Duplicate detection by image similarity
  • Near-duplicate detection (similar but not exact)
  • Duplicate detection across network drives
  • Cloud storage integration (S3, Google Drive)
  • Automatic scheduling of scans
  • Heuristic duplicate detection (ML-based)
  • Recover deleted files from backup
  • Duplicate detection by file content similarity (not just hash)

License

MIT


Find duplicates. Save space. Keep your system clean. 🔮

安全使用建议
Key things to consider before installing/running: - Metadata mismatch: the registry says 'instruction-only' but this package includes executable source files (index.js, test.js). Ask the author why code is included and request a published homepage or repo for verification. - Code quality and bugs: index.js contains several coding problems that make its behavior unpredictable and potentially dangerous: - computeHash returns a Promise but in the scanning code the hash is used synchronously (no await), so hash-based grouping may not work as intended. - Several places treat file path strings as file-stat objects (e.g., accessing .size or .mtime on strings). That will throw or produce incorrect groupings. - The logic that groups by size/name appears to rely on properties that do not exist on the stored values, causing incorrect duplicate detection or skipped files. - The code does not consistently enforce configured max file size before hashing; hashing very large files could occur despite the config. - Deletion risk: SKILL.md promises safety features (confirmation prompts, archive, undo log). Because the implementation is sloppy and untested in places, those safeguards may not actually work or may be bypassed. Treat any non-trivial deletion operation as potentially destructive until you audit the code and run it in dry-run mode. - Test in an isolated environment: run the skill only on a non-production/test directory first (use dryRun:true and a small, controlled dataset). Do not enable any auto-confirm or delete action until you verify the code paths that perform deletions. Prefer running inside a VM or container and keep backups. - Code review suggestions: before trusting this skill, inspect the remainder of index.js (the provided snippet is truncated) and confirm: - Exports actually expose async functions matching the SKILL.md examples (await findDuplicates etc.). - Deletion/move code checks autoConfirm and dryRun properly, honors sizeThreshold and whitelist, and logs actions to a safe location. - All file operations handle errors safely and avoid races that could delete wrong files. - Provenance: there's no homepage or repository and the owner/slug give little context. Consider contacting the publisher for source repository and clarifications, or prefer a deduplication tool from a known source. If you want, I can: (a) list the exact lines in index.js/test.js that are problematic, (b) suggest concrete code fixes to make the tool safer (proper async/await, robust stat handling, stronger whitelisting), or (c) produce a safe audit checklist to run before invoking removal operations.
功能分析
Type: OpenClaw Skill Name: file-deduplicator Version: 1.0.0 The OpenClaw AgentSkills skill bundle 'file-deduplicator' is classified as benign. The code in `index.js` uses standard Node.js file system operations (`fs.readFileSync`, `fs.unlinkSync`, `fs.renameSync`, etc.) which are necessary for its stated purpose of finding and removing duplicate files. The skill includes several safety features such as a `dryRun` mode, a configurable `sizeThreshold` for deletions, an `archivePath` option to move files instead of permanently deleting them, and logging of all actions. Neither `SKILL.md` nor `README.md` contain any prompt injection attempts or instructions that would lead an AI agent to perform unauthorized actions or deviate from the skill's intended functionality. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, or obfuscation.
能力评估
Purpose & Capability
The declared purpose (file deduplication and removal) matches the included code and README: the package contains functions to scan, hash, and remove files. However the registry metadata states 'No install spec — instruction-only' while actual source files (index.js, test.js, config.json) are packaged — this mismatch is unexpected and worth questioning the maintainer. The code's header includes an odd phrase ('Autonomous Revenue Agent') which doesn't align with the stated utility purpose (likely benign but surprising).
Instruction Scope
SKILL.md shows the agent will scan arbitrary directories and can delete or move files. That is expected for this tool, but the actual index.js implementation contains multiple coding errors (see user guidance) that show mismatches between the documented safety features (confirmation prompts, size thresholding, archive, undo log) and what's implemented. Because the skill is permitted to delete/move user files, any bugs in the implementation increase the risk of accidental data loss. The SKILL.md examples rely on awaiting async functions (await findDuplicates), but the code handling of async operations appears inconsistent — this increases chance of unexpected behavior at runtime.
Install Mechanism
No install spec or external downloads are used; the skill is distributed as source files. This is lower risk than remote installers, but it means the shipped code will run on the user's system and should be reviewed.
Credentials
The skill does not request any environment variables, credentials, or config paths. There are no explicit requests to access unrelated services or secrets.
Persistence & Privilege
The skill does not request always:true and does not modify other skills' configurations in the provided files. It has normal, limited presence (just code executed when invoked).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install file-deduplicator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /file-deduplicator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: Find and remove duplicate files intelligently to save storage space
元数据
Slug file-deduplicator
版本 1.0.0
许可证
累计安装 13
当前安装数 13
历史版本数 1
常见问题

File Deduplicator 是什么?

Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 2776 次。

如何安装 File Deduplicator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install file-deduplicator」即可一键安装,无需额外配置。

File Deduplicator 是免费的吗?

是的,File Deduplicator 完全免费(开源免费),可自由下载、安装和使用。

File Deduplicator 支持哪些平台?

File Deduplicator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 File Deduplicator?

由 Michael-laffin(@michael-laffin)开发并维护,当前版本 v1.0.0。

💬 留言讨论