← Back to Skills Marketplace
tobewin

Data Leak Detector

by ToBeWin · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
171
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install data-leak-detector
Description
数据泄露检测工具。Use when user wants to scan skills, files, or folders for potential data leaks, privacy risks, or suspicious behavior. Detects network calls, file a...
README (SKILL.md)

Data Leak Detector

Scan skills, files, and folders for potential data leaks and privacy risks.

Features

  • 🔍 Static Analysis: Scan SKILL.md for suspicious patterns
  • 🌐 Network Detection: Detect external API calls
  • 📁 File Access: Detect file read/write operations
  • 🔄 Process Detection: Detect subprocess spawning
  • 🔐 Env Access: Detect environment variable access
  • 📊 Risk Scoring: 0-100 risk score with recommendations

Risk Levels

Level Color Meaning
🟢 Low Green Safe, no concerns
🟡 Medium Yellow Review recommended
🔴 High Red Caution required

Detection Patterns

Network Risks

  • curl/wget calls
  • requests/httpx usage
  • External API endpoints
  • WebSocket connections

File Risks

  • File read/write operations
  • Directory traversal
  • Sensitive file access
  • Temporary file creation

Process Risks

  • subprocess calls
  • os.system usage
  • Shell command execution
  • Process spawning

Environment Risks

  • Environment variable access
  • Config file reading
  • Credential access

Trigger Conditions

  • "检查这个skill安全吗" / "Check if this skill is safe"
  • "扫描数据泄露" / "Scan for data leaks"
  • "这个skill有没有风险" / "Does this skill have risks"
  • "data-leak-detector"

Python Code

import os
import re
import json
from pathlib import Path

class DataLeakDetector:
    def __init__(self):
        self.patterns = {
            'network': {
                'high': [
                    r'curl\s+',
                    r'wget\s+',
                    r'requests\.(get|post|put|delete)',
                    r'http[s]?://',
                    r'urllib\.request',
                    r'httpx\.',
                    r'websocket',
                ],
                'medium': [
                    r'fetch\(',
                    r'axios\.',
                ]
            },
            'file_access': {
                'high': [
                    r'open\s*\(',
                    r'os\.remove',
                    r'os\.rmdir',
                    r'shutil\.rmtree',
                ],
                'medium': [
                    r'readFile',
                    r'writeFile',
                    r'os\.path\.exists',
                    r'glob\.',
                ]
            },
            'process': {
                'high': [
                    r'subprocess\.',
                    r'os\.system',
                    r'os\.popen',
                    r'exec\(',
                    r'eval\(',
                ],
                'medium': [
                    r'Popen',
                    r'call\(',
                ]
            },
            'env_access': {
                'high': [
                    r'os\.environ',
                    r'os\.getenv',
                    r'\$[A-Z_]+',
                ],
                'medium': [
                    r'config\[',
                    r'secrets\[',
                ]
            }
        }
    
    def scan_file(self, filepath):
        """Scan a single file for risks"""
        
        risks = []
        
        try:
            with open(filepath, 'r', encoding='utf-8') as f:
                content = f.read()
        except:
            return risks
        
        for category, levels in self.patterns.items():
            for level, patterns in levels.items():
                for pattern in patterns:
                    matches = re.finditer(pattern, content, re.IGNORECASE)
                    for match in matches:
                        line_num = content[:match.start()].count('\
') + 1
                        risks.append({
                            'category': category,
                            'level': level,
                            'pattern': pattern,
                            'line': line_num,
                            'match': match.group()[:50]
                        })
        
        return risks
    
    def scan_skill(self, skill_path):
        """Scan entire skill for risks"""
        
        skill_path = Path(skill_path)
        
        all_risks = []
        files_scanned = 0
        
        for ext in ['.md', '.py', '.js', '.ts']:
            for filepath in skill_path.rglob(f'*{ext}'):
                risks = self.scan_file(str(filepath))
                for risk in risks:
                    risk['file'] = str(filepath.relative_to(skill_path))
                all_risks.extend(risks)
                files_scanned += 1
        
        return all_risks, files_scanned
    
    def calculate_risk_score(self, risks):
        """Calculate overall risk score (0-100)"""
        
        if not risks:
            return 0
        
        score = 0
        for risk in risks:
            if risk['level'] == 'high':
                score += 20
            elif risk['level'] == 'medium':
                score += 10
        
        return min(score, 100)
    
    def generate_report(self, skill_path, risks, files_scanned):
        """Generate risk assessment report"""
        
        risk_score = self.calculate_risk_score(risks)
        
        if risk_score \x3C= 20:
            risk_level = "🟢 LOW"
            recommendation = "Safe to use"
        elif risk_score \x3C= 50:
            risk_level = "🟡 MEDIUM"
            recommendation = "Review before installing"
        else:
            risk_level = "🔴 HIGH"
            recommendation = "Caution required"
        
        # Group by category
        by_category = {}
        for risk in risks:
            cat = risk['category']
            if cat not in by_category:
                by_category[cat] = []
            by_category[cat].append(risk)
        
        report = []
        report.append(f"{'='*60}")
        report.append(f"DATA LEAK DETECTOR - SECURITY REPORT")
        report.append(f"{'='*60}")
        report.append(f"")
        report.append(f"Skill: {os.path.basename(skill_path)}")
        report.append(f"Files Scanned: {files_scanned}")
        report.append(f"Total Risks Found: {len(risks)}")
        report.append(f"")
        report.append(f"RISK SCORE: {risk_score}/100 ({risk_level})")
        report.append(f"RECOMMENDATION: {recommendation}")
        report.append(f"")
        
        # Category breakdown
        report.append(f"{'='*60}")
        report.append(f"RISK BREAKDOWN")
        report.append(f"{'='*60}")
        
        for category, category_risks in by_category.items():
            high = len([r for r in category_risks if r['level'] == 'high'])
            medium = len([r for r in category_risks if r['level'] == 'medium'])
            report.append(f"")
            report.append(f"{category.upper()}:")
            report.append(f"  High: {high} | Medium: {medium}")
            
            for risk in category_risks[:3]:  # Show top 3
                report.append(f"  - [{risk['level'].upper()}] {risk['match']} (line {risk['line']})")
        
        # Recommendations
        report.append(f"")
        report.append(f"{'='*60}")
        report.append(f"RECOMMENDATIONS")
        report.append(f"{'='*60}")
        
        if 'network' in by_category:
            report.append(f"- Review network calls: verify destinations")
        if 'file_access' in by_category:
            report.append(f"- Review file access: check for sensitive files")
        if 'process' in by_category:
            report.append(f"- Review subprocess calls: verify commands")
        if 'env_access' in by_category:
            report.append(f"- Review env access: check for credential access")
        
        return '\
'.join(report)

# Example usage
detector = DataLeakDetector()

# Scan skill
risks, files_scanned = detector.scan_skill('/path/to/skill')
report = detector.generate_report('/path/to/skill', risks, files_scanned)
print(report)

Usage Examples

User: "检查这个skill安全吗"
Agent: Scan SKILL.md and generate risk report

User: "扫描我的skills有没有数据泄露"
Agent: Scan all installed skills

User: "这个skill有没有网络访问"
Agent: Focus on network risks

Notes

  • Static analysis only (no runtime monitoring)
  • Fast scanning (seconds)
  • No external API calls
  • Cross-platform compatible
Usage Guidance
This skill appears to be what it says: a static scanner that reads files under whatever path you give it and reports pattern matches. Before using it, review the SKILL.md (which contains the scanner code), and run scans only on directories you permit (don't point it at your full home or production directories unless you're comfortable). Because it's instruction-only, nothing is installed automatically, but if you run the provided Python code it will read file contents — run it in a sandbox or on copies of sensitive files if you want to avoid accidental disclosure. Note the markdown mentions installing 'watchdog' (filesystem watcher) though the displayed code doesn't use it; that mismatch looks like benign incompleteness but you may want to confirm which dependencies the implementer expects.
Capability Analysis
Type: OpenClaw Skill Name: data-leak-detector Version: 1.0.0 The 'data-leak-detector' skill is a static analysis tool designed to scan other files for security risks using regex patterns. The Python code in SKILL.md performs read-only operations to identify potentially dangerous functions (e.g., network calls, file deletions, or environment variable access) and generates a risk report. There is no evidence of data exfiltration, malicious execution, or harmful instructions.
Capability Assessment
Purpose & Capability
Name/description (data-leak detection) align with the declared requirement of python3 and the included scanning logic. The metadata lists a pip dependency (watchdog) which is plausible for filesystem monitoring, though the visible code snippet does not reference it (likely benign/incomplete documentation).
Instruction Scope
SKILL.md contains concrete Python code that statically scans files and skill directories for patterns (network calls, file access, subprocess usage, env access). The code reads files under the target path you ask it to scan — this is expected for this purpose. It does not instruct the agent to exfiltrate data or access unrelated system secrets; however, because it reads file contents, you should only ask it to scan paths you permit.
Install Mechanism
No install spec is provided (instruction-only), which is low risk. The markdown mentions 'pip install watchdog' as a dependency; that is a benign, standard package, and there are no downloads from arbitrary URLs or archive extraction steps.
Credentials
The skill declares no required environment variables or credentials. The detector looks for environment-access patterns when scanning target files, but it does not itself require or request your environment secrets.
Persistence & Privilege
always:false and no install scripts or config paths are present. The skill does not request persistent/system-wide privileges or modify other skills.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-leak-detector
  3. After installation, invoke the skill by name or use /data-leak-detector
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
数据泄露检测工具:静态分析,检测网络/文件/进程/环境变量风险,生成安全报告
Metadata
Slug data-leak-detector
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Data Leak Detector?

数据泄露检测工具。Use when user wants to scan skills, files, or folders for potential data leaks, privacy risks, or suspicious behavior. Detects network calls, file a... It is an AI Agent Skill for Claude Code / OpenClaw, with 171 downloads so far.

How do I install Data Leak Detector?

Run "/install data-leak-detector" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Leak Detector free?

Yes, Data Leak Detector is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Leak Detector support?

Data Leak Detector is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Leak Detector?

It is built and maintained by ToBeWin (@tobewin); the current version is v1.0.0.

💬 Comments