功能描述

Extract contact information from a website using a 3-tier approach. Direct HTML scraping, WHOIS lookup, then Hunter.io API domain search for verified busines...

使用说明 (SKILL.md)

Contact Enrichment Skill

Name: Contact Enrichment
Author: maverick-software

3-tier contact extraction. Runs cheapest method first, escalates only when needed.

Extraction Hierarchy

Tier 1: Direct HTML Scrape (free, always try first)
   → Check: homepage, /contact, /about, /about-us, /contact-us
   → Extract: all emails, all phone numbers, business name
   ↓ (if no email found)
Tier 2: WHOIS Lookup (free)
   → Extract: registrant org, registrant email
   ↓ (if still no email)
Tier 3: Hunter.io API (paid, 25 free/month)
   → Domain search → verified emails + first/last name + job title

Output Format

{
    "business_name": "Green Valley Landscaping",
    "emails": ["[email protected]", "[email protected]"],
    "primary_email": "[email protected]",
    "phones": ["(503) 555-0123"],
    "primary_phone": "(503) 555-0123",
    "owner_name": "John Smith",           # from Hunter.io or About page
    "whois_org": "Green Valley LLC",
    "whois_email": "[email protected]",
    "source": "scrape",                   # "scrape" | "whois" | "hunter"
    "confidence": "high"                  # "high" | "medium" | "low"
}

Tier 1: Direct HTML Scrape

import re, requests
from bs4 import BeautifulSoup

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}

EMAIL_REGEX = re.compile(
    r'\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
)

PHONE_REGEX = re.compile(
    r'(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
)

# Emails that are never real contacts
EMAIL_BLACKLIST = {
    "example.com", "sentry.io", "schema.org", "w3.org",
    "wix.com", "wordpress.com", "squarespace.com", "shopify.com",
    "google.com", "jquery.com", "bootstrap.com", "cloudflare.com"
}

def is_valid_email(email: str) -> bool:
    domain = email.split("@")[-1].lower()
    return domain not in EMAIL_BLACKLIST and len(email) \x3C 100

def scrape_page(url: str) -> tuple[str, str]:
    """Fetch a page. Returns (html, final_url)."""
    try:
        r = requests.get(url, headers=HEADERS, timeout=8, allow_redirects=True)
        if r.status_code \x3C 400:
            return r.text, r.url
    except Exception:
        pass
    return "", url

def extract_from_html(html: str) -> dict:
    soup = BeautifulSoup(html, "lxml")
    text = soup.get_text(separator=" ")
    
    # Extract emails
    emails = list({e for e in EMAIL_REGEX.findall(text) if is_valid_email(e)})
    
    # Extract phones
    phones_raw = PHONE_REGEX.findall(text)
    phones = list({p[0] + p[1] if isinstance(p, tuple) else p for p in phones_raw})
    # Clean up: remove very short matches
    phones = [p.strip() for p in phones if len(re.sub(r'\D', '', p)) >= 10]
    
    # Extract business name from title
    title = soup.find("title")
    business_name = ""
    if title:
        t = title.text.strip()
        # Take the first meaningful segment
        for sep in ["|", "–", "-", "·", "—", ":"]:
            if sep in t:
                t = t.split(sep)[0].strip()
                break
        business_name = t
    
    return {"emails": emails, "phones": phones, "business_name": business_name}

def scrape_contact_pages(base_url: str, existing_html: str = "") -> dict:
    """
    Scrape multiple pages for contact info.
    existing_html: pass HTML already fetched by website-auditor to avoid re-fetching.
    """
    from urllib.parse import urljoin
    
    all_emails = set()
    all_phones = set()
    business_name = ""
    
    # Try homepage first (using already-fetched HTML if available)
    if existing_html:
        data = extract_from_html(existing_html)
        all_emails.update(data["emails"])
        all_phones.update(data["phones"])
        business_name = data["business_name"]
    
    # Try contact/about pages
    contact_paths = ["/contact", "/about", "/contact-us", "/about-us",
                     "/get-in-touch", "/reach-us", "/find-us"]
    
    for path in contact_paths:
        if all_emails:
            break  # Found emails — no need to keep crawling
        url = urljoin(base_url, path)
        html, _ = scrape_page(url)
        if html:
            data = extract_from_html(html)
            all_emails.update(data["emails"])
            all_phones.update(data["phones"])
            if not business_name:
                business_name = data["business_name"]
    
    emails = list(all_emails)
    phones = list(all_phones)
    
    return {
        "emails": emails,
        "primary_email": emails[0] if emails else None,
        "phones": phones,
        "primary_phone": phones[0] if phones else None,
        "business_name": business_name,
        "source": "scrape",
        "confidence": "high" if emails else "low"
    }

Tier 2: WHOIS Lookup

import whois  # pip install python-whois

def whois_lookup(domain: str) -> dict:
    """
    Look up WHOIS registration data.
    Returns registrant org + email if publicly available.
    """
    try:
        w = whois.whois(domain)
        
        # Extract emails (some registrars redact these)
        emails = w.emails if isinstance(w.emails, list) else ([w.emails] if w.emails else [])
        emails = [e for e in emails if e and is_valid_email(e) and "whoisprotect" not in e.lower()
                  and "privacy" not in e.lower() and "proxy" not in e.lower()]
        
        org = w.get("org") or w.get("registrant_organization") or ""
        creation = w.get("creation_date")
        if isinstance(creation, list):
            creation = creation[0]
        
        return {
            "whois_org": org,
            "whois_email": emails[0] if emails else None,
            "whois_emails": emails,
            "domain_created": str(creation)[:10] if creation else None,
            "registrar": w.get("registrar")
        }
    except Exception as e:
        return {"whois_org": None, "whois_email": None, "whois_error": str(e)}

Tier 3: Hunter.io API

import requests, os

def hunter_domain_search(domain: str, limit: int = 5) -> dict:
    """
    Search Hunter.io for emails associated with a domain.
    Returns verified business emails + contact names.
    Requires HUNTER_API_KEY env var.
    Free: 25 searches/month | Starter: $34/mo for 500
    """
    api_key = os.environ.get("HUNTER_API_KEY")
    if not api_key:
        return {"hunter_emails": [], "hunter_source": "no_key"}
    
    params = {
        "domain": domain,
        "api_key": api_key,
        "limit": limit,
        "type": "personal"  # personal | generic
    }
    
    try:
        r = requests.get("https://api.hunter.io/v2/domain-search", params=params, timeout=10)
        data = r.json().get("data", {})
        
        contacts = []
        for email_obj in data.get("emails", []):
            contacts.append({
                "email": email_obj.get("value"),
                "first_name": email_obj.get("first_name"),
                "last_name": email_obj.get("last_name"),
                "position": email_obj.get("position"),
                "confidence": email_obj.get("confidence"),
                "linkedin": email_obj.get("linkedin")
            })
        
        primary = contacts[0] if contacts else {}
        
        return {
            "hunter_emails": [c["email"] for c in contacts if c["email"]],
            "primary_email": primary.get("email"),
            "owner_name": f"{primary.get('first_name', '')} {primary.get('last_name', '')}".strip(),
            "owner_title": primary.get("position"),
            "owner_linkedin": primary.get("linkedin"),
            "hunter_contacts": contacts,
            "organization": data.get("organization"),
            "source": "hunter",
            "confidence": "high" if contacts else "low"
        }
    except Exception as e:
        return {"hunter_emails": [], "hunter_error": str(e)}

def hunter_email_verify(email: str) -> dict:
    """Verify if an email is deliverable."""
    api_key = os.environ.get("HUNTER_API_KEY")
    if not api_key:
        return {"valid": None}
    params = {"email": email, "api_key": api_key}
    r = requests.get("https://api.hunter.io/v2/email-verifier", params=params)
    data = r.json().get("data", {})
    return {"valid": data.get("status") == "valid", "score": data.get("score")}

Full Enrichment Runner

import tldextract

def enrich_contact(url: str, existing_html: str = "") -> dict:
    """
    Run all 3 tiers. Returns the best contact data found.
    Pass existing_html from website-auditor to avoid re-fetching.
    """
    ext = tldextract.extract(url)
    domain = f"{ext.domain}.{ext.suffix}"
    
    result = {"url": url, "domain": domain}
    
    # Tier 1: Scrape
    scrape_data = scrape_contact_pages(url, existing_html)
    result.update(scrape_data)
    
    if not result.get("primary_email"):
        # Tier 2: WHOIS
        whois_data = whois_lookup(domain)
        result.update(whois_data)
        if whois_data.get("whois_email"):
            result["primary_email"] = whois_data["whois_email"]
            result["emails"] = [whois_data["whois_email"]] + result.get("emails", [])
            result["source"] = "whois"
            result["confidence"] = "medium"
    
    if not result.get("primary_email"):
        # Tier 3: Hunter.io
        hunter_data = hunter_domain_search(domain)
        result.update(hunter_data)
        if hunter_data.get("primary_email"):
            result["emails"] = hunter_data["hunter_emails"]
            result["source"] = "hunter"
            result["confidence"] = "high"
    
    return result

Privacy & Ethics Notes

Only scrape publicly visible contact info
Respect robots.txt for large-scale operations
Do not scrape pages behind login walls
WHOIS data is public by design — legal to use
Hunter.io only indexes publicly available emails
Always include opt-out mechanism in outreach
Comply with CAN-SPAM and GDPR when emailing

安全使用建议

This skill appears coherent with its description, but consider the following before installing or running it: - Privacy & legality: The skill collects email addresses and phone numbers (personal data). Make sure you have the right to harvest or store this data and that your use complies with laws and the target site's terms of service. - Network scope & SSRF risk: The skill will fetch arbitrary URLs. Avoid feeding it attacker-controlled or internal-only URLs (localhost, 169.254.x.x, private IP ranges) to prevent server-side request forgery or leaking internal resources. - Robots, throttling, and politeness: The instructions do not mention obeying robots.txt or rate limits. If you will use this at scale, add rate-limiting, retries/backoff, and respect site crawl policies. - Dependencies: The skill lists Python packages (requests, beautifulsoup4, lxml, python-whois). Ensure these are installed from trusted sources (PyPI) in an isolated environment to reduce supply-chain risk. - Hunter.io key: If you provide HUNTER_API_KEY, treat it as a secret and restrict its scope. The key is optional; the skill works with free tiers or without it if WHOIS/scraping succeed. If you need stricter controls, require an allowlist of domains, add internal-address blocking, and add explicit instructions to respect robots.txt and rate limits before running this skill.

功能分析

Type: OpenClaw Skill Name: contact-enrichment Version: 1.0.0 The OpenClaw skill 'contact-enrichment' is benign. It performs its stated purpose of extracting contact information from websites using web scraping, WHOIS lookups, and the Hunter.io API. The code uses standard libraries like `requests`, `beautifulsoup4`, `python-whois`, and `tldextract` for network communication and data parsing. API keys are securely retrieved from environment variables. There is no evidence of malicious intent, such as unauthorized data exfiltration, shell command injection, persistence mechanisms, or prompt injection attempts against the AI agent in the `SKILL.md` file. All operations are consistent with the described functionality.

能力评估

✓ Purpose & Capability

The name/description (contact enrichment via scraping, WHOIS, Hunter.io) matches the instructions and the declared Python packages (requests, beautifulsoup4, lxml, python-whois). No unrelated environment variables, binaries, or config paths are requested.

ℹ Instruction Scope

The SKILL.md stays within the stated purpose: it scrapes common contact pages, runs WHOIS lookups, and optionally calls Hunter.io. It does not instruct accessing unrelated system files or secrets. Note: it does not mention obeying robots.txt, rate-limiting, or domain allowlists, and it will fetch arbitrary URLs supplied to it — which raises operational/privacy considerations (see guidance).

✓ Install Mechanism

Instruction-only skill with no install spec and no archive downloads. Declared Python packages are reasonable for the described tasks. There is no high-risk download or executable install step.

✓ Credentials

No required environment variables or secrets are requested. Hunter.io API key is listed as optional (appropriate and proportional). The declared packages and optional API key are consistent with functionality.

✓ Persistence & Privilege

always is false and the skill does not request persistent or elevated platform privileges. It doesn't modify other skills or system-wide settings.

版本历史

v1.0.0

3-tier contact extraction: direct HTML scraping, WHOIS registrant lookup, Hunter.io API domain search. Returns emails, phones, owner name, and source confidence level for any business domain.

元数据

Slug contact-enrichment

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Contact Enrichment 是什么？

Extract contact information from a website using a 3-tier approach. Direct HTML scraping, WHOIS lookup, then Hunter.io API domain search for verified busines... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 331 次。

如何安装 Contact Enrichment？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install contact-enrichment」即可一键安装，无需额外配置。

Contact Enrichment 是免费的吗？

是的，Contact Enrichment 完全免费（开源免费），可自由下载、安装和使用。

Contact Enrichment 支持哪些平台？

Contact Enrichment 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Contact Enrichment？

由 maverick-software（@maverick-software）开发并维护，当前版本 v1.0.0。

Contact Enrichment