← 返回 Skills 市场
Contact Enrichment
作者
maverick-software
· GitHub ↗
· v1.0.0
331
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install contact-enrichment
功能描述
Extract contact information from a website using a 3-tier approach. Direct HTML scraping, WHOIS lookup, then Hunter.io API domain search for verified busines...
使用说明 (SKILL.md)
Contact Enrichment Skill
3-tier contact extraction. Runs cheapest method first, escalates only when needed.
Extraction Hierarchy
Tier 1: Direct HTML Scrape (free, always try first)
→ Check: homepage, /contact, /about, /about-us, /contact-us
→ Extract: all emails, all phone numbers, business name
↓ (if no email found)
Tier 2: WHOIS Lookup (free)
→ Extract: registrant org, registrant email
↓ (if still no email)
Tier 3: Hunter.io API (paid, 25 free/month)
→ Domain search → verified emails + first/last name + job title
Output Format
{
"business_name": "Green Valley Landscaping",
"emails": ["[email protected]", "[email protected]"],
"primary_email": "[email protected]",
"phones": ["(503) 555-0123"],
"primary_phone": "(503) 555-0123",
"owner_name": "John Smith", # from Hunter.io or About page
"whois_org": "Green Valley LLC",
"whois_email": "[email protected]",
"source": "scrape", # "scrape" | "whois" | "hunter"
"confidence": "high" # "high" | "medium" | "low"
}
Tier 1: Direct HTML Scrape
import re, requests
from bs4 import BeautifulSoup
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
EMAIL_REGEX = re.compile(
r'\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
)
PHONE_REGEX = re.compile(
r'(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
)
# Emails that are never real contacts
EMAIL_BLACKLIST = {
"example.com", "sentry.io", "schema.org", "w3.org",
"wix.com", "wordpress.com", "squarespace.com", "shopify.com",
"google.com", "jquery.com", "bootstrap.com", "cloudflare.com"
}
def is_valid_email(email: str) -> bool:
domain = email.split("@")[-1].lower()
return domain not in EMAIL_BLACKLIST and len(email) \x3C 100
def scrape_page(url: str) -> tuple[str, str]:
"""Fetch a page. Returns (html, final_url)."""
try:
r = requests.get(url, headers=HEADERS, timeout=8, allow_redirects=True)
if r.status_code \x3C 400:
return r.text, r.url
except Exception:
pass
return "", url
def extract_from_html(html: str) -> dict:
soup = BeautifulSoup(html, "lxml")
text = soup.get_text(separator=" ")
# Extract emails
emails = list({e for e in EMAIL_REGEX.findall(text) if is_valid_email(e)})
# Extract phones
phones_raw = PHONE_REGEX.findall(text)
phones = list({p[0] + p[1] if isinstance(p, tuple) else p for p in phones_raw})
# Clean up: remove very short matches
phones = [p.strip() for p in phones if len(re.sub(r'\D', '', p)) >= 10]
# Extract business name from title
title = soup.find("title")
business_name = ""
if title:
t = title.text.strip()
# Take the first meaningful segment
for sep in ["|", "–", "-", "·", "—", ":"]:
if sep in t:
t = t.split(sep)[0].strip()
break
business_name = t
return {"emails": emails, "phones": phones, "business_name": business_name}
def scrape_contact_pages(base_url: str, existing_html: str = "") -> dict:
"""
Scrape multiple pages for contact info.
existing_html: pass HTML already fetched by website-auditor to avoid re-fetching.
"""
from urllib.parse import urljoin
all_emails = set()
all_phones = set()
business_name = ""
# Try homepage first (using already-fetched HTML if available)
if existing_html:
data = extract_from_html(existing_html)
all_emails.update(data["emails"])
all_phones.update(data["phones"])
business_name = data["business_name"]
# Try contact/about pages
contact_paths = ["/contact", "/about", "/contact-us", "/about-us",
"/get-in-touch", "/reach-us", "/find-us"]
for path in contact_paths:
if all_emails:
break # Found emails — no need to keep crawling
url = urljoin(base_url, path)
html, _ = scrape_page(url)
if html:
data = extract_from_html(html)
all_emails.update(data["emails"])
all_phones.update(data["phones"])
if not business_name:
business_name = data["business_name"]
emails = list(all_emails)
phones = list(all_phones)
return {
"emails": emails,
"primary_email": emails[0] if emails else None,
"phones": phones,
"primary_phone": phones[0] if phones else None,
"business_name": business_name,
"source": "scrape",
"confidence": "high" if emails else "low"
}
Tier 2: WHOIS Lookup
import whois # pip install python-whois
def whois_lookup(domain: str) -> dict:
"""
Look up WHOIS registration data.
Returns registrant org + email if publicly available.
"""
try:
w = whois.whois(domain)
# Extract emails (some registrars redact these)
emails = w.emails if isinstance(w.emails, list) else ([w.emails] if w.emails else [])
emails = [e for e in emails if e and is_valid_email(e) and "whoisprotect" not in e.lower()
and "privacy" not in e.lower() and "proxy" not in e.lower()]
org = w.get("org") or w.get("registrant_organization") or ""
creation = w.get("creation_date")
if isinstance(creation, list):
creation = creation[0]
return {
"whois_org": org,
"whois_email": emails[0] if emails else None,
"whois_emails": emails,
"domain_created": str(creation)[:10] if creation else None,
"registrar": w.get("registrar")
}
except Exception as e:
return {"whois_org": None, "whois_email": None, "whois_error": str(e)}
Tier 3: Hunter.io API
import requests, os
def hunter_domain_search(domain: str, limit: int = 5) -> dict:
"""
Search Hunter.io for emails associated with a domain.
Returns verified business emails + contact names.
Requires HUNTER_API_KEY env var.
Free: 25 searches/month | Starter: $34/mo for 500
"""
api_key = os.environ.get("HUNTER_API_KEY")
if not api_key:
return {"hunter_emails": [], "hunter_source": "no_key"}
params = {
"domain": domain,
"api_key": api_key,
"limit": limit,
"type": "personal" # personal | generic
}
try:
r = requests.get("https://api.hunter.io/v2/domain-search", params=params, timeout=10)
data = r.json().get("data", {})
contacts = []
for email_obj in data.get("emails", []):
contacts.append({
"email": email_obj.get("value"),
"first_name": email_obj.get("first_name"),
"last_name": email_obj.get("last_name"),
"position": email_obj.get("position"),
"confidence": email_obj.get("confidence"),
"linkedin": email_obj.get("linkedin")
})
primary = contacts[0] if contacts else {}
return {
"hunter_emails": [c["email"] for c in contacts if c["email"]],
"primary_email": primary.get("email"),
"owner_name": f"{primary.get('first_name', '')} {primary.get('last_name', '')}".strip(),
"owner_title": primary.get("position"),
"owner_linkedin": primary.get("linkedin"),
"hunter_contacts": contacts,
"organization": data.get("organization"),
"source": "hunter",
"confidence": "high" if contacts else "low"
}
except Exception as e:
return {"hunter_emails": [], "hunter_error": str(e)}
def hunter_email_verify(email: str) -> dict:
"""Verify if an email is deliverable."""
api_key = os.environ.get("HUNTER_API_KEY")
if not api_key:
return {"valid": None}
params = {"email": email, "api_key": api_key}
r = requests.get("https://api.hunter.io/v2/email-verifier", params=params)
data = r.json().get("data", {})
return {"valid": data.get("status") == "valid", "score": data.get("score")}
Full Enrichment Runner
import tldextract
def enrich_contact(url: str, existing_html: str = "") -> dict:
"""
Run all 3 tiers. Returns the best contact data found.
Pass existing_html from website-auditor to avoid re-fetching.
"""
ext = tldextract.extract(url)
domain = f"{ext.domain}.{ext.suffix}"
result = {"url": url, "domain": domain}
# Tier 1: Scrape
scrape_data = scrape_contact_pages(url, existing_html)
result.update(scrape_data)
if not result.get("primary_email"):
# Tier 2: WHOIS
whois_data = whois_lookup(domain)
result.update(whois_data)
if whois_data.get("whois_email"):
result["primary_email"] = whois_data["whois_email"]
result["emails"] = [whois_data["whois_email"]] + result.get("emails", [])
result["source"] = "whois"
result["confidence"] = "medium"
if not result.get("primary_email"):
# Tier 3: Hunter.io
hunter_data = hunter_domain_search(domain)
result.update(hunter_data)
if hunter_data.get("primary_email"):
result["emails"] = hunter_data["hunter_emails"]
result["source"] = "hunter"
result["confidence"] = "high"
return result
Privacy & Ethics Notes
- Only scrape publicly visible contact info
- Respect
robots.txtfor large-scale operations - Do not scrape pages behind login walls
- WHOIS data is public by design — legal to use
- Hunter.io only indexes publicly available emails
- Always include opt-out mechanism in outreach
- Comply with CAN-SPAM and GDPR when emailing
安全使用建议
This skill appears coherent with its description, but consider the following before installing or running it:
- Privacy & legality: The skill collects email addresses and phone numbers (personal data). Make sure you have the right to harvest or store this data and that your use complies with laws and the target site's terms of service.
- Network scope & SSRF risk: The skill will fetch arbitrary URLs. Avoid feeding it attacker-controlled or internal-only URLs (localhost, 169.254.x.x, private IP ranges) to prevent server-side request forgery or leaking internal resources.
- Robots, throttling, and politeness: The instructions do not mention obeying robots.txt or rate limits. If you will use this at scale, add rate-limiting, retries/backoff, and respect site crawl policies.
- Dependencies: The skill lists Python packages (requests, beautifulsoup4, lxml, python-whois). Ensure these are installed from trusted sources (PyPI) in an isolated environment to reduce supply-chain risk.
- Hunter.io key: If you provide HUNTER_API_KEY, treat it as a secret and restrict its scope. The key is optional; the skill works with free tiers or without it if WHOIS/scraping succeed.
If you need stricter controls, require an allowlist of domains, add internal-address blocking, and add explicit instructions to respect robots.txt and rate limits before running this skill.
功能分析
Type: OpenClaw Skill
Name: contact-enrichment
Version: 1.0.0
The OpenClaw skill 'contact-enrichment' is benign. It performs its stated purpose of extracting contact information from websites using web scraping, WHOIS lookups, and the Hunter.io API. The code uses standard libraries like `requests`, `beautifulsoup4`, `python-whois`, and `tldextract` for network communication and data parsing. API keys are securely retrieved from environment variables. There is no evidence of malicious intent, such as unauthorized data exfiltration, shell command injection, persistence mechanisms, or prompt injection attempts against the AI agent in the `SKILL.md` file. All operations are consistent with the described functionality.
能力评估
Purpose & Capability
The name/description (contact enrichment via scraping, WHOIS, Hunter.io) matches the instructions and the declared Python packages (requests, beautifulsoup4, lxml, python-whois). No unrelated environment variables, binaries, or config paths are requested.
Instruction Scope
The SKILL.md stays within the stated purpose: it scrapes common contact pages, runs WHOIS lookups, and optionally calls Hunter.io. It does not instruct accessing unrelated system files or secrets. Note: it does not mention obeying robots.txt, rate-limiting, or domain allowlists, and it will fetch arbitrary URLs supplied to it — which raises operational/privacy considerations (see guidance).
Install Mechanism
Instruction-only skill with no install spec and no archive downloads. Declared Python packages are reasonable for the described tasks. There is no high-risk download or executable install step.
Credentials
No required environment variables or secrets are requested. Hunter.io API key is listed as optional (appropriate and proportional). The declared packages and optional API key are consistent with functionality.
Persistence & Privilege
always is false and the skill does not request persistent or elevated platform privileges. It doesn't modify other skills or system-wide settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install contact-enrichment - 安装完成后,直接呼叫该 Skill 的名称或使用
/contact-enrichment触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
3-tier contact extraction: direct HTML scraping, WHOIS registrant lookup, Hunter.io API domain search. Returns emails, phones, owner name, and source confidence level for any business domain.
元数据
常见问题
Contact Enrichment 是什么?
Extract contact information from a website using a 3-tier approach. Direct HTML scraping, WHOIS lookup, then Hunter.io API domain search for verified busines... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 331 次。
如何安装 Contact Enrichment?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install contact-enrichment」即可一键安装,无需额外配置。
Contact Enrichment 是免费的吗?
是的,Contact Enrichment 完全免费(开源免费),可自由下载、安装和使用。
Contact Enrichment 支持哪些平台?
Contact Enrichment 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Contact Enrichment?
由 maverick-software(@maverick-software)开发并维护,当前版本 v1.0.0。
推荐 Skills