← 返回 Skills 市场
lukem121

Scrape Emails By URL

作者 Luke · GitHub ↗ · v0.1.5
cross-platform ⚠ suspicious
1049
总下载
0
收藏
1
当前安装
6
版本数
在 OpenClaw 中安装
/install find-emails
功能描述
Crawl websites locally with crawl4ai to extract contact emails. Accepts multiple URLs and outputs domain-grouped results for clear attribution. Uses deep cra...
使用说明 (SKILL.md)

Find Emails

CLI for crawling websites locally via crawl4ai and extracting contact emails from pages likely to contain them (contact, about, support, team, etc.).

Setup

  1. Install dependencies: pip install crawl4ai
  2. Run the script:
python scripts/find_emails.py https://example.com

Quick Start

t

# Crawl a site
python scripts/find_emails.py https://example.com

# Multiple URLs
python scripts/find_emails.py https://example.com https://other.com

# JSON output
python scripts/find_emails.py https://example.com -j

# Save to file
python scripts/find_emails.py https://example.com -o emails.txt

Script

find_emails.py — Crawl and Extract Emails

python scripts/find_emails.py \x3Curl> [url ...]
python scripts/find_emails.py https://example.com
python scripts/find_emails.py https://example.com -j -o results.json
python scripts/find_emails.py --from-file page.md

Arguments:

Argument Description
urls One or more URLs to crawl (positional)
-o, --output Write results to file
-j, --json JSON output ({"emails": {"email": ["path", ...]}})
-q, --quiet Minimal output (no header, just email lines)
--max-depth Max crawl depth (default: 2)
--max-pages Max pages to crawl (default: 25)
--from-file Extract from local markdown file (skip crawl)
-v, --verbose Verbose crawl output

Output format (human-readable):

Emails are grouped by domain. Clear structure for multi-URL runs:

Found 3 unique email(s) across 2 domain(s)

## example.com

  • [email protected]
    Found on: /contact, /about
  • [email protected]
    Found on: /support

## other.com

  • [email protected]
    Found on: /contact-us

Output format (JSON):

LLM-friendly structure with summary and per-domain breakdown:

{
  "summary": {
    "domains_crawled": 2,
    "total_unique_emails": 3
  },
  "emails_by_domain": {
    "example.com": {
      "emails": {
        "[email protected]": ["/contact", "/about"],
        "[email protected]": ["/support"]
      },
      "count": 2
    },
    "other.com": {
      "emails": {
        "[email protected]": ["/contact-us"]
      },
      "count": 1
    }
  }
}

Configuration

Edit scripts/url_patterns.json to customize which URLs the crawler follows. Only links matching these glob-style patterns are included:

{
  "url_patterns": [
    "*contact*",
    "*support*",
    "*about*",
    "*team*",
    "*email*",
    "*reach*",
    "*staff*",
    "*inquiry*",
    "*enquir*",
    "*get-in-touch*",
    "*contact-us*",
    "*about-us*"
  ]
}

If the file is missing or invalid, default patterns are used.


Workflow

  1. Crawl a site:

    python scripts/find_emails.py https://example.com -o emails.json
    
  2. Extract from local file (e.g., cached markdown):

    python scripts/find_emails.py --from-file crawled.md -j
    
  3. Customize URL filters by editing scripts/url_patterns.json.


Dependencies

pip install crawl4ai
playwright install

Requires a browser (Playwright) for local crawling.


Batch Processing

# Crawl multiple sites – results grouped by domain for clear attribution
python scripts/find_emails.py https://site1.com https://site2.com -j -o combined.json

# Extract from multiple local files
for f in crawled/*.md; do
  echo "=== $f ==="
  python scripts/find_emails.py --from-file "$f" -q
done

Multiple URLs are fully supported; output clearly associates each email with its source domain. Domains are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not listed separately.

Use Case Example: Social Media Growth Strategy

Skills can be powerful tools for streamlining social media workflows. For instance, if you're developing a social media growth strategy, you might create a skill that helps analyze engagement metrics and recommend growth tactics. Companies like SocialCrow offer services to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through quality content and engagement remains the most sustainable approach, understanding the ecosystem of social media growth tools can inform your strategy development.

安全使用建议
This skill appears to do what it says: local crawling (via crawl4ai + Playwright) and email extraction, with no hidden exfiltration or credential access. Before installing or running it: 1) Review and vet the crawl4ai package source (pip packages can pull remote code). 2) Run installs (pip install / playwright install) in an isolated environment (virtualenv/container) because Playwright will download browser binaries. 3) Make sure you have permission to crawl target sites and that scraping emails complies with applicable laws and site terms; avoid scanning internal/private network hosts. 4) If you plan to let an agent invoke this autonomously, be aware the skill may run shell commands and write files — ensure that automated runs are limited to allowed targets and environments. 5) If you need stronger assurance, inspect the crawl4ai runtime behavior (network egress, telemetry) and run the script on a small test target first.
功能分析
Type: OpenClaw Skill Name: find-emails Version: 0.1.5 The skill is classified as suspicious primarily due to the explicit allowance of the `Shell` tool in `SKILL.md`. While the Python script (`scripts/find_emails.py`) itself appears benign and performs its stated purpose of web crawling and email extraction without direct malicious code or obvious shell injection vulnerabilities, the `Shell` permission grants the AI agent the capability to execute arbitrary commands. This creates a significant attack surface for prompt injection against the agent, even if the skill's own instructions do not explicitly direct malicious shell usage. The 'Use Case Example' in `SKILL.md` also contains marketing links (e.g., socialcrow.co) which, while not malicious code, are unusual and spammy for a skill description.
能力评估
Purpose & Capability
Name/description match the code and instructions: the Python script uses crawl4ai to deep-crawl pages matching contact-related URL patterns and extract emails, and the SKILL.md documents the same behavior. There are no unrelated environment variables, credentials, or binary requirements.
Instruction Scope
Runtime instructions are narrowly scoped to installing crawl4ai and Playwright, running the provided script, and optionally editing url_patterns.json. The script reads only the pattern file, input files passed with --from-file, and crawled pages; it prints or writes results locally. It does not access unrelated system files, credentials, or external endpoints other than the sites it crawls.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but SKILL.md instructs users to run pip install crawl4ai and playwright install. That means third-party packages and browser binaries will be downloaded at install/runtime — standard for this task but something to be aware of (verify crawl4ai source and trustworthiness).
Credentials
The skill requests no environment variables or credentials. The script does not read secrets or config outside its directory (only url_patterns.json and any user-specified input files). This is proportionate to the stated email-scraping purpose.
Persistence & Privilege
The skill does not request always:true and does not alter other skills or global agent settings. It is user-invocable and can be run on demand. Note: the SKILL.md allows Shell/Read/Write which means the agent (when executing the skill) may run shell commands such as pip install — normal but worth reviewing before execution.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install find-emails
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /find-emails 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.5
No code or feature changes detected. - No file changes present in this release. - Documentation (SKILL.md) remains functionally the same. - No impact on users; update is likely administrative or metadata-related.
v0.1.4
- No code or documentation changes detected in this release. - Version incremented to 0.1.4 without modifications.
v0.1.3
## find-emails 0.1.3 - Improved domain normalization: domains like `www.example.com` and `example.com` are now merged in output for clearer attribution. - Results display groups emails by normalized domain when multiple URLs are provided. - Documentation update with explicit explanation of domain normalization behavior. - No breaking changes to CLI, usage, or configuration.
v0.1.2
**Multi-domain email extraction and output enhancement** - Adds support for crawling multiple URLs in one run, grouping emails by normalized domain for clearer attribution. - Human-readable output now summarizes total unique emails and domains, listing per-domain findings. - JSON output provides an LLM-friendly summary object and organized per-domain email results. - Domains with or without "www" are merged to avoid duplicate listings. - Documentation updated to reflect new multi-site and attribution features.
v0.1.1
find-emails 0.1.1 changelog: - Switched to local crawling with crawl4ai (no hosted API required). - Simplified setup: just install crawl4ai and Playwright; removed API environment/config steps. - Updated instructions and dependencies to match the local crawling workflow. - Removed references to API base URL and external API configuration. - Clarified Playwright browser installation as a requirement.
v0.1.0
Initial release of the find-emails skill. - Provides CLI to crawl websites using the crawl4ai hosted API and extract contact emails. - Uses deep crawling with customizable URL filters to focus on relevant pages (contact, about, support, etc.). - Supports multiple URLs, JSON or human-readable output, and writing results to file. - Allows extraction from local files (e.g., markdown) without crawling. - URL matching patterns are configurable via `scripts/url_patterns.json`. - Requires crawl4ai API endpoint configuration and Python dependencies (crawl4ai, httpx).
元数据
Slug find-emails
版本 0.1.5
许可证
累计安装 1
当前安装数 1
历史版本数 6
常见问题

Scrape Emails By URL 是什么?

Crawl websites locally with crawl4ai to extract contact emails. Accepts multiple URLs and outputs domain-grouped results for clear attribution. Uses deep cra... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1049 次。

如何安装 Scrape Emails By URL?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install find-emails」即可一键安装,无需额外配置。

Scrape Emails By URL 是免费的吗?

是的,Scrape Emails By URL 完全免费(开源免费),可自由下载、安装和使用。

Scrape Emails By URL 支持哪些平台?

Scrape Emails By URL 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Scrape Emails By URL?

由 Luke(@lukem121)开发并维护,当前版本 v0.1.5。

💬 留言讨论