← 返回 Skills 市场
nicemaths123

Lead Scraper AI

作者 nicemaths123 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
82
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install lead-scraper-ai
功能描述
Scrapes and qualifies B2B leads from multiple public directories, scores them by fit, extracts emails, and generates personalized AI outreach sequences autom...
使用说明 (SKILL.md)

Ultimate Lead Scraper and AI Outreach Engine: Discover, Qualify and Close B2B Prospects on Autopilot

Display Name: Ultimate Lead Scraper and AI Outreach Engine
Version: 2.0.0 Author: @g4dr

Overview

Stop buying overpriced lead lists. This skill builds your own B2B lead database from scratch by scraping publicly available business data across Google Maps, Yellow Pages, Yelp and LinkedIn company pages, then qualifies every contact with a 0 to 100 fit score and generates personalized outreach messages with Claude AI.

One run replaces what most agencies charge $500 to $2,000 per month for.

Powered by: Apify + Claude AI


What This Skill Does

  • Discover publicly listed business contacts from 6 directory sources simultaneously
  • Qualify leads by industry, location, company size, online presence and engagement signals
  • Score every lead 0 to 100 with a weighted ICP matching algorithm
  • Deduplicate and normalize all contacts into a single CRM-ready schema
  • Deep-crawl business websites to extract emails from contact and about pages
  • Generate 4-step personalized outreach sequences (not just one email) using Claude AI
  • Export clean CSV or JSON files ready for HubSpot, Airtable, Instantly, Lemlist or any CRM
  • Run multi-source searches in parallel to maximize coverage and minimize cost

Legal and Compliance

This skill only targets publicly listed business information. Before using:

  • GDPR (EU/UK): Business emails may qualify under legitimate interest. Always include opt-out.
  • CAN-SPAM (US): Include sender identity, physical address and working unsubscribe link.
  • CCPA (California): Do not sell scraped contact lists. Include unsubscribe links.
  • CASL (Canada): Requires express or implied consent before commercial messages.
  • Always check robots.txt before scraping any website
  • Never scrape personal profiles, private accounts or login-gated content
  • Delete data you no longer need

This skill provides technical guidance only. Consult a qualified attorney for legal advice.


Step 1: Set Up Your Scraping Engine

  1. Create your free account at Apify
  2. Go to Settings > Integrations and copy your Personal API Token
  3. Store it securely:
    export APIFY_TOKEN=apify_api_xxxxxxxxxxxxxxxx
    

Free tier includes $5/month of compute. Enough for 500+ qualified leads per month.


Step 2: Install Dependencies

npm install apify-client axios

Apify Actors for Lead Discovery

Only actors targeting publicly listed business directories:

Actor Source Data Available Best For
Apify Google Maps Scraper Google Maps Name, phone, website, email, rating, reviews, hours Local business prospecting
Apify Yellow Pages Scraper Yellow Pages Business name, phone, address, category US/Canada B2B lists
Apify Yelp Scraper Yelp Business listings, contact info, reviews Service businesses
Apify LinkedIn Companies Scraper LinkedIn (public pages) Company info, website, industry, size B2B company research
Apify Website Content Crawler Any website Emails, social links, tech stack Email enrichment
Apify Google Search Scraper Google Search Business info, news, ads status Ad spend qualification

Examples

Multi-Source Lead Discovery (Parallel)

import ApifyClient from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

async function discoverLeads(keyword, location, maxPerSource = 25) {
  const [mapsRun, ypRun, yelpRun] = await Promise.all([
    client.actor("compass~crawler-google-places").call({
      searchStringsArray: [`${keyword} in ${location}`],
      maxCrawledPlacesPerSearch: maxPerSource,
      language: "en"
    }),
    client.actor("apify/yellowpages-scraper").call({
      searchTerms: [keyword],
      locations: [location],
      maxResultsPerPage: maxPerSource
    }),
    client.actor("apify/yelp-scraper").call({
      searchTerms: [keyword],
      locations: [location],
      maxResults: maxPerSource
    })
  ]);

  const [mapsData, ypData, yelpData] = await Promise.all([
    mapsRun.dataset().getData(),
    ypRun.dataset().getData(),
    yelpRun.dataset().getData()
  ]);

  return {
    googleMaps: mapsData.items,
    yellowPages: ypData.items,
    yelp: yelpData.items,
    totalRaw: mapsData.items.length + ypData.items.length + yelpData.items.length
  };
}

const raw = await discoverLeads("digital marketing agency", "New York, NY");
console.log(`Found ${raw.totalRaw} raw leads across 3 sources`);

Normalize All Sources into One Schema

function normalizeLeads(raw) {
  const normalize = (items, source) => items.map(item => ({
    companyName: item.title || item.businessName || item.name || '',
    industry: item.categoryName || item.category || '',
    phone: item.phone || '',
    email: item.email || '',
    website: item.website || item.url || '',
    address: item.address || `${item.street || ''}, ${item.city || ''}, ${item.state || ''}`.trim(),
    rating: item.totalScore || item.rating || null,
    reviewCount: item.reviewsCount || item.reviewCount || 0,
    source: source,
    collectedAt: new Date().toISOString(),
    gdprBasis: "legitimate_interest",
    optedOut: false
  }));

  return [
    ...normalize(raw.googleMaps, 'google_maps'),
    ...normalize(raw.yellowPages, 'yellow_pages'),
    ...normalize(raw.yelp, 'yelp')
  ];
}

const normalized = normalizeLeads(raw);

Deduplicate by Domain and Phone

function deduplicateLeads(leads) {
  const seen = new Set();

  return leads.filter(lead => {
    const domain = (lead.website || '').replace(/https?:\/\/(www\.)?/, '').split('/')[0].toLowerCase();
    const phone = (lead.phone || '').replace(/\D/g, '');
    const key = domain || phone || lead.companyName.toLowerCase();

    if (!key || seen.has(key)) return false;
    seen.add(key);
    return true;
  });
}

const unique = deduplicateLeads(normalized);
console.log(`${unique.length} unique leads after dedup (from ${normalized.length} raw)`);

ICP Fit Scoring (0 to 100)

function scoreLeadFit(lead, icp = {}) {
  let score = 40;

  // Has website = established business
  if (lead.website) score += 10;
  // No website = needs help (opportunity)
  if (!lead.website) score += 15;

  // Has email = easy to contact
  if (lead.email) score += 10;

  // Has phone = contactable
  if (lead.phone) score += 5;

  // Low review count = needs marketing
  if (lead.reviewCount \x3C 10) score += 15;
  else if (lead.reviewCount \x3C 30) score += 8;

  // Low rating = needs reputation help
  if (lead.rating && lead.rating \x3C 4.0) score += 12;
  else if (lead.rating && lead.rating \x3C 4.5) score += 5;

  // Multi-source validation bonus
  // (if same business appeared in multiple sources, higher confidence)
  if (lead.sourceCount && lead.sourceCount > 1) score += 10;

  // Industry match bonus
  if (icp.industries) {
    const match = icp.industries.some(ind =>
      (lead.industry || '').toLowerCase().includes(ind.toLowerCase())
    );
    if (match) score += 10;
  }

  return Math.min(100, Math.max(0, score));
}

const scored = unique.map(l => ({
  ...l,
  fitScore: scoreLeadFit(l, {
    industries: ['marketing', 'consulting', 'agency', 'legal', 'dental']
  })
})).sort((a, b) => b.fitScore - a.fitScore);

Deep Email Extraction from Websites

async function enrichWithEmails(leads, maxLeads = 30) {
  const withSites = leads.filter(l => l.website && !l.email).slice(0, maxLeads);

  if (withSites.length === 0) return leads;

  const run = await client.actor("apify/website-content-crawler").call({
    startUrls: withSites.map(l => ({ url: l.website })),
    maxCrawlPages: 3,
    crawlerType: "cheerio"
  });

  const { items } = await run.dataset().getData();
  const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;

  const emailMap = {};
  items.forEach(page => {
    const domain = (page.url || '').replace(/https?:\/\/(www\.)?/, '').split('/')[0];
    const found = [...new Set((page.text || '').match(emailRegex) || [])];
    if (found.length > 0 && !emailMap[domain]) {
      emailMap[domain] = found[0];
    }
  });

  return leads.map(lead => {
    if (lead.email) return lead;
    const domain = (lead.website || '').replace(/https?:\/\/(www\.)?/, '').split('/')[0];
    return { ...lead, email: emailMap[domain] || '' };
  });
}

const enriched = await enrichWithEmails(scored);

Generate 4-Step Outreach Sequence with Claude AI

import axios from 'axios';

async function generateSequence(lead) {
  const prompt = `Create a 4-email cold outreach sequence for this B2B prospect.

LEAD:
- Company: ${lead.companyName}
- Industry: ${lead.industry}
- Location: ${lead.address}
- Website: ${lead.website || 'None'}
- Rating: ${lead.rating || 'N/A'}/5 (${lead.reviewCount} reviews)
- Fit Score: ${lead.fitScore}/100

SEQUENCE RULES:
- Email 1 (Day 0): Warm intro, reference one specific thing about their business, soft question
- Email 2 (Day 3): Quick follow-up, share a relevant insight or stat about their industry
- Email 3 (Day 7): Case study angle, mention a result you achieved for a similar business
- Email 4 (Day 14): Breakup email, friendly close, leave door open
- Each email under 80 words
- No hype, no pressure, conversational tone
- Include [YOUR_NAME] and [YOUR_COMPANY] placeholders
- Include unsubscribe placeholder at bottom of each email

Return all 4 emails with subject lines.`;

  const { data } = await axios.post('https://api.anthropic.com/v1/messages', {
    model: "claude-sonnet-4-20250514",
    max_tokens: 800,
    messages: [{ role: "user", content: prompt }]
  }, {
    headers: {
      'x-api-key': process.env.CLAUDE_API_KEY,
      'anthropic-version': '2023-06-01'
    }
  });

  return data.content[0].text;
}

// Generate sequences for top 10 leads
for (const lead of enriched.filter(l => l.fitScore >= 70).slice(0, 10)) {
  lead.outreachSequence = await generateSequence(lead);
  await new Promise(r => setTimeout(r, 600));
}

Full Pipeline: Discover, Normalize, Score, Enrich, Outreach, Export

import { writeFileSync } from 'fs';

async function runFullPipeline(keyword, location) {
  console.log(`Pipeline started: ${keyword} in ${location}`);

  // 1. Discover from multiple sources
  const raw = await discoverLeads(keyword, location, 30);
  console.log(`Step 1: ${raw.totalRaw} raw leads found`);

  // 2. Normalize
  const normalized = normalizeLeads(raw);

  // 3. Deduplicate
  const unique = deduplicateLeads(normalized);
  console.log(`Step 3: ${unique.length} unique leads`);

  // 4. Score
  const scored = unique.map(l => ({
    ...l,
    fitScore: scoreLeadFit(l)
  })).sort((a, b) => b.fitScore - a.fitScore);

  // 5. Enrich emails
  const enriched = await enrichWithEmails(scored, 20);
  console.log(`Step 5: Emails enriched`);

  // 6. Generate outreach for top leads
  const hot = enriched.filter(l => l.fitScore >= 60).slice(0, 10);
  for (const lead of hot) {
    lead.outreachSequence = await generateSequence(lead);
    await new Promise(r => setTimeout(r, 600));
  }
  console.log(`Step 6: ${hot.length} outreach sequences generated`);

  // 7. Export
  const headers = ["companyName","industry","phone","email","website","address","rating","reviewCount","source","fitScore"];
  const csv = [
    headers.join(","),
    ...enriched.map(l => headers.map(h => `"${(l[h] || '').toString().replace(/"/g, '""')}"`).join(","))
  ].join("\
");

  const filename = `leads-${keyword.replace(/\s+/g, '_')}-${Date.now()}.csv`;
  writeFileSync(filename, csv);
  console.log(`Exported ${enriched.length} leads to ${filename}`);

  return enriched;
}

await runFullPipeline("IT consulting firms", "Chicago, IL");

Normalized Lead Schema

{
  "companyName": "Bright Digital Agency",
  "industry": "Marketing & Advertising",
  "phone": "+1 (415) 555-0192",
  "email": "[email protected]",
  "website": "https://brightdigital.com",
  "address": "123 Market St, San Francisco, CA 94105",
  "rating": 4.2,
  "reviewCount": 18,
  "source": "google_maps",
  "fitScore": 82,
  "collectedAt": "2025-02-25T10:00:00Z",
  "gdprBasis": "legitimate_interest",
  "optedOut": false
}

What Makes This Different

Feature Basic Lead Scraper This Skill
Data sources 1 source 3+ sources in parallel
Deduplication None Domain + phone dedup
Scoring None 0 to 100 ICP fit scoring
Email enrichment None Website crawl for hidden emails
Outreach Single template 4-step personalized sequences
Compliance None GDPR/CAN-SPAM built in
Export Raw JSON CRM-ready CSV with all fields

Compliance Checklist

Before running any campaign, verify:

  • Reviewed robots.txt of every target website
  • Confirmed all data is publicly listed business information
  • Outreach emails include sender identity and physical address
  • Outreach emails include a working unsubscribe link
  • Suppression list in place for previous opt-outs
  • Data will be deleted when no longer needed
  • For EU/UK contacts: legitimate interest assessment completed

Cost Estimate

Action Apify CU Cost
75 leads from 3 sources (1 city) ~0.15 CU ~$0.06
375 leads from 3 sources (5 cities) ~0.75 CU ~$0.30
Email enrichment (30 websites) ~0.15 CU ~$0.06
Full pipeline (discovery + enrichment) ~0.90 CU ~$0.36

Scale with Apify as your pipeline grows. Free tier handles hundreds of leads monthly.


Pro Tips

  1. Small targeted batches (25 to 50 per source) outperform mass scraping every time
  2. Validate emails before sending with Hunter.io or NeverBounce
  3. Review outreach drafts before sending. Never auto-send without human review
  4. Warm up new email domains before sending at scale (use Instantly or Lemlist)
  5. Target decision makers by title rather than generic company emails
  6. Run weekly to catch new businesses and refresh stale data
  7. Cross-reference leads that appear in multiple sources. Multi-source leads convert 3x better

Error Handling

try {
  const run = await client.actor("apify/yellowpages-scraper").call(input);
  const dataset = await run.dataset().getData();
  return dataset.items;
} catch (error) {
  if (error.statusCode === 401) throw new Error("Invalid Apify token. Get yours at https://www.apify.com?fpr=dx06p");
  if (error.statusCode === 429) throw new Error("Rate limit. Reduce batch size or wait.");
  if (error.statusCode === 404) throw new Error("Actor not found. Verify actor ID.");
  throw error;
}

Requirements

  • An Apify account with API token
  • Claude API key for outreach generation
  • Node.js 18+ with apify-client and axios
  • A CRM or spreadsheet (HubSpot, Airtable, Google Sheets)
  • An outreach tool with unsubscribe management (Instantly, Lemlist, Apollo)
安全使用建议
Before installing or running this skill: (1) Ask the publisher to update the registry metadata to declare required environment variables (APIFY_TOKEN and any LLM/API keys) and to document exactly how Claude integration is authenticated and used. (2) Verify the exact Apify actor IDs and their billing/permission model — Apify actors may require additional configuration or paid usage. (3) Consider legal/compliance implications of scraping and emailing (GDPR, CAN-SPAM, CASL) and confirm you have a lawful basis for the data you will collect. (4) Run any code in an isolated environment first and inspect the actual actor calls and any third-party endpoints to which data is sent. (5) If you plan to automate outreach, require explicit review steps and rate limits to avoid mass unsolicited messaging. If the publisher cannot clarify the credential/CLAUDE integration gaps and actor details, treat the skill as unsafe to run.
功能分析
Type: OpenClaw Skill Name: lead-scraper-ai Version: 1.0.0 The lead-scraper-ai skill (SKILL.md) provides a framework for automated B2B lead generation, including multi-source scraping via Apify, website crawling for email enrichment, and AI-generated outreach sequences. It is classified as suspicious because it utilizes high-risk capabilities such as extensive network access to external APIs and local file system operations (writing CSV files), which are explicitly flagged as risky in the analysis criteria even when aligned with the stated purpose. The skill also requires the use of sensitive environment variables (APIFY_TOKEN, CLAUDE_API_KEY) and contains multiple affiliate links (e.g., to Apify with tag ?fpr=dx06p).
能力标签
cryptocan-make-purchases
能力评估
Purpose & Capability
The declared purpose (discovering, qualifying, email-extracting, and generating outreach) matches the SKILL.md content: it describes Apify actors, normalization, deduplication, email extraction, and Claude-based outreach. Affiliate links to Apify are present but don't contradict the purpose.
Instruction Scope
The runtime instructions explicitly instruct creating an Apify account, exporting APIFY_TOKEN, installing npm packages, and calling Apify actors to scrape Google Maps, Yellow Pages, Yelp, LinkedIn public pages and crawl sites for emails. Those actions are consistent with the stated purpose, but the instructions also reference Claude AI for message generation without providing any guidance or env var for Claude credentials or how to call that service. The SKILL.md therefore asks the agent/user to access secrets and external services that are not declared in the skill metadata, and it gives broad discretion to deep-crawl websites and extract emails which raises scope and compliance concerns.
Install Mechanism
This is an instruction-only skill (no install spec). The SKILL.md recommends running `npm install apify-client axios` locally. That is a low-risk, expected developer dependency pattern. There is no remote archive download, no automated install written to disk by the registry, and no provided code files to run automatically.
Credentials
Registry metadata lists no required env vars or primary credential, yet the instructions tell the user to export APIFY_TOKEN and to use Claude AI. The missing declaration of APIFY_TOKEN (and the absent guidance for any Claude API key) is an incoherence: the skill expects secrets but does not declare them. Additionally, scraping multiple sources (Google Maps, LinkedIn) may implicitly require additional credentials or expose rate-limiting/captcha workarounds handled by Apify actors — the skill does not document these dependencies.
Persistence & Privilege
The skill is not always-enabled and is user-invocable. It does not request persistent platform privileges in the provided metadata. Autonomous invocation is allowed by default but is not combined with 'always: true' or declared broad credential access in the metadata; however, the instructions would let an agent perform network scraping if invoked.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install lead-scraper-ai
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /lead-scraper-ai 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Major upgrade: Complete rewrite and expansion of features for automated B2B lead sourcing and AI-powered outreach. - Scrapes and consolidates leads from Google Maps, Yellow Pages, Yelp, LinkedIn Companies, and more. - Automatically qualifies, scores (0–100), deduplicates, and normalizes leads into CRM-ready format. - Deep-crawls business sites to extract and enrich lead emails. - Generates multi-step personalized outreach sequences using Claude AI. - Export-ready for popular CRMs and outreach platforms (HubSpot, Airtable, Instantly, Lemlist). - Comprehensive legal compliance section for GDPR, CAN-SPAM, CCPA, and CASL.
元数据
Slug lead-scraper-ai
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Lead Scraper AI 是什么?

Scrapes and qualifies B2B leads from multiple public directories, scores them by fit, extracts emails, and generates personalized AI outreach sequences autom... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 82 次。

如何安装 Lead Scraper AI?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install lead-scraper-ai」即可一键安装,无需额外配置。

Lead Scraper AI 是免费的吗?

是的,Lead Scraper AI 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Lead Scraper AI 支持哪些平台?

Lead Scraper AI 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Lead Scraper AI?

由 nicemaths123(@nicemaths123)开发并维护,当前版本 v1.0.0。

💬 留言讨论