← 返回 Skills 市场
futurizerush

Apify Google News Scraper

作者 Futurize Rush · GitHub ↗ · v0.1.1 · MIT-0
cross-platform ⚠ suspicious
104
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install apify-google-news
功能描述
This skill should be used when the user asks to "scrape Google News", "get news articles", "search for news", "extract news data", "monitor news topics", "ge...
使用说明 (SKILL.md)

Google News Scraper with Apify

Search and extract news articles from Google News with full article content, enriched descriptions, and metadata. Supports region and language filtering.

Actor: futurizerush/google-news-scraper

Prerequisites

Set APIFY_API_TOKEN in environment. Get a token at console.apify.com/account/integrations.

Execution Flow

Apify runs are asynchronous. Every request follows 3 steps:

  1. Start a run -- POST to the actor API, receive a run ID and dataset ID
  2. Poll until done -- GET the run status, wait for SUCCEEDED
  3. Fetch results -- GET the dataset items (returns a JSON array)

Typical run time: 30-90 seconds depending on query count and article enrichment.

Input Parameters

Parameter Type Required Description
searchQueries array of strings Yes Search queries (e.g. ["AI"], ["climate change"])
region string No Region code. Default: "us". Examples: "us", "tw", "jp"
language string No Language code. Default: "en". Examples: "en", "zh-TW", "ja"
dateFilter string No Time range: "1h", "1d", "1w", "1m", or "" (any time). Default: ""
maxResults integer No Max articles per query. Default: 20. Min: 10

Complete Example (Python)

import requests, os, time

TOKEN = os.environ["APIFY_API_TOKEN"]
BASE = "https://api.apify.com/v2"

# Step 1: Start the run
response = requests.post(
    f"{BASE}/acts/futurizerush~google-news-scraper/runs?token={TOKEN}",
    json={
        "searchQueries": ["AI"],
        "region": "us",
        "language": "en",
        "dateFilter": "1d",
        "maxResults": 10,
    },
)
response.raise_for_status()
run = response.json()["data"]
run_id = run["id"]
dataset_id = run["defaultDatasetId"]

# Step 2: Poll until done
while True:
    status = requests.get(
        f"{BASE}/actor-runs/{run_id}?token={TOKEN}"
    ).json()["data"]["status"]
    if status == "SUCCEEDED":
        break
    if status in ("FAILED", "ABORTED", "TIMED-OUT"):
        raise RuntimeError(f"Run failed: {status}")
    time.sleep(5)

# Step 3: Fetch results (JSON array)
items = requests.get(
    f"{BASE}/datasets/{dataset_id}/items?token={TOKEN}"
).json()
for article in items:
    print(f"[{article['source']}] {article['title']}")
    print(f"  URL: {article['articleUrl']}")
    print(f"  Published: {article['pubDate']}")
    if article.get("enrichedDescription"):
        print(f"  Summary: {article['enrichedDescription'][:100]}")

Taiwan news in Chinese

requests.post(
    f"{BASE}/acts/futurizerush~google-news-scraper/runs?token={TOKEN}",
    json={
        "searchQueries": ["台灣"],
        "region": "tw",
        "language": "zh-TW",
        "dateFilter": "1d",
        "maxResults": 10,
    },
)

Multiple queries

requests.post(
    f"{BASE}/acts/futurizerush~google-news-scraper/runs?token={TOKEN}",
    json={
        "searchQueries": ["AI", "climate", "crypto"],
        "region": "us",
        "dateFilter": "1w",
        "maxResults": 10,
    },
)

Complete Example (bash)

# Step 1: Start the run
RUN_RESPONSE=$(curl -s -X POST \
  "https://api.apify.com/v2/acts/futurizerush~google-news-scraper/runs?token=$APIFY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"searchQueries": ["AI"], "region": "us", "language": "en", "dateFilter": "1d", "maxResults": 10}')

RUN_ID=$(echo "$RUN_RESPONSE" | jq -r '.data.id')
DATASET_ID=$(echo "$RUN_RESPONSE" | jq -r '.data.defaultDatasetId')

# Step 2: Poll until done
while true; do
  STATUS=$(curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID?token=$APIFY_API_TOKEN" \
    | jq -r '.data.status')
  [ "$STATUS" = "SUCCEEDED" ] && break
  [ "$STATUS" = "FAILED" ] || [ "$STATUS" = "ABORTED" ] && echo "Failed: $STATUS" && exit 1
  sleep 5
done

# Step 3: Fetch results
curl -s "https://api.apify.com/v2/datasets/$DATASET_ID/items?token=$APIFY_API_TOKEN" | jq '.'

Output Format

Each item in the results array (field names verified from real API output on 2026-04-11):

{
  "title": "Vance, Bessent questioned tech giants on AI security...",
  "articleUrl": "https://www.cnbc.com/2026/04/10/...",
  "googleNewsUrl": "https://news.google.com/rss/articles/...",
  "pubDate": "Fri, 10 Apr 2026 20:06:08 GMT",
  "timestamp": "2026-04-10T20:06:08.000Z",
  "source": "CNBC",
  "websiteName": "CNBC",
  "websiteUrl": "https://www.cnbc.com",
  "imageUrl": "https://image.cnbcfm.com/...",
  "description": "Raw RSS description with related headlines...",
  "enrichedDescription": "Bessent and Fed Chair Jerome Powell separately met with...",
  "excerpt": "Bessent and Fed Chair Jerome Powell separately met with...",
  "articleContent": {
    "content": "Full article text (truncated to ~2000 chars)...",
    "characterCount": 2000,
    "tokenCount": 325
  },
  "enrichmentTime": 9848,
  "guid": "unique-article-id",
  "searchQuery": "AI",
  "region": "us",
  "language": "en",
  "scrapedAt": "2026-04-11T06:06:59.618Z"
}

Note: Field names use camelCase. The articleContent object contains the full article text (up to ~2000 characters), character count, and token count. Use enrichedDescription or excerpt for summaries.

Error Handling

Error Cause Fix
401 Unauthorized Invalid or missing API token Check APIFY_API_TOKEN
invalid-input: "must be >= 10" maxResults below minimum Set maxResults to at least 10
No results Query too specific or region has no news Broaden the query or try a different region

Tips

  • Use dateFilter: "1h" for real-time news monitoring and alerting.
  • Use dateFilter: "1d" for daily news digests.
  • articleContent.content provides the full article text (up to ~2000 chars) -- useful for summarization.
  • enrichedDescription is a cleaner summary than description (which contains raw RSS data with related headlines).
  • timestamp is ISO 8601 format, easier to parse than pubDate.
  • Multiple search queries run in a single actor execution.
  • No login or API key for Google News required.

Links

安全使用建议
Before installing or enabling this skill: (1) note that SKILL.md requires an APIFY API token but the registry metadata does not list it — treat that as a metadata bug and assume you'll need to provide APIFY_API_TOKEN. (2) Only provide an APIFY token if you trust the actor/owner; verify the actor name (futurizerush/google-news-scraper) on Apify and ideally use a token scoped to a dedicated/minimal Apify account. (3) Ask the skill publisher to update registry metadata to declare APIFY_API_TOKEN as a required credential so automated tooling and reviewers can see the dependency. (4) If you have doubts about the owner (source unknown, no homepage), avoid supplying your main Apify credentials and consider running the actor manually in a sandboxed account to inspect outputs first.
功能分析
Type: OpenClaw Skill Name: apify-google-news Version: 0.1.1 The skill bundle provides legitimate instructions and code examples (Python and Bash) for interacting with the Apify Google News Scraper actor. It uses standard API patterns (POST to start, polling for status, GET for results) and correctly handles the API token via environment variables without any signs of data exfiltration, malicious execution, or prompt injection.
能力标签
crypto
能力评估
Purpose & Capability
The name/description (Apify Google News Scraper) matches the runtime instructions: the SKILL.md instructs calling the Apify actor futurizerush/google-news-scraper and fetching dataset items from api.apify.com. That functionality is coherent with the stated purpose. However, the registry metadata lists no required environment variables or primary credential while the SKILL.md explicitly requires APIFY_API_TOKEN — this mismatch is unexpected.
Instruction Scope
The instructions are focused: they show how to start an Apify actor run, poll for completion, and fetch dataset items from https://api.apify.com. They only reference an API token (APIFY_API_TOKEN) and standard network calls; they do not ask the agent to read unrelated files, system paths, or other credentials. No unexpected external endpoints are used beyond Apify.
Install Mechanism
This is an instruction-only skill with no install spec and no code files, so nothing is written to disk or downloaded by the skill itself. That is the lowest-risk install mechanism.
Credentials
The SKILL.md requires APIFY_API_TOKEN (sensitive credential) for API access, but the registry metadata declares no required env vars or primary credential. The token request is legitimate for Apify usage, but the metadata omission is misleading and could cause automated reviewers or users to miss that a secret is needed. The skill should declare APIFY_API_TOKEN as a required credential/primaryEnv.
Persistence & Privilege
The skill does not request persistent/always-on inclusion and does not modify other skills or agent-wide configs. Autonomous invocation is allowed (platform default) but is not combined with other high-privilege requests here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install apify-google-news
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /apify-google-news 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.1
Add discovery tags for SEO
v0.1.0
Initial release. Search and extract news articles from Google News with full content. All output fields verified.
元数据
Slug apify-google-news
版本 0.1.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Apify Google News Scraper 是什么?

This skill should be used when the user asks to "scrape Google News", "get news articles", "search for news", "extract news data", "monitor news topics", "ge... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 104 次。

如何安装 Apify Google News Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install apify-google-news」即可一键安装,无需额外配置。

Apify Google News Scraper 是免费的吗?

是的,Apify Google News Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Apify Google News Scraper 支持哪些平台?

Apify Google News Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Apify Google News Scraper?

由 Futurize Rush(@futurizerush)开发并维护,当前版本 v0.1.1。

💬 留言讨论