← 返回 Skills 市场
ai-mrscraper

MrScraper

作者 MrScraper · GitHub ↗ · v1.0.4
cross-platform ✓ 安全检测通过
647
总下载
2
收藏
0
当前安装
5
版本数
在 OpenClaw 中安装
/install mrscraper
功能描述
Run AI-powered, unblockable web scraping, data extraction with natural language via the MrScraper API
使用说明 (SKILL.md)

MrScraper

Run AI-powered, unblockable web scraping, data extraction with natural language via the MrScraper API

Actions

This skill supports:

  • Opening blocked pages through unblocker (stealth browser + IP rotation)
  • Starting AI scraper runs from natural-language instructions
  • Rerunning existing scraper configurations on one or multiple URLs
  • Running manual workflow-based reruns
  • Fetching paginated results and detailed results by ID

This skill is API-only and does not depend on bundled local scripts.

Base URLs

  • Unblocker API: https://api.mrscraper.com
  • Platform API: https://api.app.mrscraper.com

Authentication

Unblocker API auth

Use query-param auth on unblocker endpoint:

  • token=\x3CMRSCRAPER_API_TOKEN>

Platform API auth

Use header-based auth on platform endpoints:

x-api-token: \x3CMRSCRAPER_API_TOKEN>
accept: application/json
content-type: application/json

How to get MRSCRAPER_API_TOKEN?

An API token lets your applications securely interact with MrScraper APIs and rerun scrapers created in the dashboard.

Follow these steps in the dashboard:

  1. Click your User Profile at the top-right corner.
  2. Select API Tokens.
  3. Click New Token.
  4. Enter a name and set an expiration date.
  5. Click Create.
  6. Copy the new token and store it securely as MRSCRAPER_API_TOKEN.
  7. Use it in requests through the x-api-token header.

Security rule:

  • Never expose tokens in client-side code (browser/mobile app bundles).
  • Store tokens in environment variables or server-side secret managers.

Notes from the auth docs:

  • The API key works for all V3 Platform endpoints.
  • The same key can be used for endpoints on sync.scraper.mrscraper.com.
  • For access to endpoints on other hosts, contact [email protected].

Install and Runtime

  • No local install step is required by this skill document.
  • No bundled scripts/ are required.
  • Calls are direct HTTPS requests to the two base URLs above.

Data and Scope

  • Data is sent only to api.app.mrscraper.com and api.mrscraper.com.
  • Responses may contain extracted page content and scrape metadata.
  • This skill does not define hidden persistence or background jobs.
  • Never expose tokens in logs, commits, or output.

Endpoints

1. Unblocker

  • Method: GET
  • URL: https://api.mrscraper.com
  • Auth: token query parameter

Opens a target URL through stealth browsing and IP rotation, then returns HTML. Use this when direct access is blocked by captcha or anti-bot protections.

Query parameters:

Field Type Required Default Description
token string Yes Unblocker token (MRSCRAPER_API_TOKEN)
url string Yes URL-encoded target URL
timeout number No 60 Max wait in seconds (example 120)
geoCode string No None Geographic routing code (example SG)
blockResources boolean No false Block non-essential resources

Request example:

curl --location 'https://api.mrscraper.com?token=\x3CMRSCRAPER_API_TOKEN>&timeout=120&geoCode=SG&url=https%3A%2F%2Fwww.lazada.sg%2Fproducts%2Fpdp-i111650098-s23209659764.html&blockResources=false'

Response example:

\x3C!doctype html>
\x3Chtml>
  \x3Chead>...\x3C/head>
  \x3Cbody>...\x3C/body>
\x3C/html>

Notes:

  • Prefer explicit geoCode and practical timeouts for repeatable behavior.
  • Only pass cookies when session-specific content is required.

2. Create AI Scraper

  • Method: POST
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/scrapers-ai
  • Auth: x-api-token

Create a new AI scraper run from natural-language instructions.

Payload parameters (for agent: general or agent: listing):

Field Type Required Default Description
url string Yes Target URL
message string Yes Extraction instruction
agent string No general The AI agent type to use for scraping: general, listing, or map
proxyCountry string No None ISO country code for proxy-based scraping

Payload parameters (for agent: map):

Field Type Required Default Description
url string Yes Target URL
agent string No map The AI agent type to use for scraping (for this case it is map)
maxDepth number No 2 Maximum depth level for crawling links from the starting URL.\x3Cbr>0 = only the starting URL, 1 = +direct links
maxPages number No 50 Maximum number of pages to scrape during the crawling process.
limit number No 1000 Maximum number of data records to extract across all pages. Scraping stops when this limit is reached.
includePatterns string No "" Regex patterns to include (separate multiple with ||)
excludePatterns string No "" Regex patterns to exclude (separate multiple with ||)

Request example:

curl -X POST "https://api.app.mrscraper.com/api/v1/scrapers-ai" \
  -H "x-api-token: \x3CMRSCRAPER_API_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
    "message": "Extract title, price, stocks, and rating",
    "agent": "general"
  }'

Response example:

{
  "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
  "createdAt": "2019-08-24T14:15:22Z",
  "createdById": "e13e432a-5323-4484-a91d-b5969bc564d9",
  "updatedAt": "2019-08-24T14:15:22Z",
  "updatedById": "d8bc6076-4141-4a88-80b9-0eb31643066f",
  "deletedAt": "2019-08-24T14:15:22Z",
  "deletedById": "8ef578ad-7f1e-4656-b48b-b1b4a9aaa1cb",
  "userId": "2c4a230c-5085-4924-a3e1-25fb4fc5965b",
  "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
  "type": "AI",
  "url": "http://example.com",
  "status": "Finished",
  "error": "string",
  "tokenUsage": 0,
  "runtime": 0,
  "data": {}, // MAIN SCRAPED DATA
  "htmlPath": "string",
  "recordingPath": "string",
  "screenshotPath": "string",
  "dataPath": "string"
}

Notes:

  • Choose agent type correctly as each agent is specialized for specified use cases. Use general for most standard web scraping tasks. The go to agent if the user doesn't specify or the connected LLM is not confident about the type of page. But mostly used for scraping product page, but handles any type of page very well as well. Use listing for scraping listing pages like product listings, job listings, etc. Choose this if the connected LLM can confidently identify whether the given URL is a listing page. Use map for crawling and getting all subdomain or subpages of a website. Choose this if the user specifies that the given URL is a website and not a specific page. For map agent type, there is a special args that can be used to configure the scraping process.
  • For the map agent, you can use special arguments to control crawling:\x3Cbr>maxDepth (lower values 1–2 for focused scraping, max 3 recommended),\x3Cbr>maxPages (limits total pages regardless of depth),\x3Cbr>limit (caps total records extracted),\x3Cbr>and includePatterns/excludePatterns (regex patterns separated by || to specify which URLs to crawl or skip, e.g., */products/*||*/blog/* or */cart/*||*.pdf).\x3Cbr>If includePatterns is an empty string, all URLs are included. If excludePatterns is an empty string, no URLs are excluded.

3. Rerun AI Scraper

  • Method: POST
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/scrapers-ai-rerun
  • Auth: x-api-token

Reruns an existing scraper configuration on a new URL.

Payload parameters:

Field Type Required Default Description
scraperId string Yes Scraper ID retrieved from created AI scraper
url string Yes Target URL

Optional payload parameters for map agent:

Field Type Required Default Description
maxDepth number No 2 Crawl depth
maxPages number No 50 Maximum pages to crawl
limit number No 1000 Result limit
includePatterns string No "" Regex patterns to include (separate multiple with ||)
excludePatterns string No "" Regex patterns to exclude (separate multiple with ||)

Request example:

curl -X POST "https://api.app.mrscraper.com/api/v1/scrapers-ai-rerun" \
  -H "accept: application/json" \
  -H "x-api-token: \x3CMRSCRAPER_API_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "url": "https://shopee.sg/"
  }'

Response example:

{
  "message": "Successful operation!",
  "data": {
    "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
    "createdAt": "2019-08-24T14:15:22Z",
    "createdById": "e13e432a-5323-4484-a91d-b5969bc564d9",
    "updatedAt": "2019-08-24T14:15:22Z",
    "updatedById": "d8bc6076-4141-4a88-80b9-0eb31643066f",
    "deletedAt": "2019-08-24T14:15:22Z",
    "deletedById": "8ef578ad-7f1e-4656-b48b-b1b4a9aaa1cb",
    "userId": "2c4a230c-5085-4924-a3e1-25fb4fc5965b",
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "type": "Rerun-AI",
    "url": "http://example.com",
    "status": "Finished",
    "error": "string",
    "tokenUsage": 0,
    "runtime": 0,
    "data": {}, // MAIN SCRAPED DATA
    "htmlPath": "string",
    "recordingPath": "string",
    "screenshotPath": "string",
    "dataPath": "string",
    "htmlContent": "string"
  }
}

4. Bulk Rerun AI Scraper

  • Method: POST
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/scrapers-ai-rerun/bulk
  • Auth: x-api-token

Runs one scraper configuration over multiple URLs.

Payload parameters:

Field Type Required Default Description
scraperId string Yes Existing AI scraper configuration ID
urls array[string] Yes Target URLs to run

Request example:

curl -X POST "https://api.app.mrscraper.com/api/v1/scrapers-ai-rerun/bulk" \
  -H "x-api-token: " \
  -H "Content-Type: application/json" \
  -d '{
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "urls": [
      "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
      "https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html",
      "https://books.toscrape.com/catalogue/soumission_998/index.html"
    ]
  }'

Response example:

{
  "message": "Bulk rerun started successfully",
  "data": {
    "bulkResultId": "f89f8f58-3c9a-42e5-a72e-59fa6c389f09",
    "status": "Running",
    "totalUrls": 3
  }
}

5. Rerun Manual Scraper

  • Method: POST
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/scrapers-manual-rerun
  • Auth: x-api-token

Executes a rerun using a manual browser workflow.

Creating a Manual Scraper

Before calling the manual rerun endpoint, you need to create and save a manual scraper from the dashboard. Follow these steps:

  1. Open the MrScraper dashboard and go to Scraper.
  2. Click New Manual Scraper +.
  3. Enter your target URL.
  4. Add workflow steps that match your site's behavior (e.g., Input, Click, Delay, Extract, Inject JavaScript).
  5. Configure pagination if needed (using options like Query Pagination, Directory Pagination, or Next Page Link).
  6. Test and save the scraper, then copy its scraperId to use in API reruns.

Payload parameters:

Field Type Required Default Description
scraperId string Yes ID of the manual scraper to rerun.
url string Yes Target URL for the rerun.
workflow array\x3Cobject> No None Allows overriding the saved workflow steps. By default, uses the workflow saved during manual creation.

Request example:

curl -X POST "https://api.app.mrscraper.com/api/v1/scrapers-manual-rerun" \
  -H "accept: application/json" \
  -H "x-api-token: " \
  -H "Content-Type: application/json" \
  -d '{
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "url": "https://books.toscrape.com/",
    "workflow": [
      {
        "type": "extract",
        "data": {
          "extraction_type": "text",
          "attribute": null,
          "name": "book",
          "selector": "h3 a"
        }
      }
    ],
    "record": false,
    "paginator": {
      "type": "query_pagination",
      "max_page": 1,
      "enabled": false
    }
  }'

Response example:

{
  "message": "Successful operation!",
  "data": {
    "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
    "createdAt": "2019-08-24T14:15:22Z",
    "createdById": "e13e432a-5323-4484-a91d-b5969bc564d9",
    "updatedAt": "2019-08-24T14:15:22Z",
    "updatedById": "d8bc6076-4141-4a88-80b9-0eb31643066f",
    "deletedAt": "2019-08-24T14:15:22Z",
    "deletedById": "8ef578ad-7f1e-4656-b48b-b1b4a9aaa1cb",
    "userId": "2c4a230c-5085-4924-a3e1-25fb4fc5965b",
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "type": "Rerun-AI",
    "url": "http://example.com",
    "status": "Finished",
    "error": "string",
    "tokenUsage": 0,
    "runtime": 0,
    "data": {}, // MAIN SCRAPED DATA
    "htmlPath": "string",
    "recordingPath": "string",
    "screenshotPath": "string",
    "dataPath": "string",
    "htmlContent": "string"
  }
}

6. Bulk Rerun Manual Scraper

  • Method: POST
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/scrapers-manual-rerun/bulk
  • Auth: x-api-token

Runs one scraper configuration over multiple URLs.

Payload parameters:

Field Type Required Default Description
scraperId string Yes Existing manual scraper configuration ID
urls array[string] Yes Target URLs to run

Request example:

curl -X POST "https://api.app.mrscraper.com/api/v1/scrapers-manual-rerun/bulk" \
  -H "x-api-token: " \
  -H "Content-Type: application/json" \
  -d '{
    "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
    "urls": [
      "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
      "https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html",
      "https://books.toscrape.com/catalogue/soumission_998/index.html"
    ]
  }'

Response example:

{
  "message": "Bulk rerun started successfully",
  "data": {
    "bulkResultId": "f89f8f58-3c9a-42e5-a72e-59fa6c389f09",
    "status": "Running",
    "totalUrls": 3
  }
}

7. Fetch Results

  • Method: GET
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/results
  • Auth: x-api-token

Returns paginated scrape results.

Query parameters:

Field Type Required Default Description
sortField string Yes updatedAt Sort column
sortOrder string Yes DESC Sort direction
page number Yes 1 Page number
pageSize number Yes 10 Items per page
search string No None Search keyword
dateRangeColumn string No createdAt Date field to filter
startAt string No None Date range start (ISO)
endAt string No None Date range end (ISO)

Notes:

  • sortField options: createdAt, updatedAt, id, type, url, status, error, tokenUsage, runtime
  • sortOrder options: ASC, DESC
  • dateRangeColumn options: createdAt, updatedAt

Request example:

curl -X GET "https://api.app.mrscraper.com/api/v1/results?sortField=updatedAt&sortOrder=DESC&pageSize=10&page=1" \
  -H "accept: application/json" \
  -H "x-api-token: \x3CMRSCRAPER_API_TOKEN>"

Response example:

{
  "message": "Successful fetch",
  "data": [
    {
      "createdAt": "2025-11-11T09:50:09.722Z",
      "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
      "userId": "2c4a230c-5085-4924-a3e1-25fb4fc5965b",
      "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
      "type": "AI",
      "url": "http://example.com",
      "status": "Finished",
      "error": "string",
      "tokenUsage": 5,
      "runtime": 0,
      "data": "{ \"title\": \"Product A\", \"price\": \"$10\" }",
      "htmlPath": "string",
      "recordingPath": "string",
      "screenshotPath": "string",
      "dataPath": "string"
    }
  ],
  "meta": {
    "page": 1,
    "pageSize": 10,
    "total": 1,
    "totalPage": 1
  }
}

8. Fetch Detailed Result by ID

  • Method: GET
  • Host: https://api.app.mrscraper.com
  • Path: /api/v1/results/{id}
  • Auth: x-api-token

Returns one detailed result object for a specific result ID.

Query parameters:

Field Type Required Default Description
id string Yes Result ID

Request example:

curl -X GET "https://api.app.mrscraper.com/api/v1/results/497f6eca-6276-4993-bfeb-53cbbbba6f08" \
  -H "accept: application/json" \
  -H "x-api-token: \x3CMRSCRAPER_API_TOKEN>"

Response example:

{
  "message": "Successful fetch",
  "data": [
    {
      "createdAt": "2025-11-11T09:50:09.722Z",
      "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
      "userId": "2c4a230c-5085-4924-a3e1-25fb4fc5965b",
      "scraperId": "6695bf87-aaa6-46b0-b1ee-88586b222b0b",
      "type": "AI",
      "url": "http://example.com",
      "status": "Finished",
      "error": "string",
      "tokenUsage": 5,
      "runtime": 0,
      "data": "string",
      "htmlPath": "string",
      "recordingPath": "string",
      "screenshotPath": "string",
      "dataPath": "string"
    }
  ]
}

Errors

Standard platform API errors:

Status Meaning
400 Invalid request payload
401 Missing or invalid API token
404 Scraper or result not found
429 Rate limit exceeded
500 Internal scraper error

Error format:

{
  "message": "string",
  "error": "string",
  "statusCode": "number"
}

Operating Rules

  • Validate required fields before every call.
  • Use pagination for large result sets.
  • Retry on 429 with exponential backoff.
  • Never expose credentials in outputs.
安全使用建议
This skill appears to do what it says: it calls MrScraper APIs and only needs an API token. Before installing, consider the following: (1) Tokens in URL query parameters are easy to leak — prefer using server-side requests and avoid putting the token in logs or including unblocker URLs in content that may be shared; (2) 'Unblocker' and IP-rotation features can enable evasion of site protections — ensure your use complies with target sites' terms of service and the law; (3) Follow least-privilege: create tokens with limited scope and expiration, store them in a secret manager, and rotate regularly; (4) Verify the vendor (mrscraper.com) and support email before trusting a production token; (5) If you need stronger assurance, request the vendor's API documentation or an allowlist of exact endpoints and consider reviewing network traffic in a controlled environment.
功能分析
Type: OpenClaw Skill Name: mrscraper Version: 1.0.4 The skill bundle is benign. It defines an API-only web scraping skill for the MrScraper service, explicitly listing allowed network hosts and requiring an API token from environment variables. The `SKILL.md` provides clear instructions and security best practices for token handling, and explicitly states that no local scripts or persistence mechanisms are involved. There are no prompt injection attempts against the AI agent, nor any indicators of intentional malicious behavior such as data exfiltration to unauthorized endpoints, backdoors, or obfuscation. Potential vulnerabilities related to user-supplied input (e.g., arbitrary JavaScript injection via the `workflow` parameter or ReDoS via regex patterns) are issues for the AI agent's input sanitization or the MrScraper platform itself, not evidence of malice within this skill definition.
能力评估
Purpose & Capability
Name, description, declared network hosts, and required environment variable (MRSCRAPER_API_TOKEN) align with an API-only scraping service. The skill is instruction-only and does not request unrelated credentials or system access.
Instruction Scope
SKILL.md instructs the agent to perform direct HTTPS requests to MrScraper hosts and to use the API token. It does not direct the agent to read local files or other environment variables. However, the unblocker endpoint requires the token as a URL query parameter — this is an insecure pattern (tokens in URLs can be logged, cached, leaked via referrers) and the doc itself warns about token exposure.
Install Mechanism
No install spec and no bundled code — instruction-only skill. This minimizes on-disk footprint and is proportional to an API-only integration.
Credentials
Only a single API token (MRSCRAPER_API_TOKEN) is required and declared as the primary credential, which is appropriate for a hosted scraping API. There are no additional unrelated secrets requested.
Persistence & Privilege
The skill does not request always:true, does not declare background jobs or hidden persistence, and does not attempt to modify other skill/system settings. Autonomous invocation is allowed (platform default) but not combined with elevated persistence.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install mrscraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /mrscraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.4
- Added Bulk Rerun Manual Scraper
v1.0.3
- Added vendor info: homepage, vendor name, and support email now included in metadata. - No functional or API changes; documentation updated only.
v1.0.2
- Updated environment variable and network configuration metadata for improved platform compatibility. - Required environment variable and allowed hosts now use more structured formats. - Minor internal metadata additions; no changes to user-visible features or API usage.
v1.0.1
- Added required environment variable documentation for `MRSCRAPER_API_TOKEN`, including secret handling guidelines. - Declared `MRSCRAPER_API_TOKEN` as the primary credential used for both Unblocker and Platform API authentication. - Introduced an explicit `network` policy listing allowed API hosts. - No functional or endpoint changes. Documentation and configuration structure enhanced for clarity and secure usage.
v1.0.0
Initial release of the mrscraper skill. - Enables AI-powered, unblockable web scraping and data extraction with natural language via the MrScraper API. - Supports scraping blocked pages using stealth browser and IP rotation. - Allows creating and rerunning AI scraper configurations on single or multiple URLs. - Provides detailed, paginated results and advanced workflow options. - Secure API authentication required; no local scripts or installation needed. - Data only sent to official MrScraper endpoints.
元数据
Slug mrscraper
版本 1.0.4
许可证
累计安装 0
当前安装数 0
历史版本数 5
常见问题

MrScraper 是什么?

Run AI-powered, unblockable web scraping, data extraction with natural language via the MrScraper API. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 647 次。

如何安装 MrScraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install mrscraper」即可一键安装,无需额外配置。

MrScraper 是免费的吗?

是的,MrScraper 完全免费(开源免费),可自由下载、安装和使用。

MrScraper 支持哪些平台?

MrScraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 MrScraper?

由 MrScraper(@ai-mrscraper)开发并维护,当前版本 v1.0.4。

💬 留言讨论