/install datalens-web-scraper
DataLens Scraping Skill
How Tool Calls Work
Every DataLens tool is invoked by running a terminal command. No MCP client configuration is required.
The datalens-mcp-call binary handles the MCP stdio handshake and returns the tool result as YAML/JSON to stdout.
run_in_terminal: datalens-mcp-call \x3Ctool_name> '\x3Cargs_json>'
If datalens-mcp-call is not on PATH (e.g. not globally installed), use npx:
run_in_terminal: npx datalens-mcp-call \x3Ctool_name> '\x3Cargs_json>'
Prerequisites
datalens-mcp-servernpm package installed:npm install -g datalens-mcp-server(or usenpx).- DataLens Chrome extension installed and active in Chrome.
- Chrome open with the target page loaded (or provide
urlin the tool args — the extension will open it). - Node.js ≥ 18 available in the terminal.
How This Works
datalens-mcp-call spawns the DataLens MCP proxy as a child process, performs the MCP initialization handshake over stdio, calls the requested tool, and prints the result.
AI Agent
↓ run_in_terminal
datalens-mcp-call \x3Ctool> \x3Cargs>
↓ stdio JSON-RPC
DataLens MCP Proxy (datalens-mcp-proxy)
↓ WebSocket (localhost:17373)
Chrome Extension
↓
Browser Tab
Standard Scraping Workflow
Follow these steps in order. Do not skip steps or call scrape_start before scrape_analyze_columns completes.
Step 1 — Detect tables
datalens-mcp-call scrape_detect_tables '{"url":"https://example.com","prompt":"article list"}'
Returns a list of detected table structures with rootSelector, itemSelector, documentInfoPath. Pick the best matching table and copy those three values for subsequent steps.
If the page requires login, ask the user to log in in Chrome first, then re-run this command.
Step 2 (optional) — Inspect tree for expand buttons
datalens-mcp-call scrape_get_table_tree '{"rootSelector":"\x3Cfrom step 1>","itemSelector":"\x3Cfrom step 1>","documentInfoPath":"\x3Cfrom step 1>"}'
Use when the data has nested replies, collapsed rows, or "load more" buttons. Inspect the _uid-annotated tree in the output to identify expand button UIDs.
Step 2b (optional) — Expand and re-detect
datalens-mcp-call scrape_click_expand_and_redetect '{"rootSelector":"...","itemSelector":"...","documentInfoPath":"...","expandButtonUids":[{"type":"reply","uids":["uid1","uid2"]}]}'
The extension clicks the buttons, waits for new content, then re-detects. Use the updated rootSelector/itemSelector/documentInfoPath from this output in Step 3.
Step 3 — Analyze columns
datalens-mcp-call scrape_analyze_columns '{"rootSelector":"...","itemSelector":"...","documentInfoPath":"...","url":"https://example.com","prompt":"article list"}'
Calls the backend AI to identify fields, data types, and pagination. Returns a scraperConfig and jobDraft. Confirm the field list looks correct before proceeding.
Step 4 — Start scraping
# Pass the jobDraft object returned by scrape_analyze_columns
datalens-mcp-call scrape_start '{"jobDraft":\x3Cpaste jobDraft here>,"maxRecords":10}'
Returns a jobId. Use maxRecords: 10 for a preview run first.
Step 5 — Poll for status
datalens-mcp-call scrape_status '{"jobId":"\x3CjobId>","waitMs":3000}'
Re-run until status is COMPLETED, FAILED, or STOPPED.
Key status fields:
status:QUEUED→PREPARING→RUNNING→COMPLETED/FAILED/STOPPEDscrapedCount: rows collected so farerror: present only on failure
Step 6 — Retrieve results
Save to file (recommended for large results):
datalens-mcp-call scrape_export_to_file '{"jobId":"\x3CjobId>","outputDir":"/tmp/datalens","format":"json"}'
Returns the saved file path.
Inline preview (small result sets):
datalens-mcp-call scrape_result '{"jobId":"\x3CjobId>","limit":50}'
Use the cursor field from each response to fetch the next page.
In-memory export:
datalens-mcp-call scrape_export '{"jobId":"\x3CjobId>","format":"csv"}'
Returns base64-encoded file content.
Job Control
datalens-mcp-call scrape_pause '{"jobId":"\x3CjobId>"}'
datalens-mcp-call scrape_resume '{"jobId":"\x3CjobId>"}'
datalens-mcp-call scrape_stop '{"jobId":"\x3CjobId>"}'
Browser Tab Management
datalens-mcp-call browser_list_tabs
datalens-mcp-call browser_open_tab '{"url":"https://example.com"}'
datalens-mcp-call browser_use_tab '{"tabId":123}'
datalens-mcp-call browser_close_tab '{"tabId":123}'
Tab management is usually not needed — scrape_detect_tables with a url arg handles tab opening automatically.
Agent Decision Rules
- Never call
scrape_startwithout ajobDraftorscraperConfigfrom a priorscrape_analyze_columnsresponse. Fabricating a scraperConfig will produce wrong results. - Never skip
scrape_analyze_columnsand jump straight toscrape_start. The analyze step is required to build the config. - If
scrape_detect_tablesreturns an empty list, the page may need login or may be dynamically loaded. Ask the user to open the target URL in Chrome and scroll to load content, then retry. - If
scrape_statusstays atQUEUEDfor more than 30 seconds, check that the Chrome extension is active and that a tab for the target URL is open. - Use
maxRecords: 10for a preview scrape to confirm the config is correct before running a full job. - Default export format is JSON. Use CSV or XLSX when the user asks for spreadsheet output.
End-to-End Example: Scrape Toutiao Headlines
# 1. Detect tables on the homepage
datalens-mcp-call scrape_detect_tables '{"url":"https://www.toutiao.com/?is_new_connect=0&is_new_user=0","prompt":"article list"}'
# 2. Analyze columns (fill in selectors from step 1 output)
datalens-mcp-call scrape_analyze_columns '{"rootSelector":"\x3Cfrom step 1>","itemSelector":"\x3Cfrom step 1>","documentInfoPath":"\x3Cfrom step 1>","url":"https://www.toutiao.com/?is_new_connect=0&is_new_user=0","prompt":"article list"}'
# 3. Preview run — first 10 rows (paste the full jobDraft JSON object from step 2)
datalens-mcp-call scrape_start '{"jobDraft":\x3Cpaste jobDraft>,"maxRecords":10}'
# 4. Poll until status is COMPLETED
datalens-mcp-call scrape_status '{"jobId":"\x3CjobId>","waitMs":3000}'
# 5. Save results to file
datalens-mcp-call scrape_export_to_file '{"jobId":"\x3CjobId>","outputDir":"/tmp/datalens","format":"json"}'
Set DATALENS_TIMEOUT=180000 before running if a tool call takes longer than the default 120 s:
DATALENS_TIMEOUT=180000 datalens-mcp-call scrape_analyze_columns '...'
Debug Tools
These are for troubleshooting only. Do not use in normal scraping workflows.
datalens-mcp-call debug_get_logs '{"levels":["error"]}'
datalens-mcp-call debug_clear_logs '{}'
datalens-mcp-call debug_export_logs_to_file '{"outputDir":"/tmp/datalens"}'
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install datalens-web-scraper - 安装完成后,直接呼叫该 Skill 的名称或使用
/datalens-web-scraper触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
web-scraper 是什么?
Use DataLens MCP tools to scrape structured data from any website open in Chrome. Triggers when the user wants to extract lists, tables, comments, products,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 68 次。
如何安装 web-scraper?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install datalens-web-scraper」即可一键安装,无需额外配置。
web-scraper 是免费的吗?
是的,web-scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
web-scraper 支持哪些平台?
web-scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 web-scraper?
由 weird94(@weird94)开发并维护,当前版本 v1.0.0。