← 返回 Skills 市场
icestorms

HTML-to-Selenium 网页元素识别和selenium自动化

作者 张 庆 ( Zhang Qing ) · GitHub ↗ · v2.0.7 · MIT-0
cross-platform ⚠ suspicious
156
总下载
0
收藏
0
当前安装
9
版本数
在 OpenClaw 中安装
/install html-element-to-selenium-automation
功能描述
EN: Convert any webpage into a runnable Python Selenium automation script. Triggers: "analyze page", "generate selenium", "web automation", "help me do xxx"....
使用说明 (SKILL.md)

html-to-selenium

概述

接收 URL → 通过 Playwright 获取截图 + HTML DOM → AI 分析页面类型与结构 → 输出:

  1. 页面结构描述 + 操作逻辑
  2. 关键信息提取 (表单 action、API endpoint、session 状态)
  3. 完整可运行的 Python Selenium 代码
  4. 操作步骤示例 (每个有效操作的定位方式 + Selenium 方法 + 执行结果)

工作流程

用户提供 URL
      │
      ▼
  Step 1: 抓取页面
  ┌──────────────────────────────────────┐
  │  fetch_page.py                       │
  │  - Playwright 打开 URL               │
  │  - 等待页面渲染完成                   │
  │  - 全页截图 + 完整 HTML DOM           │
  │  - 输出: screenshot.png, html.html    │
  └──────────────┬───────────────────────┘
                 │
                 ▼
  Step 2: AI 判断页面类型
  ┌──────────────────────────────────────┐
  │  输入: 截图 + HTML + URL + title     │
  │                                      │
  │  A. 目标页 = 登录页                   │
  │     → 直接分析登录表单结构             │
  │     (作为分析任务本身, 不触发登录)      │
  │                                      │
  │  B. 中途被重定向到登录页               │
  │     → AI 输出: 登录逻辑 + 凭据需求     │
  │     → 等待用户提供凭据                 │
  │     → 执行登录 → 返回目标页            │
  │     → 继续分析                        │
  │                                      │
  │  C. 公开页面 (无需登录)               │
  │     → 直接进入分析流程                 │
  └──────────────┬───────────────────────┘
                 │
                 ▼
  Step 3: 生成输出
  ┌──────────────────────────────────────┐
  │  页面结构描述                          │
  │  操作逻辑 (点击链、表单提交、AJAX)      │
  │  有用信息 (API、URL 参数、session)     │
  │  Python Selenium 代码                 │
  │  操作步骤示例 (每个有效操作详细记录)    │
  └──────────────────────────────────────┘

Step 1 — 抓取页面

python scripts/fetch_page.py \x3Curl> --output \x3C目录> --wait \x3C秒>
  • --output: 保存目录, 默认当前目录. 建议用有意义的名字如 temp/example_com
  • --wait: 页面加载后等待秒数, 默认 3. 动态页面可调大 (5-8)
  • --login / -l: 检测到登录页时尝试自动登录
  • --username / -u + --password / -p: 登录凭据 (如需要)

输出文件:

  • screenshot.png — 全页截图
  • screenshot_viewport.png — 视口截图
  • html.html — 渲染后的完整 HTML DOM
  • meta.json — 页面元信息 (URL、title、元素计数、page_type)

重要: 必须等页面完全渲染后再抓取 HTML (wait_until="networkidle" + 额外等待), 否则拿到的可能是空的半加载状态.

Step 2 — AI 页面类型判断

接收以下信息, 综合判断:

输入 用途
screenshot.png 视觉判断: 纯登录页 / 登录弹窗覆盖 / 正常内容页
html.html DOM 判断: 表单数量、input 类型、class/id 特征
meta.json URL 变化: 访问前后 URL 是否一致
title 辅助判断: 页面标题关键词

页面类型判断标准

A. 目标页 = 登录页 (page_type = "login_required", 且 URL 本身含 login/signin/auth)

  • 处理: 把登录表单本身作为分析对象, 不触发自动登录
  • 分析重点: 用户名/密码字段特征、记住登录、多因素认证、表单 action

B. 中途被重定向到登录页 (page_type = "login_required", 但 URL 不是登录页)

  • 处理: AI 输出"需要登录"提示 + 需要的凭据说明, 等待用户或上级 Agent 提供
  • 提供凭据后: 读取 selenium-patterns.md 执行真实点击登录 → 等待回到目标页 → 继续

C. 公开页面 (page_type = "public")

  • 处理: 直接进入分析流程

登录流程 (情况 B)

当需要登录时, 使用 selenium-patterns.md 中的规范:

# 填入用户名
username = driver.find_element(By.NAME, "username")  # 或其他定位
username.click()
username.send_keys(Keys.CONTROL + "a")
username.send_keys(Keys.DELETE)
username.send_keys(credentials["username"])

# 填入密码
password = driver.find_element(By.NAME, "password")
password.click()
password.send_keys(Keys.CONTROL + "a")
password.send_keys(Keys.DELETE)
password.send_keys(credentials["password"])

# 点击登录 (真实用户点击, 禁止 JS click)
submit = WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.XPATH, "//button[@type='submit']"))
)
ActionChains(driver).move_to_element(submit).click().perform()

# 等待回到目标页
WebDriverWait(driver, 15).until(
    lambda d: "/login" not in d.current_url
)

Step 3 — 输出内容

3.1 页面结构描述

用自然语言描述页面组成:

  • 语义分区 (顶部导航 / 侧边栏 / 主内容区 / 底部)
  • 关键元素位置与作用
  • 表单结构 (有哪些字段、必填项)

3.2 操作逻辑

描述操作流程:

  • 点击链: 哪个按钮 → 哪个页面/弹窗
  • 表单提交流程: 哪个按钮提交, 如何验证
  • AJAX/动态行为: 哪些操作触发异步请求

3.3 关键信息提取

# 示例
INFO = {
    "表单 action": "https://example.com/api/submit",
    "API endpoint": "https://example.com/api/users/search",
    # Session 状态: 仅当页面 DOM 显式暴露时提取,fetch_page.py 不主动提取或传输
    "Session Token": "xxx (DOM 可见时提取, 非自动行为)",
    "关键元素": {
        "用户名": "#username (id)",
        "密码": "#password (id)",
        "提交": "button[type=submit] (xpath text='登录')",
    },
    "URL 参数": "?redirect=/dashboard",
}

3.4 Python Selenium 代码

必须遵循 references/selenium-patterns.md 中的规范, 重点:

  • 真实用户点击: ActionChains.move_to_element().click(), 禁止 execute_script("click")
  • 显式等待: WebDriverWait + expected_conditions, 禁止 time.sleep() 作为主要方式
  • 安全输入: click()Ctrl+ADeletesend_keys()
  • 截图对比: 关键步骤前后截图保存

3.5 操作步骤示例 (必须输出)

每个有效操作必须记录为以下格式的步骤, 复杂操作拆分为 1、2、3...:

# ✅ 操作1: [操作名称]
# 定位: [定位方式] → [具体选择器]
# 方法: [Selenium 调用链]
# 结果: [执行后的页面状态描述]

# 示例:
# ✅ 操作1: 点击左侧导航"用户管理"
# 定位: CSS selector → ".sidebar .nav-item[data-tab='users']"
# 方法: ActionChains.move_to_element().click()
# 结果: 页面加载用户管理视图, URL 变为 /admin/users

# ✅ 操作2: 在搜索框输入关键词
# 定位: XPath → "//input[@placeholder='搜索...']"
# 2.1: 点击聚焦 → click()
# 2.2: 清空内容 → send_keys(Keys.CONTROL+"a") → send_keys(Keys.DELETE)
# 2.3: 输入文本 → send_keys("admin")
# 结果: 输入框显示"admin", 触发前端搜索过滤

# ✅ 操作3: 点击搜索结果中的"编辑"按钮
# 定位: partial link text → "//button[contains(text(),'编辑')]"
# 方法: ActionChains.move_to_element().click()
# 结果: 弹出编辑弹窗, 截图已保存

快速参考

触发方式

用户说这些关键词时加载本 skill:

  • "分析页面"
  • "生成 selenium"
  • "网页自动化"
  • "帮我操作 xxx 页面"
  • "帮我完成 xxx"
  • "做个 xxx 的自动化"
  • 提供 URL 并要求生成自动化代码

文件路径 (skill 内部)

文件 用途 何时读取
scripts/fetch_page.py 页面抓取 (截图 + HTML) Step 1
references/selenium-patterns.md Selenium 规范与代码片段 生成代码时
references/examples.md 操作步骤示例模板 生成示例时
SKILL.md 主工作流说明 触发时

快速命令模板

# 抓取公开页面
python scripts/fetch_page.py https://example.com/page --output temp/example

# 抓取并自动登录 (如遇登录拦截)
python scripts/fetch_page.py https://example.com/protected \
    --output temp/example \
    --login \
    --username "[email protected]" \
    --password "password123"

重要限制

  • 凭据不在对话中硬编码. 情况 B 登录时, 凭据必须由用户或上层 Agent 在对话中提供
  • 不处理复杂验证码 (滑块/点选/文字). 如遇验证码, 输出提示并等待人工介入
  • 不处理 MFA (短信/APP 验证码). 同样输出提示等待介入
安全使用建议
This skill appears to do what it says (capture a page and generate Selenium scripts), but there are a few things to check before installing or running it: - Metadata mismatch: the registry claims no env vars required, but the skill and its scripts read ROUTER_USERNAME / ROUTER_PASSWORD (and aliases). Treat that as a warning — verify which env vars the deployed skill will actually read. - Credentials: do not provide high-privilege or production credentials. If you must test auto-login, use a throwaway/test account. Prefer passing credentials via CLI on a trusted machine rather than putting them in long-lived environment variables. - Sensitive page content: the skill fetches full rendered HTML and screenshots; those can contain tokens, CSRF values, or other secrets. Review outputs (html.html, meta.json, screenshots) in a safe environment and avoid processing sensitive internal pages unless you trust the environment. - Installation: the skill expects Playwright + Selenium; installing Playwright will download browser binaries. Run installation in an isolated virtualenv/container. - Run-time behavior: disable automatic login (do not pass --login) unless you explicitly want it and understand where credentials come from. Inspect generated Selenium scripts before executing them. - Clarify with the skill author or vendor: ask why registry metadata omits the env vars and confirm how credentials are handled/retained/logged. If any of these concerns are unacceptable (unexpected env reads, automatic credential usage, or running against sensitive sites), do not enable the skill or run it only in an isolated/test environment.
功能分析
Type: OpenClaw Skill Name: html-element-to-selenium-automation Version: 2.0.7 The skill bundle provides powerful web automation and analysis capabilities, including automated login and the extraction of session tokens from the DOM. It handles sensitive credentials through environment variables (e.g., ROUTER_USERNAME, ROUTER_PASSWORD) and command-line arguments in `scripts/fetch_page.py`. While these features are aligned with the stated purpose of generating Selenium scripts and analyzing page structures, the automated handling of credentials and instructions in `SKILL.md` to extract session data represent significant security risks if used on untrusted or sensitive pages. No clear evidence of intentional data exfiltration to third-party domains was found, but the high-privilege nature of the automation warrants caution.
能力评估
Purpose & Capability
The skill's stated purpose (fetch a URL with Playwright, analyze DOM, and generate Selenium code) is coherent with the included code and docs. However the registry metadata at the top claims 'Required env vars: none' / 'Primary credential: none' while SKILL.md, MANIFEST.md and fetch_page.py explicitly reference environment variables (ROUTER_USERNAME / ROUTER_PASSWORD / ROUTER_USER / ROUTER_PASS) and support auto-login. That metadata mismatch is an incoherence that can mislead users about what secrets the skill will access.
Instruction Scope
Runtime instructions direct Playwright to capture full page HTML and screenshots and instruct the AI to analyze DOM and extract 'key info' including 'Session Token' if visible. The skill also supports automated login flows (using credentials from CLI, env, or chat) and will attempt to click/submit forms. Those behaviors are consistent with the purpose but broaden the data the agent will touch (rendered HTML, possibly cookies/embedded tokens). The SKILL.md claims 'fetch_page.py does not actively extract or transmit cookies/session' but the AI analysis is allowed to extract data present in the DOM — this is a sensitive-data handling area and the instructions give the agent discretion to extract credentials-like values from page content.
Install Mechanism
No install spec in the registry (skill is instruction-only) and all code is bundled locally (no remote downloads). The documentation instructs users to 'pip install playwright selenium' and run 'playwright install chromium' — installing Playwright will download browser binaries. There are no obfuscated or remote download URLs in the bundle, but the skill will require installing packages that themselves download runtime artifacts (Playwright browsers).
Credentials
The code and SKILL.md accept credentials via CLI, chat, or environment variables (ROUTER_USERNAME/ROUTER_PASSWORD and aliases). These env vars are reasonable for an auto-login feature, but the registry claims none are required — a mismatch. The skill's credential-handling behavior is under-specified: it may read env vars automatically (fetch_page.py does so) and will accept credentials provided in chat, which increases risk of accidental exposure of real credentials. The number and naming of env vars is modest and relevant to the stated function, but the metadata inconsistency is a red flag.
Persistence & Privilege
The skill does not request 'always: true' and does not modify other skills or system-wide settings. It runs locally (Playwright/Selenium) and reads environment variables and writes files (screenshots, html.html, meta.json) into the chosen output directory — these are appropriate for its purpose and properly scoped.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install html-element-to-selenium-automation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /html-element-to-selenium-automation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.7
Version 2.0.7 - Added initial documentation, reference, and script files for the skill. - Introduced `fetch_page.py` for Playwright-based page fetching (screenshot + HTML). - Added reference guides: skill manifest, README, selenium patterns, and operation examples. - Improved modularization by separating examples and conventions into dedicated files. - No breaking changes to workflow or API; documentation focus for this update.
v2.0.6
**Changelog for html-element-to-selenium-automation v2.0.6** - Added MANIFEST.md and _meta.json for enhanced metadata and manifest support. - Updated SKILL.md: clarified session状态采集方式,明确fetch_page.py不主动提取Session,仅在DOM可见时由主流程提取。 - Minor improvements in workflow and restriction documentation. - references/selenium-patterns.md and scripts/fetch_page.py updated for consistency with the new session信息处理逻辑.
v2.0.5
**Summary:** Refined documentation, clarified workflow and usage, simplified triggers, and improved maintainability. - Updated SKILL.md to provide a more concise overview, focusing on core workflow and outputs. - Clarified page type logic and step-by-step process for page analysis and Selenium code generation. - Deprecated/removed legacy references and streamlined explanations of triggers and credential handling. - Improved formatting and included quick reference guides and command templates for easier use. - Removed legacy MANIFEST.md for a slimmer skill package.
v2.0.4
**Changelog for v2.0.4:** - SKILL.md has been significantly shortened and streamlined for clarity. - Overview and instructions are now more concise in both Chinese and English. - Step-by-step technical workflow split into brief sections, removing verbose detail. - Key usage triggers, credential handling, and security notes are preserved in a condensed format. - Maintains essential commands, limitations, and sample code while omitting intricate elaborations.
v2.0.3
**html-element-to-selenium-automation v2.0.3 Changelog** - Updated and expanded skill documentation for clarity in both English and Chinese. - Added quick installation instructions and enhanced environment variable support in SKILL.md. - Improved description of output formats and usage scenarios. - Clarified credential handling, page analysis workflow, and output requirements. - No code or logic changes; documentation only.
v2.0.2
**Version 2.0.2 Changelog** - Added `MANIFEST.md` file to declare dependencies and include security notes. - Updated SKILL.md: - Added clear multi-language (EN/中文) description and trigger phrase summary. - Documented credential handling: environment variables (`ROUTER_USERNAME`/`ROUTER_PASSWORD`) are now preferred over dialog input. - Clarified security recommendations and credential sourcing priority. - Improved file structure explanation to reflect the new manifest file. - Enhanced distinction of page type handling and login flow for clarity. - Refined quick commands and usage notes for better onboarding. - No logic or code changes to core functionality.
v2.0.1
html-element-to-selenium-automation v1.0.5 - Major update: Overhauled documentation to focus on a streamlined workflow for automatic HTML-to-Selenium script generation, including detailed step breakdowns and actionable examples. - Added quick reference files: `references/examples.md` (operation step templates) and `references/selenium-patterns.md` (Selenium best practices/code patterns). - Removed outdated `analysis_report.md` file to reduce redundancy. - Clarified login flow handling and emphasized safe code patterns (real user clicks, explicit waits, credential input, screenshot before/after). - Clearly described output requirements: structure description, operation logic, key info extraction, executable Selenium code, and step-by-step operation records.
v1.0.4
Version 1.0.
v1.0.3
- Updated skill name and description to "page-analyzer" with clearer use cases and detailed security warnings. - Added bilingual (English/Chinese) documentation for broader accessibility. - Clarified workflow steps for webpage analysis and Selenium script generation. - Outlined recommended output format, including analysis overview, element table, full code, and notes. - Emphasized security, authorization, and dependency requirements.
元数据
Slug html-element-to-selenium-automation
版本 2.0.7
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 9
常见问题

HTML-to-Selenium 网页元素识别和selenium自动化 是什么?

EN: Convert any webpage into a runnable Python Selenium automation script. Triggers: "analyze page", "generate selenium", "web automation", "help me do xxx".... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 156 次。

如何安装 HTML-to-Selenium 网页元素识别和selenium自动化?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install html-element-to-selenium-automation」即可一键安装,无需额外配置。

HTML-to-Selenium 网页元素识别和selenium自动化 是免费的吗?

是的,HTML-to-Selenium 网页元素识别和selenium自动化 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

HTML-to-Selenium 网页元素识别和selenium自动化 支持哪些平台?

HTML-to-Selenium 网页元素识别和selenium自动化 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 HTML-to-Selenium 网页元素识别和selenium自动化?

由 张 庆 ( Zhang Qing )(@icestorms)开发并维护,当前版本 v2.0.7。

💬 留言讨论