← 返回 Skills 市场
Playwright Ocr
作者
cgxxxxxxxxxxxx
· GitHub ↗
· v3.0.0
· MIT-0
80
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install playwright-ocr
功能描述
Automated web data extraction using Playwright for browser automation and OCR for text recognition. Use when you need to extract data from dynamic web pages,...
安全使用建议
This skill appears to implement Playwright-based screenshots + OCR and largely does what it claims, but there are red flags to check before running: 1) Treat the token in config.example.json as suspicious — do NOT assume it's a harmless placeholder. Remove it or replace it with your own secrets stored securely. 2) SKILL.md mentions upload_csv.py and an upload step (Feishu) but the upload script is missing; search the repo or contact the author before enabling any upload. 3) The scripts use hard-coded absolute paths (/root/.openclaw/...) — change OUTPUT_DIR or run in an isolated container/VM so it cannot overwrite important files. 4) The run_pipeline.py uses subprocess.run with shell=True; while normal here, avoid exposing this to untrusted input. 5) Install the declared dependencies (playwright, tesseract, pillow, paddleocr) from official sources and verify you understand what network endpoints the scraping will visit (default TARGET_URL points at openrouter.ai). 6) If you plan to use Feishu or cloud OCR, provision credentials properly and avoid committing them to any repo. If you need higher confidence, request: (a) the missing upload_csv.py or confirmation it was intentionally omitted, (b) clarification whether the token in config.example.json is a placeholder, and (c) an updated SKILL.md or config that uses relative or configurable paths.
能力评估
Purpose & Capability
The skill's code (Node Playwright scripts + Python OCR) matches the description of browser automation + OCR and legitimately needs node and python3. However the included config.example.json contains a Feishu app_token and table_id (feishu_upload.enabled: true) even though no upload_csv.py or Feishu upload implementation is present in the repository; that token in an example file is disproportionate to the present code and unexpected.
Instruction Scope
SKILL.md instructs running the provided scripts and references an upload_csv.py in the architecture, but upload_csv.py is not present in the file manifest. The README and scripts use absolute paths under /root/.openclaw/workspace/skills/playwright_ocr and suggest adding cron jobs — these are reasonable for a pipeline but the hard-coded paths reduce portability and could cause accidental writes in privileged directories. The instructions also reference optional cloud OCR and Feishu upload but there is no implementation for Feishu upload in the code bundle, which is an inconsistency to investigate.
Install Mechanism
This is an instruction-only skill (no install spec). That is low risk compared to arbitrary remote downloads. The SKILL.md lists npm/pip packages required (playwright, pytesseract, pillow, paddleocr) which align with the code; however the skill does not provide an automated install step — users must install these themselves.
Credentials
The skill declares no required environment variables, which is consistent with the code using environment fallbacks. However config.example.json includes a Feishu app_token value which looks like a real credential. Example files should not contain valid tokens; this is disproportionate and risky because it may leak credentials if real. The code otherwise only reads TARGET_URL and OUTPUT_DIR from env, which is proportional.
Persistence & Privilege
The skill is not always-enabled and is user-invocable. It does not request persistent platform privileges or modify other skills. It writes output files to a workspace output directory (hard-coded defaults under /root), so run it in an isolated environment or change OUTPUT_DIR if you want to avoid writing to system or privileged directories.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install playwright-ocr - 安装完成后,直接呼叫该 Skill 的名称或使用
/playwright-ocr触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v3.0.0
**v3.0.0 Summary:**
Major cleanup—output data and report files removed.
- Removed all files in the output and output_batch_* directories.
- All previously extracted CSV, JSON, and report files are no longer included in the repository.
- No functional or configuration changes to the core skill.
v2.0.0
Playwright_OCR v2.0.0 introduces major enhancements for automated web data extraction and OCR:
- Added batch processing support with parallelization and deduplication for faster and more efficient OCR workflows.
- Introduced data validation features, including confidence checks, threshold filtering, review queue generation, and integrity checks.
- Implemented robust error recovery: resume from breakpoints, automatic retries, detailed logging, and persistent state saving.
- Integrated PaddleOCR for improved multi-language (Chinese/English) recognition and automatic OCR engine selection.
- Achieved significant performance gains: 3–5x faster batch processing and over 95% accuracy for Chinese text recognition.
元数据
常见问题
Playwright Ocr 是什么?
Automated web data extraction using Playwright for browser automation and OCR for text recognition. Use when you need to extract data from dynamic web pages,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 80 次。
如何安装 Playwright Ocr?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install playwright-ocr」即可一键安装,无需额外配置。
Playwright Ocr 是免费的吗?
是的,Playwright Ocr 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Playwright Ocr 支持哪些平台?
Playwright Ocr 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux, darwin, win32)。
谁开发了 Playwright Ocr?
由 cgxxxxxxxxxxxx(@cgxxxxxxxxxxxx)开发并维护,当前版本 v3.0.0。
推荐 Skills