/install datacrawl-debug
DataProcess Debug — 数据处理全流程工具
处理得了·修得好·洗得净·跑得稳
核心定位
数据处理的"急诊室+健身房"——出了问题来急诊(DebugRunner),日常训练来健身(IterateOptimizer),全程配营养师(DataCleaner)。
5大核心模块
1. ProcessEngine — 数据处理配置生成 + 结果解析
scripts/process-engine.py config --url URL --fields 字段1 字段2 --mode static|dynamic|api
scripts/process-engine.py extract --html "HTML内容" --fields 字段1 字段2
- 站点类型自动识别(电商/B2B/社媒/内容/政府/开发者)
- 3种模式工具推荐 + CSS/XPath选择器建议
- HTML结构化提取(文本/链接/图片/表格/列表)
2. CodeGenerator — 数据处理代码自动生成
scripts/code-generator.py --name 项目名 --url URL --fields 字段1 字段2 --mode requests_bs4|playwright|api_client
- 3种模板自动选择:静态页面/动态渲染/API接口
- 生成完整可运行代码 + 依赖安装 + 使用步骤
3. DebugRunner — 代码调试与修复
scripts/debug-runner.py --error "错误信息"
- 8类错误模式库:connection/http_error/timeout/selector_error/encoding/json_parse/selenium_playwright/rate_limit
- HTTP子类型精准诊断(403限流/429限流/503服务不可用等各有方案)
- 代码片段扫描(缺异常处理/超时/延迟/UA自动检测)
4. DataCleaner — 数据清洗格式化
scripts/data-cleaner.py clean --input 数据 --remove-html --remove-duplicates
scripts/data-cleaner.py normalize --input 数据 --schema 类型定义
scripts/data-cleaner.py format --input 数据 --format json|csv|jsonl --fields 字段列表
5. IterateOptimizer — 自我迭代优化
scripts/iterate-optimizer.py analyze --input 运行历史.json
scripts/iterate-optimizer.py improve --config 当前配置 --analysis 分析结果
- 成功率趋势 / 错误聚类 / 字段覆盖率 / 优化建议
- 自动调整延迟/超时/重试/模式切换
实战案例:外贸博主数据处理
内置 scripts/trade-contact-scorer.py:
- 5维粉丝质量评分(互动率/收藏比/评论活跃/粉丝规模/外贸相关度)
- S/A/B/C/D 5级分层
- 粉丝画像推断(工厂主/跨境卖家/SOHO/公司经营者/新手)
- 批量数据处理(去重+外贸筛选+评分+画像)
常见处理问题诊断
直接请求API → 必遇限制。正确方案:
- 用Playwright打开网页版
- 手动登录后保存Cookie
- 通过搜索页面提取数据
- 用本技能的评分模型替代简单加权
使用流程
- 配置:
process-engine.py config→ 了解目标站点+推荐方案 - 生成代码:
code-generator.py→ 获得起始代码模板 - 调试: 遇错 →
debug-runner.py→ 秒级诊断 - 清洗:
data-cleaner.py→ 去重+标准化+格式化 - 迭代:
iterate-optimizer.py→ 基于运行数据持续改进
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install datacrawl-debug - After installation, invoke the skill by name or use
/datacrawl-debug - Provide required inputs per the skill's parameter spec and get structured output
What is Datacrawl Debug?
Use when user needs to process web data, debug data collection code, clean processed data, or iterate on data processing strategies. Use when generating data... It is an AI Agent Skill for Claude Code / OpenClaw, with 71 downloads so far.
How do I install Datacrawl Debug?
Run "/install datacrawl-debug" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Datacrawl Debug free?
Yes, Datacrawl Debug is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Datacrawl Debug support?
Datacrawl Debug is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Datacrawl Debug?
It is built and maintained by WangM-A3 (@wangm-a3); the current version is v1.1.0.