Description

Data Science CV Repro Lab is a public ClawHub CV repro-lab skill. Use it when the user says "cv repro lab", "computer vision reproducibility", "CV experiment...

README (SKILL.md)

Data Science CV Repro Lab

Name: Data Science CV Repro Lab
Author: zack-dev-cm

Search intent: cv repro lab, computer vision reproducibility, cv experiment evidence, colab kaggle cv workflow

Goal

Turn CV work into a reproducible decision loop:

fixed inputs
explicit metrics
durable artifacts
bounded browser automation
long-run health monitoring
promotion only on verified benchmark wins

This skill is the execution and evidence layer, not the full score-improvement stack. If the real ask is "beat the baseline", "escape a plateau", or "find a better recipe", pair it with sota-agent and a fixed improvement harness before spending more compute.

Use This Skill When

the user asks to debug CV training, segmentation, detection, or runtime behavior
the workflow includes OpenClaw, Colab, Kaggle, or browser-only notebook actions
you need preprocessing, augmentation, or label-alignment review
the task requires checkpoint comparisons, export comparisons, or promotion gating
the user wants VM or GPU watchdog logic, heartbeat files, or auto-stop behavior
the user wants a general third-party CV workflow, not only repo-specific advice

If the primary goal is benchmark improvement rather than clean execution, route the search loop through sota-agent and use this skill as the execution lane.

Quick Start

Lock the objective before touching code.
- Write the product problem in one sentence.
- Name the primary metric.
- Name the non-regression surfaces.
- State what blocks promotion.
Initialize the durable records immediately.
- Use python3 {baseDir}/scripts/init_cv_dataset_manifest.py --out \x3Cjson> --dataset-id \x3Cid>.
- Use python3 {baseDir}/scripts/init_cv_run_card.py --out \x3Cjson> --candidate-id \x3Cid> --task-id \x3Ctask> --baseline-id \x3Cbaseline>.
- If the task is plateau recovery or benchmark improvement, use python3 {baseDir}/scripts/init_cv_improvement_harness.py --out \x3Cjson> --task-id \x3Ctask> --candidate-family \x3Cfamily>.
- If the workflow mixes runtime sweeps, QA runs, benchmark panels, or synced VM artifacts, use python3 {baseDir}/scripts/init_cv_review_dashboard_manifest.py --out \x3Cjson> --dashboard-id \x3Cid> --title \x3Ctitle>.
- If a browser lane matters, use python3 {baseDir}/scripts/init_cv_browser_run_card.py --out \x3Cjson> --target-url \x3Curl>.
- If browser-visible overlays or prompt variants are part of the hypothesis, use python3 {baseDir}/scripts/init_cv_validation_scorecard.py --out \x3Cjson> --scorecard-id \x3Cid> --surface \x3Csurface>.
- If a long VM run is involved, use python3 {baseDir}/scripts/init_cv_vm_bootstrap_manifest.py --out \x3Cjson> --output-root \x3Crun_root> --model-family \x3Cname> --command python train.py --epochs 40.
Capture the current state immediately.
- Use python3 {baseDir}/scripts/capture_cv_run_context.py --repo-root \x3Crepo> --out \x3Cjson> --markdown-out \x3Cmd> --param key=value.
- Record git state, module versions, GPU state, and experiment params before launch.
- Use the dataset, artifact, and browser manifest helpers for any additional evidence instead of broad host inspection.
Pick the right orchestration lane.
- Local debug lane: tiny overfit, transform audits, shape and dtype checks.
- Browser notebook lane: Colab or Kaggle steps that must happen in a real browser or notebook UI.
- Colab GPU lane: runtime selection, smoke validation, artifact export, and browser evidence.
- Custom VM or cluster lane: long runs with heartbeats, watchdogs, stall detection, sync, and auto-stop.
- Review dashboard lane: one local or synced surface for runtime sweeps, QA runs, curated comparisons, and benchmark panels.
- Promotion lane: fixed benchmark matrix plus customer-facing surface checks.
Work the debug ladder in order.
- Harness review: fixed split, primary metric, slice table, rerun rule, and stop condition.
- Validation scorecard: browser or notebook visual QA with per-image pass or fail notes when the UI is part of the release story.
- Data audit: split integrity, label normalization, image-mask pairing, resize geometry.
- Preview audit: at least one augmentation preview and one transformed batch preview.
- Failure-set review: keep 20-50 representative overlays with short notes instead of trusting one scalar metric.
- Tiny overfit: 4-16 shared samples with no_aug.
- Short resumed run: continue from the best trusted checkpoint.
- Long run: only after the short loop is healthy.
Keep agentic work bounded.
- External browser LLM output is hypothesis generation, not release evidence.
- Keep the main thread on the benchmark contract and improvement harness.
- Use bounded Codex subagents for scouting, data audits, patch proposals, and per-case review.
- For repeated case review, batch over a manifest or CSV instead of free-form chat drift.
- Browser steps must emit screenshots, machine-readable scores, and explicit success markers.
- Hard-fail on unavailable browser modes, dead CDP sessions, or ambiguous notebook state.
- Keep planner, executor, reviewer, and promoter responsibilities distinct even if one agent performs more than one role.
Promote only on full-surface wins.
- Raw checkpoint quality
- Exported or runtime quality
- User-facing render, service, or product surface
- Runtime cost or throughput if deployment matters
- Adjacent-seed or rerun stability if the claimed delta is small
- Generate a promotion bundle with python3 {baseDir}/scripts/init_cv_promotion_bundle.py --out \x3Cjson> --candidate-id \x3Cid> before the final decision.

Operating Rules

Research before edits

Keep separate files or sections for research, plan, journal, and evidence.
Summaries are not evidence. Preserve the artifact paths.
If a workflow uses both code changes and browser actions, record both.

Agentic orchestration rules

Planner: defines the question, benchmark, stop condition, and chosen execution lane.
Executor: runs the browser, notebook, local, or VM steps and preserves artifacts.
Reviewer: checks whether the evidence actually answers the question and catches regressions.
Promoter: makes the final hold or promote decision from the run card, not from memory.
If one agent performs all roles, keep the outputs separated anyway.

Browser automation rules

Prefer stable URLs over uploads.
Start with a short smoke run before full training.
When the hypothesis depends on visible overlays, grids, or prompt variants, capture a validation scorecard before the long run.
Capture at least two screenshots when the browser UI is part of the validation path.
Pull artifacts back locally as files, not only screenshots.
Use explicit timeout and marker logic; do not rely on visual guesswork.
Record browser profile aliases and session aliases in durable artifacts; keep raw CDP URLs in ephemeral local debug logs only.

Colab GPU rules

Select the accelerator explicitly before running expensive cells.
Verify GPU readiness from inside the notebook before the long run.
Use a smoke cell that proves the runtime, imports, and data mounts all work.
Export all required artifacts to one stable bundle directory.
Create an artifact manifest for that export bundle before pulling it back locally.
Pull the artifact manifest plus at least one preview image back to local storage.

Custom VM and cluster rules

Create a named run root before launch.
Write a machine-readable bootstrap manifest with commit, dataset, env, and command details.
Run long jobs under a session, heartbeat, or supervisor so liveness is explicit.
Track GPU utilization, epoch movement, and log freshness.
Sync summaries and checkpoints back to local storage on a schedule.
Auto-stop or downgrade to a debug path when the run is clearly unhealthy.

Review dashboard rules

Use one review dashboard manifest when the program spans runtime sweeps, QA runs, benchmark panels, and synced VM artifacts.
Track summary roots, benchmark roots, allowed roots, and sync targets explicitly instead of relying on memory.
Keep source-audit, leakage-audit, progress-snapshot, and comparison-summary paths next to the dashboard manifest.
Count runtime groups, QA runs, curated comparisons, and benchmark panels so the review surface stays legible as the program grows.
Promote from synced artifacts and run cards, not from a live dashboard alone.

CV training rules

Do not change architecture first.
Prove learning on a tiny shared subset before scaling.
Save previews in the same run folder as metrics and summaries.
Do not compare candidates on different benchmark sets.

Plateau recovery rules

If Dice or another primary metric is stuck, freeze the benchmark contract and write the improvement harness before another long run.
Require per-slice metrics, a short failure taxonomy, and a rerun rule before claiming a real win.
Keep a small, reviewable change set for each serious candidate. If several knobs move together, mark it as a package change instead of an ablation.
Cut a recipe family after a few non-winning serious candidates instead of rerunning the same idea with cosmetic churn.

Derm and segmentation rules

Before architecture changes, audit mask geometry, resize policy, interpolation, empty-mask prevalence, and overlay alignment.
For derm or lesion segmentation, slice the benchmark by lesion size, border difficulty, artifact-heavy images, and background-dominant images.
Global Dice is not enough. Keep boundary-sensitive or slice-specific diagnostics so a hidden failure mode does not look like a flat plateau.
Preserve a 20-50 case review set with saved overlays and short reviewer notes.

Codex and auth rules

Use ChatGPT or Codex OAuth-backed sessions as the default and preferred path.
Prefer Codex multi-agent or app-server workflows over third-party orchestrators that require paid API keys.
Do not require or recommend OPENAI_API_KEY, other vendor API keys, or paid inference APIs as the default runtime path.
If a third-party framework only works through paid API keys, treat it as reference material unless you can run it fully through local tools and OAuth-backed Codex sessions.

Promotion rules

Keep the last trusted baseline intact until the candidate clears agreed gates.
Separate semantic, runtime, and product-surface gates when deployment or export changes are involved.
If the semantic model improves but the deployed overlay or service output regresses, fix the downstream path before promotion.
Prefer a machine-readable run card plus a short markdown summary.
Initialize that run card before or at launch time so later steps append to one canonical record.
Render the markdown summary from the run card instead of hand-writing it when possible.
Keep the default redacted-public markdown rendering in place.

Public distribution rules

Use {baseDir} when pointing at bundled scripts or references.
Keep secrets, tokens, private dataset identifiers, browser profile names, and internal URLs out of the skill bundle.
Do not publish repo-specific absolute paths.
Keep private specialization in a local override skill, not the public package.

References

Read only the reference that matches the task:

references/official-repro-guidance.md
- Official PyTorch, Albumentations, MLflow, and DVC guidance.
references/agentic-research-patterns.md
- How to adapt karpathy/autoresearch style loops to DS and CV work.
references/improvement-harness-and-oauth-stack.md
- What to reuse from Codex subagents, harness engineering, OpenEvolve, Symphony, Paperclip, and OptiLLM under an OAuth-only rule.
references/openclaw-browser-lane.md
- OpenClaw, CDP, Colab, screenshot, artifact-pull, and timeout patterns.
references/colab-vm-operations.md
- Google Colab GPU management and custom VM lifecycle guidance.
references/kaggle-2026-practices.md
- Current Kaggle platform habits for reproducibility, versioning, and notebook execution.
references/cross-repo-cv-patterns.md
- Generic patterns for benchmark, trainer, and deploy repos split across one program.
references/publication-security.md
- Publication checklist for OpenClaw or ClawHub and leak-prevention rules.
references/runtime-serving-change-gates.md
- How to separate semantic, runtime, and product-surface gates for deployment-shaped releases.

Bundled Scripts

scripts/capture_cv_run_context.py
- Capture a compact git, module, GPU, and experiment-param snapshot.
scripts/init_cv_task_scaffold.py
- Create a reusable research, harness, ablation, agent, plan, journal, and evidence scaffold for a new CV task.
scripts/init_cv_run_card.py
- Create a machine-readable candidate run card for training, benchmark, and promotion evidence.
scripts/init_cv_improvement_harness.py
- Create a machine-readable benchmark, slice, rerun, and auth contract for plateau recovery and score-improvement work.
scripts/init_cv_review_dashboard_manifest.py
- Create a machine-readable review dashboard manifest for runtime sweeps, QA runs, benchmark panels, sync targets, and audit surfaces.
scripts/init_cv_dataset_manifest.py
- Create a reusable dataset identity manifest for shared CV benchmarks and training runs.
scripts/init_cv_browser_run_card.py
- Create a sanitized browser evidence record for Colab, Kaggle, or other notebook UI runs.
scripts/init_cv_validation_scorecard.py
- Create a machine-readable pre-training QA scorecard for browser or notebook hypothesis checks.
scripts/render_cv_run_summary.py
- Render a concise markdown release summary from the machine-readable run card with public-release redaction.
scripts/init_cv_artifact_manifest.py
- Create a machine-readable export-bundle manifest for Colab, Kaggle, or VM artifact pulls with redacted public path metadata.
scripts/init_cv_vm_bootstrap_manifest.py
- Create a machine-readable bootstrap manifest for long VM or cluster training runs with public-release command redaction.
scripts/init_cv_promotion_bundle.py
- Create one promotion entry point that joins semantic, runtime, browser, and product-surface evidence.

Usage Guidance

This bundle looks coherent and targeted at reproducibility: it runs local Python scripts that capture git state, package versions, GPU details, and generate sanitized JSON/markdown artifacts. Before running or publishing, consider the following: 1) Provide explicit --out paths you control (avoid writing into shared/public directories). 2) Do not pass secrets or API tokens as --param key=value pairs; the sanitizers try to redact credential-like values but do not guarantee perfect protection. 3) The scripts call local tools (git, nvidia-smi); run them in a sandbox or environment you trust if you are worried about sensitive repo or host data. 4) Review any omitted files (the listing noted 8 files truncated) if you want full assurance. 5) When publishing run artifacts, follow the publication-security reference included in the bundle to avoid leaking private URLs, CDP endpoints, or tokens.

Capability Analysis

Type: OpenClaw Skill Name: data-science-cv-repro-lab Version: 1.9.2 The bundle is a comprehensive framework for managing computer vision experiment reproducibility and provenance. It is notably security-conscious, featuring a dedicated utility script (cv_public_safety.py) and extensive documentation (publication-security.md) designed to prevent the accidental leakage of credentials, private network hostnames, and absolute file paths in generated reports. The included Python scripts perform standard experiment tracking tasks—such as logging git state, GPU status via nvidia-smi, and package versions—and use safe subprocess handling and path sanitization to ensure that evidence captured from environments like Colab or Kaggle remains public-safe.

Capability Tags

cryptocan-make-purchasesrequires-oauth-tokenrequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (CV reproducibility lab) match the bundled artifacts and runtime instructions. The SKILL.md and scripts focus on run cards, dataset manifests, browser run cards, git state, module versions, and GPU snapshots — all expected for a reproducibility/evidence capture tool.

✓ Instruction Scope

Runtime instructions tell the agent to run local Python scripts (init_*, capture_*, render_*) that read repo state (git), module/package versions, and GPU info (nvidia-smi) and write machine-readable manifests. These actions are coherent with the stated purpose. The code includes explicit sanitization helpers (cv_public_safety) to redact credential-like values before creating durable artifacts.

✓ Install Mechanism

No install spec is provided (instruction-only skill); required runtime is Python (python3/python), which is reasonable. No external downloads, package installs, or archive extraction are requested in the metadata or SKILL.md.

✓ Credentials

The skill does not declare or require any environment variables or credentials. Its scripts run local commands (git, nvidia-smi) and accept explicit --out / --repo-root / --bundle-root arguments; nothing requires unrelated secrets or cloud credentials.

✓ Persistence & Privilege

Flags show always:false and user-invocable:true. The skill does not request permanent presence or attempt to modify other skills or system-wide agent settings. It writes only to paths explicitly supplied via arguments (the scripts create output files at provided --out locations).

Version History

v1.9.2

Harden public release safety wording and refresh skill metadata.

v1.9.1

Polish public wording in published skill docs.

v1.9.0

Add review dashboard manifests for synced QA runs, benchmark panels, and audit surfaces.

v1.8.0

Add improvement harnesses, richer run-card schemas, and cleaner public packaging.

v1.7.2

Stop importing ML packages for version capture and reduce public artifact metadata

v1.7.1

Remove broad env, path-hash, and pip-freeze capture from the public execution bundle

v1.7.0

Tighten the public CLI to keep path, URL, env, and markdown redaction always on

v1.6.1

Tighten public summary and point homepage to the public portfolio

v1.6.0

Add public-safe manifest defaults, browser validation scorecards, and promotion bundles

Metadata

Slug data-science-cv-repro-lab

Version 1.9.2

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 9

Frequently Asked Questions

What is Data Science CV Repro Lab?

Data Science CV Repro Lab is a public ClawHub CV repro-lab skill. Use it when the user says "cv repro lab", "computer vision reproducibility", "CV experiment... It is an AI Agent Skill for Claude Code / OpenClaw, with 276 downloads so far.

How do I install Data Science CV Repro Lab?

Run "/install data-science-cv-repro-lab" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Science CV Repro Lab free?

Yes, Data Science CV Repro Lab is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Science CV Repro Lab support?

Data Science CV Repro Lab is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Science CV Repro Lab?

It is built and maintained by Zakhar Pashkin (@zack-dev-cm); the current version is v1.9.2.

More Skills

Data Science CV Repro Lab

Data Science CV Repro Lab

Goal

Use This Skill When

Quick Start

Operating Rules

Research before edits

Agentic orchestration rules

Browser automation rules

Colab GPU rules

Custom VM and cluster rules

Review dashboard rules

CV training rules

Plateau recovery rules

Derm and segmentation rules

Codex and auth rules

Promotion rules

Public distribution rules

References

Bundled Scripts

What is Data Science CV Repro Lab?

How do I install Data Science CV Repro Lab?

Is Data Science CV Repro Lab free?

Which platforms does Data Science CV Repro Lab support?

Who created Data Science CV Repro Lab?

💬 Comments