Description

Manages end-to-end LoRA training: collects and verifies photos, scrapes datasets, applies quality checks, captions, and trains the LoRA model locally.

README (SKILL.md)

LoRA Pipeline

Name: Lora Pipeline
Author: iskwang

Orchestrates the full LoRA dataset-to-model pipeline. Each phase is self-contained and can be delegated to a sub-agent independently.

Pipeline Overview

Phase 1: 蒐集範例照片   → collect 3–6 reference face photos
Phase 2: 確認人臉正確   → user confirms refs; deepface cross-check
Phase 3: 蒐集 datasets  → scrape web sources guided by face features
Phase 4: 確認照片正確   → face verify + dedup + quality filter + crop
Phase 5: 開始 caption   → WD14 local tagging + trigger word
Phase 6: LoRA training  → RunPod Kohya training → retrieve outputs

Phase Index

Phase	File	Can Sub-Agent	Model	Est. Time
01 — Reference Collection	`phases/01-reference.md`	✅	Haiku (Worker)	5–10 min
02 — Scraping	`phases/02-scraping.md`	✅	Haiku (Worker)	10–30 min
03 — Verify & Clean	`phases/03-verify.md`	✅	Haiku (Worker)	2–5 min
04 — Caption	`phases/04-caption.md`	✅	Haiku (Worker)	1–3 min
05 — Training	`phases/05-training.md`	✅	Haiku (Worker) + Sentry	15–30 min

To load a specific phase: read skills/lora-pipeline/phases/\x3Cphase-file> — each file is independently readable.

Directory Structure

~/.openclaw/workspace/
└── datasets/
    ├── face_references/
    │   └── \x3Clora_name>/          # Phase 1–2: Gold standard refs (3–6 photos)
    │       ├── ref_01.jpg
    │       └── ...
    ├── \x3Clora_name>_raw/          # Phase 3: Raw scraped images (pre-verification)
    │   └── ...
    └── \x3Clora_name>/              # Phase 4–5: Verified + captioned training set
        ├── image001.png
        ├── image001.txt
        └── ...

Privacy Rules (CRITICAL — All Phases)

NO DATA INSPECTION: Do NOT cat, read, or analyze image file contents or .txt caption files.
NO CLOUD UPLOAD: All face verification (DeepFace) must run locally. Never send images to cloud APIs.
NO DATA LEAKAGE: Do not describe dataset details (person names, attributes) to the LLM unnecessarily.
Treat datasets as opaque binary blobs except when running local scripts.

Quality Standards (SDXL)

Resolution: 1024×1024 minimum after crop
Format: Convert all to PNG before training
No black borders: Run autocrop before final save
Dataset diversity: ≥30% clothed/natural skin shots

Scripts

Script	Location	Purpose
`tag_batch.py`	`skills/lora-pipeline/scripts/tag_batch.py`	Local WD14 ONNX tagger for a directory
`smart_crop.py`	`skills/lora-pipeline/scripts/smart_crop.py`	Interactive or automated single-subject cropping
`batch_lora_train.py`	`skills/lora-pipeline/scripts/batch_lora_train.py`	Kohya batch training runner for RunPod

Sub-Agent Protocol

Each phase file contains:

Input Contract — what must already exist before this phase starts
Output Contract — what this phase produces
Completion Signal — how to report back (sessions_send + status file fallback)
Error Escalation — sub-agent reports to parent, never self-escalates model tier

Usage Guidance

This skill implements a full LoRA training pipeline but is sloppy: it doesn't declare the system tools and Python libs it needs, contains hardcoded paths (e.g., /Users/mini/...), and assumes you have runpodctl/SSH keys and local model files. Before installing or running: 1) Do not run it blindly — inspect and fix absolute paths in tag_batch.py and other scripts. 2) Ensure you understand and consent to uploading datasets to remote RunPod pods and that you control the SSH keys used. 3) Verify required Python packages and ONNX/Wd14 models are installed in known locations, or change the scripts to configurable paths. 4) Confirm you have permission to scrape and use the images (privacy and legal risk). 5) If you expect a small/local-only helper, this skill is overprivileged; if you intend cloud training, validate runpodctl configuration and review the SCP/SSH commands carefully. If you want, provide the missing dependency list and replace hardcoded paths and I'll re-evaluate.

Capability Analysis

Type: OpenClaw Skill Name: lora-pipeline Version: 1.0.0 The skill bundle provides a complex LoRA training pipeline involving web scraping, local face verification (DeepFace), and remote training on RunPod. It is classified as suspicious because it requires high-risk capabilities—including remote command execution via SSH, automated cloud instance management (runpodctl), and web scraping with JavaScript execution—which, while plausibly needed for the stated purpose, represent a significant attack surface. Additionally, 'scripts/tag_batch.py' contains hardcoded absolute file paths ('/Users/mini/...') that would cause failures or unexpected behavior on other systems. Despite these risks, the bundle includes explicit privacy-preserving instructions for the agent, such as 'NO CLOUD UPLOAD' for face data and 'NO DATA INSPECTION' of dataset contents.

Capability Assessment

⚠ Purpose & Capability

The skill's description (end-to-end LoRA pipeline) matches the instructions and included scripts. However the registry metadata declares no required binaries or env vars while the SKILL.md explicitly depends on runpodctl, ssh/scp, unzip, Python + many Python packages (deepface, opencv, onnxruntime, pandas, PIL), and local ONNX/WD14 tagger models. That mismatch (no declared dependencies vs. heavy toolchain required) is incoherent and will cause failures or implicit network activity to fetch models/tools.

⚠ Instruction Scope

Runtime instructions include web scraping (browser JS snippets and instructions to bypass SNS login via mirrors), extensive filesystem operations, spawning sub-agents, scp/ssh upload to remote RunPod pods, and automated remote training. The SKILL.md's 'NO DATA INSPECTION/NO CLOUD UPLOAD' guidance is contradictory in places (e.g., it forbids sending images to cloud APIs for verification but instructs uploading datasets to remote pods for training). The agent is instructed to perform network transfers (scp/ssh) and spawn long-running sub-agents which are beyond simple local helper behavior — these are appropriate for training but require clear declared permissions and user consent.

ℹ Install Mechanism

There is no install spec (instruction-only), which lowers install risk. But included scripts assume many preinstalled binaries and libraries (accelerate path '/venv/bin/accelerate', runpodctl, system Python packages) and expect model files to exist locally. No mechanism is provided to install or verify those dependencies; this is an operational risk (failures or implicit downloads at runtime).

⚠ Credentials

The skill requests no declared environment variables or credentials, yet the workflow requires access to the user's SSH key, runpodctl configuration, and possibly local model directories (e.g., tag_batch.py hardcodes '/Users/mini/.openclaw/...'). Hardcoded absolute paths and implicit reliance on SSH keys / known_hosts files are disproportionate to a clean, portable skill design and risk accidental use of personal files or keys. The skill also requires RunPod credits / account access (implied) but doesn't declare or request credentials explicitly.

✓ Persistence & Privilege

The skill is not force-installed (always:false) and follows the normal model-invocation defaults. It uses sub-agents and sessions_spawn as part of its design; this autonomous behavior is expected for long-running training tasks. Nothing in the package attempts to modify other skills or grant itself permanent system-wide privileges.

Version History

v1.0.0

Initial release of the lora-pipeline skill: end-to-end LoRA model training pipeline. - Automates the full process: photo collection, face verification, dataset scraping and cleaning, captioning, and LoRA training. - Each phase is modular and can be delegated to a sub-agent independently. - Includes strict privacy rules: no cloud uploads, all verifications run locally, never read or leak dataset contents. - Provides scripts for captioning, smart cropping, and batch training. - Ensures high-quality datasets: PNG format, 1024×1024 resolution minimum, no black borders, and enforced diversity. - Detailed directory structure and phase documentation for transparency and reproducibility.

Metadata

Slug lora-pipeline

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Lora Pipeline?

Manages end-to-end LoRA training: collects and verifies photos, scrapes datasets, applies quality checks, captions, and trains the LoRA model locally. It is an AI Agent Skill for Claude Code / OpenClaw, with 265 downloads so far.

How do I install Lora Pipeline?

Run "/install lora-pipeline" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Lora Pipeline free?

Yes, Lora Pipeline is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Lora Pipeline support?

Lora Pipeline is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Lora Pipeline?

It is built and maintained by iskWang (@iskwang); the current version is v1.0.0.

More Skills

Lora Pipeline