← 返回 Skills 市场

Data Generator

Name: Data Generator
Author: huaibuer

作者 HuaiBuer · GitHub ↗ · v2.2.0 · MIT-0

cross-platform ✓ 安全检测通过

166

总下载

当前安装

版本数

在 OpenClaw 中安装

/install data-generator

功能描述

训练数据生成技能。根据传入的工具名和用户指令列表，生成多轮对话格式的 JSONL 训练数据。触发场景：(1) 传入工具名和用户指令列表，生成完整训练数据；(2) 批量生成指定工具的标注数据；(3) 给定指令列表，输出 JSONL 对话样本。

使用说明 (SKILL.md)

Data Generator

将用户指令列表转换为标准 JSONL 训练数据。

输入

两个必填参数：

参数	类型	说明
`tool_name`	string	工具名，如 `dev_control`、`scene_generator`、`weather`
`user_instructions`	string[]	用户指令列表，如 `["5分钟后打开空调", "3分钟后关灯"]`

输出 JSONL 格式

{"conversations":[
  {"from":"human","value":"\x3C当前用户指令>打开客厅空调\x3C/当前用户指令>\
\x3C本地设备>格力冷静王(空调)\x3C/本地设备>\
\x3C当前时间>2026-03-15 14:22:47\x3C/当前时间>\
\x3C用户场景列表>[{\"scene_id\":1001,\"scene_name\":\"回家模式\",\"room_name\":\"全屋\"},{\"scene_id\":1002,\"scene_name\":\"睡眠模式\",\"room_name\":\"主卧\"}]\x3C/用户场景列表>\
\x3C用户设备列表>{\"客厅\":[\"格力冷静王(空调)\",\"洗碗机A1(洗碗机)\"],\"主卧\":[\"美的舒省风(空调)\"]}\x3C/用户设备列表>"},
  {"from":"assistant","value":"\x3Ctool_call>{\"tool_name\":\"dev_control\",\"query\":\"打开客厅空调\"}\x3C/tool_call>"},
  {"from":"observation","value":"\x3Ctool_response>客厅空调已打开\x3C/tool_response>"},
  {"from":"assistant","value":"好的，客厅空调已经打开啦~"}
],"system":"","history":[]}

格式规则

human value = 完整上下文，格式固定：

\x3C当前用户指令>用户原始指令\x3C/当前用户指令>
\x3C本地设备>设备名(类型)\x3C/本地设备>
\x3C当前时间>YYYY-MM-DD HH:mm:ss\x3C/当前时间>
\x3C用户场景列表>[{"scene_id":xxx,"scene_name":"场景名","room_name":"房间名"},...]\x3C/用户场景列表>
\x3C用户设备列表>{"房间":["设备名(类型)",...]}\x3C/用户设备列表>

assistant tool_call = 直接输出 tool_call 标签，无垫音前缀
observation = \x3Ctool_response>...\x3C/tool_response> 或 \x3Ctool_call>{...}\x3C/tool_call>（dev_info/weather 等工具）
assistant 终接回复 = 直接回复内容，无垫音前缀
system = ""，history = []

工作流

1. 接收 tool_name + user_instructions[]
         ↓
2. 加载提示词：通用要求 + 工具特定要求（references/tools/{tool}.txt）
         ↓
3. 将 user_instructions 注入提示词
         ↓
4. 生成 JSONL（每条独立）
         ↓
5. 输出 .jsonl 文件

提示词拼接

拼接规则：

[通用要求]
# ═══════════════════════════════════════════════════════════════════
# 【工具特定要求】
# 本次只调用：{TOOL_NAME}
# ────────────────────────────────────────────────────────────────
[references/tools/{TOOL_NAME}.txt 内容]

拼接脚本：scripts/build_prompt.py

工具与文件对照

工具	要求文件
`dev_control`	`references/tools/dev_control.txt`
`scene_generator`	`references/tools/scene_generator.txt`
`alarm_remind`	`references/tools/alarm_remind.txt`
`weather`	`references/tools/weather.txt`
`scene_control`	`references/tools/scene_control.txt`
`dev_info`	`references/tools/dev_info.txt`
`exit_dialog`	`references/tools/exit_dialog.txt`
`GreeQA`	`references/tools/GreeQA.txt`
`scene_guide`	`references/tools/scene_guide.txt`
`chat`	`references/tools/chat.txt`

使用示例

输入：

tool_name: "dev_info"
user_instructions: ["家里空调数量", "有几个空调"]

输出字段说明：

字段	说明
`conversations[0].value`	含 `\x3C当前用户指令>` + 完整上下文
`conversations[1].value`	`\x3Ctool_call>{"tool_name":"dev_info"}\x3C/tool_call>`
`conversations[2].value`	dev_info 返回结果（设备列表）
`conversations[3].value`	文字终接回复
`system`	空字符串 `""`
`history`	空数组 `[]`

BUG 修复数据

当传入 tool_name 为修复后的正确工具时，生成的数据应体现：

工具调用格式正确（符合工具要求文件）
query 字段格式正确（如延时类指令含 timing 字段）
文字回复符合预期（含延时时间描述）

具体格式参考：references/tools/scene_generator.txt。

安全使用建议

This skill appears to do only what it claims: generate JSONL training samples from tool-specific templates and user instruction lists. Before running: (1) inspect a sample output to ensure it doesn't accidentally encode any sensitive content you might include in the input file; (2) do not pass paths to sensitive system files to the --file option (the script will embed file contents into the generated data); (3) if you plan to include real user data or production logs, sanitize or anonymize them first. Otherwise it is safe to install from a provenance perspective (no network calls, no secrets requested).

功能分析

Type: OpenClaw Skill Name: data-generator Version: 2.2.0 The data-generator skill bundle is a utility for creating synthetic JSONL training datasets for a smart home AI assistant. The included Python scripts (scripts/build_prompt.py and scripts/gen_data.py) generate simulated multi-turn dialogues by combining user instructions with randomized device and scene contexts. The logic is entirely focused on data formatting and string manipulation based on tool definitions found in the references/ directory. No indicators of malicious intent, such as data exfiltration, unauthorized system access, or harmful prompt injection, were detected.

能力评估

✓ Purpose & Capability

Name/description (生成训练数据) match the included artifacts: prompt templates, per-tool rule files under references/tools/, and two generator scripts. All required files and behavior (creating JSONL from user instructions) are coherent with the stated purpose.

ℹ Instruction Scope

SKILL.md and scripts constrain behavior to assembling prompts and producing JSONL records. The generator reads local reference files and, if invoked with --file, will read a user-supplied file of instructions — this is expected for the task but means you should not point it at sensitive system files because their contents would be embedded into generated data.

✓ Install Mechanism

No install spec; instruction-only skill with bundled Python scripts. No downloads, no external package installs, and no code executed from remote URLs.

✓ Credentials

The skill requests no environment variables, credentials, or config paths. The scripts operate on provided inputs and bundled reference files only, which is proportionate to the stated functionality.

✓ Persistence & Privilege

Skill is not always-enabled and uses normal model invocation semantics. It does write output files (the generated .jsonl) as expected, but does not modify other skills or request elevated/system-wide privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install data-generator
安装完成后，直接呼叫该 Skill 的名称或使用 /data-generator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v2.2.0

更新scene_generator提示词：新增repeat_type=holidays时datetime不能为空的补充规则；更新GreeQA提示词

v2.1.0

更新GreeQA提示词：修正"格力小智"为"格仔"，完善返回分布规则（2条50%/1条25%/0条25%），新增双方式结构说明和模型回复示例

v2.0.0

新格式v2：conversations[0].value含完整上下文，system=空history=空数组，assistant无垫音前缀

v1.2.0

SKILL.md优化：输入tool_name+user_instructions列表，专注JSONL生成

v1.1.0

新增gen_data.py：接收tool_name+指令列表生成JSONL；优化SKILL.md说明输入输出格式

v1.0.0

Data Generator skill initial release. - Enables generating multi-turn dialogue training data for smart home scenarios in JSONL format based on natural language instructions. - Supports 10 different tool types, each with specific data generation requirements. - Provides a prompt-building script for composing tool-specific generation requests. - Allows batch generation of specified tool training data for standard, special, and bug-fix scenarios. - Includes comprehensive documentation and usage examples for both manual and automated workflows.

元数据

Slug data-generator

版本 2.2.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 6

常见问题

Data Generator 是什么？

训练数据生成技能。根据传入的工具名和用户指令列表，生成多轮对话格式的 JSONL 训练数据。触发场景：(1) 传入工具名和用户指令列表，生成完整训练数据；(2) 批量生成指定工具的标注数据；(3) 给定指令列表，输出 JSONL 对话样本。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 166 次。

如何安装 Data Generator？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-generator」即可一键安装，无需额外配置。

Data Generator 是免费的吗？

是的，Data Generator 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Data Generator 支持哪些平台？

Data Generator 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Data Generator？

由 HuaiBuer（@huaibuer）开发并维护，当前版本 v2.2.0。

Data Generator

Data Generator

输入

输出 JSONL 格式

格式规则

工作流

提示词拼接

工具与文件对照

使用示例

BUG 修复数据

Data Generator 是什么？

如何安装 Data Generator？

Data Generator 是免费的吗？

Data Generator 支持哪些平台？

谁开发了 Data Generator？

💬 留言讨论