功能描述

Provides tools for implementing feedback loops to fine-tune LLM agents using user feedback for continuous personalization and improvement, including training...

使用说明 (SKILL.md)

Feedback Loop Fine-Tuner

Name: feedback-loop-fine-tuner
Author: jpengcheng523-netizen

Implement feedback loops to fine-tune LLM agents using user feedback for continuous personalization and improvement.

When to Use

Collecting user feedback from agent interactions
Generating training datasets for fine-tuning
Optimizing prompts based on feedback
Tracking improvement metrics over time
Running A/B tests for prompt variants
Implementing RLHF preference learning

Usage

const fineTuner = require('./skills/feedback-loop-fine-tuner');

// Collect feedback
const feedback = fineTuner.collectFeedback({
  conversationId: 'conv_123',
  messageId: 'msg_456',
  query: 'What is machine learning?',
  response: 'Machine learning is...',
  rating: 'positive',
  model: 'llama-3',
  temperature: 0.7
});

// Generate training data
const dataset = fineTuner.generateTrainingData(feedbackHistory, {
  format: 'openai',
  includeCorrections: true
});

// Optimize prompts
const optimization = fineTuner.optimizePrompts(feedbackHistory, {
  'default': 'You are a helpful assistant.',
  'detailed': 'You are a detailed, thorough assistant.'
});

// Track improvement
const improvement = fineTuner.trackImprovement(beforeMetrics, afterMetrics);

// Create A/B test
const experiment = fineTuner.createABTest('prompt_test', [
  { name: 'control', template: 'You are helpful.' },
  { name: 'variant', template: 'You are a detailed, helpful assistant.' }
]);

API

`collectFeedback(interaction)`

Collect feedback from a user interaction.

const feedback = collectFeedback({
  conversationId: 'conv_123',
  messageId: 'msg_456',
  query: 'Explain quantum computing',
  response: 'Quantum computing uses...',
  rating: 'positive', // 'positive', 'negative', 'neutral', 'correction'
  userCorrection: null, // Optional corrected response
  model: 'llama-3-70b',
  temperature: 0.7,
  promptTemplate: 'default',
  responseTime: 1500,
  tokensUsed: 256
});

`generateTrainingData(feedbackHistory, options)`

Generate fine-tuning dataset from feedback history.

const dataset = generateTrainingData(feedbackHistory, {
  includeNegative: false,
  includeCorrections: true,
  minRating: 'neutral',
  format: 'jsonl' // 'jsonl', 'openai', 'llama', 'alpaca'
});

Supported formats:

jsonl: { "prompt": "...", "completion": "..." }
openai: { "messages": [{ "role": "user", "content": "..." }, ...] }
llama: Llama 3 chat format with special tokens
alpaca: { "instruction": "...", "input": "", "output": "..." }

`optimizePrompts(feedbackHistory, templates)`

Optimize prompts based on feedback analysis.

const result = optimizePrompts(feedbackHistory, {
  'concise': 'Answer briefly.',
  'detailed': 'Answer with full details.',
  'friendly': 'Answer in a friendly tone.'
});

console.log(result.bestTemplate); // 'detailed'
console.log(result.suggestions); // [{ type: 'length', suggestion: '...' }]
console.log(result.optimizedVariant); // Optimized prompt template

`trackImprovement(before, after)`

Track improvement metrics between two snapshots.

const improvement = trackImprovement(
  { qualityScore: 0.65, positiveRate: 0.70 },
  { qualityScore: 0.82, positiveRate: 0.85 }
);

console.log(improvement.qualityScore);
// { baseline: 0.65, current: 0.82, change: 0.17, percentChange: 26.15, improved: true }

`generateImprovementReport(metricsHistory)`

Generate comprehensive improvement report.

const report = generateImprovementReport([
  { qualityScore: 0.65, positiveRate: 0.70 },
  { qualityScore: 0.72, positiveRate: 0.75 },
  { qualityScore: 0.82, positiveRate: 0.85 }
]);

console.log(report.trends.qualityScore.direction); // 'improving'
console.log(report.summary.latestQualityScore); // 0.82

`createABTest(name, variants, config)`

Create an A/B test experiment for prompt variants.

const experiment = createABTest('tone_test', [
  { name: 'formal', template: 'You are a formal assistant.' },
  { name: 'casual', template: 'You are a friendly, casual assistant.' }
], {
  trafficSplit: [0.5, 0.5],
  minSamples: 100,
  confidenceLevel: 0.95
});

Classes

FeedbackCollector

Collect and aggregate user feedback.

const collector = new FeedbackCollector();

// Collect individual feedback
const fb = collector.collectFeedback(interaction);

// Batch collect
collector.batchCollect(interactions);

// Aggregate by category
const aggregation = collector.aggregateByCategory({
  start: Date.now() - 7 * 24 * 60 * 60 * 1000, // Last 7 days
  end: Date.now()
});

// Export for analysis
const csv = collector.exportFeedback('csv');

TrainingDatasetGenerator

Generate fine-tuning datasets from feedback.

const generator = new TrainingDatasetGenerator();

// Generate training data
const dataset = generator.generateTrainingData(feedbackHistory, { format: 'openai' });

// Generate preference pairs for RLHF
const pairs = generator.generatePreferencePairs(feedbackHistory);

// Split into train/validation
const { train, validation } = generator.splitDataset(examples, 0.8);

PromptOptimizer

Optimize prompts based on feedback.

const optimizer = new PromptOptimizer();

// Register templates
optimizer.registerTemplate('default', 'You are helpful.');
optimizer.registerTemplate('detailed', 'You are detailed and thorough.');

// Update performance
optimizer.updatePerformance('default', feedback);

// Get best template
const best = optimizer.getBestTemplate();

// Get improvement suggestions
const suggestions = optimizer.suggestImprovements(feedbackHistory, 'default');

// Generate optimized variant
const variant = optimizer.generateVariant('default', suggestions);

ImprovementTracker

Track improvement metrics over time.

const tracker = new ImprovementTracker();

// Set baseline
tracker.setBaseline('initial', { qualityScore: 0.5 });

// Record snapshots
tracker.recordSnapshot({ qualityScore: 0.6 });
tracker.recordSnapshot({ qualityScore: 0.7 });

// Calculate improvement
const improvement = tracker.calculateImprovement({ qualityScore: 0.8 }, 'initial');

// Get trend
const trend = tracker.getTrend('qualityScore', 10);

// Generate report
const report = tracker.generateReport();

ABTester

Run A/B tests for prompt variants.

const tester = new ABTester();

// Create experiment
tester.createExperiment('tone_test', [
  { name: 'formal', template: 'Be formal.' },
  { name: 'casual', template: 'Be casual.' }
]);

// Assign variant
const variant = tester.assignVariant('tone_test');

// Record result
tester.recordResult('tone_test', variant.variantIndex, {
  rating: 'positive',
  responseTime: 1200
});

// Analyze results
const analysis = tester.analyzeResults('tone_test');

// Stop experiment
tester.stopExperiment('tone_test');

Example: Complete Feedback Loop

const fineTuner = require('./skills/feedback-loop-fine-tuner');

// 1. Initialize components
const collector = new fineTuner.FeedbackCollector();
const generator = new fineTuner.TrainingDatasetGenerator();
const optimizer = new fineTuner.PromptOptimizer();
const tracker = new fineTuner.ImprovementTracker();

// 2. Register prompt templates
optimizer.registerTemplate('v1', 'You are a helpful assistant.');
optimizer.registerTemplate('v2', 'You are a detailed, helpful assistant.');

// 3. Set baseline
tracker.setBaseline('initial', {
  qualityScore: 0.5,
  positiveRate: 0.5,
  avgResponseTime: 2000
});

// 4. Collect feedback (simulated)
const interactions = [
  { conversationId: 'c1', query: 'What is AI?', response: 'AI is...', rating: 'positive' },
  { conversationId: 'c2', query: 'Explain ML', response: 'ML is...', rating: 'negative' },
  { conversationId: 'c3', query: 'What is DL?', response: 'DL is...', rating: 'positive', userCorrection: 'Deep learning is a subset of ML that uses neural networks...' }
];

for (const interaction of interactions) {
  const feedback = collector.collectFeedback(interaction);
  optimizer.updatePerformance(interaction.promptTemplate || 'v1', feedback);
}

// 5. Generate training data
const feedbackHistory = collector.feedbackStore;
const trainingData = generator.generateTrainingData(feedbackHistory, {
  format: 'openai',
  includeCorrections: true
});

console.log('Training examples:', trainingData.split('\
').length);

// 6. Optimize prompts
const optimization = fineTuner.optimizePrompts(feedbackHistory, {
  'v1': 'You are a helpful assistant.',
  'v2': 'You are a detailed, helpful assistant.'
});

console.log('Best template:', optimization.bestTemplate);
console.log('Suggestions:', optimization.suggestions);

// 7. Track improvement
const aggregation = collector.aggregateByCategory();
tracker.recordSnapshot({
  qualityScore: aggregation.qualityScore,
  positiveRate: aggregation.byRating.positive?.length / aggregation.total || 0,
  avgResponseTime: aggregation.avgResponseTime,
  totalFeedback: aggregation.total
});

const report = tracker.generateReport();
console.log('Improvement trend:', report.trends.qualityScore?.direction);

Example: RLHF Preference Learning

const fineTuner = require('./skills/feedback-loop-fine-tuner');
const generator = new fineTuner.TrainingDatasetGenerator();

// Collect feedback with comparisons
const feedbackHistory = [
  { query: 'Explain AI', rating: 'positive', response: 'AI is artificial intelligence...' },
  { query: 'Explain AI', rating: 'negative', response: 'AI means artificial intelligence.' }
];

// Generate preference pairs
const pairs = generator.generatePreferencePairs(feedbackHistory);

console.log('Preference pairs:');
for (const pair of pairs) {
  console.log(`Prompt: ${pair.prompt}`);
  console.log(`Chosen: ${pair.chosen.substring(0, 50)}...`);
  console.log(`Rejected: ${pair.rejected.substring(0, 50)}...`);
}

Notes

Feedback ratings: 'positive', 'negative', 'neutral', 'correction'
User corrections are treated as high-quality training examples
Preference pairs are generated from positive/negative feedback on similar queries
A/B testing uses simplified statistical significance (use proper libraries for production)
Training data formats support OpenAI, Llama 3, and Alpaca fine-tuning
All metrics are calculated locally without external dependencies

安全使用建议

This skill appears to do what it says: local collection, analysis, and formatting of user feedback for dataset preparation. Before installing or using it, consider: (1) Privacy — the skill will aggregate user interactions and can export datasets (JSON/CSV/jsonl) that may include PII or sensitive conversation content; ensure you filter or redact data before training or sharing. (2) Scope — the module prepares data but does not perform model training or upload to external services, so plan how/where you'll run fine-tuning or RLHF steps. (3) Code review — although included code shows no network calls or secret access, review the full (non-truncated) index.js to confirm there are no hidden endpoints or telemetry. (4) Test in a sandboxed environment and enforce policies about what feedback may be captured (e.g., do not collect credentials). If you need automatic cloud training integrations, prefer a skill that explicitly requests and documents the required credentials and endpoints.

功能分析

Type: OpenClaw Skill Name: jpeng-feedback-loop-fine-tuner Version: 1.0.0 The skill provides a comprehensive set of tools for managing LLM feedback loops, including feedback collection, training dataset generation (JSONL, OpenAI, Llama formats), and prompt optimization. Analysis of index.js and SKILL.md reveals no network activity, filesystem access, or use of dangerous functions like eval or exec. The code logic is transparent, aligns perfectly with the stated purpose, and contains no indicators of malicious intent or prompt injection vulnerabilities.

能力评估

ℹ Purpose & Capability

The name/description (feedback-loop fine-tuner) matches the included SKILL.md and index.js: the code implements feedback collection, aggregation, dataset generation (jsonl/openai/llama/alpaca), preference-pair generation, prompt optimization, and metrics tracking. One note: the skill describes 'fine-tuning' and 'RLHF' workflows but the implementation focuses on data preparation and analysis (no built-in training calls or cloud upload). That is a legitimate design choice for a local library, but users expecting automated model training integrations should not assume those are present.

✓ Instruction Scope

SKILL.md instructions are narrowly scoped to collecting feedback, generating datasets, optimizing prompts, tracking metrics, and running A/B tests. They do not instruct reading arbitrary system files, contacting external endpoints, or accessing environment variables beyond what the module exposes. The example usage assumes requiring the module from a local path, which is normal for a Node library.

✓ Install Mechanism

No install spec is provided (instruction-only plus a local index.js), so nothing will be downloaded or installed by the platform. The package.json is minimal and the code is included in the bundle. This is low-risk from an install/execution vector perspective.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths and the code does not reference process.env or external secrets. That matches the stated purpose (local data processing) and is proportionate.

✓ Persistence & Privilege

The skill does not request always:true or other privileged persistent presence. It keeps feedback in an in-memory store (feedbackStore) and provides export functions; it does not modify other skills or system-wide agent settings. Autonomous invocation is allowed by platform default but there's no additional persistence or privilege escalation requested by the skill.

版本历史

v1.0.0

Initial release of Feedback Loop Fine-Tuner. - Introduces tools for collecting and aggregating user feedback to improve LLM agents. - Supports generation of fine-tuning datasets from feedback history in multiple formats. - Enables prompt optimization using feedback data and analysis. - Provides improvement tracking and reporting functionality over time. - Adds A/B testing for prompt template variants with experiment management. - Includes modular classes: FeedbackCollector, TrainingDatasetGenerator, PromptOptimizer, ImprovementTracker, and ABTester.

元数据

Slug jpeng-feedback-loop-fine-tuner

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题