功能描述

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, R...

使用说明 (SKILL.md)

Senior ML Engineer

Name: Senior Ml Engineer
Author: alirezarezvani

Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.

Model Deployment Workflow

Deploy a trained model to production with monitoring:

Export model to standardized format (ONNX, TorchScript, SavedModel)
Package model with dependencies in Docker container
Deploy to staging environment
Run integration tests against staging
Deploy canary (5% traffic) to production
Monitor latency and error rates for 1 hour
Promote to full production if metrics pass
Validation: p95 latency \x3C 100ms, error rate \x3C 0.1%

Container Template

FROM python:3.11-slim

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model/ /app/model/
COPY src/ /app/src/

HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]

Serving Options

Option	Latency	Throughput	Use Case
FastAPI + Uvicorn	Low	Medium	REST APIs, small models
Triton Inference Server	Very Low	Very High	GPU inference, batching
TensorFlow Serving	Low	High	TensorFlow models
TorchServe	Low	High	PyTorch models
Ray Serve	Medium	High	Complex pipelines, multi-model

MLOps Pipeline Setup

Establish automated training and deployment:

Configure feature store (Feast, Tecton) for training data
Set up experiment tracking (MLflow, Weights & Biases)
Create training pipeline with hyperparameter logging
Register model in model registry with version metadata
Configure staging deployment triggered by registry events
Set up A/B testing infrastructure for model comparison
Enable drift monitoring with alerting
Validation: New models automatically evaluated against baseline

Feature Store Pattern

from feast import Entity, Feature, FeatureView, FileSource

user = Entity(name="user_id", value_type=ValueType.INT64)

user_features = FeatureView(
    name="user_features",
    entities=["user_id"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="purchase_count_30d", dtype=ValueType.INT64),
        Feature(name="avg_order_value", dtype=ValueType.FLOAT),
    ],
    online=True,
    source=FileSource(path="data/user_features.parquet"),
)

Retraining Triggers

Trigger	Detection	Action
Scheduled	Cron (weekly/monthly)	Full retrain
Performance drop	Accuracy \x3C threshold	Immediate retrain
Data drift	PSI > 0.2	Evaluate, then retrain
New data volume	X new samples	Incremental update

LLM Integration Workflow

Integrate LLM APIs into production applications:

Create provider abstraction layer for vendor flexibility
Implement retry logic with exponential backoff
Configure fallback to secondary provider
Set up token counting and context truncation
Add response caching for repeated queries
Implement cost tracking per request
Add structured output validation with Pydantic
Validation: Response parses correctly, cost within budget

Provider Abstraction

from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential

class LLMProvider(ABC):
    @abstractmethod
    def complete(self, prompt: str, **kwargs) -> str:
        pass

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
    return provider.complete(prompt)

Cost Management

Provider	Input Cost	Output Cost
GPT-4	$0.03/1K	$0.06/1K
GPT-3.5	$0.0005/1K	$0.0015/1K
Claude 3 Opus	$0.015/1K	$0.075/1K
Claude 3 Haiku	$0.00025/1K	$0.00125/1K

RAG System Implementation

Build retrieval-augmented generation pipeline:

Choose vector database (Pinecone, Qdrant, Weaviate)
Select embedding model based on quality/cost tradeoff
Implement document chunking strategy
Create ingestion pipeline with metadata extraction
Build retrieval with query embedding
Add reranking for relevance improvement
Format context and send to LLM
Validation: Response references retrieved context, no hallucinations

Vector Database Selection

Database	Hosting	Scale	Latency	Best For
Pinecone	Managed	High	Low	Production, managed
Qdrant	Both	High	Very Low	Performance-critical
Weaviate	Both	High	Low	Hybrid search
Chroma	Self-hosted	Medium	Low	Prototyping
pgvector	Self-hosted	Medium	Medium	Existing Postgres

Chunking Strategies

Strategy	Chunk Size	Overlap	Best For
Fixed	500-1000 tokens	50-100	General text
Sentence	3-5 sentences	1 sentence	Structured text
Semantic	Variable	Based on meaning	Research papers
Recursive	Hierarchical	Parent-child	Long documents

Model Monitoring

Monitor production models for drift and degradation:

Set up latency tracking (p50, p95, p99)
Configure error rate alerting
Implement input data drift detection
Track prediction distribution shifts
Log ground truth when available
Compare model versions with A/B metrics
Set up automated retraining triggers
Validation: Alerts fire before user-visible degradation

Drift Detection

from scipy.stats import ks_2samp

def detect_drift(reference, current, threshold=0.05):
    statistic, p_value = ks_2samp(reference, current)
    return {
        "drift_detected": p_value \x3C threshold,
        "ks_statistic": statistic,
        "p_value": p_value
    }

Alert Thresholds

Metric	Warning	Critical
p95 latency	> 100ms	> 200ms
Error rate	> 0.1%	> 1%
PSI (drift)	> 0.1	> 0.2
Accuracy drop	> 2%	> 5%

Reference Documentation

MLOps Production Patterns

references/mlops_production_patterns.md contains:

Model deployment pipeline with Kubernetes manifests
Feature store architecture with Feast examples
Model monitoring with drift detection code
A/B testing infrastructure with traffic splitting
Automated retraining pipeline with MLflow

LLM Integration Guide

references/llm_integration_guide.md contains:

Provider abstraction layer pattern
Retry and fallback strategies with tenacity
Prompt engineering templates (few-shot, CoT)
Token optimization with tiktoken
Cost calculation and tracking

RAG System Architecture

references/rag_system_architecture.md contains:

RAG pipeline implementation with code
Vector database comparison and integration
Chunking strategies (fixed, semantic, recursive)
Embedding model selection guide
Hybrid search and reranking patterns

Tools

Model Deployment Pipeline

python scripts/model_deployment_pipeline.py --model model.pkl --target staging

Generates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.

RAG System Builder

python scripts/rag_system_builder.py --config rag_config.yaml --analyze

Scaffolds RAG pipeline with vector store integration and retrieval logic.

ML Monitoring Suite

python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy

Sets up drift detection, alerting, and performance dashboards.

Tech Stack

Category	Tools
ML Frameworks	PyTorch, TensorFlow, Scikit-learn, XGBoost
LLM Frameworks	LangChain, LlamaIndex, DSPy
MLOps	MLflow, Weights & Biases, Kubeflow
Data	Spark, Airflow, dbt, Kafka
Deployment	Docker, Kubernetes, Triton
Databases	PostgreSQL, BigQuery, Pinecone, Redis

安全使用建议

This skill is primarily documentation and example code for MLOps and LLM/RAG systems. The included scripts are scaffolding (they parse CLI args and return simple JSON) and do not themselves call remote APIs or read secrets. If you plan to run or adapt the examples, you will need to supply your own provider credentials (e.g., OpenAI, Pinecone), and you should: 1) review any code you run and supply credentials only to trusted runtime environments; 2) avoid pasting production keys into untrusted places; 3) run examples in a sandbox or test project first; and 4) ensure any external dependencies you install (clients, libraries) come from trusted package sources.

功能分析

Type: OpenClaw Skill Name: senior-ml-engineer Version: 2.1.1 The skill bundle is a comprehensive set of documentation and boilerplate scripts for a Senior ML Engineer persona, covering MLOps, RAG systems, and LLM integration. The Python scripts in the scripts/ directory (ml_monitoring_suite.py, model_deployment_pipeline.py, and rag_system_builder.py) are non-functional skeletons containing only logging and argument parsing logic with no dangerous operations. The documentation provides standard industry patterns and code snippets that are educational and lack any signs of malicious intent or prompt injection.

能力评估

ℹ Purpose & Capability

Name and description match the provided SKILL.md, references, and example code (model deployment, MLOps patterns, RAG, LLM integration). The reference docs include code that assumes external provider clients (OpenAI, Pinecone, Anthropic) but the skill does not request API keys — this is reasonable for an instruction-only skill but worth noting because to actually use the examples the user will need provider credentials.

✓ Instruction Scope

SKILL.md and the reference files stay on-topic: they describe deployment pipelines, monitoring, RAG designs, and LLM integration patterns. Instructions do not direct the agent to read unrelated system files or to exfiltrate data; example snippets reference provider APIs but don't instruct the agent to call unknown external endpoints beyond normal vendor APIs.

✓ Install Mechanism

No install spec is provided (instruction-only plus example scripts), so nothing is downloaded or written to disk by the installer. This is lowest-risk from an install-mechanism perspective.

ℹ Credentials

The skill declares no required environment variables or credentials, which is proportionate for a documentation/instruction skill. However, many examples reference external services (OpenAI, Pinecone, embedding clients) that in practice require API keys; the skill does not request or store those keys itself — the user must supply them when running code.

✓ Persistence & Privilege

always is false and the skill is user-invocable with normal autonomous invocation allowed. The skill does not request persistent system privileges, nor does it attempt to modify other skills or system-wide agent settings.

版本历史

v2.1.1

v2.1.1: optimization, reference splits

v1.0.0

Initial release: Comprehensive skillset for production Machine Learning engineering, MLOps automation, and LLM integration. - Covers model deployment workflows, feature store setup, drift monitoring, A/B testing, and automated retraining. - Provides reference code and decision tables for containerization, serving options, and model monitoring. - Includes production-ready templates for RAG (retrieval-augmented generation) pipelines, LLM API integration, and cost tracking. - Supplies reference documentation links and Python snippets for core patterns. - Lists practical tool and command-line usage for rapid workflow setup.

元数据

Slug senior-ml-engineer

版本 2.1.1

许可证 MIT-0

累计安装 19

当前安装数 18

历史版本数 2

常见问题