功能描述

Predict construction project costs using Machine Learning. Use Linear Regression, K-Nearest Neighbors, and Random Forest models on historical project data. Train, evaluate, and deploy cost prediction models.

使用说明 (SKILL.md)

\r \r

Construction Cost Prediction with Machine Learning\r

Name: Cost Prediction
Author: datadrivenconstruction

\r

Overview\r

\r Based on DDC methodology (Chapter 4.5), this skill enables predicting construction project costs using historical data and machine learning algorithms. The approach transforms traditional expert-based estimation into data-driven prediction.\r \r Book Reference: "Будущее: прогнозы и машинное обучение" / "Future: Predictions and Machine Learning"\r \r

"Предсказания и прогнозы на основе исторических данных позволяют компаниям принимать более точные решения о стоимости и сроках проектов."\r — DDC Book, Chapter 4.5\r \r

Core Concepts\r

\r

Historical Data → Feature Engineering → ML Model → Cost Prediction\r
    │                    │                │              │\r
    ▼                    ▼                ▼              ▼\r
Past projects      Prepare data      Train model    New project\r
with costs         for ML            on history     cost forecast\r
```\r
\r
## Quick Start\r
\r
```python\r
import pandas as pd\r
from sklearn.model_selection import train_test_split\r
from sklearn.linear_model import LinearRegression\r
from sklearn.metrics import mean_absolute_error, r2_score\r
\r
# Load historical project data\r
df = pd.read_csv("historical_projects.csv")\r
\r
# Features and target\r
X = df[['area_m2', 'floors', 'complexity_score']]\r
y = df['total_cost']\r
\r
# Split data\r
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\r
\r
# Train model\r
model = LinearRegression()\r
model.fit(X_train, y_train)\r
\r
# Predict\r
predictions = model.predict(X_test)\r
print(f"R² Score: {r2_score(y_test, predictions):.2f}")\r
print(f"MAE: ${mean_absolute_error(y_test, predictions):,.0f}")\r
\r
# Predict new project\r
new_project = [[5000, 10, 3]]  # area, floors, complexity\r
cost = model.predict(new_project)\r
print(f"Predicted cost: ${cost[0]:,.0f}")\r
```\r
\r
## Data Preparation\r
\r
### Prepare Historical Dataset\r
\r
```python\r
import pandas as pd\r
import numpy as np\r
\r
def prepare_cost_dataset(df):\r
    """Prepare historical project data for ML"""\r
    # Select relevant features\r
    features = [\r
        'area_m2',\r
        'floors',\r
        'building_type',\r
        'location',\r
        'year_completed',\r
        'complexity_score',\r
        'material_quality',\r
        'total_cost'\r
    ]\r
\r
    df = df[features].copy()\r
\r
    # Handle missing values\r
    df = df.dropna(subset=['total_cost'])\r
    df['complexity_score'] = df['complexity_score'].fillna(df['complexity_score'].median())\r
\r
    # Encode categorical variables\r
    df = pd.get_dummies(df, columns=['building_type', 'location'])\r
\r
    # Calculate derived features\r
    df['cost_per_m2'] = df['total_cost'] / df['area_m2']\r
    df['cost_per_floor'] = df['total_cost'] / df['floors']\r
\r
    # Adjust for inflation (to current year prices)\r
    current_year = 2024\r
    inflation_rate = 0.03  # 3% annual\r
    df['years_ago'] = current_year - df['year_completed']\r
    df['adjusted_cost'] = df['total_cost'] * (1 + inflation_rate) ** df['years_ago']\r
\r
    return df\r
\r
# Usage\r
df = pd.read_csv("projects_history.csv")\r
df_prepared = prepare_cost_dataset(df)\r
```\r
\r
### Feature Engineering\r
\r
```python\r
def engineer_features(df):\r
    """Create additional features for better predictions"""\r
    # Interaction features\r
    df['area_x_floors'] = df['area_m2'] * df['floors']\r
    df['area_x_complexity'] = df['area_m2'] * df['complexity_score']\r
\r
    # Polynomial features\r
    df['area_squared'] = df['area_m2'] ** 2\r
\r
    # Log transforms (for skewed features)\r
    df['log_area'] = np.log1p(df['area_m2'])\r
\r
    # Binned features\r
    df['size_category'] = pd.cut(\r
        df['area_m2'],\r
        bins=[0, 1000, 5000, 10000, float('inf')],\r
        labels=['small', 'medium', 'large', 'xlarge']\r
    )\r
\r
    return df\r
```\r
\r
## Machine Learning Models\r
\r
### Linear Regression\r
\r
```python\r
from sklearn.linear_model import LinearRegression\r
from sklearn.preprocessing import StandardScaler\r
from sklearn.pipeline import Pipeline\r
\r
def train_linear_model(X_train, y_train):\r
    """Train Linear Regression model with scaling"""\r
    pipeline = Pipeline([\r
        ('scaler', StandardScaler()),\r
        ('regressor', LinearRegression())\r
    ])\r
\r
    pipeline.fit(X_train, y_train)\r
\r
    # Feature importance (coefficients)\r
    coefficients = pd.DataFrame({\r
        'feature': X_train.columns,\r
        'coefficient': pipeline.named_steps['regressor'].coef_\r
    }).sort_values('coefficient', key=abs, ascending=False)\r
\r
    return pipeline, coefficients\r
\r
# Usage\r
model, importance = train_linear_model(X_train, y_train)\r
print("Feature Importance:")\r
print(importance)\r
```\r
\r
### K-Nearest Neighbors (KNN)\r
\r
```python\r
from sklearn.neighbors import KNeighborsRegressor\r
from sklearn.preprocessing import StandardScaler\r
from sklearn.model_selection import GridSearchCV\r
\r
def train_knn_model(X_train, y_train):\r
    """Train KNN model with optimal k"""\r
    # Scale features\r
    scaler = StandardScaler()\r
    X_scaled = scaler.fit_transform(X_train)\r
\r
    # Find optimal k using cross-validation\r
    param_grid = {'n_neighbors': range(3, 20)}\r
    knn = KNeighborsRegressor()\r
    grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='neg_mean_absolute_error')\r
    grid_search.fit(X_scaled, y_train)\r
\r
    print(f"Best k: {grid_search.best_params_['n_neighbors']}")\r
    print(f"Best MAE: ${-grid_search.best_score_:,.0f}")\r
\r
    return grid_search.best_estimator_, scaler\r
\r
# Usage\r
knn_model, scaler = train_knn_model(X_train, y_train)\r
```\r
\r
### Random Forest\r
\r
```python\r
from sklearn.ensemble import RandomForestRegressor\r
\r
def train_random_forest(X_train, y_train):\r
    """Train Random Forest model"""\r
    rf = RandomForestRegressor(\r
        n_estimators=100,\r
        max_depth=10,\r
        min_samples_split=5,\r
        random_state=42\r
    )\r
\r
    rf.fit(X_train, y_train)\r
\r
    # Feature importance\r
    importance = pd.DataFrame({\r
        'feature': X_train.columns,\r
        'importance': rf.feature_importances_\r
    }).sort_values('importance', ascending=False)\r
\r
    return rf, importance\r
\r
# Usage\r
rf_model, importance = train_random_forest(X_train, y_train)\r
print("Feature Importance:")\r
print(importance.head(10))\r
```\r
\r
### Gradient Boosting\r
\r
```python\r
from sklearn.ensemble import GradientBoostingRegressor\r
\r
def train_gradient_boosting(X_train, y_train):\r
    """Train Gradient Boosting model"""\r
    gb = GradientBoostingRegressor(\r
        n_estimators=200,\r
        learning_rate=0.1,\r
        max_depth=5,\r
        random_state=42\r
    )\r
\r
    gb.fit(X_train, y_train)\r
    return gb\r
\r
# Usage\r
gb_model = train_gradient_boosting(X_train, y_train)\r
```\r
\r
## Model Evaluation\r
\r
### Comprehensive Evaluation\r
\r
```python\r
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\r
import numpy as np\r
\r
def evaluate_model(model, X_test, y_test, model_name="Model"):\r
    """Comprehensive model evaluation"""\r
    predictions = model.predict(X_test)\r
\r
    metrics = {\r
        'MAE': mean_absolute_error(y_test, predictions),\r
        'RMSE': np.sqrt(mean_squared_error(y_test, predictions)),\r
        'R²': r2_score(y_test, predictions),\r
        'MAPE': np.mean(np.abs((y_test - predictions) / y_test)) * 100\r
    }\r
\r
    print(f"\
{model_name} Evaluation:")\r
    print(f"  MAE:  ${metrics['MAE']:,.0f}")\r
    print(f"  RMSE: ${metrics['RMSE']:,.0f}")\r
    print(f"  R²:   {metrics['R²']:.3f}")\r
    print(f"  MAPE: {metrics['MAPE']:.1f}%")\r
\r
    return metrics, predictions\r
\r
# Usage\r
metrics, predictions = evaluate_model(model, X_test, y_test, "Linear Regression")\r
```\r
\r
### Compare Multiple Models\r
\r
```python\r
def compare_models(models, X_test, y_test):\r
    """Compare multiple models"""\r
    results = []\r
\r
    for name, model in models.items():\r
        metrics, _ = evaluate_model(model, X_test, y_test, name)\r
        metrics['Model'] = name\r
        results.append(metrics)\r
\r
    comparison = pd.DataFrame(results)\r
    comparison = comparison.set_index('Model')\r
\r
    print("\
Model Comparison:")\r
    print(comparison.round(2))\r
\r
    return comparison\r
\r
# Usage\r
models = {\r
    'Linear Regression': linear_model,\r
    'KNN': knn_model,\r
    'Random Forest': rf_model,\r
    'Gradient Boosting': gb_model\r
}\r
comparison = compare_models(models, X_test, y_test)\r
```\r
\r
### Cross-Validation\r
\r
```python\r
from sklearn.model_selection import cross_val_score\r
\r
def cross_validate_model(model, X, y, cv=5):\r
    """Perform cross-validation"""\r
    scores = cross_val_score(model, X, y, cv=cv, scoring='neg_mean_absolute_error')\r
    mae_scores = -scores\r
\r
    print(f"Cross-Validation MAE: ${mae_scores.mean():,.0f} (+/- ${mae_scores.std():,.0f})")\r
    return mae_scores\r
\r
# Usage\r
cv_scores = cross_validate_model(rf_model, X, y)\r
```\r
\r
## Prediction Pipeline\r
\r
### Complete Prediction Function\r
\r
```python\r
import joblib\r
\r
def create_prediction_pipeline(model, feature_names, scaler=None):\r
    """Create a reusable prediction pipeline"""\r
\r
    def predict_cost(project_data):\r
        """\r
        Predict cost for new project\r
\r
        Args:\r
            project_data: dict with project features\r
\r
        Returns:\r
            Predicted cost and confidence interval\r
        """\r
        # Create DataFrame from input\r
        df = pd.DataFrame([project_data])\r
\r
        # Ensure all required features\r
        for col in feature_names:\r
            if col not in df.columns:\r
                df[col] = 0\r
\r
        df = df[feature_names]\r
\r
        # Scale if necessary\r
        if scaler:\r
            df = scaler.transform(df)\r
\r
        # Predict\r
        prediction = model.predict(df)[0]\r
\r
        # Confidence interval (simple estimation)\r
        confidence = 0.15  # 15% margin\r
        lower = prediction * (1 - confidence)\r
        upper = prediction * (1 + confidence)\r
\r
        return {\r
            'predicted_cost': prediction,\r
            'lower_bound': lower,\r
            'upper_bound': upper,\r
            'confidence_level': f"{(1-confidence)*100:.0f}%"\r
        }\r
\r
    return predict_cost\r
\r
# Usage\r
predictor = create_prediction_pipeline(rf_model, X.columns.tolist())\r
\r
# Predict new project\r
new_project = {\r
    'area_m2': 5000,\r
    'floors': 8,\r
    'complexity_score': 3,\r
    'material_quality': 2\r
}\r
\r
result = predictor(new_project)\r
print(f"Predicted Cost: ${result['predicted_cost']:,.0f}")\r
print(f"Range: ${result['lower_bound']:,.0f} - ${result['upper_bound']:,.0f}")\r
```\r
\r
### Save and Load Model\r
\r
```python\r
import joblib\r
\r
# Save model\r
def save_model(model, filepath):\r
    """Save trained model to file"""\r
    joblib.dump(model, filepath)\r
    print(f"Model saved to {filepath}")\r
\r
# Load model\r
def load_model(filepath):\r
    """Load model from file"""\r
    model = joblib.load(filepath)\r
    print(f"Model loaded from {filepath}")\r
    return model\r
\r
# Usage\r
save_model(rf_model, "cost_prediction_model.pkl")\r
loaded_model = load_model("cost_prediction_model.pkl")\r
```\r
\r
## Using with ChatGPT\r
\r
```python\r
# Prompt for ChatGPT to help with cost prediction\r
\r
prompt = """\r
I have historical construction project data with these columns:\r
- area_m2: Building area in square meters\r
- floors: Number of floors\r
- building_type: residential, commercial, industrial\r
- total_cost: Total project cost in USD\r
\r
Write Python code using scikit-learn to:\r
1. Prepare the data for machine learning\r
2. Train a Random Forest model\r
3. Evaluate the model\r
4. Predict cost for a new 3000 m² commercial building with 5 floors\r
"""\r
```\r
\r
## Quick Reference\r
\r
| Task | Code |\r
|------|------|\r
| Split data | `train_test_split(X, y, test_size=0.2)` |\r
| Linear Regression | `LinearRegression().fit(X, y)` |\r
| KNN | `KNeighborsRegressor(n_neighbors=5)` |\r
| Random Forest | `RandomForestRegressor(n_estimators=100)` |\r
| Predict | `model.predict(X_new)` |\r
| MAE | `mean_absolute_error(y_true, y_pred)` |\r
| R² Score | `r2_score(y_true, y_pred)` |\r
| Cross-validate | `cross_val_score(model, X, y, cv=5)` |\r
| Save model | `joblib.dump(model, 'file.pkl')` |\r
\r
## Best Practices\r
\r
1. **Data Quality**: More historical data = better predictions\r
2. **Feature Selection**: Include relevant project characteristics\r
3. **Inflation Adjustment**: Normalize costs to current prices\r
4. **Regular Retraining**: Update model with new completed projects\r
5. **Ensemble Methods**: Combine multiple models for robustness\r
6. **Confidence Intervals**: Always provide prediction ranges\r
\r
## Resources\r
\r
- **Book**: "Data-Driven Construction" by Artem Boiko, Chapter 4.5\r
- **Website**: https://datadrivenconstruction.io\r
- **scikit-learn**: https://scikit-learn.org\r
\r
## Next Steps\r
\r
- See `duration-prediction` for project duration forecasting\r
- See `ml-model-builder` for custom ML workflows\r
- See `kpi-dashboard` for visualization\r
- See `big-data-analysis` for large dataset processing\r

安全使用建议

This skill appears coherent and local-only. Before using it, confirm you trust the CSVs or other local data you will load (they may contain sensitive project or personnel information). Ensure the Python environment has pandas, numpy, and scikit-learn installed (the SKILL.md examples assume these). If you prefer stricter isolation, run the code in a controlled environment (container/VM) so filesystem access is limited. Finally, clarify what 'deploy' means for your workflow — the skill saves models locally but does not include steps to publish to a remote service.

功能分析

Type: OpenClaw Skill Name: cost-prediction Version: 2.0.0 The skill is designed for construction cost prediction using machine learning, involving standard data processing and model training/evaluation with scikit-learn. All file operations (reading CSVs, saving/loading models with joblib) are local and explicitly permitted by the 'filesystem' permission in claw.json. Crucially, the instructions.md file explicitly states, 'All computation is local (filesystem only, no external APIs),' which strongly indicates a lack of malicious intent and prevents data exfiltration or unauthorized network communication. There is no evidence of prompt injection with malicious objectives, obfuscation, or attempts at persistence.

能力评估

✓ Purpose & Capability

Name and description (train/evaluate/deploy cost-prediction models) match the SKILL.md and instructions.md. The declared filesystem permission aligns with reading historical CSVs and saving models. The only mild ambiguity is the word 'deploy' — the instructions constrain computation to local filesystem (no external APIs), so 'deploy' appears to mean saving models locally rather than deploying to a remote service.

✓ Instruction Scope

Instructions focus on preparing local datasets, feature engineering, training LinearRegression/KNN/RandomForest, evaluating metrics, and saving models. They do not instruct reading unrelated system files, accessing environment secrets, or contacting external endpoints. The guidance to 'gather project parameters' means soliciting user input, not scanning system data.

ℹ Install Mechanism

There is no install spec (instruction-only), which is low risk because nothing is written or downloaded by the skill itself. However, runtime Python libraries (pandas, scikit-learn, numpy) are used in the examples but not declared; users must ensure those dependencies are present. This is a minor coherence/usability gap but not a security red flag.

✓ Credentials

The skill requests no environment variables or external credentials. The filesystem permission declared in claw.json is proportionate to the stated need to read historical datasets and save trained models. Note: filesystem permission inherently allows reading any local files the agent process can access, so users should be aware of privacy of local data used for training.

✓ Persistence & Privilege

always:false and normal autonomous-invocation settings are appropriate. The skill does not request to modify other skills or system-wide settings and has no install step that would create persistent background components.

版本历史

v2.0.0

Major update: Adds comprehensive guidance and code for construction cost prediction using multiple ML models. - Introduces an in-depth SKILL.md with project overview, methodology, and book reference. - Provides detailed code snippets for data preparation, feature engineering, and handling inflation adjustments. - Offers step-by-step instructions for training and evaluating Linear Regression, K-Nearest Neighbors, Random Forest, and Gradient Boosting models. - Includes model evaluation metrics (MAE, RMSE, R², MAPE) and feature importance analysis. - Enables easy model comparison for optimal selection.

v1.0.0

Initial release of the cost-prediction skill for construction projects. - Predicts construction project costs using historical data and machine learning. - Supports Linear Regression, K-Nearest Neighbors, Random Forest, and Gradient Boosting models. - Includes data preparation and feature engineering utilities for effective modeling. - Provides example code snippets for training, evaluating, and deploying models. - Follows DDC methodology and integrates comprehensive model evaluation tools.

元数据

Slug cost-prediction

版本 2.0.0

许可证 —

累计安装 7

当前安装数 7

历史版本数 2

常见问题

Cost Prediction 是什么？

Predict construction project costs using Machine Learning. Use Linear Regression, K-Nearest Neighbors, and Random Forest models on historical project data. Train, evaluate, and deploy cost prediction models. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1352 次。

如何安装 Cost Prediction？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install cost-prediction」即可一键安装，无需额外配置。

Cost Prediction 是免费的吗？

是的，Cost Prediction 完全免费（开源免费），可自由下载、安装和使用。

Cost Prediction 支持哪些平台？

Cost Prediction 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Cost Prediction？

由 datadrivenconstruction（@datadrivenconstruction）开发并维护，当前版本 v2.0.0。

Cost Prediction