功能描述

Automate construction data processing using LLM (ChatGPT, Claude, LLaMA). Generate Python/Pandas scripts, extract data from documents, and create automated p...

使用说明 (SKILL.md)

\r

LLM Data Automation for Construction\r

Name: Llm Data Automation
Author: datadrivenconstruction

\r

Overview\r

\r Based on DDC methodology (Chapter 2.3), this skill enables automation of construction data processing using Large Language Models (LLM). Instead of manually coding data transformations, you describe what you need in natural language, and the LLM generates the necessary Python/Pandas code.\r \r Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"\r \r

"LLM-модели, такие как ChatGPT и LLaMA, позволяют специалистам без глубоких знаний программирования внести свой вклад в автоматизацию и улучшение бизнес-процессов компании."\r — DDC Book, Chapter 2.3\r \r

Quick Start\r

\r

Option 1: Use ChatGPT/Claude Online\r

Simply describe your data processing task in natural language:\r \r

Prompt: "Write Python code to read an Excel file with construction materials,\r
filter rows where quantity > 100, and save to CSV."\r
```\r
\r
### Option 2: Run Local LLM (Ollama)\r
```bash\r
# Install Ollama from ollama.com\r
ollama pull mistral\r
\r
# Run a query\r
ollama run mistral "Write Pandas code to calculate total cost from quantity * unit_price"\r
```\r
\r
### Option 3: Use LM Studio (GUI)\r
1. Download from lmstudio.ai\r
2. Install and select a model (e.g., Mistral, LLaMA)\r
3. Start chatting with your local AI\r
\r
## Core Concepts\r
\r
### DataFrame as Universal Format\r
```python\r
import pandas as pd\r
\r
# Construction project as DataFrame\r
# Rows = elements, Columns = attributes\r
df = pd.DataFrame({\r
    'element_id': ['W001', 'W002', 'C001'],\r
    'category': ['Wall', 'Wall', 'Column'],\r
    'material': ['Concrete', 'Brick', 'Steel'],\r
    'volume_m3': [45.5, 32.0, 8.2],\r
    'cost_per_m3': [150, 80, 450]\r
})\r
\r
# Calculate total cost\r
df['total_cost'] = df['volume_m3'] * df['cost_per_m3']\r
print(df)\r
```\r
\r
### LLM Prompts for Construction Tasks\r
\r
**Data Import:**\r
```\r
"Write code to import Excel file with construction schedule,\r
parse dates, and create a Pandas DataFrame"\r
```\r
\r
**Data Filtering:**\r
```\r
"Filter construction elements where category is 'Structural'\r
and cost exceeds budget limit of 50000"\r
```\r
\r
**Data Aggregation:**\r
```\r
"Group construction data by floor level,\r
calculate total volume and cost for each floor"\r
```\r
\r
**Report Generation:**\r
```\r
"Create summary report with material quantities grouped by category,\r
export to Excel with formatting"\r
```\r
\r
## Common Use Cases\r
\r
### 1. Extract Data from PDF Documents\r
```python\r
# Prompt to ChatGPT:\r
# "Write code to extract tables from PDF and convert to DataFrame"\r
\r
import pdfplumber\r
import pandas as pd\r
\r
def pdf_to_dataframe(pdf_path):\r
    """Extract tables from PDF file"""\r
    all_tables = []\r
    with pdfplumber.open(pdf_path) as pdf:\r
        for page in pdf.pages:\r
            tables = page.extract_tables()\r
            for table in tables:\r
                if table:\r
                    df = pd.DataFrame(table[1:], columns=table[0])\r
                    all_tables.append(df)\r
\r
    if all_tables:\r
        return pd.concat(all_tables, ignore_index=True)\r
    return pd.DataFrame()\r
\r
# Usage\r
df = pdf_to_dataframe("construction_spec.pdf")\r
df.to_excel("extracted_data.xlsx", index=False)\r
```\r
\r
### 2. Process BIM Element Data\r
```python\r
# Prompt: "Analyze BIM elements, group by category, calculate volumes"\r
\r
import pandas as pd\r
\r
def analyze_bim_elements(csv_path):\r
    """Analyze BIM element data from CSV export"""\r
    df = pd.read_csv(csv_path)\r
\r
    # Group by category\r
    summary = df.groupby('Category').agg({\r
        'Volume': 'sum',\r
        'Area': 'sum',\r
        'ElementId': 'count'\r
    }).rename(columns={'ElementId': 'Count'})\r
\r
    return summary\r
\r
# Usage\r
summary = analyze_bim_elements("revit_export.csv")\r
print(summary)\r
```\r
\r
### 3. Cost Estimation Pipeline\r
```python\r
# Prompt: "Create cost estimation from quantities and unit prices"\r
\r
import pandas as pd\r
\r
def calculate_cost_estimate(quantities_df, prices_df):\r
    """\r
    Calculate project cost estimate\r
\r
    Args:\r
        quantities_df: DataFrame with columns [item_code, quantity]\r
        prices_df: DataFrame with columns [item_code, unit_price, unit]\r
\r
    Returns:\r
        DataFrame with cost calculations\r
    """\r
    # Merge quantities with prices\r
    result = quantities_df.merge(prices_df, on='item_code', how='left')\r
\r
    # Calculate costs\r
    result['total_cost'] = result['quantity'] * result['unit_price']\r
\r
    # Add summary\r
    result['cost_percentage'] = (result['total_cost'] /\r
                                  result['total_cost'].sum() * 100).round(2)\r
\r
    return result\r
\r
# Usage\r
quantities = pd.DataFrame({\r
    'item_code': ['C001', 'S001', 'W001'],\r
    'quantity': [150, 2000, 500]\r
})\r
\r
prices = pd.DataFrame({\r
    'item_code': ['C001', 'S001', 'W001'],\r
    'unit_price': [120, 45, 85],\r
    'unit': ['m3', 'kg', 'm2']\r
})\r
\r
estimate = calculate_cost_estimate(quantities, prices)\r
print(estimate)\r
```\r
\r
### 4. Schedule Data Processing\r
```python\r
# Prompt: "Parse construction schedule, calculate durations, identify delays"\r
\r
import pandas as pd\r
from datetime import datetime\r
\r
def analyze_schedule(schedule_path):\r
    """Analyze construction schedule for delays"""\r
    df = pd.read_excel(schedule_path)\r
\r
    # Parse dates\r
    df['start_date'] = pd.to_datetime(df['start_date'])\r
    df['end_date'] = pd.to_datetime(df['end_date'])\r
    df['actual_end'] = pd.to_datetime(df['actual_end'])\r
\r
    # Calculate durations\r
    df['planned_duration'] = (df['end_date'] - df['start_date']).dt.days\r
    df['actual_duration'] = (df['actual_end'] - df['start_date']).dt.days\r
\r
    # Identify delays\r
    df['delay_days'] = df['actual_duration'] - df['planned_duration']\r
    df['is_delayed'] = df['delay_days'] > 0\r
\r
    return df\r
\r
# Usage\r
schedule = analyze_schedule("project_schedule.xlsx")\r
delayed_tasks = schedule[schedule['is_delayed']]\r
print(f"Delayed tasks: {len(delayed_tasks)}")\r
```\r
\r
## Local LLM Setup (No Internet Required)\r
\r
### Using Ollama\r
```bash\r
# Install\r
curl -fsSL https://ollama.com/install.sh | sh\r
\r
# Download models\r
ollama pull mistral      # General purpose, 7B params\r
ollama pull codellama    # Code-focused\r
ollama pull deepseek-coder  # Best for coding tasks\r
\r
# Run\r
ollama run mistral "Write Pandas code to merge two DataFrames on project_id"\r
```\r
\r
### Using LlamaIndex for Company Documents\r
```python\r
# Load company documents into local LLM\r
from llama_index import SimpleDirectoryReader, VectorStoreIndex\r
\r
# Read all PDFs from folder\r
reader = SimpleDirectoryReader("company_documents/")\r
documents = reader.load_data()\r
\r
# Create searchable index\r
index = VectorStoreIndex.from_documents(documents)\r
\r
# Query your documents\r
query_engine = index.as_query_engine()\r
response = query_engine.query(\r
    "What are the standard concrete mix specifications?"\r
)\r
print(response)\r
```\r
\r
## IDE Recommendations\r
\r
| IDE | Best For | Features |\r
|-----|----------|----------|\r
| **Jupyter Notebook** | Learning, experiments | Interactive cells, visualizations |\r
| **Google Colab** | Free GPU, quick start | Cloud-based, pre-installed libs |\r
| **VS Code** | Professional development | Extensions, GitHub Copilot |\r
| **PyCharm** | Large projects | Advanced debugging, refactoring |\r
\r
### Quick Setup with Jupyter\r
```bash\r
pip install jupyter pandas openpyxl pdfplumber\r
jupyter notebook\r
```\r
\r
## Best Practices\r
\r
1. **Start Simple**: Begin with clear, specific prompts\r
2. **Iterate**: Refine prompts based on results\r
3. **Validate**: Always check generated code before running\r
4. **Document**: Save working prompts for reuse\r
5. **Secure**: Use local LLM for sensitive company data\r
\r
## Common Prompts Library\r
\r
### Data Import\r
- "Read Excel file and show first 10 rows"\r
- "Import CSV with custom delimiter and encoding"\r
- "Load multiple Excel sheets into dictionary of DataFrames"\r
\r
### Data Cleaning\r
- "Remove duplicate rows based on element_id"\r
- "Fill missing values with column mean"\r
- "Convert column to numeric, handling errors"\r
\r
### Data Analysis\r
- "Calculate descriptive statistics for numeric columns"\r
- "Find correlation between cost and duration"\r
- "Identify outliers using IQR method"\r
\r
### Data Export\r
- "Export to Excel with multiple sheets"\r
- "Save to CSV with specific encoding"\r
- "Generate formatted PDF report"\r
\r
## Resources\r
\r
- **Book**: "Data-Driven Construction" by Artem Boiko, Chapter 2.3\r
- **Website**: https://datadrivenconstruction.io\r
- **Pandas Documentation**: https://pandas.pydata.org/docs/\r
- **Ollama**: https://ollama.com\r
- **LM Studio**: https://lmstudio.ai\r
- **Google Colab**: https://colab.research.google.com\r
\r
## Next Steps\r
\r
- See `pandas-construction-analysis` for advanced Pandas operations\r
- See `pdf-to-structured` for document processing\r
- See `etl-pipeline` for automated data pipelines\r
- See `rag-construction` for RAG implementation with construction documents\r

安全使用建议

This skill appears coherent with its stated purpose, but take these precautions before installing or using it: - Note the OS restriction: metadata lists win32; ensure it matches your environment. - The skill expects python3 and filesystem access; make sure you run generated scripts in a safe environment (virtualenv/container) and review LLM-generated code before executing it — LLMs can produce buggy or unsafe commands. - The SKILL.md suggests using online LLM services (ChatGPT/Claude) or installing local tools (Ollama, LM Studio). Avoid sending sensitive or proprietary project data to online models unless you trust the provider and understand data retention policies. - If you install third-party LLM runtimes, download them only from official sites and verify checksums where available. - The skill does not ask for API keys or other credentials, which reduces exfiltration risk; nevertheless, monitor any prompts that ask you to paste credentials into the tool or chat. If you want a stricter safety posture, run the skill's workflows in an isolated VM/container and review all generated scripts before running them.

功能分析

Type: OpenClaw Skill Name: llm-data-automation Version: 2.1.0 The skill bundle is designed for LLM-driven construction data automation, generating Python/Pandas scripts for data processing. The `claw.json` requests `filesystem` permission, which is justified by the Python code examples in `SKILL.md` that involve reading and writing various file formats (CSV, Excel, PDF) using `pandas` and `pdfplumber`. While `SKILL.md` includes a `curl | sh` command for installing Ollama, it is presented as a manual user instruction, not an instruction for the OpenClaw agent to execute. Neither `SKILL.md` nor `instructions.md` contain any directives for prompt injection, data exfiltration, persistence, or other malicious activities. All code examples and instructions align with the stated purpose and lack high-risk behaviors or suspicious external communication.

能力评估

✓ Purpose & Capability

Name/description match the instructions and manifest: examples show generating Python/Pandas code, extracting PDFs, processing CSV/Excel/BIM exports. Declared requirement (python3) and filesystem permission are appropriate for a file-processing, code-generation skill.

✓ Instruction Scope

SKILL.md instructs the agent to gather user-provided data/files, generate or run Python code (pandas, pdfplumber), and optionally use local LLMs (Ollama/LM Studio) or online LLMs. All referenced actions are within the stated purpose and limited to user-supplied data; there are no instructions to read unrelated system files or to exfiltrate data to hidden endpoints.

✓ Install Mechanism

There is no install spec (instruction-only). The skill recommends third-party tools (Ollama, LM Studio) but does not bundle downloads or run installers itself. This is low-risk as long as the user installs those tools from official sources.

✓ Credentials

The skill requests no environment variables or credentials. The only declared permission is filesystem access in claw.json, which is appropriate for reading/writing user-supplied data files. No unrelated secrets or config paths are requested.

✓ Persistence & Privilege

always is false and the skill is user-invocable with normal autonomous invocation allowed. It does not request persistent special privileges or attempt to modify other skills or system-wide settings.

版本历史

v2.1.0

llm-data-automation v2.1.0 - Expanded documentation with practical examples for common construction data tasks, including PDF extraction, BIM data analysis, cost estimation, and schedule processing. - Added clear quick start guides for using LLMs online (ChatGPT, Claude), locally (Ollama, LM Studio), and with company documents (LlamaIndex). - Highlighted core concepts and best practices for non-programmers to use Python/Pandas via natural language prompts. - Provided IDE recommendations and setup instructions for popular development environments. - Updated local LLM setup instructions with supported models and use cases for offline automation.

v1.0.0

LLM Data Automation 1.0.0 – Initial Release - Launches an automation skill to generate Python/Pandas scripts for construction data processing with LLMs such as ChatGPT, Claude, and LLaMA. - Enables data extraction from documents, BIM analysis, cost estimation, and schedule processing via natural language prompts. - Provides detailed quick-start guides for ChatGPT/Claude, local LLMs (Ollama, LM Studio), and company document indexing with LlamaIndex. - Includes practical examples and ready-to-use code snippets for common construction data automation tasks. - Recommends modern IDEs and offers best practices for safe and effective use.

元数据

Slug llm-data-automation

版本 2.1.0

许可证 —

累计安装 5

当前安装数 4

历史版本数 2

常见问题

Llm Data Automation 是什么？

Automate construction data processing using LLM (ChatGPT, Claude, LLaMA). Generate Python/Pandas scripts, extract data from documents, and create automated p... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 936 次。

如何安装 Llm Data Automation？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install llm-data-automation」即可一键安装，无需额外配置。

Llm Data Automation 是免费的吗？

是的，Llm Data Automation 完全免费（开源免费），可自由下载、安装和使用。

Llm Data Automation 支持哪些平台？

Llm Data Automation 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（win32）。

谁开发了 Llm Data Automation？

由 datadrivenconstruction（@datadrivenconstruction）开发并维护，当前版本 v2.1.0。

Llm Data Automation