功能描述

Expert pandas skill for data manipulation, cleaning, analysis, and transformation. Use this skill when working with tabular data, CSV/Excel files, data analy...

使用说明 (SKILL.md)

\r \r

Pandas Data Processing Skill\r

Name: Pandas Skill
Author: yangruihan

\r English | 简体中文\r \r This skill provides comprehensive pandas data processing capabilities through executable scripts and reference documentation. Use this skill whenever tasks involve data manipulation, cleaning, analysis, or transformation of tabular data.\r \r

When to Use This Skill\r

\r Activate this skill when the user requests:\r \r

Data cleaning operations (handling missing values, duplicates, outliers)\r
Data analysis and statistical summaries\r
Format conversions (CSV ↔ Excel ↔ JSON ↔ Parquet)\r
Data transformation (filtering, sorting, aggregation, pivoting)\r
Merging or combining multiple datasets\r
Generating data quality reports\r
Any pandas DataFrame operations\r \r

Core Capabilities\r

\r

1. Data Cleaning (`scripts/data_cleaner.py`)\r

\r Handles common data cleaning tasks with a single command:\r \r Usage:\r

python scripts/data_cleaner.py input.csv output.csv [options]\r
```\r
\r
**Available Options:**\r
- `--remove-duplicates`: Remove duplicate rows\r
- `--handle-missing [strategy]`: Handle missing values\r
  - Strategies: `drop`, `fill`, `forward`, `backward`, `mean`, `median`\r
- `--fill-value [value]`: Custom fill value for missing data\r
- `--remove-outliers`: Remove outliers using IQR or Z-score method\r
- `--outlier-method [method]`: Choose `iqr` or `zscore` (default: iqr)\r
- `--standardize-columns`: Standardize column names (lowercase, underscores)\r
\r
**Example:**\r
```bash\r
python scripts/data_cleaner.py data.csv cleaned_data.csv \\r
    --remove-duplicates \\r
    --handle-missing mean \\r
    --remove-outliers \\r
    --standardize-columns\r
```\r
\r
### 2. Data Analysis (`scripts/data_analyzer.py`)\r
\r
Generates comprehensive data analysis reports:\r
\r
**Usage:**\r
```bash\r
python scripts/data_analyzer.py input.csv [options]\r
```\r
\r
**Available Options:**\r
- `--output, -o [file]`: Save report to file\r
- `--format [format]`: Output format (`json` or `text`, default: json)\r
\r
**Report Includes:**\r
- Basic information (rows, columns, memory usage)\r
- Data type distribution\r
- Missing values analysis\r
- Numeric column statistics (mean, std, min, max, quartiles, skewness, kurtosis)\r
- Categorical column statistics (unique values, value counts)\r
- Correlation analysis\r
- Outlier detection\r
\r
**Example:**\r
```bash\r
python scripts/data_analyzer.py sales_data.csv -o report.json --format json\r
```\r
\r
### 3. Data Transformation (`scripts/data_transformer.py`)\r
\r
Performs various data transformation operations through subcommands:\r
\r
#### Convert Format\r
```bash\r
python scripts/data_transformer.py convert input.csv output.xlsx\r
```\r
Supports: CSV, Excel (.xlsx/.xls), JSON, Parquet, HTML\r
\r
#### Merge Files\r
```bash\r
python scripts/data_transformer.py merge file1.csv file2.csv file3.csv \\r
    --output merged.csv \\r
    --how outer \\r
    --on key_column\r
```\r
\r
#### Filter Data\r
```bash\r
python scripts/data_transformer.py filter data.csv \\r
    --query "age > 18 and city == 'Beijing'" \\r
    --output filtered.csv\r
```\r
\r
#### Sort Data\r
```bash\r
python scripts/data_transformer.py sort data.csv \\r
    --by sales quantity \\r
    --descending \\r
    --output sorted.csv\r
```\r
\r
#### Select Columns\r
```bash\r
python scripts/data_transformer.py select data.csv \\r
    --columns name age city \\r
    --output selected.csv\r
```\r
\r
## Reference Documentation\r
\r
The `references/` directory contains detailed documentation:\r
\r
### `references/common_operations.md`\r
\r
Comprehensive reference covering:\r
- Data reading/saving (CSV, Excel, JSON, SQL, Parquet)\r
- Data exploration (head, info, describe, dtypes)\r
- Data selection and filtering (loc, iloc, boolean indexing, query)\r
- Data cleaning (handling missing/duplicate values, type conversion)\r
- Data transformation (apply, map, sorting, column operations)\r
- Groupby and aggregation operations\r
- Pivot tables\r
- Merging and joining (concat, merge, join)\r
- Time series operations\r
- String operations\r
- Performance optimization tips\r
\r
**When to use:** When Claude needs to understand pandas syntax or find the right method for a specific operation.\r
\r
### `references/data_cleaning_best_practices.md`\r
\r
Best practices guide covering:\r
- Data quality check checklist\r
- Missing value handling strategies with decision tree\r
- Outlier detection methods (IQR, Z-Score, percentile)\r
- Data type optimization for memory efficiency\r
- String cleaning techniques\r
- Date/time standardization\r
- Complete cleaning pipeline template\r
- Common problems and solutions\r
- Data validation methods\r
\r
**When to use:** When designing a data cleaning workflow or deciding on the best approach for specific data quality issues.\r
\r
## Workflow Guidelines\r
\r
### Step 1: Initial Assessment\r
Always start by analyzing the data:\r
```bash\r
python scripts/data_analyzer.py input_file.csv -o analysis_report.json\r
```\r
Review the report to understand data quality, types, missing values, and potential issues.\r
\r
### Step 2: Plan Cleaning Strategy\r
Based on the analysis report:\r
- Identify missing value strategy (reference: `data_cleaning_best_practices.md`)\r
- Determine if duplicates should be removed\r
- Decide on outlier handling approach\r
- Plan any necessary type conversions\r
\r
### Step 3: Execute Cleaning\r
Run the data cleaner with appropriate options:\r
```bash\r
python scripts/data_cleaner.py input.csv cleaned.csv [options]\r
```\r
\r
### Step 4: Transform as Needed\r
Apply any transformations (filtering, sorting, format conversion, merging):\r
```bash\r
python scripts/data_transformer.py [subcommand] [options]\r
```\r
\r
### Step 5: Validate Results\r
Re-run analysis on the cleaned data to verify improvements:\r
```bash\r
python scripts/data_analyzer.py cleaned.csv -o final_report.json\r
```\r
\r
## Common Patterns\r
\r
### Pattern 1: Quick Data Quality Report\r
```bash\r
python scripts/data_analyzer.py data.csv --format text\r
```\r
\r
### Pattern 2: Standard Cleaning Pipeline\r
```bash\r
python scripts/data_cleaner.py raw_data.csv clean_data.csv \\r
    --standardize-columns \\r
    --remove-duplicates \\r
    --handle-missing median \\r
    --remove-outliers\r
```\r
\r
### Pattern 3: Excel to CSV with Filtering\r
```bash\r
# Convert\r
python scripts/data_transformer.py convert data.xlsx data.csv\r
\r
# Filter\r
python scripts/data_transformer.py filter data.csv \\r
    --query "status == 'active'" \\r
    --output filtered.csv\r
```\r
\r
### Pattern 4: Merge Multiple CSVs\r
```bash\r
python scripts/data_transformer.py merge *.csv \\r
    --output combined.csv\r
```\r
\r
## Dependencies\r
\r
Ensure pandas is installed:\r
```bash\r
pip install pandas numpy openpyxl\r
```\r
\r
Optional for specific formats:\r
```bash\r
pip install pyarrow  # For Parquet support\r
pip install xlrd     # For older Excel files (.xls)\r
```\r
\r
## Tips for Effective Use\r
\r
1. **Start with analysis:** Always run the analyzer first to understand the data\r
2. **Incremental cleaning:** Apply cleaning operations step by step, verify each step\r
3. **Preserve originals:** Never overwrite original data files\r
4. **Check references:** Consult reference docs for complex operations or best practices\r
5. **Validate results:** Use the analyzer to verify cleaning effectiveness\r
6. **Memory efficiency:** For large files, consider using the data type optimization techniques in the reference docs\r
7. **Combine operations:** Chain multiple transformer commands for complex workflows\r
\r
## Limitations\r
\r
- Scripts work with single-machine memory constraints (for very large datasets, consider Dask)\r
- Time series resampling and rolling operations require custom pandas code\r
- Complex statistical modeling beyond basic descriptive statistics requires additional libraries\r
- For advanced visualizations, use matplotlib/seaborn directly\r
\r
## Troubleshooting\r
\r
**Import errors:** Ensure pandas and dependencies are installed\r
**Memory errors:** Process data in chunks or optimize dtypes (see references)\r
**Encoding issues:** Add `encoding='utf-8'` parameter when loading CSVs\r
**Date parsing issues:** Use `pd.to_datetime()` with explicit format string\r
\r
For detailed pandas operations and troubleshooting, always refer to `references/common_operations.md` and `references/data_cleaning_best_practices.md`.\r

安全使用建议

This skill appears coherent and only performs local data processing with pandas. Before using it: (1) Run it in an isolated environment (virtualenv/container) and install dependencies from PyPI to limit supply-chain risk. (2) Backup any original data — the scripts overwrite or produce output files you supply. (3) Review/grep the scripts yourself if you plan to run them on highly sensitive files (they perform only local I/O and pandas operations, but it's best practice). (4) For very large datasets, follow the documentation advice about chunked processing or use a tool designed for big data. If you need stronger assurances, request the publisher's provenance (homepage/repo) or scan package dependencies for known vulnerabilities.

功能分析

Type: OpenClaw Skill Name: pandas-skill Version: 1.0.0 The pandas-skill bundle is a legitimate and well-documented toolkit for data analysis, cleaning, and transformation using the pandas library. It includes functional Python scripts (data_analyzer.py, data_cleaner.py, data_transformer.py) that implement standard data science operations such as handling missing values, outlier detection, and format conversion. The SKILL.md instructions provide a safe and logical workflow for an AI agent, and no evidence of malicious intent, data exfiltration, or harmful prompt injection was found across the code or documentation.

能力评估

✓ Purpose & Capability

Name/description match the shipped artifacts: three pandas scripts (analyzer, cleaner, transformer), examples, and reference docs. No unexpected environment variables, binaries, or cloud credentials are requested.

✓ Instruction Scope

SKILL.md instructs running the included scripts against local files (CSV/Excel/JSON/Parquet). The instructions do not direct the agent to read unrelated system files, call external endpoints, or exfiltrate data. All referenced operations (analyze → clean → transform → validate) are within the stated purpose.

✓ Install Mechanism

No install spec; repository is instruction+script based with a requirements.txt listing typical Python libs (pandas, numpy, openpyxl, pyarrow, xlrd). No downloads from arbitrary URLs or archive extraction; install is standard pip usage which is proportionate to the functionality.

✓ Credentials

The skill declares no required environment variables, no credentials, and no config paths. The code accesses only files supplied by the user and standard Python modules; no secret-like env variables or unrelated service tokens are requested.

✓ Persistence & Privilege

Skill does not request always:true and does not attempt to modify agent/system-wide configuration. It runs as CLI scripts operating on local files and writes outputs to user-specified paths.

版本历史

v1.0.0

pandas-skill 1.0.0 - Initial release providing expert scripts and documentation for pandas-based data manipulation, cleaning, analysis, and transformation. - Includes command-line scripts for data cleaning (missing values, duplicates, outliers), analysis (summary reports), and transformation (conversion, merging, filtering). - Adds comprehensive reference documentation for common pandas operations and data cleaning best practices. - Outlines recommended workflows and common usage patterns. - Lists required and optional dependencies for full functionality.

元数据

Slug pandas-skill

版本 1.0.0

许可证 MIT-0

累计安装 8

当前安装数 8

历史版本数 1

常见问题

Pandas Skill 是什么？

Expert pandas skill for data manipulation, cleaning, analysis, and transformation. Use this skill when working with tabular data, CSV/Excel files, data analy... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1262 次。

如何安装 Pandas Skill？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pandas-skill」即可一键安装，无需额外配置。

Pandas Skill 是免费的吗？

是的，Pandas Skill 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Pandas Skill 支持哪些平台？

Pandas Skill 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Pandas Skill？

由 Ryan（@yangruihan）开发并维护，当前版本 v1.0.0。

Pandas Skill