功能描述

为OpenClaw提供中文文本处理、翻译、OCR、语音识别等功能的综合工具包。支持中文分词、拼音转换、中英文翻译、关键词提取、文本分析等功能。

使用说明 (SKILL.md)

中文工具包技能

Name: 中文工具包
Author: utopia013-droid

OpenClaw中文处理和支持工具

🎯 技能概述

功能描述：

为OpenClaw提供中文文本处理、翻译、OCR、语音识别等功能的综合工具包。

适用场景：

• 处理中文文本内容
• 中英文翻译需求
• 中文语音识别和合成
• 中文文档处理和分析
• 中文内容创作和优化

📋 核心功能

1. 中文文本处理

基础处理：

• 中文分词 (jieba)
• 词性标注
• 命名实体识别
• 关键词提取
• 文本摘要

高级处理：

• 情感分析
• 文本分类
• 相似度计算
• 文本纠错
• 风格转换

2. 中英文翻译

翻译服务：

• 百度翻译API集成
• 谷歌翻译API集成
• 腾讯翻译API集成
• 本地翻译模型

翻译功能：

• 文本翻译
• 文档翻译
• 实时翻译
• 批量翻译

3. 中文OCR识别

图像文字识别：

• 图片中文文字提取
• PDF文档文字识别
• 手写文字识别
• 表格识别

支持格式：

• 图片: JPG, PNG, BMP
• 文档: PDF, Word, Excel
• 扫描件: 各种扫描格式

4. 中文语音处理

语音识别：

• 中文语音转文字
• 方言识别支持
• 实时语音识别
• 音频文件处理

语音合成：

• 文字转中文语音
• 多种音色选择
• 情感语音合成
• 批量语音生成

🔧 技术实现

1. 依赖库和工具

Python库：

基础库:
• jieba: 中文分词
• pypinyin: 拼音转换
• opencc: 简繁转换
• snowland: 中文NLP

高级库:
• transformers: 预训练模型
• paddlepaddle: 百度飞桨
• torch: PyTorch深度学习

命令行工具：

• curl: API调用
• tesseract: OCR识别
• ffmpeg: 音频处理
• pandoc: 文档转换

2. API服务集成

免费API：

• 百度翻译API (免费额度)
• 腾讯云AI (试用额度)
• 阿里云智能语音 (试用)
• 讯飞开放平台 (试用)

本地服务：

• 本地OCR服务
• 本地翻译模型
• 本地语音识别
• 本地文本分析

🚀 使用方法

1. 基础使用示例

中文分词：

# 使用技能进行中文分词
openclaw技能调用 chinese-toolkit --function segment --text "今天天气真好"

中英翻译：

# 中译英
openclaw技能调用 chinese-toolkit --function translate --text "你好世界" --from zh --to en

# 英译中
openclaw技能调用 chinese-toolkit --function translate --text "Hello World" --from en --to zh

2. 高级使用示例

文档处理：

# 提取PDF中的中文文字
openclaw技能调用 chinese-toolkit --function ocr --file document.pdf --language zh

# 中文文档摘要
openclaw技能调用 chinese-toolkit --function summarize --file report.txt --language zh --length 200

语音处理：

# 中文语音识别
openclaw技能调用 chinese-toolkit --function speech2text --audio recording.wav --language zh

# 文字转语音
openclaw技能调用 chinese-toolkit --function text2speech --text "欢迎使用中文工具包" --output welcome.mp3

📁 文件结构

技能目录结构：

chinese-toolkit/
├── SKILL.md                    # 技能说明文档 (本文件)
├── requirements.txt           # Python依赖库
├── chinese_tools.py          # 核心Python模块
├── config.json               # 配置文件
├── scripts/                  # 脚本目录
│   ├── install_deps.sh      # 安装依赖脚本
│   ├── test_functions.sh    # 功能测试脚本
│   └── update_models.sh     # 模型更新脚本
├── models/                   # 模型文件目录
│   ├── segmentation/        # 分词模型
│   ├── translation/         # 翻译模型
│   └── speech/             # 语音模型
└── examples/                # 使用示例
    ├── basic_usage.py      # 基础使用示例
    ├── advanced_usage.py   # 高级使用示例
    └── api_integration.py  # API集成示例

配置文件示例：

{
  "api_keys": {
    "baidu_translate": {
      "app_id": "YOUR_APP_ID",
      "app_key": "YOUR_APP_KEY"
    },
    "tencent_cloud": {
      "secret_id": "YOUR_SECRET_ID",
      "secret_key": "YOUR_SECRET_KEY"
    }
  },
  "local_services": {
    "ocr_enabled": true,
    "translation_enabled": true,
    "speech_enabled": false
  },
  "performance": {
    "cache_enabled": true,
    "cache_ttl": 3600,
    "parallel_processing": true
  }
}

🔄 安装和配置

1. 自动安装

# 通过clawhub安装
npx clawhub install chinese-toolkit

# 或手动安装
git clone https://github.com/openclaw/chinese-toolkit.git
cp -r chinese-toolkit ~/.openclaw/workspace/skills/

2. 依赖安装

# 安装Python依赖
pip install -r requirements.txt

# 安装系统依赖 (Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim ffmpeg

# 安装系统依赖 (macOS)
brew install tesseract tesseract-lang ffmpeg

3. API配置

# 设置百度翻译API
export BAIDU_TRANSLATE_APP_ID="your_app_id"
export BAIDU_TRANSLATE_APP_KEY="your_app_key"

# 设置腾讯云API
export TENCENT_CLOUD_SECRET_ID="your_secret_id"
export TENCENT_CLOUD_SECRET_KEY="your_secret_key"

📊 性能优化

1. 缓存策略

• 翻译结果缓存: 减少API调用
• 分词结果缓存: 加速文本处理
• OCR结果缓存: 避免重复识别
• 语音结果缓存: 提高响应速度

2. 并行处理

• 多文档并行处理
• 批量翻译优化
• 并发API调用
• 分布式计算支持

3. 资源管理

• 内存使用优化
• 磁盘空间管理
• 网络带宽控制
• 计算资源分配

🛡️ 安全和隐私

1. 数据安全

• 本地处理优先: 敏感数据本地处理
• 加密传输: API调用使用HTTPS
• 数据清理: 处理完成后清理临时数据
• 访问控制: API密钥安全存储

2. 隐私保护

• 用户数据保护: 不存储用户原始数据
• 匿名化处理: 去除个人识别信息
• 合规使用: 遵守数据保护法规
• 透明操作: 明确数据处理流程

3. 安全审计

• 代码安全审查
• 依赖库安全检查
• API使用监控
• 异常行为检测

🔍 故障排除

常见问题：

1. 分词不准确
   • 原因: 词典不完整或模型过时
   • 解决: 更新分词词典和模型

2. 翻译质量差
   • 原因: API限制或网络问题
   • 解决: 更换翻译服务或检查网络

3. OCR识别错误
   • 原因: 图片质量差或语言设置错误
   • 解决: 优化图片质量，正确设置语言

4. 语音识别失败
   • 原因: 音频质量差或方言不支持
   • 解决: 提高音频质量，使用标准普通话

调试方法：

# 启用调试模式
export CHINESE_TOOLKIT_DEBUG=true

# 查看详细日志
tail -f ~/.openclaw/logs/chinese-toolkit.log

# 运行测试套件
python -m pytest tests/

📈 性能指标

处理速度：

• 中文分词: 1000字/秒
• 中英翻译: 500字/秒 (API)
• OCR识别: 1页/秒
• 语音识别: 实时 (1x速度)

准确率：

• 中文分词: >95%
• 命名实体识别: >90%
• 翻译质量: >85% (专业翻译对比)
• OCR识别: >98% (清晰文档)
• 语音识别: >95% (标准普通话)

资源使用：

• 内存占用: \x3C500MB
• 磁盘空间: \x3C2GB (含模型)
• CPU使用: 中等
• 网络带宽: 按需使用

🚀 未来发展

短期计划 (2026年Q2)：

1. 增加更多方言支持
2. 优化本地模型性能
3. 扩展API服务集成
4. 改进用户体验

中期计划 (2026年Q3-Q4)：

1. 深度学习模型优化
2. 实时处理能力提升
3. 多模态处理支持
4. 生态系统建设

长期计划 (2027年)：

1. 自主AI模型训练
2. 边缘计算支持
3. 全球化扩展
4. 开源社区建设

🤝 贡献指南

如何贡献：

1. 报告问题: GitHub Issues
2. 提交代码: Pull Requests
3. 改进文档: 文档更新
4. 测试反馈: 使用反馈

开发规范：

• 代码风格: PEP 8
• 文档标准: Google风格
• 测试要求: 单元测试覆盖>80%
• 提交规范: Conventional Commits

社区支持：

• 讨论区: GitHub Discussions
• 即时聊天: Discord中文频道
• 邮件列表: 开发组邮件
• 线下活动: 技术分享会

中文工具包技能版本: 1.0.0 最后更新: 2026-02-23 维护者: OpenClaw中文社区

让OpenClaw更好地理解和处理中文！ 🇨🇳🔧🤖

中文智能，全球共享！ 🌍🚀🌟

安全使用建议

This repository appears to implement a Python-based Chinese NLP toolkit (jieba, pypinyin, OCR, translate via requests) and contains many helpful docs and scripts. Before installing or running anything, please: 1) Review the included install scripts (install.sh, scripts/*.sh, and PowerShell release scripts) — they contain git/publish commands that could push tags or publish code if executed. 2) Treat API credentials as sensitive: the code reads BAIDU/TENCENT/ALIYUN environment variables but the skill metadata did not declare them — do not paste secrets into your environment without understanding where they will be sent. 3) Note the documentation inconsistencies (Node/npm packaging and an unrelated GitHub repo name) — these could be copy-paste leftovers; verify repository origin and maintainer identity before trusting release automation. 4) If you only need runtime features, prefer inspecting and running the toolkit Python modules locally in an isolated environment (virtualenv) and avoid running release/publishing scripts. If you want this skill enabled for autonomous agents, consider asking the maintainer to update metadata to list required env vars and to remove or clearly label publish scripts so automated or inexperienced users are not misled.

功能分析

Type: OpenClaw Skill Name: chinese-toolkit Version: 1.0.1 The skill is suspicious due to a significant supply chain risk identified in `install.sh`, which clones the project from `https://github.com/utopia013-droid/luxyoo.git` instead of the declared `https://github.com/openclaw/chinese-toolkit` (or `openclaw-cn/chinese-toolkit` as seen in other documentation). This inconsistency in the source repository is a critical vulnerability. Additionally, `openclaw_integration.py` exposes `_handle_ocr` and `_handle_convert` methods that likely wrap command-line tools (like `tesseract` and `opencc` mentioned in `SKILL.md` and `scripts/install_deps.sh`). Without the full implementation of these methods, there is a high risk of shell injection if input parameters are not properly sanitized, leading to a potential Remote Code Execution vulnerability.

能力评估

ℹ Purpose & Capability

Name/description and the Python modules (jieba, pypinyin, OCR, ffmpeg, requests, etc.) align with a Chinese text/voice/OCR toolkit. However documentation and release guides repeatedly reference Node/npm packaging (package.json, index.js/main.js, npm installs) and a different GitHub repo (utopia013-droid/luxyoo) which do not match the Python-centric codebase present. This mismatch between packaging targets (Python vs Node) is an incoherence in the bundle.

ℹ Instruction Scope

SKILL.md focuses on expected functionality (segment, translate, ocr, speech). It also instructs installing Python and system binaries and setting translation provider env vars (BAIDU/TENCENT), which is reasonable. But the docs also include full release/publish instructions (git push, npx clawhub publish, one-click release scripts) and commands that would push tags or upload releases if executed — these are outside run-time text-processing scope and elevate risk if users run the provided scripts without review.

✓ Install Mechanism

There is no platform install spec (instruction-only), which is lower risk. The repository includes install.sh and scripts/install_deps.sh, but no automatic downloader or remote extract URL was found in the provided content. The install approach appears to rely on standard package installation (pip, apt/brew) and manual invocation of included scripts.

⚠ Credentials

The manifest declares no required env vars, yet the code and SKILL.md reference provider credentials and environment variables (BAIDU_TRANSLATE_APP_ID/KEY, TENCENT_CLOUD_SECRET_ID/KEY, and examples of exporting API credentials). The skill expects external API keys to enable translation/speech features but fails to declare them in the skill metadata; that mismatch is a red flag because the agent or user may not be warned that secrets are needed or used. Additionally, release/publishing docs instruct use of GitHub tokens for clawhub/npm operations — these are sensitive and unrelated to runtime text processing.

✓ Persistence & Privilege

The skill does not request always:true and does not declare elevated platform privileges. Autonomous invocation (disable-model-invocation=false) is the platform default and present here, which is normal. The package contains scripts that, if run manually, perform Git operations and publishing; but nothing in the skill forces persistent system-wide changes by itself.

版本历史

v1.0.1

- No user-facing changes in this release. - Version number updated to 1.0.1; documentation and features remain unchanged.

v1.0.0

Initial release of chinese-toolkit skill for OpenClaw: - Provides comprehensive Chinese text processing, translation, OCR, and speech recognition capabilities. - Supports key features such as Chinese word segmentation, pinyin conversion, keyword extraction, sentiment analysis, text classification, and error correction. - Integrates multiple translation APIs (Baidu, Google, Tencent) and supports both API and local model operation. - Enables Chinese OCR for images, PDFs, handwriting, and tables, as well as speech-to-text and text-to-speech functionalities. - Includes performance optimizations, caching, parallel processing, and robust security/privacy measures. - Offers detailed documentation, usage examples, troubleshooting guidance, and community contribution guidelines.

元数据

Slug chinese-toolkit

版本 1.0.1

许可证 —

累计安装 10

当前安装数 8

历史版本数 2

常见问题

中文工具包是什么？

为OpenClaw提供中文文本处理、翻译、OCR、语音识别等功能的综合工具包。支持中文分词、拼音转换、中英文翻译、关键词提取、文本分析等功能。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1110 次。

如何安装中文工具包？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install chinese-toolkit」即可一键安装，无需额外配置。

中文工具包是免费的吗？

是的，中文工具包完全免费（开源免费），可自由下载、安装和使用。

中文工具包支持哪些平台？

中文工具包跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了中文工具包？

由 utopia013-droid（@utopia013-droid）开发并维护，当前版本 v1.0.1。

中文工具包