← 返回 Skills 市场
371166758-qq

Chinese NLP Toolkit

作者 371166758-qq · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
360
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install chinese-nlp-toolkit
功能描述
Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detec...
使用说明 (SKILL.md)

Chinese NLP Toolkit

Process and analyze Chinese text with specialized NLP capabilities.

Core Capabilities

1. Text Segmentation (分词)

Chinese has no word boundaries. Segmentation is the foundation of all Chinese NLP.

Approach: Use rule-based heuristics when no library is available:

  • Dictionary matching (maximum forward/backward matching)
  • Context-aware: "南京市长江大桥" → ["南京市", "长江大桥"] not ["南京", "市长", "江大桥"]
  • Domain-specific terms should be added as custom dictionary entries

Common Ambiguities:

Text Wrong Split Correct Split
雨伞 雨/伞 雨伞 (compound)
结婚的和尚未结婚的 结婚/的/和尚/未/结婚/的 结婚/的/和/尚未/结婚/的
项目部 项目/部 项目部 (compound)

2. Sentiment Analysis (情感分析)

Beyond positive/negative — Chinese sentiment is nuanced:

Intensity levels: 强烈负面 \x3C 偏负面 \x3C 中性 \x3C 偏正面 \x3C 强烈正面

Chinese-specific signals:

  • Rhetorical questions often indicate negative sentiment: "这也算好?"
  • Sarcasm markers: "呵呵", "厉害了", "也是醉了", "你开心就好"
  • Intensifiers: "非常", "特别", "简直了", "超级"
  • Diminishers: "还行吧", "马马虎虎", "凑合"

Emoji contribution (critical for social media):

  • 😊👍❤️ = positive amplification
  • 😤👎💔 = negative amplification
  • 🙄🙄🙄 = sarcasm/disdain (intensity scales with repetition)

3. Keyword Extraction (关键词提取)

For Chinese text, prioritize:

  • Noun phrases (名词短语)
  • Domain-specific terminology
  • Named entities (人名、地名、机构名)

Method: TF-IDF adapted for Chinese + positional weighting (first/last sentences carry more weight in Chinese writing).

4. Text Summarization (文本摘要)

Chinese-specific rules:

  • Summarize to 20-30% of original length
  • Preserve key numbers, names, and claims
  • Chinese articles often "bury the lead" — the conclusion may be more important than the introduction
  • Extract key sentences using positional + keyword scoring

5. Readability Scoring (可读性评分)

Rate Chinese text on a 1-10 scale considering:

  • Average sentence length (characters per sentence)
  • Vocabulary difficulty (HSK level estimate)
  • Clause density ( commas per sentence)
  • Use of classical Chinese elements
  • Technical jargon density
Score Level Target Audience
1-3 Easy General public
4-6 Moderate Educated readers
7-8 Hard Domain experts
9-10 Very Hard Academic specialists

6. Format Conversion

Conversion Example
Simplified → Traditional 体验 → 體驗
Traditional → Simplified 體驗 → 体验
Chinese → Pinyin 你好 → nǐ hǎo
Chinese → Zhuyin 你好 → ㄋㄧˇ ㄏㄠˇ

Workflow

When Processing Chinese Text:

  1. Detect variant: Simplified (简体) or Traditional (繁体)?
  2. Segment: Break into meaningful units
  3. Analyze: Apply the requested analysis type(s)
  4. Report: Present results with Chinese annotations

Output Format

原文:[original text]
分词:[segmented text with / separators]
关键词:[top 5-10 keywords with relevance scores]
情感:[sentiment label + confidence + key signals]
摘要:[summarized text]
可读性:[score/10 + brief explanation]

Edge Cases

  • Mixed-language text: Handle code-switching naturally ("这个bug太坑了") — don't force Chinese segmentation on English words
  • Internet slang: Recognize common abbreviations (yyds, xswl, nbcs, awsl) and expand for formal analysis
  • Poetry/classical Chinese: Flag as special case — modern NLP rules don't apply; use classical grammar patterns
  • Dialectal text: Flag non-Mandarin text (Cantonese, Shanghainese written forms) — analysis may be unreliable
  • Zero-width characters: Chinese text sometimes contains invisible characters (U+200B, U+FEFF) that affect processing

Common Tasks & Prompts

  • "Analyze the sentiment of this Chinese review"
  • "Extract keywords from this article"
  • "Summarize this Chinese news article in 100 characters"
  • "Rate the readability of this document"
  • "Convert this to Traditional Chinese with pinyin annotation"
  • "Segment this Chinese text and identify named entities"
安全使用建议
This skill is an instruction-only guide for Chinese NLP and appears internally consistent and low-risk because it requests no installs or secrets. Important things to consider before using or implementing it: (1) There is no code here — if you or the agent installs libraries (jieba, pypinyin, zhconv, third-party sentiment APIs), make sure those packages come from trusted sources and review them before installing. (2) The skill will be used to process text; if that text is sensitive, confirm any implementation does not send data to external services you don't trust. (3) The skill owner/source is unknown and no homepage is provided — if you plan to use a packaged implementation labeled with this skill, inspect the implementation for network calls, credentials usage, or unexpected file I/O. (4) If you need production reliability (tokenization, NER, domain dictionaries), prefer vetted libraries and explicitly review their dependencies.
功能分析
Type: OpenClaw Skill Name: chinese-nlp-toolkit Version: 1.0.0 The skill bundle provides legitimate instructions and heuristics for Chinese Natural Language Processing tasks such as segmentation, sentiment analysis, and format conversion. The content in SKILL.md is purely educational and functional for an AI agent, with no evidence of malicious intent, data exfiltration, or unauthorized command execution.
能力评估
Purpose & Capability
The name/description (Chinese NLP: segmentation, sentiment, keywords, summarization, format conversion) match the SKILL.md content. The instructions describe algorithms and heuristics appropriate for these tasks and do not request unrelated resources.
Instruction Scope
SKILL.md stays on-topic: it provides step-by-step guidance for Chinese text processing, edge cases, and output formats. It does not instruct the agent to read system files, environment variables, or transmit data to external endpoints. Note: the skill is purely guidance (no implementation), so actual runtime behavior depends on any concrete implementation the agent or user builds from these instructions.
Install Mechanism
No install spec or code files are present. Nothing will be downloaded or written by the skill itself (lowest risk install profile).
Credentials
No required environment variables, credentials, or config paths are declared or referenced in the instructions. Requested privileges are proportional (none).
Persistence & Privilege
always is false and the skill is user-invocable; it does not request permanent presence or system configuration changes.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install chinese-nlp-toolkit
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /chinese-nlp-toolkit 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Specialized Chinese NLP: segmentation, sentiment analysis, keyword extraction, summarization, readability scoring, format conversion (simplified/traditional/pinyin).
元数据
Slug chinese-nlp-toolkit
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Chinese NLP Toolkit 是什么?

Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detec... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 360 次。

如何安装 Chinese NLP Toolkit?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install chinese-nlp-toolkit」即可一键安装,无需额外配置。

Chinese NLP Toolkit 是免费的吗?

是的,Chinese NLP Toolkit 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Chinese NLP Toolkit 支持哪些平台?

Chinese NLP Toolkit 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Chinese NLP Toolkit?

由 371166758-qq(@371166758-qq)开发并维护,当前版本 v1.0.0。

💬 留言讨论