← 返回 Skills 市场

Information Extraction

Name: Information Extraction
Author: quqxui

作者 quqxui · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

139

总下载

当前安装

版本数

在 OpenClaw 中安装

/install information-extraction

功能描述

Extract structured information from unstructured text through a semi-automatic pipeline. Support entity extraction, relation extraction, attribute extraction...

使用说明 (SKILL.md)

Information Extraction

Extract entity, relation, attribute, and event information from text into a normalized intermediate structure, then export triples in JSON, JSONL, or TSV.

Core workflow

Define extraction scope and output granularity.
Segment input text into sentences and paragraphs.
Extract entities with evidence.
Extract relations, attributes, and events.
Normalize aliases, predicates, and duplicated records.
Export triples. Default output is JSON.
Review ambiguities before treating output as final.

Input scope

Prefer this skill for:

Plain text strings
Markdown text
Text copied from webpages, notes, reports, transcripts, or documents

If the user provides a file in another format, convert it to text first, then use this skill.

Output contract

Default output should contain:

{
  "triples": [],
  "entities": [],
  "attributes": [],
  "events": [],
  "ambiguities": []
}

Support export formats:

JSON (default)
JSONL
TSV

Extraction principles

Extract explicit facts before inference.
Preserve evidence spans for important records.
Prefer controlled predicates from references/relation-taxonomy.md.
Keep attributes and events separate internally, even when final output is triples.
Do not flatten complex events too early.
Normalize before exporting.
Record unresolved ambiguity instead of pretending certainty.

Minimal internal schema

Use these record shapes during extraction.

Entity

{
  "id": "ent_001",
  "mention": "OpenAI",
  "canonical_name": "OpenAI",
  "type": "Organization",
  "evidence": "OpenAI published the GPT-4 Technical Report.",
  "confidence": 0.95
}

Relation

{
  "subject": "ent_001",
  "predicate": "published",
  "object": "ent_002",
  "evidence": "OpenAI published the GPT-4 Technical Report.",
  "confidence": 0.93
}

Attribute

{
  "entity_id": "ent_002",
  "attribute": "year",
  "value": "2023",
  "evidence": "The report was released in 2023.",
  "confidence": 0.87
}

Event

{
  "id": "ev_001",
  "type": "Publication",
  "trigger": "published",
  "participants": {
    "agent": "ent_001",
    "object": "ent_002"
  },
  "time": "2023",
  "location": null,
  "evidence": "OpenAI published the GPT-4 Technical Report in 2023.",
  "confidence": 0.92
}

How to use references

Read references/pipeline.md for the end-to-end procedure.
Read references/schema.md for types and intermediate record structure.
Read references/relation-taxonomy.md before inventing new predicates.
Read references/triple-mapping.md when exporting final triples.
Read references/event-modeling.md when text describes complex events.
Read references/quality-checklist.md before final delivery.

Scripts

Extract

python3 skills/information-extraction/scripts/extract.py --text "OpenAI published GPT-4." --output out.json

Or read from stdin:

echo "OpenAI published GPT-4." | python3 skills/information-extraction/scripts/extract.py --stdin --output out.json

Normalize

python3 skills/information-extraction/scripts/normalize.py --input out.json --output normalized.json

Export triples

python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format json --output triples.json
python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format jsonl --output triples.jsonl
python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format tsv --output triples.tsv

Notes on automation

This is a semi-automatic pipeline, not a claim of perfect extraction. The scripts provide scaffolding, normalization, and export. For high-stakes outputs, keep evidence and perform manual review.

安全使用建议

This skill appears to implement a simple semi-automatic IE pipeline and contains only local Python scripts (no network calls or secrets). However, there are a few things to check before using it on important data: - Bug to fix: run the extractor once and open the produced JSON. If you do not see a top-level "relations" key, relations discovered by the extractor will be lost by normalize.py. Either add 'relations' to the extractor's output or modify normalize.py to read relations from the extractor output. - Path note: the SKILL.md usage examples use 'skills/information-extraction/scripts/...' while the files live under 'scripts/...'; ensure the runtime path matches where the skill is installed. - Quality caution: the scripts use simple regex heuristics and low default confidences; expect false positives/negatives. Always manually review outputs (the documentation already recommends this). - Safety: there is no network or secret access in the code, so the immediate exfiltration risk is low. Still, run the code on non-sensitive sample data first and inspect outputs. If you plan to integrate this into automated pipelines, patch the relations omission and consider improving extraction logic and confidence handling before processing high-stakes documents.

功能分析

Type: OpenClaw Skill Name: information-extraction Version: 1.0.0 The information-extraction skill bundle is a legitimate toolset for processing unstructured text into structured data. The included Python scripts (extract.py, normalize.py, and export_triples.py) perform local data processing using standard libraries and regex, with no evidence of network access, data exfiltration, or malicious execution. The instructions in SKILL.md and the reference documentation are clearly focused on the stated task and do not contain any prompt-injection attacks or suspicious commands.

能力评估

ℹ Purpose & Capability

Name, description, and included scripts align with an information-extraction pipeline: extract.py, normalize.py, and export_triples.py implement extraction, normalization, and export. The heuristics are simple and consistent with a scaffold rather than a full production extractor. However, the pipeline's data contract is inconsistent: extract.py does not include a top-level "relations" key in its output even though normalization expects one, which will cause relations to be lost when following the documented workflow.

⚠ Instruction Scope

SKILL.md instructs running extract.py -> normalize.py -> export_triples.py, but extract.py's JSON output omits a 'relations' field (it returns triples, entities, attributes, events, ambiguities). normalize.py expects data.get('relations', []) and will therefore receive an empty list — so relations discovered by extract.py will not be preserved through normalization. Also, the usage examples reference a path (skills/information-extraction/scripts/...) while the repository layout shows scripts/..., which may cause confusion depending on installation layout. Aside from these mismatches, the instructions do not attempt to read unrelated system files, environment variables, or contact external endpoints.

✓ Install Mechanism

This is an instruction-only skill with included Python scripts and no install spec. Nothing is downloaded from external URLs and no packages are installed by the skill itself, so filesystem and network risks from installation are minimal. The scripts use only the standard library.

✓ Credentials

No environment variables, credentials, or config paths are requested. The scripts operate on local input text and local files only; there is no network or secret access.

✓ Persistence & Privilege

The skill does not request always:true and does not modify system or other skills' configuration. It is user-invocable and may be invoked autonomously by the agent (platform default), which is expected for skills. There is no evidence of persistent privilege escalation.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install information-extraction
安装完成后，直接呼叫该 Skill 的名称或使用 /information-extraction 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: semi-automatic information extraction pipeline for entities, relations, attributes, events, and triple export (JSON/JSONL/TSV).

元数据

Slug information-extraction

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题