← Back to Skills Marketplace

Information Extraction

Name: Information Extraction
Author: quqxui

by quqxui · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

139

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install information-extraction

Description

Extract structured information from unstructured text through a semi-automatic pipeline. Support entity extraction, relation extraction, attribute extraction...

README (SKILL.md)

Information Extraction

Extract entity, relation, attribute, and event information from text into a normalized intermediate structure, then export triples in JSON, JSONL, or TSV.

Core workflow

Define extraction scope and output granularity.
Segment input text into sentences and paragraphs.
Extract entities with evidence.
Extract relations, attributes, and events.
Normalize aliases, predicates, and duplicated records.
Export triples. Default output is JSON.
Review ambiguities before treating output as final.

Input scope

Prefer this skill for:

Plain text strings
Markdown text
Text copied from webpages, notes, reports, transcripts, or documents

If the user provides a file in another format, convert it to text first, then use this skill.

Output contract

Default output should contain:

{
  "triples": [],
  "entities": [],
  "attributes": [],
  "events": [],
  "ambiguities": []
}

Support export formats:

JSON (default)
JSONL
TSV

Extraction principles

Extract explicit facts before inference.
Preserve evidence spans for important records.
Prefer controlled predicates from references/relation-taxonomy.md.
Keep attributes and events separate internally, even when final output is triples.
Do not flatten complex events too early.
Normalize before exporting.
Record unresolved ambiguity instead of pretending certainty.

Minimal internal schema

Use these record shapes during extraction.

Entity

{
  "id": "ent_001",
  "mention": "OpenAI",
  "canonical_name": "OpenAI",
  "type": "Organization",
  "evidence": "OpenAI published the GPT-4 Technical Report.",
  "confidence": 0.95
}

Relation

{
  "subject": "ent_001",
  "predicate": "published",
  "object": "ent_002",
  "evidence": "OpenAI published the GPT-4 Technical Report.",
  "confidence": 0.93
}

Attribute

{
  "entity_id": "ent_002",
  "attribute": "year",
  "value": "2023",
  "evidence": "The report was released in 2023.",
  "confidence": 0.87
}

Event

{
  "id": "ev_001",
  "type": "Publication",
  "trigger": "published",
  "participants": {
    "agent": "ent_001",
    "object": "ent_002"
  },
  "time": "2023",
  "location": null,
  "evidence": "OpenAI published the GPT-4 Technical Report in 2023.",
  "confidence": 0.92
}

How to use references

Read references/pipeline.md for the end-to-end procedure.
Read references/schema.md for types and intermediate record structure.
Read references/relation-taxonomy.md before inventing new predicates.
Read references/triple-mapping.md when exporting final triples.
Read references/event-modeling.md when text describes complex events.
Read references/quality-checklist.md before final delivery.

Scripts

Extract

python3 skills/information-extraction/scripts/extract.py --text "OpenAI published GPT-4." --output out.json

Or read from stdin:

echo "OpenAI published GPT-4." | python3 skills/information-extraction/scripts/extract.py --stdin --output out.json

Normalize

python3 skills/information-extraction/scripts/normalize.py --input out.json --output normalized.json

Export triples

python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format json --output triples.json
python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format jsonl --output triples.jsonl
python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format tsv --output triples.tsv

Notes on automation

This is a semi-automatic pipeline, not a claim of perfect extraction. The scripts provide scaffolding, normalization, and export. For high-stakes outputs, keep evidence and perform manual review.

Usage Guidance

This skill appears to implement a simple semi-automatic IE pipeline and contains only local Python scripts (no network calls or secrets). However, there are a few things to check before using it on important data: - Bug to fix: run the extractor once and open the produced JSON. If you do not see a top-level "relations" key, relations discovered by the extractor will be lost by normalize.py. Either add 'relations' to the extractor's output or modify normalize.py to read relations from the extractor output. - Path note: the SKILL.md usage examples use 'skills/information-extraction/scripts/...' while the files live under 'scripts/...'; ensure the runtime path matches where the skill is installed. - Quality caution: the scripts use simple regex heuristics and low default confidences; expect false positives/negatives. Always manually review outputs (the documentation already recommends this). - Safety: there is no network or secret access in the code, so the immediate exfiltration risk is low. Still, run the code on non-sensitive sample data first and inspect outputs. If you plan to integrate this into automated pipelines, patch the relations omission and consider improving extraction logic and confidence handling before processing high-stakes documents.

Capability Analysis

Type: OpenClaw Skill Name: information-extraction Version: 1.0.0 The information-extraction skill bundle is a legitimate toolset for processing unstructured text into structured data. The included Python scripts (extract.py, normalize.py, and export_triples.py) perform local data processing using standard libraries and regex, with no evidence of network access, data exfiltration, or malicious execution. The instructions in SKILL.md and the reference documentation are clearly focused on the stated task and do not contain any prompt-injection attacks or suspicious commands.

Capability Assessment

ℹ Purpose & Capability

Name, description, and included scripts align with an information-extraction pipeline: extract.py, normalize.py, and export_triples.py implement extraction, normalization, and export. The heuristics are simple and consistent with a scaffold rather than a full production extractor. However, the pipeline's data contract is inconsistent: extract.py does not include a top-level "relations" key in its output even though normalization expects one, which will cause relations to be lost when following the documented workflow.

⚠ Instruction Scope

SKILL.md instructs running extract.py -> normalize.py -> export_triples.py, but extract.py's JSON output omits a 'relations' field (it returns triples, entities, attributes, events, ambiguities). normalize.py expects data.get('relations', []) and will therefore receive an empty list — so relations discovered by extract.py will not be preserved through normalization. Also, the usage examples reference a path (skills/information-extraction/scripts/...) while the repository layout shows scripts/..., which may cause confusion depending on installation layout. Aside from these mismatches, the instructions do not attempt to read unrelated system files, environment variables, or contact external endpoints.

✓ Install Mechanism

This is an instruction-only skill with included Python scripts and no install spec. Nothing is downloaded from external URLs and no packages are installed by the skill itself, so filesystem and network risks from installation are minimal. The scripts use only the standard library.

✓ Credentials

No environment variables, credentials, or config paths are requested. The scripts operate on local input text and local files only; there is no network or secret access.

✓ Persistence & Privilege

The skill does not request always:true and does not modify system or other skills' configuration. It is user-invocable and may be invoked autonomously by the agent (platform default), which is expected for skills. There is no evidence of persistent privilege escalation.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install information-extraction
After installation, invoke the skill by name or use /information-extraction
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release: semi-automatic information extraction pipeline for entities, relations, attributes, events, and triple export (JSON/JSONL/TSV).

Metadata

Slug information-extraction

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Information Extraction?

Extract structured information from unstructured text through a semi-automatic pipeline. Support entity extraction, relation extraction, attribute extraction... It is an AI Agent Skill for Claude Code / OpenClaw, with 139 downloads so far.

How do I install Information Extraction?

Run "/install information-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Information Extraction free?

Yes, Information Extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Information Extraction support?

Information Extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Information Extraction?

It is built and maintained by quqxui (@quqxui); the current version is v1.0.0.

More Skills

Information Extraction

Information Extraction

Core workflow

Input scope

Output contract

Extraction principles

Minimal internal schema

Entity

Relation

Attribute

Event

How to use references

Scripts

Extract

Normalize

Export triples

Notes on automation

What is Information Extraction?

How do I install Information Extraction?

Is Information Extraction free?

Which platforms does Information Extraction support?

Who created Information Extraction?

💬 Comments