Description

Map unstructured biomedical text to standardized ontologies (SNOMED CT, MeSH, ICD-10) for terminology normalization and semantic interoperability. Extracts m...

README (SKILL.md)

Bio-Ontology Mapper

Name: Bio Ontology Mapper
Author: renhaosu2024

Overview

Biomedical terminology normalization tool that maps free-text clinical and scientific concepts to standardized ontologies for semantic interoperability and data harmonization.

Key Capabilities:

Multi-Ontology Support: SNOMED CT, MeSH, ICD-10, LOINC, RxNorm
Entity Extraction: NER for diseases, symptoms, procedures, drugs
Fuzzy Matching: Handle typos, abbreviations, and synonyms
Confidence Scoring: Reliability metrics for each mapping
Batch Processing: Normalize large datasets efficiently
Cross-Mapping: Translate between ontology systems

When to Use

✅ Use this skill when:

Normalizing clinical notes for EHR integration
Standardizing terminology for multi-site studies
Mapping legacy data to modern ontologies
Preparing data for clinical data warehouses
Converting free-text to coded data for analysis
Building semantic search for biomedical literature
Teaching biomedical informatics principles

❌ Do NOT use when:

Clinical diagnosis or decision support → Use clinical decision tools
Real-time patient care → Latency too high for acute settings
Replacing expert coding → Use for pre-coding, final review needed
Processing PHI without de-identification → Ensure HIPAA compliance

Integration:

Upstream: clinical-data-cleaner (data preparation), ehr-semantic-compressor (text extraction)
Downstream: clinical-data-cleaner (SDTM mapping), unstructured-medical-text-miner (NLP pipelines)

Core Capabilities

1. Entity Recognition and Mapping

Extract and map biomedical entities to ontologies:

from scripts.mapper import BioOntologyMapper

mapper = BioOntologyMapper()

# Map clinical text
result = mapper.map_text(
    text="Patient has diabetes and hypertension, taking metformin",
    ontologies=["snomed", "mesh", "rxnorm"],
    confidence_threshold=0.7
)

for entity in result.entities:
    print(f"{entity.text} → {entity.concept_id} ({entity.ontology})")
    print(f"  Preferred: {entity.preferred_term}")
    print(f"  Confidence: {entity.confidence:.2f}")

Supported Ontologies:

Ontology	Domain	Use Case
SNOMED CT	Clinical	EHR interoperability
MeSH	Literature	PubMed indexing
ICD-10	Billing	Diagnosis codes
LOINC	Labs	Test result standardization
RxNorm	Drugs	Medication normalization
HGNC	Genes	Gene name standardization

2. Cross-Ontology Translation

Map concepts between different ontologies:

# Cross-map SNOMED to ICD-10
translation = mapper.cross_map(
    source_id="22298006",  # SNOMED: Myocardial infarction
    source_ontology="snomed",
    target_ontology="icd10"
)

print(f"ICD-10: {translation.target_id} - {translation.target_term}")
# Output: I21.9 - Acute myocardial infarction, unspecified

Cross-Mapping Coverage:

SNOMED CT ↔ ICD-10-CM (clinical modifications)
MeSH ↔ SNOMED CT (literature to clinical)
RxNorm ↔ ATC (drug classifications)
LOINC ↔ SNOMED (lab to clinical)

3. Batch Normalization

Process large datasets:

# Batch process CSV
results = mapper.batch_map(
    input_file="clinical_terms.csv",
    text_column="diagnosis_description",
    ontologies=["snomed", "icd10"],
    output_format="csv",
    max_workers=4
)

# Results include:
# - Original term
# - Mapped concept ID
# - Confidence score
# - Alternative mappings (if ambiguous)

Performance:

~100 terms/second (with caching)
~20 terms/second (API lookup)
Parallel processing for large datasets

4. Confidence Scoring and Validation

Assess mapping reliability:

scoring = mapper.score_mapping(
    term="heart attack",
    candidate="22298006",  # Myocardial infarction
    factors=["string_similarity", "context_match", "frequency"]
)

print(f"Overall confidence: {scoring.confidence:.2f}")
print(f"Breakdown: {scoring.factors}")

Scoring Factors:

String similarity: Levenshtein distance, n-grams
Context match: Surrounding words alignment
Frequency: Common usage in corpus
Semantic similarity: Vector embeddings

Common Patterns

Pattern 1: Clinical Note Normalization

Scenario: Convert free-text diagnoses to SNOMED codes.

# Normalize clinical notes
python scripts/main.py \
  --input notes.csv \
  --column diagnosis_text \
  --ontology snomed \
  --threshold 0.8 \
  --output coded_diagnoses.csv

# Results: "heart attack" → 22298006 (Myocardial infarction)

Post-Processing:

Review low-confidence mappings (\x3C0.8)
Handle ambiguous terms manually
Validate against clinical context

Pattern 2: Literature Indexing

Scenario: Map research paper keywords to MeSH.

# Map keywords to MeSH
mesh_terms = mapper.map_to_mesh(
    keywords=["cancer immunotherapy", "checkpoint inhibitors", "PD-1"],
    include_tree_numbers=True,
    include_qualifiers=True
)

for term in mesh_terms:
    print(f"{term.input} → {term.descriptor}")
    print(f"  Tree: {term.tree_numbers}")
    print(f"  Entry terms: {term.synonyms}")

Pattern 3: Drug Name Normalization

Scenario: Standardize medication names across datasets.

# Normalize drug names
drugs = ["Tylenol", "Advil", "Motrin", "acetaminophen"]

for drug in drugs:
    result = mapper.map_to_rxnorm(drug)
    print(f"{drug} → {result.rxcui}: {result.name}")
    # Tylenol → 161: Acetaminophen
    # Advil → 5640: Ibuprofen
    # Motrin → 5640: Ibuprofen

Pattern 4: EHR Data Harmonization

Scenario: Merge data from multiple hospital systems.

# Harmonize diagnoses from 3 hospitals
python scripts/main.py \
  --batch \
  --inputs "hospital_a.csv,hospital_b.csv,hospital_c.csv" \
  --target-ontology snomed \
  --cross-map-to icd10 \
  --output harmonized_data.csv

Complete Workflow Example

From free-text to coded database:

from scripts.mapper import BioOntologyMapper
from scripts.validator import MappingValidator

# Initialize
mapper = BioOntologyMapper()
validator = MappingValidator()

# Step 1: Extract entities from text
clinical_note = "Patient has Type 2 diabetes and hypertension..."
entities = mapper.extract_entities(clinical_note)

# Step 2: Map to SNOMED
mappings = []
for entity in entities:
    mapping = mapper.map_to_snomed(
        entity.text,
        context=clinical_note,
        top_n=3
    )
    mappings.append(mapping)

# Step 3: Validate mappings
for mapping in mappings:
    validation = validator.validate(
        mapping,
        check_clinical_plausibility=True
    )
    if not validation.is_valid:
        print(f"Review needed: {mapping}")

# Step 4: Export to database format
db_records = [m.to_database_record() for m in mappings]

Quality Checklist

Pre-Mapping:

Text preprocessed (lowercase, punctuation handled)
Abbreviations expanded where possible
Language identified (multilingual support)

During Mapping:

Confidence threshold appropriate (>0.7 for clinical)
Multiple candidates considered for ambiguous terms
Context used for disambiguation

Post-Mapping:

Low-confidence mappings flagged for review
Unmapped terms logged
CRITICAL: Clinical expert validation for high-stakes use

Before Production:

Mapping accuracy validated on gold standard
False positive rate acceptable (\x3C5%)
Recall acceptable for use case (>90%)
API rate limits respected

Common Pitfalls

Mapping Errors:

❌ Abbreviation ambiguity → "MI" = Myocardial infarction OR Michigan
- ✅ Use context; flag for manual review
❌ Outdated terms → Old terminology not in current ontology
- ✅ Use historical mappings; update terminology
❌ False confidence → High score for wrong concept
- ✅ Always review top-3 candidates

Technical Issues:

❌ API failures → No local fallback
- ✅ Implement caching; use local reference files
❌ Version mismatches → Different ontology versions
- ✅ Track ontology version used
❌ PHI exposure → Sending patient data to external APIs
- ✅ De-identify before API calls; use local processing when possible

References

Available in references/ directory:

snomed_ct_guide.md - SNOMED CT hierarchy and relationships
mesh_structure.md - MeSH tree structure and qualifiers
ontology_mappings.md - Crosswalks between systems
nlp_best_practices.md - Biomedical text processing
api_documentation.md - External service integration
validation_datasets.md - Gold standard test sets

Scripts

Located in scripts/ directory:

main.py - CLI interface for mapping
mapper.py - Core ontology mapping engine
extractor.py - Named entity recognition
cross_mapper.py - Ontology-to-ontology translation
scorer.py - Confidence calculation
batch_processor.py - Large dataset handling
validator.py - Mapping quality checks
caching.py - Local storage for frequent lookups

Limitations

Ambiguity: Many-to-many mappings common; context required
Coverage: Rare diseases and new concepts may not be in ontologies
Versioning: Ontology updates can change mappings over time
Language: Best support for English; other languages limited
Real-time: Not suitable for time-critical clinical applications
API Dependency: Requires internet for most lookups (caching helps)

⚠️ Critical: Ontology mapping is for research and data integration, not clinical decision-making. Always validate mappings with domain experts before use in patient care contexts. Never process PHI without appropriate de-identification and compliance measures.

Parameters

Parameter	Type	Default	Description
`--term`	str	Required	Single term to map
`--input`	str	Required	Input file path
`--output`	str	Required	Output file path
`--ontology`	str	'both'
`--threshold`	float	0.7
`--format`	str	'json'
`--use-api`	str	Required	Use UMLS/MeSH APIs
`--api-key`	str	Required

Usage Guidance

This skill appears to implement local SNOMED/MeSH mapping and optional API-backed lookups, but it has a few inconsistencies you should address before using it on real or sensitive data: - Confirm UMLS API usage: the code reads UMLS_API_KEY from the environment but the registry metadata does not declare this requirement. Only set UMLS_API_KEY if you trust the code and intend to allow outbound API calls. - Expect internet access: the tool calls the official UMLS UTS and NLM MeSH endpoints. If you need offline processing, use only the provided local reference files and disable API use. - Verify claimed ontology coverage: SKILL.md advertises RxNorm, LOINC, ICD-10, HGNC, and cross-mapping. The bundled reference files are for SNOMED and MeSH; confirm (by reviewing the rest of scripts/main.py or other source) whether the other ontologies are actually implemented or rely on UMLS API lookups. - Protect PHI: do not run this on protected health information unless data are de-identified and usage complies with your legal/regulatory obligations. Network calls could leak sensitive context if used with live PHI. - Trust and provenance: the skill owner and homepage are unknown. If you plan to run it in production, audit the full code (ensure no hidden endpoints or telemetry) and run in an isolated environment first. If you want higher confidence, request the author to (1) declare UMLS_API_KEY (and any other env vars) in the registry metadata, (2) document which ontologies require API access vs. local data, and (3) provide a small README describing network access and data-handling guarantees.

Capability Analysis

Type: OpenClaw Skill Name: bio-ontology-mapper Version: 0.1.0 The bio-ontology-mapper skill bundle is a legitimate tool for normalizing biomedical terminology. The core logic in scripts/main.py facilitates mapping clinical terms to SNOMED CT and MeSH ontologies using local reference data and official National Library of Medicine (NLM) APIs (uts-ws.nlm.nih.gov and id.nlm.nih.gov). The code follows security best practices for API interaction, including rate limiting and input encoding, and contains no evidence of data exfiltration, malicious execution, or prompt injection.

Capability Assessment

⚠ Purpose & Capability

SKILL.md claims multi-ontology support (SNOMED CT, MeSH, ICD-10, LOINC, RxNorm, HGNC) and cross-mapping, but the included references and visible code primarily provide local data for SNOMED and MeSH only. Broader coverage would rely on external APIs (UMLS) but that dependency and required credential are not declared in the registry metadata or requirements. This mismatch between claimed capabilities and provided assets is concerning.

⚠ Instruction Scope

Runtime instructions and examples reference local files and API-backed lookups. The code performs network requests to the official UMLS UTS API and the NLM MeSH API; UMLS usage depends on an API key read from the environment. SKILL.md/examples do not explicitly state the need to supply UMLS_API_KEY or require internet access, which is an omission that affects operational safety and privacy (especially for PHI). Otherwise the instructions stay within the stated mapping purpose and do not attempt broad system file access.

✓ Install Mechanism

There is no install spec (instruction-only skill) and requirements.txt only lists standard-library-like packages ('dataclasses', 'difflib'). No arbitrary downloads or extract/install steps are present. This is low-risk from an installer perspective.

⚠ Credentials

The code reads UMLS_API_KEY from the environment (os.getenv) to call the UMLS UTS API, but the skill registry lists no required environment variables. Requesting an API key for an external clinical terminology service is reasonable for functionality, but failing to declare it in the metadata is an omission that could lead users to unintentionally expose credentials or run the skill with insufficient privileges. The skill also performs outbound network calls (MeSH and UMLS endpoints).

✓ Persistence & Privilege

The skill is not always-enabled and has no install-time persistence mechanism. There is no evidence it modifies other skills or system-wide configuration. No elevated privileges or persistent presence were requested.

Version History

v0.1.0

Initial release of bio-ontology-mapper: a tool for mapping unstructured biomedical text to standardized ontologies. - Supports multi-ontology mapping: SNOMED CT, MeSH, ICD-10, LOINC, RxNorm, HGNC. - Extracts medical entities and maps them to ontology codes with confidence scores. - Includes batch processing for large datasets, cross-ontology translation, and confidence scoring/validation. - Provides patterns and examples for clinical note normalization, literature indexing, drug name normalization, and EHR harmonization. - Quality checklist and common pitfalls documented for reliable deployment.

Metadata

Slug bio-ontology-mapper

Version 0.1.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Bio Ontology Mapper?

Map unstructured biomedical text to standardized ontologies (SNOMED CT, MeSH, ICD-10) for terminology normalization and semantic interoperability. Extracts m... It is an AI Agent Skill for Claude Code / OpenClaw, with 307 downloads so far.

How do I install Bio Ontology Mapper?

Run "/install bio-ontology-mapper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Bio Ontology Mapper free?

Yes, Bio Ontology Mapper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Bio Ontology Mapper support?

Bio Ontology Mapper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Bio Ontology Mapper?

It is built and maintained by renhaosu2024 (@renhaosu2024); the current version is v0.1.0.

More Skills

Bio Ontology Mapper