← Back to Skills Marketplace
renhaosu2024

Bio Ontology Mapper

by renhaosu2024 · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
307
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install bio-ontology-mapper
Description
Map unstructured biomedical text to standardized ontologies (SNOMED CT, MeSH, ICD-10) for terminology normalization and semantic interoperability. Extracts m...
README (SKILL.md)

Bio-Ontology Mapper

Overview

Biomedical terminology normalization tool that maps free-text clinical and scientific concepts to standardized ontologies for semantic interoperability and data harmonization.

Key Capabilities:

  • Multi-Ontology Support: SNOMED CT, MeSH, ICD-10, LOINC, RxNorm
  • Entity Extraction: NER for diseases, symptoms, procedures, drugs
  • Fuzzy Matching: Handle typos, abbreviations, and synonyms
  • Confidence Scoring: Reliability metrics for each mapping
  • Batch Processing: Normalize large datasets efficiently
  • Cross-Mapping: Translate between ontology systems

When to Use

✅ Use this skill when:

  • Normalizing clinical notes for EHR integration
  • Standardizing terminology for multi-site studies
  • Mapping legacy data to modern ontologies
  • Preparing data for clinical data warehouses
  • Converting free-text to coded data for analysis
  • Building semantic search for biomedical literature
  • Teaching biomedical informatics principles

❌ Do NOT use when:

  • Clinical diagnosis or decision support → Use clinical decision tools
  • Real-time patient care → Latency too high for acute settings
  • Replacing expert coding → Use for pre-coding, final review needed
  • Processing PHI without de-identification → Ensure HIPAA compliance

Integration:

  • Upstream: clinical-data-cleaner (data preparation), ehr-semantic-compressor (text extraction)
  • Downstream: clinical-data-cleaner (SDTM mapping), unstructured-medical-text-miner (NLP pipelines)

Core Capabilities

1. Entity Recognition and Mapping

Extract and map biomedical entities to ontologies:

from scripts.mapper import BioOntologyMapper

mapper = BioOntologyMapper()

# Map clinical text
result = mapper.map_text(
    text="Patient has diabetes and hypertension, taking metformin",
    ontologies=["snomed", "mesh", "rxnorm"],
    confidence_threshold=0.7
)

for entity in result.entities:
    print(f"{entity.text} → {entity.concept_id} ({entity.ontology})")
    print(f"  Preferred: {entity.preferred_term}")
    print(f"  Confidence: {entity.confidence:.2f}")

Supported Ontologies:

Ontology Domain Use Case
SNOMED CT Clinical EHR interoperability
MeSH Literature PubMed indexing
ICD-10 Billing Diagnosis codes
LOINC Labs Test result standardization
RxNorm Drugs Medication normalization
HGNC Genes Gene name standardization

2. Cross-Ontology Translation

Map concepts between different ontologies:

# Cross-map SNOMED to ICD-10
translation = mapper.cross_map(
    source_id="22298006",  # SNOMED: Myocardial infarction
    source_ontology="snomed",
    target_ontology="icd10"
)

print(f"ICD-10: {translation.target_id} - {translation.target_term}")
# Output: I21.9 - Acute myocardial infarction, unspecified

Cross-Mapping Coverage:

  • SNOMED CT ↔ ICD-10-CM (clinical modifications)
  • MeSH ↔ SNOMED CT (literature to clinical)
  • RxNorm ↔ ATC (drug classifications)
  • LOINC ↔ SNOMED (lab to clinical)

3. Batch Normalization

Process large datasets:

# Batch process CSV
results = mapper.batch_map(
    input_file="clinical_terms.csv",
    text_column="diagnosis_description",
    ontologies=["snomed", "icd10"],
    output_format="csv",
    max_workers=4
)

# Results include:
# - Original term
# - Mapped concept ID
# - Confidence score
# - Alternative mappings (if ambiguous)

Performance:

  • ~100 terms/second (with caching)
  • ~20 terms/second (API lookup)
  • Parallel processing for large datasets

4. Confidence Scoring and Validation

Assess mapping reliability:

scoring = mapper.score_mapping(
    term="heart attack",
    candidate="22298006",  # Myocardial infarction
    factors=["string_similarity", "context_match", "frequency"]
)

print(f"Overall confidence: {scoring.confidence:.2f}")
print(f"Breakdown: {scoring.factors}")

Scoring Factors:

  • String similarity: Levenshtein distance, n-grams
  • Context match: Surrounding words alignment
  • Frequency: Common usage in corpus
  • Semantic similarity: Vector embeddings

Common Patterns

Pattern 1: Clinical Note Normalization

Scenario: Convert free-text diagnoses to SNOMED codes.

# Normalize clinical notes
python scripts/main.py \
  --input notes.csv \
  --column diagnosis_text \
  --ontology snomed \
  --threshold 0.8 \
  --output coded_diagnoses.csv

# Results: "heart attack" → 22298006 (Myocardial infarction)

Post-Processing:

  • Review low-confidence mappings (\x3C0.8)
  • Handle ambiguous terms manually
  • Validate against clinical context

Pattern 2: Literature Indexing

Scenario: Map research paper keywords to MeSH.

# Map keywords to MeSH
mesh_terms = mapper.map_to_mesh(
    keywords=["cancer immunotherapy", "checkpoint inhibitors", "PD-1"],
    include_tree_numbers=True,
    include_qualifiers=True
)

for term in mesh_terms:
    print(f"{term.input} → {term.descriptor}")
    print(f"  Tree: {term.tree_numbers}")
    print(f"  Entry terms: {term.synonyms}")

Pattern 3: Drug Name Normalization

Scenario: Standardize medication names across datasets.

# Normalize drug names
drugs = ["Tylenol", "Advil", "Motrin", "acetaminophen"]

for drug in drugs:
    result = mapper.map_to_rxnorm(drug)
    print(f"{drug} → {result.rxcui}: {result.name}")
    # Tylenol → 161: Acetaminophen
    # Advil → 5640: Ibuprofen
    # Motrin → 5640: Ibuprofen

Pattern 4: EHR Data Harmonization

Scenario: Merge data from multiple hospital systems.

# Harmonize diagnoses from 3 hospitals
python scripts/main.py \
  --batch \
  --inputs "hospital_a.csv,hospital_b.csv,hospital_c.csv" \
  --target-ontology snomed \
  --cross-map-to icd10 \
  --output harmonized_data.csv

Complete Workflow Example

From free-text to coded database:

from scripts.mapper import BioOntologyMapper
from scripts.validator import MappingValidator

# Initialize
mapper = BioOntologyMapper()
validator = MappingValidator()

# Step 1: Extract entities from text
clinical_note = "Patient has Type 2 diabetes and hypertension..."
entities = mapper.extract_entities(clinical_note)

# Step 2: Map to SNOMED
mappings = []
for entity in entities:
    mapping = mapper.map_to_snomed(
        entity.text,
        context=clinical_note,
        top_n=3
    )
    mappings.append(mapping)

# Step 3: Validate mappings
for mapping in mappings:
    validation = validator.validate(
        mapping,
        check_clinical_plausibility=True
    )
    if not validation.is_valid:
        print(f"Review needed: {mapping}")

# Step 4: Export to database format
db_records = [m.to_database_record() for m in mappings]

Quality Checklist

Pre-Mapping:

  • Text preprocessed (lowercase, punctuation handled)
  • Abbreviations expanded where possible
  • Language identified (multilingual support)

During Mapping:

  • Confidence threshold appropriate (>0.7 for clinical)
  • Multiple candidates considered for ambiguous terms
  • Context used for disambiguation

Post-Mapping:

  • Low-confidence mappings flagged for review
  • Unmapped terms logged
  • CRITICAL: Clinical expert validation for high-stakes use

Before Production:

  • Mapping accuracy validated on gold standard
  • False positive rate acceptable (\x3C5%)
  • Recall acceptable for use case (>90%)
  • API rate limits respected

Common Pitfalls

Mapping Errors:

  • Abbreviation ambiguity → "MI" = Myocardial infarction OR Michigan

    • ✅ Use context; flag for manual review
  • Outdated terms → Old terminology not in current ontology

    • ✅ Use historical mappings; update terminology
  • False confidence → High score for wrong concept

    • ✅ Always review top-3 candidates

Technical Issues:

  • API failures → No local fallback

    • ✅ Implement caching; use local reference files
  • Version mismatches → Different ontology versions

    • ✅ Track ontology version used
  • PHI exposure → Sending patient data to external APIs

    • ✅ De-identify before API calls; use local processing when possible

References

Available in references/ directory:

  • snomed_ct_guide.md - SNOMED CT hierarchy and relationships
  • mesh_structure.md - MeSH tree structure and qualifiers
  • ontology_mappings.md - Crosswalks between systems
  • nlp_best_practices.md - Biomedical text processing
  • api_documentation.md - External service integration
  • validation_datasets.md - Gold standard test sets

Scripts

Located in scripts/ directory:

  • main.py - CLI interface for mapping
  • mapper.py - Core ontology mapping engine
  • extractor.py - Named entity recognition
  • cross_mapper.py - Ontology-to-ontology translation
  • scorer.py - Confidence calculation
  • batch_processor.py - Large dataset handling
  • validator.py - Mapping quality checks
  • caching.py - Local storage for frequent lookups

Limitations

  • Ambiguity: Many-to-many mappings common; context required
  • Coverage: Rare diseases and new concepts may not be in ontologies
  • Versioning: Ontology updates can change mappings over time
  • Language: Best support for English; other languages limited
  • Real-time: Not suitable for time-critical clinical applications
  • API Dependency: Requires internet for most lookups (caching helps)

⚠️ Critical: Ontology mapping is for research and data integration, not clinical decision-making. Always validate mappings with domain experts before use in patient care contexts. Never process PHI without appropriate de-identification and compliance measures.

Parameters

Parameter Type Default Description
--term str Required Single term to map
--input str Required Input file path
--output str Required Output file path
--ontology str 'both'
--threshold float 0.7
--format str 'json'
--use-api str Required Use UMLS/MeSH APIs
--api-key str Required
Usage Guidance
This skill appears to implement local SNOMED/MeSH mapping and optional API-backed lookups, but it has a few inconsistencies you should address before using it on real or sensitive data: - Confirm UMLS API usage: the code reads UMLS_API_KEY from the environment but the registry metadata does not declare this requirement. Only set UMLS_API_KEY if you trust the code and intend to allow outbound API calls. - Expect internet access: the tool calls the official UMLS UTS and NLM MeSH endpoints. If you need offline processing, use only the provided local reference files and disable API use. - Verify claimed ontology coverage: SKILL.md advertises RxNorm, LOINC, ICD-10, HGNC, and cross-mapping. The bundled reference files are for SNOMED and MeSH; confirm (by reviewing the rest of scripts/main.py or other source) whether the other ontologies are actually implemented or rely on UMLS API lookups. - Protect PHI: do not run this on protected health information unless data are de-identified and usage complies with your legal/regulatory obligations. Network calls could leak sensitive context if used with live PHI. - Trust and provenance: the skill owner and homepage are unknown. If you plan to run it in production, audit the full code (ensure no hidden endpoints or telemetry) and run in an isolated environment first. If you want higher confidence, request the author to (1) declare UMLS_API_KEY (and any other env vars) in the registry metadata, (2) document which ontologies require API access vs. local data, and (3) provide a small README describing network access and data-handling guarantees.
Capability Analysis
Type: OpenClaw Skill Name: bio-ontology-mapper Version: 0.1.0 The bio-ontology-mapper skill bundle is a legitimate tool for normalizing biomedical terminology. The core logic in scripts/main.py facilitates mapping clinical terms to SNOMED CT and MeSH ontologies using local reference data and official National Library of Medicine (NLM) APIs (uts-ws.nlm.nih.gov and id.nlm.nih.gov). The code follows security best practices for API interaction, including rate limiting and input encoding, and contains no evidence of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
SKILL.md claims multi-ontology support (SNOMED CT, MeSH, ICD-10, LOINC, RxNorm, HGNC) and cross-mapping, but the included references and visible code primarily provide local data for SNOMED and MeSH only. Broader coverage would rely on external APIs (UMLS) but that dependency and required credential are not declared in the registry metadata or requirements. This mismatch between claimed capabilities and provided assets is concerning.
Instruction Scope
Runtime instructions and examples reference local files and API-backed lookups. The code performs network requests to the official UMLS UTS API and the NLM MeSH API; UMLS usage depends on an API key read from the environment. SKILL.md/examples do not explicitly state the need to supply UMLS_API_KEY or require internet access, which is an omission that affects operational safety and privacy (especially for PHI). Otherwise the instructions stay within the stated mapping purpose and do not attempt broad system file access.
Install Mechanism
There is no install spec (instruction-only skill) and requirements.txt only lists standard-library-like packages ('dataclasses', 'difflib'). No arbitrary downloads or extract/install steps are present. This is low-risk from an installer perspective.
Credentials
The code reads UMLS_API_KEY from the environment (os.getenv) to call the UMLS UTS API, but the skill registry lists no required environment variables. Requesting an API key for an external clinical terminology service is reasonable for functionality, but failing to declare it in the metadata is an omission that could lead users to unintentionally expose credentials or run the skill with insufficient privileges. The skill also performs outbound network calls (MeSH and UMLS endpoints).
Persistence & Privilege
The skill is not always-enabled and has no install-time persistence mechanism. There is no evidence it modifies other skills or system-wide configuration. No elevated privileges or persistent presence were requested.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install bio-ontology-mapper
  3. After installation, invoke the skill by name or use /bio-ontology-mapper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release of bio-ontology-mapper: a tool for mapping unstructured biomedical text to standardized ontologies. - Supports multi-ontology mapping: SNOMED CT, MeSH, ICD-10, LOINC, RxNorm, HGNC. - Extracts medical entities and maps them to ontology codes with confidence scores. - Includes batch processing for large datasets, cross-ontology translation, and confidence scoring/validation. - Provides patterns and examples for clinical note normalization, literature indexing, drug name normalization, and EHR harmonization. - Quality checklist and common pitfalls documented for reliable deployment.
Metadata
Slug bio-ontology-mapper
Version 0.1.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Bio Ontology Mapper?

Map unstructured biomedical text to standardized ontologies (SNOMED CT, MeSH, ICD-10) for terminology normalization and semantic interoperability. Extracts m... It is an AI Agent Skill for Claude Code / OpenClaw, with 307 downloads so far.

How do I install Bio Ontology Mapper?

Run "/install bio-ontology-mapper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Bio Ontology Mapper free?

Yes, Bio Ontology Mapper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Bio Ontology Mapper support?

Bio Ontology Mapper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Bio Ontology Mapper?

It is built and maintained by renhaosu2024 (@renhaosu2024); the current version is v0.1.0.

💬 Comments