Description

Map unstructured biomedical text to standardized ontologies (SNOMED CT.

README (SKILL.md)

\r

Bio-Ontology Mapper\r

Name: Bio-Ontology Mapper
Author: aipoch-ai

\r

When to Use\r

\r

Use this skill when the task is to Map unstructured biomedical text to standardized ontologies (SNOMED CT.\r
Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.\r
Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.\r \r

Key Features\r

\r

Scope-focused workflow aligned to: Map unstructured biomedical text to standardized ontologies (SNOMED CT.\r
Packaged executable path(s): scripts/main.py.\r
Reference material available in references/ for task-specific guidance.\r
Structured execution path designed to keep outputs consistent and reviewable.\r \r

Dependencies\r

\r

Python: 3.10+. Repository baseline for current packaged skills.\r
dataclasses: unspecified. Declared in requirements.txt.\r
difflib: unspecified. Declared in requirements.txt.\r \r

Example Usage\r

\r

cd "20260318/scientific-skills/Evidence Insight/bio-ontology-mapper"\r
python -m py_compile scripts/main.py\r
python scripts/main.py --help\r
```\r
\r
Example run plan:\r
1. Confirm the user input, output path, and any required config values.\r
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.\r
3. Run `python scripts/main.py` with the validated inputs.\r
4. Review the generated output and return the final artifact with any assumptions called out.\r
\r
## Implementation Details\r
\r
See `## Workflow` above for related details.\r
\r
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.\r
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.\r
- Primary implementation surface: `scripts/main.py`.\r
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.\r
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.\r
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.\r
\r
## Quick Check\r
\r
Use this command to verify that the packaged script entry point can be parsed before deeper execution.\r
\r
```bash\r
python -m py_compile scripts/main.py\r
```\r
\r
## Audit-Ready Commands\r
\r
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.\r
\r
```bash\r
python -m py_compile scripts/main.py\r
python scripts/main.py --help\r
```\r
\r
## Workflow\r
\r
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.\r
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.\r
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.\r
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.\r
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.\r
\r
## Overview\r
\r
Biomedical terminology normalization tool that maps free-text clinical and scientific concepts to standardized ontologies for semantic interoperability and data harmonization.\r
\r
**Key Capabilities:**\r
- **Multi-Ontology Support**: SNOMED CT, MeSH, ICD-10, LOINC, RxNorm\r
- **Entity Extraction**: NER for diseases, symptoms, procedures, drugs\r
- **Fuzzy Matching**: Handle typos, abbreviations, and synonyms\r
- **Confidence Scoring**: Reliability metrics for each mapping\r
- **Batch Processing**: Normalize large datasets efficiently\r
- **Cross-Mapping**: Translate between ontology systems\r
\r
## Core Capabilities\r
\r
### 1. Entity Recognition and Mapping\r
\r
Extract and map biomedical entities to ontologies:\r
\r
```python\r
from scripts.mapper import BioOntologyMapper\r
\r
mapper = BioOntologyMapper()\r
\r
# Map clinical text\r
result = mapper.map_text(\r
    text="Patient has diabetes and hypertension, taking metformin",\r
    ontologies=["snomed", "mesh", "rxnorm"],\r
    confidence_threshold=0.7\r
)\r
\r
for entity in result.entities:\r
    print(f"{entity.text} → {entity.concept_id} ({entity.ontology})")\r
    print(f"  Preferred: {entity.preferred_term}")\r
    print(f"  Confidence: {entity.confidence:.2f}")\r
```\r
\r
**Supported Ontologies:**\r
| Ontology | Domain | Use Case |\r
|----------|--------|----------|\r
| **SNOMED CT** | Clinical | EHR interoperability |\r
| **MeSH** | Literature | PubMed indexing |\r
| **ICD-10** | Billing | Diagnosis codes |\r
| **LOINC** | Labs | Test result standardization |\r
| **RxNorm** | Drugs | Medication normalization |\r
| **HGNC** | Genes | Gene name standardization |\r
\r
### 2. Cross-Ontology Translation\r
\r
Map concepts between different ontologies:\r
\r
```python\r
\r
# Cross-map SNOMED to ICD-10\r
translation = mapper.cross_map(\r
    source_id="22298006",  # SNOMED: Myocardial infarction\r
    source_ontology="snomed",\r
    target_ontology="icd10"\r
)\r
\r
print(f"ICD-10: {translation.target_id} - {translation.target_term}")\r
\r
# Output: I21.9 - Acute myocardial infarction, unspecified\r
```\r
\r
**Cross-Mapping Coverage:**\r
- SNOMED CT ↔ ICD-10-CM (clinical modifications)\r
- MeSH ↔ SNOMED CT (literature to clinical)\r
- RxNorm ↔ ATC (drug classifications)\r
- LOINC ↔ SNOMED (lab to clinical)\r
\r
### 3. Batch Normalization\r
\r
Process large datasets:\r
\r
```python\r
\r
# Batch process CSV\r
results = mapper.batch_map(\r
    input_file="clinical_terms.csv",\r
    text_column="diagnosis_description",\r
    ontologies=["snomed", "icd10"],\r
    output_format="csv",\r
    max_workers=4\r
)\r
\r
# Results include:\r
\r
# - Original term\r
\r
# - Mapped concept ID\r
\r
# - Confidence score\r
\r
# - Alternative mappings (if ambiguous)\r
```\r
\r
**Performance:**\r
- ~100 terms/second (with caching)\r
- ~20 terms/second (API lookup)\r
- Parallel processing for large datasets\r
\r
### 4. Confidence Scoring and Validation\r
\r
Assess mapping reliability:\r
\r
```python\r
scoring = mapper.score_mapping(\r
    term="heart attack",\r
    candidate="22298006",  # Myocardial infarction\r
    factors=["string_similarity", "context_match", "frequency"]\r
)\r
\r
print(f"Overall confidence: {scoring.confidence:.2f}")\r
print(f"Breakdown: {scoring.factors}")\r
```\r
\r
**Scoring Factors:**\r
- **String similarity**: Levenshtein distance, n-grams\r
- **Context match**: Surrounding words alignment\r
- **Frequency**: Common usage in corpus\r
- **Semantic similarity**: Vector embeddings\r
\r
## Quality Checklist\r
\r
**Pre-Mapping:**\r
- [ ] Text preprocessed (lowercase, punctuation handled)\r
- [ ] Abbreviations expanded where possible\r
- [ ] Language identified (multilingual support)\r
\r
**During Mapping:**\r
- [ ] Confidence threshold appropriate (>0.7 for clinical)\r
- [ ] Multiple candidates considered for ambiguous terms\r
- [ ] Context used for disambiguation\r
\r
**Post-Mapping:**\r
- [ ] Low-confidence mappings flagged for review\r
- [ ] Unmapped terms logged\r
- [ ] **CRITICAL**: Clinical expert validation for high-stakes use\r
\r
**Before Production:**\r
- [ ] Mapping accuracy validated on gold standard\r
- [ ] False positive rate acceptable (\x3C5%)\r
- [ ] Recall acceptable for use case (>90%)\r
- [ ] API rate limits respected\r
\r
## Common Pitfalls\r
\r
**Mapping Errors:**\r
- ❌ **Abbreviation ambiguity** → "MI" = Myocardial infarction OR Michigan\r
  - ✅ Use context; flag for manual review\r
\r
- ❌ **Outdated terms** → Old terminology not in current ontology\r
  - ✅ Use historical mappings; update terminology\r
\r
- ❌ **False confidence** → High score for wrong concept\r
  - ✅ Always review top-3 candidates\r
\r
**Technical Issues:**\r
- ❌ **API failures** → No local fallback\r
  - ✅ Implement caching; use local reference files\r
\r
- ❌ **Version mismatches** → Different ontology versions\r
  - ✅ Track ontology version used\r
\r
- ❌ **PHI exposure** → Sending patient data to external APIs\r
  - ✅ De-identify before API calls; use local processing when possible\r
\r
## References\r
\r
Available in `references/` directory:\r
\r
- `snomed_ct_guide.md` - SNOMED CT hierarchy and relationships\r
- `mesh_structure.md` - MeSH tree structure and qualifiers\r
- `ontology_mappings.md` - Crosswalks between systems\r
- `nlp_best_practices.md` - Biomedical text processing\r
- `api_documentation.md` - External service integration\r
- `validation_datasets.md` - Gold standard test sets\r
\r
## Scripts\r
\r
Located in `scripts/` directory:\r
\r
- `main.py` - CLI interface for mapping\r
- `mapper.py` - Core ontology mapping engine\r
- `extractor.py` - Named entity recognition\r
- `cross_mapper.py` - Ontology-to-ontology translation\r
- `scorer.py` - Confidence calculation\r
- `batch_processor.py` - Large dataset handling\r
- `validator.py` - Mapping quality checks\r
- `caching.py` - Local storage for frequent lookups\r
\r
## Limitations\r
\r
- **Ambiguity**: Many-to-many mappings common; context required\r
- **Coverage**: Rare diseases and new concepts may not be in ontologies\r
- **Versioning**: Ontology updates can change mappings over time\r
- **Language**: Best support for English; other languages limited\r
- **Real-time**: Not suitable for time-critical clinical applications\r
- **API Dependency**: Requires internet for most lookups (caching helps)\r
\r
---\r
\r
**⚠️ Critical: Ontology mapping is for research and data integration, not clinical decision-making. Always validate mappings with domain experts before use in patient care contexts. Never process PHI without appropriate de-identification and compliance measures.**\r
\r
## Parameters\r
\r
| Parameter | Type | Default | Description |\r
|-----------|------|---------|-------------|\r
| `--term` | str | Required | Single term to map |\r
| `--input` | str | Required | Input file path |\r
| `--output` | str | Required | Output file path |\r
| `--ontology` | str | 'both' |  |\r
| `--threshold` | float | 0.7 |  |\r
| `--format` | str | 'json' |  |\r
| `--use-api` | str | Required | Use UMLS/MeSH APIs |\r
| `--api-key` | str | Required |  |\r
\r
## Output Requirements\r
\r
Every final response should make these items explicit when they are relevant:\r
\r
- Objective or requested deliverable\r
- Inputs used and assumptions introduced\r
- Workflow or decision path\r
- Core result, recommendation, or artifact\r
- Constraints, risks, caveats, or validation needs\r
- Unresolved items and next-step checks\r
\r
## Error Handling\r
\r
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.\r
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.\r
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.\r
- Do not fabricate files, citations, data, search results, or execution outcomes.\r
\r
## Input Validation\r
\r
This skill accepts requests that match the documented purpose of `bio-ontology-mapper` and include enough context to complete the workflow safely.\r
\r
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:\r
\r
> `bio-ontology-mapper` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.\r
\r
## Response Template\r
\r
Use the following fixed structure for non-trivial requests:\r
\r
1. Objective\r
2. Inputs Received\r
3. Assumptions\r
4. Workflow\r
5. Deliverable\r
6. Risks and Limits\r
7. Next Checks\r
\r
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.\r

Usage Guidance

This skill appears to implement local SNOMED/MeSH mapping and optionally calls UMLS and MeSH web APIs. Before installing or running it: - Review scripts/main.py fully (it performs outbound HTTP requests to UMLS UTS and NLM MeSH). - If you will provide an API key, be aware the code reads UMLS_API_KEY from the environment — the registry metadata does NOT list this, so treat providing the key as granting the skill access to that service. Use a scoped, auditable key if possible. - Verify whether you need the remote API: you can run the mapper in offline/local mode (the code supports local reference files) if you want to avoid network calls and exposing sensitive text externally. - Confirm whether the broader ontology features advertised (ICD-10, LOINC, RxNorm, HGNC, cross-mapping) are actually implemented for your workflow; the visible code mainly covers SNOMED and MeSH. - Because the source/homepage are unknown, prefer running the script in a sandboxed environment and testing with non-sensitive sample data (the repository includes sample reference files) before using it on real clinical data.

Capability Analysis

Type: OpenClaw Skill Name: bio-ontology-mapper-1 Version: 1.0.0 The bio-ontology-mapper skill is a legitimate tool for mapping biomedical text to SNOMED CT and MeSH ontologies. The core logic in scripts/main.py uses standard libraries for fuzzy matching and interacts with official National Library of Medicine (NLM) APIs (UMLS and MeSH) for data retrieval. No evidence of malicious behavior, data exfiltration, or unauthorized execution was found; the use of the UMLS_API_KEY environment variable is consistent with the tool's documented purpose.

Capability Assessment

ℹ Purpose & Capability

The name/description and SKILL.md describe an ontology mapper (SNOMED, MeSH, ICD-10, LOINC, RxNorm, HGNC). The packaged code implements local SNOMED/MeSH matching and includes clients for UMLS (UTS) and MeSH APIs. However the skill metadata declared no required env vars or credentials, yet the code will use an environment variable UMLS_API_KEY if present. Also the broad multi-ontology claims (LOINC, RxNorm, HGNC, ICD-10 cross-mapping) are described in SKILL.md but the visible code primarily implements SNOMED and MeSH (UMLS could enable other cross-maps but that requires API access). This mismatch between advertised capabilities and the actual implementation is inconsistent and unexplained.

ℹ Instruction Scope

SKILL.md is focused and instructs to run the packaged script, compile it, and validate inputs/CONFIG before execution. It does not instruct the agent to read unrelated system files. However it omits mentioning that the script can make outbound calls to external APIs (UMLS UTS and NLM MeSH), and it does not declare the UMLS_API_KEY env var which the script will read if available. The runtime instructions are otherwise scoped to the mapping task.

✓ Install Mechanism

No install spec — the skill is instruction-plus-code and does not download external installers. This is the lower-risk model for installation. The code is packaged in the skill repository rather than fetched at runtime.

⚠ Credentials

Registry/metadata declare no required environment variables or primary credential, but scripts/main.py reads UMLS_API_KEY from the environment and will use it if present. The skill will make network requests to UMLS and MeSH APIs; providing the UMLS_API_KEY grants it access to the UMLS service. requirements.txt also lists 'dataclasses' and 'difflib' (both standard library in Python 3.10), which is odd but harmless. The undeclared credential requirement (UMLS_API_KEY) and network access are disproportionate to the metadata/manifest and should be made explicit before use.

✓ Persistence & Privilege

The skill does not request persistent privileges, does not set always:true, and does not include install steps that modify system-wide settings. Autonomous invocation is allowed (platform default) but is not combined with elevated persistence in this package.

Version History

v1.0.0

Bio-Ontology Mapper v1.0.0 - Initial release of bio-ontology-mapper for mapping unstructured biomedical text to standardized ontologies (SNOMED CT, MeSH, ICD-10, LOINC, RxNorm, HGNC). - Supports entity recognition, fuzzy matching, confidence scoring, batch normalization, cross-ontology translation, and reliability validation. - Includes workflow guidance, audit-ready commands, quick checks for entry points, and a detailed quality checklist. - Provides structured results, fallback paths for missing inputs or errors, and emphasizes reproducibility and explicit assumptions. - Example Python interface and CLI usage included for mapping, scoring, and batch processing tasks.

Metadata

Slug bio-ontology-mapper-1

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Bio-Ontology Mapper?

Map unstructured biomedical text to standardized ontologies (SNOMED CT. It is an AI Agent Skill for Claude Code / OpenClaw, with 90 downloads so far.

How do I install Bio-Ontology Mapper?

Run "/install bio-ontology-mapper-1" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Bio-Ontology Mapper free?

Yes, Bio-Ontology Mapper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Bio-Ontology Mapper support?

Bio-Ontology Mapper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Bio-Ontology Mapper?

It is built and maintained by AIpoch (@aipoch-ai); the current version is v1.0.0.

More Skills

Bio-Ontology Mapper