← Back to Skills Marketplace
bahuia

HeteroMind - Unified Knowledge QA

by Yongrui Chen · GitHub ↗ · v0.3.0 · MIT-0
cross-platform ⚠ suspicious
81
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install heteromind
Description
Unified heterogeneous knowledge QA system. Automatically routes natural language queries to SQL databases, Knowledge Graphs, or table files using 4-layer det...
README (SKILL.md)

HeteroMind

Unified heterogeneous knowledge QA system with automatic source detection and multi-stage reasoning.

Core Concept

Natural language queries are automatically routed to the appropriate knowledge source (SQL, Knowledge Graph, or Table files) without requiring users to specify the data source. A 4-layer detection architecture ensures accurate source identification, followed by multi-stage query generation with self-revision and voting.

User Query → Source Detection (4 layers) → Query Generation → Self-Revision → Voting → Execution → Answer

When to Use

Trigger Action
"How many employees in X?" NL2SQL engine
"Who is the founder of X?" NL2SPARQL engine (KG)
"Which quarter had highest sales?" TableQA engine
"Show average salary by department" Auto-detect SQL
Queries with aggregations, filters, joins Route to SQL
Entity relationship queries Route to KG
Questions about CSV/Excel files Route to TableQA
Multi-hop queries across sources Decompose + fuse

Architecture

4-Layer Source Detection

Layer 1 (15%): Rule-Based
  - 20+ keywords per source type
  - 7 regex patterns (aggregation, comparison, relation)
  - Fast pre-filtering

Layer 2 (35%): LLM Semantic
  - Intent classification
  - Entity/predicate detection
  - Multi-hop identification

Layer 3a (25%): SQL Schema Match
  - Inverted index on tables/columns
  - Automatic JOIN inference
  - Confidence scoring

Layer 3b (25%): KG Entity Link
  - Entity mention extraction
  - SPARQL endpoint lookup
  - Predicate pattern matching

Layer 3c (25%+30%): Entity Verification
  - Cross-source entity existence check
  - 30% score boost for verified entities

Layer 4: Multi-Source Fusion
  - Weighted aggregation
  - Execution plan generation

Query Generation Pipeline

1. Schema/Entity Linking     → Identify relevant tables/columns/entities
2. Parallel Generation       → Generate 3 candidates concurrently
3. Multi-Round Revision      → 2 rounds of self-review
4. Validation               → Syntax and semantic checks
5. Voting                   → Select best candidate
6. Execution                → Run query
7. Result Verification      → Validate reasonableness

Engines

NL2SQL Engine

from src.engines.nl2sql.multi_stage_engine import MultiStageNL2SQLEngine

engine = MultiStageNL2SQLEngine({
    "name": "sql_engine",
    "schema": schema,
    "llm_config": {
        "model": "deepseek-chat",
        "api_key": "sk-...",
    },
    "generation_config": {
        "num_candidates": 3,
        "max_revisions": 2,
        "parallel_generation": True,
    },
})

result = await engine.execute("How many employees in Engineering?", {})

Features:

  • Schema linking (rule-based + LLM)
  • Parallel SQL candidate generation
  • Multi-round self-revision
  • Voting mechanism
  • Result verification

NL2SPARQL Engine

from src.engines.nl2sparql.multi_stage_engine import MultiStageNL2SPARQLEngine

engine = MultiStageNL2SPARQLEngine({
    "name": "sparql_engine",
    "endpoint_url": "https://dbpedia.org/sparql",
    "ontology": ontology,
    "llm_config": {"model": "gpt-4", "api_key": "sk-..."},
})

result = await engine.execute("Who founded Microsoft?", {})

Features:

  • Entity linking to KG
  • Ontology retrieval
  • SPARQL generation with revision
  • Multi-endpoint support

TableQA Engine

from src.engines.table_qa.multi_stage_engine import MultiStageTableQAEngine

engine = MultiStageTableQAEngine({
    "name": "table_engine",
    "table_path": "data/sales.csv",
    "llm_config": {"model": "deepseek-chat", "api_key": "sk-..."},
})

result = await engine.execute("Which quarter had highest sales?", {})

Features:

  • Table schema analysis
  • Query intent interpretation
  • Pandas code generation
  • Safe execution sandbox

Multi-LLM Support

Override model and API key at runtime:

# Initialize with default
engine = MultiStageNL2SQLEngine({
    "llm_config": {"model": "deepseek-chat", "api_key": "sk-deepseek-key"},
})

# Override per-call
result = await engine.execute(
    query="Complex query",
    context={},
    model="gpt-4-turbo",      # Override model
    api_key="sk-openai-key",  # Override API key
)

Supported Providers

Provider Models Configuration
DeepSeek deepseek-chat base_url: https://api.deepseek.com/v1
OpenAI gpt-4, gpt-3.5-turbo Default endpoint
Azure OpenAI gpt-4 base_url: https://{resource}.openai.azure.com
Local (Ollama) llama2, mistral base_url: http://localhost:11434/v1

Configuration

LLM Configuration

llm_config:
  model: deepseek-chat
  api_key: sk-...
  base_url: https://api.deepseek.com/v1  # Optional
  temperature: 0.1
  max_tokens: 500
  timeout: 30

Generation Configuration

generation_config:
  num_candidates: 3           # SQL/SPARQL candidates to generate
  max_revisions: 2            # Self-revision rounds
  parallel_generation: true   # Concurrent candidate generation
  voting_enabled: true        # Multi-candidate voting

Source Detection Weights

weights:
  rule_based: 0.15      # Layer 1
  llm_based: 0.35       # Layer 2
  schema_based: 0.25    # Layer 3a/3b
  verification: 0.25    # Layer 3c
verification_boost: 0.3  # 30% boost for verified entities

Workflows

Complete Query Flow

from src.orchestrator import HeteroMindOrchestrator

orchestrator = HeteroMindOrchestrator({
    "source_detection": {
        "layer2": {"api_key": "sk-...", "model": "gpt-4"},
        "layer3": {"schemas": [schema], "kg_endpoints": [...]},
    },
    "engines": {
        "sql": [{"name": "default", "enabled": True}],
        "sparql": [{"name": "default", "enabled": True}],
        "table_qa": [{"name": "default", "enabled": True}],
    },
})

response = await orchestrator.query("How many employees in Engineering?")
print(f"Answer: {response.answer}")
print(f"Source: {response.sources}")
print(f"Confidence: {response.confidence:.2f}")

Source Detection Only

from src.classifier import SourceDetectorOrchestrator

detector = SourceDetectorOrchestrator({
    "layer2": {"api_key": "sk-...", "model": "gpt-4"},
    "layer3": {"schemas": [schema]},
})

decision = await detector.detect("How many employees?")
print(f"Primary Source: {decision.primary_source.value}")
print(f"Confidence: {decision.confidence:.2f}")
print(f"Execution Plan: {decision.execution_plan}")

Test Results

Engine Tests Passed Accuracy Avg Confidence Avg Time
SQL (NL2SQL) 3 3 100.0% 0.60 22.5s
SPARQL (NL2SPARQL) 2 2 100.0% 0.20 36.3s
TableQA 3 3 100.0% 0.62 24.2s
Overall 8 8 100.0% 0.51 26.6s

Environment Variables

Required (for LLM-based generation)

Variable Description Example
DEEPSEEK_API_KEY DeepSeek API key sk-...
OPENAI_API_KEY OpenAI API key sk-...

Optional (for specific features)

Variable Description Example
MYSQL_CONNECTION_STRING MySQL database connection mysql://user:pass@host/db
CUSTOM_KG_ENDPOINT Custom KG SPARQL endpoint https://example.com/sparql
WORKSPACE Base path for table file scanning /path/to/workspace

Setup

# Copy example env file
cp .env.example .env

# Edit with your credentials
nano .env

# Load environment
export $(cat .env | xargs)

Installation

cd HeteroMind
pip install -r requirements.txt

Requirements

  • Python 3.10+
  • aiohttp, pandas, openpyxl
  • OpenAI-compatible API key (optional)

Project Structure

HeteroMind/
├── src/
│   ├── classifier/          # 4-layer source detection
│   │   ├── rule_detector.py      # Layer 1
│   │   ├── llm_detector.py       # Layer 2
│   │   ├── sql_schema_matcher.py # Layer 3a
│   │   ├── kg_entity_linker.py   # Layer 3b
│   │   ├── entity_verifier.py    # Layer 3c
│   │   └── source_fusion.py      # Layer 4
│   ├── engines/             # Query engines
│   │   ├── nl2sql/
│   │   ├── nl2sparql/
│   │   └── table_qa/
│   ├── decomposer/          # Task decomposition
│   ├── fusion/              # Result fusion
│   ├── generator/           # Answer generation
│   └── orchestrator.py      # Main orchestrator
├── config/
│   └── source_detection.yaml
├── tests/
│   └── test_data/
├── comprehensive_tests.py
└── SKILL.md

Examples

SQL: Aggregation with Filter

Query: "How many employees are in the Engineering department?"

Generated SQL:

SELECT COUNT(*) FROM employees e 
JOIN departments d ON e.department_id = d.id 
WHERE d.name = 'Engineering'

SPARQL: Entity Relationship

Query: "Who is the founder of Microsoft?"

Generated SPARQL:

SELECT ?founder WHERE {
    \x3Chttp://dbpedia.org/resource/Microsoft> 
    \x3Chttp://dbpedia.org/ontology/founder> ?founder
}

TableQA: Aggregation

Query: "Which quarter had the highest sales in 2024?"

Generated Code:

result = df.groupby('quarter')['sales'].sum().idxmax()

Skill Contract

Skills that use HeteroMind should declare:

heteromind:
  reads: [Database Schema, KG Ontology, Table Files]
  writes: [Generated SQL, SPARQL, Pandas Code]
  requires:
    - LLM API key (for generation stages)
    - Schema metadata (for source detection)
  postconditions:
    - Generated query passes validation
    - Result verified for reasonableness

Integration Patterns

With Agent Memory

Log query execution for audit:

from src.orchestrator import HeteroMindOrchestrator

orchestrator = HeteroMindOrchestrator(config)
response = await orchestrator.query(query)

# Log to agent memory
memory.record({
    "action": "knowledge_query",
    "query": query,
    "source": response.sources,
    "confidence": response.confidence,
    "answer": response.answer,
})

Multi-Source Fusion

For queries requiring multiple sources:

# Query automatically detects hybrid need
response = await orchestrator.query(
    "Show employees who published papers"
)
# Routes to: SQL (employees) + KG (papers) + Fusion

References

  • README.md — Full documentation and API reference
  • USAGE.md — Detailed usage guide with multi-LLM examples
  • config/source_detection.yaml — Detection configuration
  • tests/test_data/ — Example schemas and test data

Version: 0.1.0
Last Updated: 2026-04-12
Test Coverage: 100.0% accuracy on 8 test cases

Usage Guidance
Do not install or supply credentials until the metadata mismatch is resolved. Specific steps to consider before enabling this skill: 1) Ask the publisher/registry why registry metadata lists no required env vars while SKILL.md requires DEEPSEEK_API_KEY and OPENAI_API_KEY. 2) If you proceed, provide only least-privilege credentials (read-only DB users, scoped API keys) and explicit table paths rather than wildcard/mounted workspaces. 3) Keep auto_execute disabled and require confirmation; review and test generated queries in a safe/non-production database. 4) Be aware logging/debug options can capture intermediate queries/results—disable verbose logging if data sensitivity is a concern. 5) Inspect SECURITY.md and source files (especially src/utils/api_security.py and any logging code) for how secrets and outputs are handled. 6) Consider running the package in an isolated environment first (no production credentials) to verify behavior.
Capability Analysis
Type: OpenClaw Skill Name: heteromind Version: 0.3.0 The HeteroMind skill bundle implements a complex QA system with high-risk execution patterns, most notably the use of `exec()` in `src/engines/table_qa/multi_stage_engine.py` to run LLM-generated Python code. It also executes generated SQL and SPARQL queries, which are susceptible to injection attacks if the LLM is manipulated. While the bundle includes defensive utilities in `src/utils/api_security.py` and a detailed `SECURITY.md` acknowledging these risks, the 'sandbox' provided for Python execution is a simple dictionary scope that is easily bypassed. No clear evidence of intentional malice was found, but the architecture presents a significant RCE (Remote Code Execution) attack surface.
Capability Tags
requires-oauth-token
Capability Assessment
Purpose & Capability
The functionality (NL→SQL/NL→SPARQL/TableQA, multi-LLM support) justifies requesting LLM API keys and optional DB connection strings; however the registry metadata claims no required environment variables or credentials while SKILL.md lists required_env_vars (DEEPSEEK_API_KEY, OPENAI_API_KEY) and optional connection strings. This mismatch between metadata and the runtime instructions/code is an incoherence that should be resolved before trusting the package.
Instruction Scope
SKILL.md instructs the agent to use LLM API keys, to connect to SQL/PG/MySQL endpoints and optional custom KG endpoints, and to read explicitly-specified table files. That behavior is in-scope for a heterogeneous QA engine. Two things to flag: (1) default config enables detailed logging (log_layer_outputs, log_verification_details) which may record intermediate query text / schema / results (potentially sensitive), and (2) per-call API key/endpoint overrides permit the skill to be directed to use arbitrary keys/endpoints at runtime. SKILL.md does not instruct reading unrelated system files or exfiltrating data to hidden endpoints.
Install Mechanism
No install spec (instruction-only) is present, and the package includes code and a plain requirements.txt referencing common PyPI packages (openai, pandas, sqlalchemy, rdflib, etc.). There are no downloads from arbitrary URLs or extract/install steps in the provided metadata. Nothing in the install footprint indicates hidden remote installers or unusual persistence.
Credentials
The environment variables referenced in SKILL.md (DEEPSEEK_API_KEY, OPENAI_API_KEY, MYSQL_CONNECTION_STRING, POSTGRES_CONNECTION_STRING, CUSTOM_KG_ENDPOINT, TABLE_PATHS) are plausible for the stated purpose. The concern is the mismatch: registry metadata declared no required env vars while SKILL.md marks two API keys as required and several sensitive optional values. That mismatch can lead to surprise credential prompts. Also, the skill can be configured to query databases and read files — supply only least-privilege credentials and explicit table paths.
Persistence & Privilege
The skill does not request 'always: true' and does not declare modifications to other skills or system-wide configuration. SKILL.md and config default to auto_execute=false and require_confirmation=true for safety, which is appropriate for a tool that runs queries against user data.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install heteromind
  3. After installation, invoke the skill by name or use /heteromind
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.3.0
- Added required and optional environment variables to SKILL.md for easier setup and integration. - Listed DEEPSEEK_API_KEY and OPENAI_API_KEY as required variables for LLM-based features. - Introduced optional variables for database and table source configuration (e.g., MYSQL_CONNECTION_STRING, TABLE_PATHS). - Added SECURITY.md and RESPONSE_TO_REVIEW.md files for improved documentation and security practices.
v0.2.0
**Major update: Multi-stage engine core and comprehensive detection/classification modules added.** - Introduced core query engines (`nl2sql`, `nl2sparql`, `table_qa`) with multi-stage orchestration and self-revision workflows. - Added new source detection modules using a detailed 4-layer approach (rule-based, LLM, schema/entity matching, verification). - Implemented support for multi-LLM providers and per-query model override. - Enhanced system documentation with architecture, configuration, and usage workflows. - Comprehensive test results and detailed project structure included.
v0.1.0
Initial release of heteromind – a unified QA system for heterogeneous knowledge sources. - Automatically routes natural language queries to SQL databases, knowledge graphs, or table files - Uses a 4-layer detection architecture for smart source selection and verification - Supports multi-hop queries and result fusion across data types - Handles both Chinese and English queries - Demonstrated 100% accuracy in initial benchmark tests - Provides NL-to-SQL, NL-to-SPARQL, and table question answering capabilities
Metadata
Slug heteromind
Version 0.3.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is HeteroMind - Unified Knowledge QA?

Unified heterogeneous knowledge QA system. Automatically routes natural language queries to SQL databases, Knowledge Graphs, or table files using 4-layer det... It is an AI Agent Skill for Claude Code / OpenClaw, with 81 downloads so far.

How do I install HeteroMind - Unified Knowledge QA?

Run "/install heteromind" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is HeteroMind - Unified Knowledge QA free?

Yes, HeteroMind - Unified Knowledge QA is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does HeteroMind - Unified Knowledge QA support?

HeteroMind - Unified Knowledge QA is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created HeteroMind - Unified Knowledge QA?

It is built and maintained by Yongrui Chen (@bahuia); the current version is v0.3.0.

💬 Comments