功能描述

基于ByteHouse MCP Server，生成数据资产目录和血缘分析的技能，用于获取数据库表结构、生成数据资产目录、分析表之间的血缘关系。当用户需要获取ByteHouse数据库的表结构、生成数据资产目录、分析表之间的血缘关系时，使用此Skill。

使用说明 (SKILL.md)

ByteHouse 数据资产和血缘分析 Skill

Name: Byted Bytehouse Data Asset Analyzer
Author: volcengine-skills

🔵 ByteHouse 品牌标识

「ByteHouse」—— 火山引擎云原生数据仓库，极速、稳定、安全、易用

本Skill基于ByteHouse MCP Server，提供完整的数据资产盘点和血缘分析能力

描述

基于ByteHouse MCP Server，生成数据资产目录和血缘分析的技能。

当以下情况时使用此 Skill: (1) 需要获取数据库表结构和字段信息 (2) 需要生成数据资产目录 (3) 需要分析表之间的血缘关系 (4) 用户提到"数据资产"、"血缘分析"、"表结构"、"字段分析"

前置条件

Python 3.8+
uv (已安装在 /root/.local/bin/uv)
ByteHouse MCP Server Skill - 本skill依赖 bytehouse-mcp skill提供的ByteHouse访问能力

依赖关系

本skill依赖 bytehouse-mcp skill，使用其提供的MCP Server访问ByteHouse。

确保 bytehouse-mcp skill已正确配置并可以正常使用。

📁 文件说明

SKILL.md - 本文件，技能主文档
data_asset_analyzer.py - 数据资产和血缘分析主程序
README.md - 快速入门指南

配置信息

ByteHouse连接配置

本skill复用 bytehouse-mcp skill的配置。请确保已在 bytehouse-mcp skill中配置好：

export BYTEHOUSE_HOST="\x3CByteHouse-host>"
export BYTEHOUSE_PORT="\x3CByteHouse-port>"
export BYTEHOUSE_USER="\x3CByteHouse-user>"
export BYTEHOUSE_PASSWORD="\x3CByteHouse-password>"
export BYTEHOUSE_SECURE="true"
export BYTEHOUSE_VERIFY="true"

🎯 功能特性

1. 完整Schema获取

获取指定数据库的所有表
获取每张表的所有字段
提取表引擎、注释等元数据
解析CREATE TABLE语句

2. 数据资产目录生成

表统计（总表数、总列数）
引擎分布统计
自动标签生成
表资产详情

3. 血缘分析

表关系识别（Distributed → Local）
列相似性分析
关系可视化

🚀 快速开始

方法1: 运行数据资产和血缘分析

cd /root/.openclaw/workspace/skills/data-asset-analyzer

# 先设置环境变量（复用bytehouse-mcp的配置）
export BYTEHOUSE_HOST="\x3CByteHouse-host>"
export BYTEHOUSE_PORT="\x3CByteHouse-port>"
export BYTEHOUSE_USER="\x3CByteHouse-user>"
export BYTEHOUSE_PASSWORD="\x3CByteHouse-password>"
export BYTEHOUSE_SECURE="true"
export BYTEHOUSE_VERIFY="true"

# 运行分析工具
uv run data_asset_analyzer.py

分析内容包括：

数据库完整schema（所有表和字段）
数据资产目录（表统计、引擎分布、自动标签）
血缘分析（表关系、列相似性）

输出文件（保存在 output/ 目录）：

schema_{database}_{timestamp}.json - 完整的数据库schema
catalog_{database}_{timestamp}.json - 数据资产目录
lineage_{database}_{timestamp}.json - 血缘分析报告

💻 程序化使用

使用分析器模块

#!/usr/bin/env python3
# /// script
# dependencies = [
#   "mcp>=1.0.0",
# ]
# ///

import asyncio
import sys
import os

# 添加bytehouse-mcp skill的路径
BYTEHOUSE_MCP_PATH = os.path.join(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
    "bytehouse-mcp"
)
sys.path.insert(0, BYTEHOUSE_MCP_PATH)

from data_asset_analyzer import DataAssetAnalyzer

async def main():
    analyzer = DataAssetAnalyzer()
    await analyzer.connect()
    
    # 分析数据库
    result = await analyzer.analyze_database("default")
    
    # result 包含:
    # - schema: 完整的数据库schema
    # - catalog: 数据资产目录
    # - lineage: 血缘分析
    # - files: 生成的文件路径

asyncio.run(main())

📊 输出文件说明

1. Schema文件 (`schema_*.json`)

包含数据库的完整结构：

{
  "database": "default",
  "analyzed_at": "2026-03-12T19:50:00",
  "tables": [
    {
      "name": "conversation_feedback",
      "comment": "",
      "engine": "Distributed",
      "columns": [
        {
          "name": "session_id",
          "type": "String",
          "comment": ""
        }
      ],
      "create_table_query": "CREATE TABLE ..."
    }
  ]
}

2. 数据资产目录 (`catalog_*.json`)

包含数据资产的统计信息：

{
  "database": "default",
  "generated_at": "2026-03-12T19:50:00",
  "summary": {
    "total_tables": 8,
    "total_columns": 45,
    "engines": {
      "Distributed": 4,
      "HaMergeTree": 3,
      "MergeTree": 1
    }
  },
  "tables": [
    {
      "name": "conversation_feedback",
      "comment": "",
      "engine": "Distributed",
      "column_count": 10,
      "columns": [...],
      "tags": ["distributed", "user-feedback"]
    }
  ]
}

3. 血缘分析 (`lineage_*.json`)

包含表关系和列相似性：

{
  "database": "default",
  "generated_at": "2026-03-12T19:50:00",
  "table_relationships": [
    {
      "source_table": "conversation_feedback",
      "relationships": [
        {
          "type": "distributed_to_local",
          "target_table": "conversation_feedback_local",
          "description": "Distributed表指向Local表"
        }
      ]
    }
  ],
  "column_similarities": [
    {
      "column_name": "session_id",
      "column_type": "String",
      "found_in_tables": [
        "conversation_feedback",
        "conversation_feedback_local"
      ]
    }
  ]
}

🏷️ 自动标签生成

分析器会根据表名和引擎自动生成标签：

标签	说明
`merge-tree`	使用MergeTree引擎
`distributed`	使用Distributed引擎
`high-availability`	使用HaMergeTree或HaUniqueMergeTree
`log-table`	表名包含"log"
`user-feedback`	表名包含"feedback"
`local-table`	表名以"_local"结尾
`test-table`	表名包含"test"

📚 更多信息

详细使用说明请参考 bytehouse-mcp skill

最后更新: 2026-03-12

安全使用建议

This skill appears to implement the advertised ByteHouse analysis, but it has several red flags you should consider before installing or running it: - Metadata vs runtime mismatch: The skill metadata declares no required env vars, but the documentation and code require ByteHouse credentials (BYTEHOUSE_HOST, BYTEHOUSE_PORT, BYTEHOUSE_USER, BYTEHOUSE_PASSWORD, etc.). Expect to provide those credentials for the tool to work. - Environment exposure: The script forwards the entire environment to a spawned subprocess. Any other secrets in your environment (AWS keys, GitHub tokens, etc.) could be exposed to that subprocess. Run only in an environment that contains no secrets you don't want shared, or modify the script to pass only the required ByteHouse variables. - Dynamic code execution from GitHub main: At runtime the tool invokes a local runner (uvx) and asks it to fetch and run code from the GitHub repo 'volcengine/mcp-server' on the main branch (subdirectory). This effectively executes upstream code at runtime unpinned to a release. Prefer a pinned release or vendored dependency and inspect that code before use. - Inconsistencies to fix: SKILL.md and the script reference different runner binaries ('uv' vs 'uvx'); metadata should declare required env vars; the code should avoid passing the full environment and should pin external references. Recommendations: 1) Do not run this in a production environment until you review/mitigate the issues above. 2) Inspect the referenced mcp-server code (the exact commit/subdirectory) and consider vendoring/pinning rather than pulling main at runtime. 3) Run in an isolated container or sandbox with only necessary env vars present. 4) Consider patching the script to pass a minimal env (only ByteHouse connection vars) and to use a pinned mcp-server release. 5) Verify and/or configure the dependent bytehouse-mcp skill separately and ensure credentials are stored and scoped appropriately. If you want, I can: (a) point out exact lines in the script to change to reduce env exposure, (b) suggest a safer way to invoke the MCP server (pin a commit/tag), or (c) help craft a minimal wrapper that only exports the specific ByteHouse env vars to the child process.

功能分析

Type: OpenClaw Skill Name: byted-bytehouse-data-asset-analyzer Version: 1.0.0 The skill bundle facilitates ByteHouse database analysis by downloading and executing a remote MCP server directly from a GitHub repository using 'uvx' in 'scripts/data_asset_analyzer.py'. While this functionality is aligned with the stated purpose and the repository (volcengine/mcp-server) appears related to the product, the practice of fetching and running remote payloads at runtime constitutes a significant supply chain risk and a high-risk execution capability. No evidence of intentional malice, such as unauthorized data exfiltration or backdoors, was found.

能力评估

ℹ Purpose & Capability

The code and SKILL.md implement ByteHouse schema discovery, catalog generation, and lineage analysis and explicitly depend on a separate bytehouse-mcp skill — that is coherent with the description. However, the package metadata declares no required environment variables or primary credential while the documentation and code clearly require ByteHouse connection env vars (BYTEHOUSE_HOST, BYTEHOUSE_PORT, BYTEHOUSE_USER, BYTEHOUSE_PASSWORD, etc.). This mismatch between declared requirements and actual usage is an incoherence.

⚠ Instruction Scope

SKILL.md instructs running the analyzer and to reuse bytehouse-mcp configuration; the runtime script starts a stdio MCP client by spawning an external command and uses env vars to configure connections. The script passes a complete copy of os.environ into the subprocess it starts, which means any secrets in the agent/process environment (not just the ByteHouse vars) will be exposed to that subprocess. The SKILL.md and script also refer to different helper binaries ('/root/.local/bin/uv' vs '/root/.local/bin/uvx'), an inconsistency that can cause unexpected behavior.

⚠ Install Mechanism

There is no declared install spec, but the runtime code invokes a local binary ('/root/.local/bin/uvx') and supplies a git+https URL referencing the GitHub repo's main branch with a subdirectory. That causes dynamic fetching and execution of external code from GitHub at runtime (using the main branch rather than pinned release), which increases risk because unreviewed upstream changes could be pulled and executed. Using GitHub is more reputable than arbitrary paste sites, but running code from the repo's main branch at runtime is higher risk than a pinned release or packaged dependency.

⚠ Credentials

The tool requires ByteHouse credentials in practice (documented in SKILL.md) but the skill metadata did not declare these required env vars. More importantly, the script forwards a full copy of the process environment to the spawned MCP server client process, which can expose unrelated secrets present in the environment (cloud credentials, tokens, etc.). The number and sensitivity of env values passed is disproportionate unless the environment is tightly controlled.

ℹ Persistence & Privilege

The skill does not request permanent/always-on presence and does not modify other skills or system-wide settings. Autonomous invocation is allowed by default (platform behavior) — combined with the other concerns (dynamic code fetch and broad env exposure) this increases the blast radius if the spawned code is malicious or compromised.

版本历史

v1.0.0

byted-bytehouse-data-asset-analyzer 1.0.0 – Initial Release - Provides data asset catalog and lineage analysis for ByteHouse via MCP Server. - Supports extraction of full table and field schema, asset catalog generation, and table lineage analysis. - Outputs include: schema, catalog, and lineage JSON files with detailed metadata and statistics. - Automatic tagging for tables based on naming patterns and engine types. - Requires the bytehouse-mcp skill for ByteHouse server connectivity.

元数据

Slug byted-bytehouse-data-asset-analyzer

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Byted Bytehouse Data Asset Analyzer 是什么？

基于ByteHouse MCP Server，生成数据资产目录和血缘分析的技能，用于获取数据库表结构、生成数据资产目录、分析表之间的血缘关系。当用户需要获取ByteHouse数据库的表结构、生成数据资产目录、分析表之间的血缘关系时，使用此Skill。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 95 次。

如何安装 Byted Bytehouse Data Asset Analyzer？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install byted-bytehouse-data-asset-analyzer」即可一键安装，无需额外配置。

Byted Bytehouse Data Asset Analyzer 是免费的吗？

是的，Byted Bytehouse Data Asset Analyzer 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Byted Bytehouse Data Asset Analyzer 支持哪些平台？

Byted Bytehouse Data Asset Analyzer 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Byted Bytehouse Data Asset Analyzer？

由 volcengine-skills（@volcengine-skills）开发并维护，当前版本 v1.0.0。

Byted Bytehouse Data Asset Analyzer