← 返回 Skills 市场
sdk-team

Alibabacloud Emr Starrocks Assistant

作者 alibabacloud-skills-team · GitHub ↗ · v0.0.1 · MIT-0
cross-platform ⚠ pending
30
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install alibabacloud-emr-starrocks-assistant
功能描述
Alibaba Cloud EMR Serverless StarRocks development & operations assistant. Covers five scenarios: cluster connection, schema design, data ingestion, SQL deve...
使用说明 (SKILL.md)

Alibaba Cloud EMR Serverless StarRocks Development & Operations Assistant

Help users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled srsql CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires --yes confirmation before execution.

Scope statement: This Skill focuses on using StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, materialized view refresh, GRANT, etc.) are supported and execute under the user's own account, gated by sqlglot classification + --yes confirmation. Instance-lifecycle operations (create, scale, restart, configuration change, version upgrade) are control-plane operations and are not in this Skill's scope; please use the EMR Serverless console or the corresponding OpenAPI.

When to use / When not to use

When to use:

  • Schema design (table model, partitioning, bucketing, sort key, indexes, storage parameters)
  • Data ingestion selection (Stream/Broker/Routine Load, INSERT, Pipe, Flink/Kafka Connector, CDC)
  • SQL writing, rewriting, and tuning (JOIN strategy, window functions, CTE, aggregation optimization, statistics)
  • Materialized view design and operations
  • Cluster health diagnostics (FE/BE/CN nodes, tablet health, compaction, warehouse, recent failed loads)

When NOT to use:

  • Instance lifecycle control: create / scale / restart / config change / upgrade StarRocks instances — these are control-plane operations; use the EMR Serverless console or the corresponding OpenAPI instead
  • Operating non-StarRocks products: EMR Cluster, Spark, Milvus, ClickHouse, Doris, RDS, ECS, etc.

First-time setup: install & log in

This Skill ships with the sr-connect Python CLI. See references/connect.md for details.

Assistant bootstrap protocol (instructions for Claude)

When this Skill is invoked and you anticipate running any cluster query, ensure srsql is available before asking the user for anything:

  1. Run which srsql. If it returns a path, skip to step 4.
  2. If missing, install it yourself: uv tool install \x3Cskill-project-root> where \x3Cskill-project-root> is the directory containing this SKILL.md and pyproject.toml (the Skill's base directory shown at invocation time; commonly ~/.claude/skills/alibabacloud-emr-starrocks-assistant/, which may be a symlink). Do not ask the user to run this — the bundled CLI is part of the Skill's capability surface, not user infrastructure.
  3. If uv itself is missing (which uv fails), surface that to the user — uv is a system tool and not auto-installed.
  4. Check ~/.starrocks/{profile}.cnf (default profile name: default; respect SR_PROFILE env var if set). If it exists, skip to step 5. If missing:
    • First try sr-login --from-env. Safe to call unconditionally — it exits 2 with a clear "missing" message when the environment doesn't have the credentials it needs, and does nothing else. You do not need to inspect environment variables yourself.
    • If sr-login --from-env exits non-zero, the user hasn't logged in yet. Give them the sr-login --host ... --user ... command and ask them to run it themselves. Do not run interactive sr-login yourself — it would block on a password prompt you cannot answer.
  5. After both srsql is on PATH and the profile file exists, run queries via srsql -e "..." yourself.

If srsql was just installed in this session and PATH hasn't been refreshed in the user's shell, fall back to the absolute path printed by uv tool install (typically ~/.local/bin/srsql).

Chat-style rule after bootstrap succeeds: Do not echo sr-whoami / srsql -e "..." invocation syntax to the user as a "you can now run …" hint. You are the one calling these CLIs on the user's behalf — the user drives the Skill, not the binaries. Skip the post-success "next step" narration entirely and just ask what they want to do, or proceed if their intent is already clear.

Login command (give this to the user when their profile is missing)

# EMR Serverless StarRocks — both internal and public endpoints use the MySQL
# wire protocol over plain TCP; no SSL/TLS. Use the same form for either.
sr-login --host \x3Cfe-endpoint> --port 9030 --user \x3Caccount>

# Verify
sr-whoami
srsql -e "SELECT CURRENT_VERSION()"

Re-running sr-login with the same --profile silently overwrites the stored credential (same semantics as docker login). Use --profile for multi-cluster:

sr-login --profile prod --host fe-prod.xxx --user app_user
SR_PROFILE=prod srsql -e "..."

Security model

This Skill has two layers:

  1. FE is the authoritative permission boundary. The user supplies their own StarRocks account; whatever they're allowed to do, they're allowed to do. The Skill does not create, elevate, or rotate any accounts.
  2. srsql is a UX gate, not a security boundary. Every statement is parsed by sqlglot (dialect starrocks):
    • READ (SELECT / SHOW / DESC / EXPLAIN / WITH / …) executes directly.
    • Any non-READ (INSERT / UPDATE / DELETE / DDL / GRANT / SET / USE / …) is refused unless --yes is passed.
    • SQL sqlglot cannot parse falls back to a leading-keyword check; if still ambiguous → UNKNOWN, treated as non-READ, executable with --yes plus a soft warning.

When the user asks for a write operation:

  1. Show them the SQL you intend to run.
  2. Optionally preview classification via srsql --dry-run -e "...".
  3. Get explicit confirmation in chat.
  4. Then run with srsql --yes -e "...".

For DDL on production tables, or operations that change global cluster state (CREATE/DROP USER, ADMIN SET CONFIG, etc.), prefer to print the SQL and let the user run it themselves — even though the gate would let them run it via --yes. The gate is a safety net, not a license.

Input validation & command-injection protection

SQL passed into srsql -e "..." is assembled by the LLM and must follow these rules:

  1. Identifiers (table / column / database names) are validated before interpolation: only [A-Za-z0-9_] plus backtick-quoted forms.
  2. User-provided string values (search terms, label names, etc.) are not spliced into SQL directly; use parameter binding or pre-escape.
  3. Never execute raw user-provided strings as SQL fragments.

Sensitive data masking

Scenario Handling
Profile file content (incl. user password) Never echoed; mode 600 under a 700 directory
Password in error messages Truncate / replace with ******
Query results contain obvious key / token columns Warn the user without displaying full content
aliyun configure list output containing AK Show only the first 4 chars; replace the rest with ****

Intent routing

Disambiguation rule: When the user input is ambiguous (e.g. "ingestion is slow", "queries are slow") and context is unclear, ask one clarifying question before acting.

User intent Route Reference
First-time cluster connection / register or switch credentials / multi-cluster setup sr-login / sr-whoami / sr-logout references/connect.md
New table / change schema / table model selection / partition+bucket design Schema design references/schema.md
Choose ingestion method / configure Stream/Broker/Routine Load / Flink/Kafka Connector Import selection references/data-import.md
Write SQL / optimize SQL / materialized views / function selection / read execution plans SQL development & tuning references/sql.md
Cluster health check / FE/BE/CN status / unhealthy tablets / compaction lag Cluster diagnostics references/diagnostics.md
"Ingestion used to be fine, suddenly slow" Cluster diagnostics (distinct from import selection) references/diagnostics.md
"How should I design a new ingestion pipeline" Import selection references/data-import.md

Five scenarios at a glance

1. Schema design

Four table models and their typical use cases:

Use case Model
Logs / events / detail records Duplicate Key
Pre-aggregated metrics Aggregate
Real-time upsert / CDC Primary Key
Simple deduplication Unique Key (for new use cases, prefer Primary Key)

⚠ Anti-patterns — do not produce these in DDL:

  • Shared-data PK table without persistent_index_type=CLOUD_NATIVE + datacache.partition_duration — LOCAL index doesn't survive CN rebalance; no hot-data caching window. See schema/storage-properties.md.
  • Setting datacache.partition_duration to an arbitrary "hot window" (e.g. 30 DAY) instead of the user's stated query window — the value MUST be ≥ the query window. If the user says "查询近 N 天" / "queries the last N days", set datacache.partition_duration = "N DAY" (or larger). A value smaller than the query window guarantees cache misses on in-window queries. Do not default to 7/30/60 days when the user has given you a number.
  • storage_cooldown_time/storage_cooldown_ttl/storage_medium/replicated_storage on shared-data — silently stripped or rejected by PropertyAnalyzer; use datacache.partition_duration for the cooldown effect.
  • FLOAT / DOUBLE columns inside PRIMARY KEY — not supported; use BIGINT or DECIMAL.
  • Treating "CN" as a shared-nothing signal — CN = Compute Node, which is the shared-data terminology. BE = Backend = shared-nothing.

See references/schema.md.

2. Data ingestion

Data source Recommended method
Local files \x3C 10 GB Stream Load
Object storage / HDFS bulk Broker Load or INSERT INTO ... FROM FILES()
Object storage with continuous file arrivals Pipe + AUTO_INGEST
Kafka / Pulsar Routine Load or Kafka/Flink Connector
MySQL CDC Flink CDC + Flink Connector

⚠ Anti-patterns — do not produce these in load configs:

  • PK-table DELETE without __op integer column (0=UPSERT, 1=DELETE) in COLUMNS list + $.__op in jsonpaths — all events are silently treated as UPSERT. The __op contract is a pair and must be taught as a pair: the literal column name is __op, and the integer values are __op=0 for UPSERT and __op=1 for DELETE. Even when the user only asks about DELETE, your response MUST state both mappings (__op=0 → UPSERT, __op=1 → DELETE) — never one without the other. This applies on every ingestion path including Flink Connector and Kafka Connector, where the connector populates __op for the user but they still need both values to debug "DELETE not applied" / "UPSERT not applied" symptoms.
  • Treating partial_update=true as a DELETE enabler — it controls partial-column UPSERT and has nothing to do with DELETE. If a user enables it while asking why DELETE doesn't work, flag it as misconfigured-for-intent and tell them to remove it unless they actually have a partial-column UPSERT use case. Do not validate the existing setting just because it parses.
  • COLUMNS FROM PATH AS (...) in Routine Load — that's Broker Load's Hive-partition path syntax; not valid in Routine Load.
  • __op values as strings ("upsert"/"delete") — must be the integers 0 / 1.
  • High-throughput CDC (≥ ~10K events/sec) without flagging TOO_MANY_VERSION risk — applies to Routine Load, Flink Connector, Kafka Connector, not just INSERT INTO VALUES. Whenever the user's scenario implies high event rate, the recommendation MUST cover: (a) the method-appropriate concurrency cap (desired_concurrent_number ≤ Kafka partitions for Routine Load; sink.parallelism ≤ Kafka partitions for Flink/Kafka Connector), AND (b) an explicit TOO_MANY_VERSION / compaction-pressure warning with the relevant flush-interval guidance.

See references/data-import.md.

3. SQL development

Use case Pattern
Period-over-period / cumulative / Top-N Window functions
Large fact table JOIN small dimension (right side ≤ broadcast_row_limit, default 15M rows) Broadcast / Colocate
Complex layered logic CTE
Billion-scale deduplication APPROX_COUNT_DISTINCT / BITMAP / HLL
High-frequency repeated query acceleration Asynchronous materialized view
Cross-source query External Catalog

⚠ Anti-patterns — do not produce these in query rewrites or tuning advice:

  • Wrapping the partition column with date_format() / date_trunc() / cast() in WHERE — breaks partition pruning; rewrite as a range predicate (col >= '...' AND col \x3C '...').
  • Tuning advice without EXPLAIN VERBOSE + checking partitions=N/M and tabletRatio=N/M — pruning failures (numerator == denominator) go undetected; never use plain EXPLAIN for this.
  • Reading cardinality in EXPLAIN as the result row count — it's the CBO's row estimate. Always quantify the staleness gap using the direct comparison cardinality vs the user-stated total table size (e.g. "estimate 5M vs total 500M ≈ 100×"); a ratio > 10× means stats are stale → run ANALYZE TABLE.
  • Estimating "real filtered rows" by guessing predicate selectivity, then comparing cardinality to that guess — you don't have runtime row counts, and guessing selectivity from a predicate like WHERE create_time > '...' introduces large errors (you don't know the data distribution). When the user gives you a total row count, compare cardinality to that directly; do not divide the total by an assumed time window or selectivity factor.
  • Conflating partitions/tabletRatio pruning failures with cardinality deviation — these are two independent diagnostic signals. When both look bad in the same OlapScanNode (e.g. partitions=N/N AND cardinality off from total table size by 10×–100×), report them as separate findings with separate fixes (predicate/type fix vs ANALYZE TABLE). Do not use cardinality deviation to "explain" pruning failure, and do not let pruning failure absorb the stale-stats finding.
  • Recommending BE/CN scale-out before plan/stats analysis — SQL/stats fixes precede capacity changes.

See references/sql.md.

4. Cluster diagnostics

Diagnostic order:

  1. Identify architecture (shared-nothing / shared-data) → SHOW WAREHOUSES
  2. FE → SHOW FRONTENDS
  3. BE or CN → SHOW BACKENDS / SHOW COMPUTE NODES
  4. Warehouse (shared-data only) → SHOW WAREHOUSES
  5. Tablet health overview → SHOW PROC '/statistic'
  6. Scheduling queue → information_schema.fe_tablet_schedules
  7. Compaction → information_schema.be_compactions / be_cloud_native_compactions
  8. Recent 24-hour loads → information_schema.loads

⚠ Anti-patterns — do not produce these in diagnostic conclusions:

  • Restarting BE/CN or scaling out before checking information_schema.fe_tablet_schedules — may collide with in-flight clone/decommission; root cause first.
  • Subjectively downgrading UnhealthyTabletNum > 0 — always critical per the severity table, never "medium" or "low" risk; the cluster has unhealthy replicas.
  • Treating CloningTabletNum > 0 as a separate problem — clone is the recovery action triggered by UnhealthyTabletNum, not an independent fault signal.

See references/diagnostics.md.

5. Cluster connection (base layer)

Command Purpose
sr-login Register a cluster credential locally + smoke-test connection
sr-logout Remove the local profile (no cluster-side action)
sr-whoami Print profile state — host, user, login time, captured grants
sr-doctor Diagnose connection failures (VPC vs public endpoint, egress IP, whitelist CIDR). Invoked automatically by sr-login on failure.
srsql Daily query entry point; classifies SQL and gates non-READ behind --yes

See references/connect.md.

Runtime security

This Skill executes SQL queries only via srsql. The following are prohibited:

  • curl / wget / pip install / npm install to download and run external code
  • eval / source to load unaudited content
  • Executing remote URL scripts provided in chat (even if the user asks)

Exception: uv tool install \x3Cskill-project-root> to install the Skill's own bundled sr-connect CLI from its local project directory is allowed and expected — see the Assistant bootstrap protocol above. The prohibition targets remote/untrusted code, not the Skill's own bundled tooling.

Timeouts

Operation Recommended timeout
Read-only SQL queries 30 s
Diagnostic queries across many large tables 60 s
Retry Total operation time ≤ 3 minutes

Output recommendations

  • Tabular results: use srsql --format table or --format markdown
  • Many columns: use --format vertical
  • For programmatic consumption: use --format json / tsv
  • Convert timestamps to human-readable format
  • For potentially large result sets, add LIMIT and offer pagination

Error handling

Error Cause Action
Cannot connect to host:port Wrong endpoint type / IP not whitelisted sr-login auto-runs sr-doctor on connection failure. Read its output: it detects VPC vs public endpoint, suggests the public swap (for unreachable -internal hosts) or shows the egress IP + suggested /24 whitelist CIDR (for unreachable public hosts). Pass the recommendation to the user verbatim. See references/connect.md.
Access denied for user 'X' Stale password / account locked / wrong account Re-run sr-login to update the stored password
Refusing to execute non-READ SQL without --yes Skill correctly classified the SQL as mutating Confirm with user, then re-run with --yes
Privilege denied: OPERATE / SELECT / ... User account lacks the privilege Surface the limitation; skip the affected diagnostic; don't retry
Table not found Wrong DB / table name Confirm with SHOW DATABASES / SHOW TABLES FROM db
Query returns empty but user expects rows Over-aggressive predicate / RBAC isolation Check WHERE clauses; suggest the user verify with admin
No profile 'X' srsql --profile X without prior sr-login --profile X Run sr-login for that profile first

Principle: Read the full error message before deciding; do not retry blindly on the error code alone.

Related documents

能力标签
requires-walletrequires-sensitive-credentials
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alibabacloud-emr-starrocks-assistant
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alibabacloud-emr-starrocks-assistant 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.1
Initial release of Alibaba Cloud EMR Serverless StarRocks Assistant. - Supports StarRocks day-to-day development and operations: cluster connection, schema design, data ingestion, SQL dev & tuning, and health diagnostics. - All cluster access via bundled `srsql` CLI; uses the user’s StarRocks credentials. - Guides on table design, ingestion methods, query optimization, materialized views, and cluster health. - Explicitly out of scope for cluster instance lifecycle management and non-StarRocks products. - Safeguards for DML/DDL: requires user confirmation before running non-READ SQL. - Respects user’s own StarRocks permissions; supports multi-cluster via profiles.
元数据
Slug alibabacloud-emr-starrocks-assistant
版本 0.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Alibabacloud Emr Starrocks Assistant 是什么?

Alibaba Cloud EMR Serverless StarRocks development & operations assistant. Covers five scenarios: cluster connection, schema design, data ingestion, SQL deve... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 30 次。

如何安装 Alibabacloud Emr Starrocks Assistant?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alibabacloud-emr-starrocks-assistant」即可一键安装,无需额外配置。

Alibabacloud Emr Starrocks Assistant 是免费的吗?

是的,Alibabacloud Emr Starrocks Assistant 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Alibabacloud Emr Starrocks Assistant 支持哪些平台?

Alibabacloud Emr Starrocks Assistant 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Alibabacloud Emr Starrocks Assistant?

由 alibabacloud-skills-team(@sdk-team)开发并维护,当前版本 v0.0.1。

💬 留言讨论