Alibabacloud Emr Starrocks Assistant
/install alibabacloud-emr-starrocks-assistant
Alibaba Cloud EMR Serverless StarRocks Development & Operations Assistant
Help users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled srsql CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires --yes confirmation before execution.
Scope statement: This Skill focuses on using StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, materialized view refresh, GRANT, etc.) are supported and execute under the user's own account, gated by sqlglot classification +
--yesconfirmation. Instance-lifecycle operations (create, scale, restart, configuration change, version upgrade) are control-plane operations and are not in this Skill's scope; please use the EMR Serverless console or the corresponding OpenAPI.
When to use / When not to use
When to use:
- Schema design (table model, partitioning, bucketing, sort key, indexes, storage parameters)
- Data ingestion selection (Stream/Broker/Routine Load, INSERT, Pipe, Flink/Kafka Connector, CDC)
- SQL writing, rewriting, and tuning (JOIN strategy, window functions, CTE, aggregation optimization, statistics)
- Materialized view design and operations
- Cluster health diagnostics (FE/BE/CN nodes, tablet health, compaction, warehouse, recent failed loads)
When NOT to use:
- Instance lifecycle control: create / scale / restart / config change / upgrade StarRocks instances — these are control-plane operations; use the EMR Serverless console or the corresponding OpenAPI instead
- Operating non-StarRocks products: EMR Cluster, Spark, Milvus, ClickHouse, Doris, RDS, ECS, etc.
First-time setup: install & log in
This Skill ships with the sr-connect Python CLI. See references/connect.md for details.
Assistant bootstrap protocol (instructions for Claude)
When this Skill is invoked and you anticipate running any cluster query, ensure srsql is available before asking the user for anything:
- Run
which srsql. If it returns a path, skip to step 4. - If missing, install it yourself:
uv tool install \x3Cskill-project-root>where\x3Cskill-project-root>is the directory containing thisSKILL.mdandpyproject.toml(the Skill's base directory shown at invocation time; commonly~/.claude/skills/alibabacloud-emr-starrocks-assistant/, which may be a symlink). Do not ask the user to run this — the bundled CLI is part of the Skill's capability surface, not user infrastructure. - If
uvitself is missing (which uvfails), surface that to the user —uvis a system tool and not auto-installed. - Check
~/.starrocks/{profile}.cnf(default profile name:default; respectSR_PROFILEenv var if set). If it exists, skip to step 5. If missing:- First try
sr-login --from-env. Safe to call unconditionally — it exits 2 with a clear "missing" message when the environment doesn't have the credentials it needs, and does nothing else. You do not need to inspect environment variables yourself. - If
sr-login --from-envexits non-zero, the user hasn't logged in yet. Give them thesr-login --host ... --user ...command and ask them to run it themselves. Do not run interactivesr-loginyourself — it would block on a password prompt you cannot answer.
- First try
- After both
srsqlis on PATH and the profile file exists, run queries viasrsql -e "..."yourself.
If srsql was just installed in this session and PATH hasn't been refreshed in the user's shell, fall back to the absolute path printed by uv tool install (typically ~/.local/bin/srsql).
Chat-style rule after bootstrap succeeds: Do not echo sr-whoami / srsql -e "..." invocation syntax to the user as a "you can now run …" hint. You are the one calling these CLIs on the user's behalf — the user drives the Skill, not the binaries. Skip the post-success "next step" narration entirely and just ask what they want to do, or proceed if their intent is already clear.
Login command (give this to the user when their profile is missing)
# EMR Serverless StarRocks — both internal and public endpoints use the MySQL
# wire protocol over plain TCP; no SSL/TLS. Use the same form for either.
sr-login --host \x3Cfe-endpoint> --port 9030 --user \x3Caccount>
# Verify
sr-whoami
srsql -e "SELECT CURRENT_VERSION()"
Re-running sr-login with the same --profile silently overwrites the stored credential (same semantics as docker login). Use --profile for multi-cluster:
sr-login --profile prod --host fe-prod.xxx --user app_user
SR_PROFILE=prod srsql -e "..."
Security model
This Skill has two layers:
- FE is the authoritative permission boundary. The user supplies their own StarRocks account; whatever they're allowed to do, they're allowed to do. The Skill does not create, elevate, or rotate any accounts.
srsqlis a UX gate, not a security boundary. Every statement is parsed by sqlglot (dialectstarrocks):READ(SELECT / SHOW / DESC / EXPLAIN / WITH / …) executes directly.- Any non-READ (INSERT / UPDATE / DELETE / DDL / GRANT / SET / USE / …) is refused unless
--yesis passed. - SQL sqlglot cannot parse falls back to a leading-keyword check; if still ambiguous →
UNKNOWN, treated as non-READ, executable with--yesplus a soft warning.
When the user asks for a write operation:
- Show them the SQL you intend to run.
- Optionally preview classification via
srsql --dry-run -e "...". - Get explicit confirmation in chat.
- Then run with
srsql --yes -e "...".
For DDL on production tables, or operations that change global cluster state (CREATE/DROP USER, ADMIN SET CONFIG, etc.), prefer to print the SQL and let the user run it themselves — even though the gate would let them run it via --yes. The gate is a safety net, not a license.
Input validation & command-injection protection
SQL passed into srsql -e "..." is assembled by the LLM and must follow these rules:
- Identifiers (table / column / database names) are validated before interpolation: only
[A-Za-z0-9_]plus backtick-quoted forms. - User-provided string values (search terms, label names, etc.) are not spliced into SQL directly; use parameter binding or pre-escape.
- Never execute raw user-provided strings as SQL fragments.
Sensitive data masking
| Scenario | Handling |
|---|---|
| Profile file content (incl. user password) | Never echoed; mode 600 under a 700 directory |
| Password in error messages | Truncate / replace with ****** |
| Query results contain obvious key / token columns | Warn the user without displaying full content |
aliyun configure list output containing AK |
Show only the first 4 chars; replace the rest with **** |
Intent routing
Disambiguation rule: When the user input is ambiguous (e.g. "ingestion is slow", "queries are slow") and context is unclear, ask one clarifying question before acting.
| User intent | Route | Reference |
|---|---|---|
| First-time cluster connection / register or switch credentials / multi-cluster setup | sr-login / sr-whoami / sr-logout | references/connect.md |
| New table / change schema / table model selection / partition+bucket design | Schema design | references/schema.md |
| Choose ingestion method / configure Stream/Broker/Routine Load / Flink/Kafka Connector | Import selection | references/data-import.md |
| Write SQL / optimize SQL / materialized views / function selection / read execution plans | SQL development & tuning | references/sql.md |
| Cluster health check / FE/BE/CN status / unhealthy tablets / compaction lag | Cluster diagnostics | references/diagnostics.md |
| "Ingestion used to be fine, suddenly slow" | Cluster diagnostics (distinct from import selection) | references/diagnostics.md |
| "How should I design a new ingestion pipeline" | Import selection | references/data-import.md |
Five scenarios at a glance
1. Schema design
Four table models and their typical use cases:
| Use case | Model |
|---|---|
| Logs / events / detail records | Duplicate Key |
| Pre-aggregated metrics | Aggregate |
| Real-time upsert / CDC | Primary Key |
| Simple deduplication | Unique Key (for new use cases, prefer Primary Key) |
⚠ Anti-patterns — do not produce these in DDL:
- Shared-data PK table without
persistent_index_type=CLOUD_NATIVE+datacache.partition_duration— LOCAL index doesn't survive CN rebalance; no hot-data caching window. See schema/storage-properties.md. - Setting
datacache.partition_durationto an arbitrary "hot window" (e.g.30 DAY) instead of the user's stated query window — the value MUST be ≥ the query window. If the user says "查询近 N 天" / "queries the last N days", setdatacache.partition_duration = "N DAY"(or larger). A value smaller than the query window guarantees cache misses on in-window queries. Do not default to 7/30/60 days when the user has given you a number. storage_cooldown_time/storage_cooldown_ttl/storage_medium/replicated_storageon shared-data — silently stripped or rejected byPropertyAnalyzer; usedatacache.partition_durationfor the cooldown effect.- FLOAT / DOUBLE columns inside
PRIMARY KEY— not supported; use BIGINT or DECIMAL. - Treating "CN" as a shared-nothing signal — CN = Compute Node, which is the shared-data terminology. BE = Backend = shared-nothing.
See references/schema.md.
2. Data ingestion
| Data source | Recommended method |
|---|---|
| Local files \x3C 10 GB | Stream Load |
| Object storage / HDFS bulk | Broker Load or INSERT INTO ... FROM FILES() |
| Object storage with continuous file arrivals | Pipe + AUTO_INGEST |
| Kafka / Pulsar | Routine Load or Kafka/Flink Connector |
| MySQL CDC | Flink CDC + Flink Connector |
⚠ Anti-patterns — do not produce these in load configs:
- PK-table DELETE without
__opinteger column (0=UPSERT,1=DELETE) in COLUMNS list +$.__opinjsonpaths— all events are silently treated as UPSERT. The__opcontract is a pair and must be taught as a pair: the literal column name is__op, and the integer values are__op=0for UPSERT and__op=1for DELETE. Even when the user only asks about DELETE, your response MUST state both mappings (__op=0→ UPSERT,__op=1→ DELETE) — never one without the other. This applies on every ingestion path including Flink Connector and Kafka Connector, where the connector populates__opfor the user but they still need both values to debug "DELETE not applied" / "UPSERT not applied" symptoms. - Treating
partial_update=trueas a DELETE enabler — it controls partial-column UPSERT and has nothing to do with DELETE. If a user enables it while asking why DELETE doesn't work, flag it as misconfigured-for-intent and tell them to remove it unless they actually have a partial-column UPSERT use case. Do not validate the existing setting just because it parses. COLUMNS FROM PATH AS (...)in Routine Load — that's Broker Load's Hive-partition path syntax; not valid in Routine Load.__opvalues as strings ("upsert"/"delete") — must be the integers0/1.- High-throughput CDC (≥ ~10K events/sec) without flagging TOO_MANY_VERSION risk — applies to Routine Load, Flink Connector, Kafka Connector, not just
INSERT INTO VALUES. Whenever the user's scenario implies high event rate, the recommendation MUST cover: (a) the method-appropriate concurrency cap (desired_concurrent_number≤ Kafka partitions for Routine Load;sink.parallelism≤ Kafka partitions for Flink/Kafka Connector), AND (b) an explicit TOO_MANY_VERSION / compaction-pressure warning with the relevant flush-interval guidance.
See references/data-import.md.
3. SQL development
| Use case | Pattern |
|---|---|
| Period-over-period / cumulative / Top-N | Window functions |
Large fact table JOIN small dimension (right side ≤ broadcast_row_limit, default 15M rows) |
Broadcast / Colocate |
| Complex layered logic | CTE |
| Billion-scale deduplication | APPROX_COUNT_DISTINCT / BITMAP / HLL |
| High-frequency repeated query acceleration | Asynchronous materialized view |
| Cross-source query | External Catalog |
⚠ Anti-patterns — do not produce these in query rewrites or tuning advice:
- Wrapping the partition column with
date_format()/date_trunc()/cast()in WHERE — breaks partition pruning; rewrite as a range predicate (col >= '...' AND col \x3C '...'). - Tuning advice without
EXPLAIN VERBOSE+ checkingpartitions=N/MandtabletRatio=N/M— pruning failures (numerator == denominator) go undetected; never use plainEXPLAINfor this. - Reading
cardinalityin EXPLAIN as the result row count — it's the CBO's row estimate. Always quantify the staleness gap using the direct comparisoncardinalityvs the user-stated total table size (e.g. "estimate 5M vs total 500M ≈ 100×"); a ratio > 10× means stats are stale → runANALYZE TABLE. - Estimating "real filtered rows" by guessing predicate selectivity, then comparing cardinality to that guess — you don't have runtime row counts, and guessing selectivity from a predicate like
WHERE create_time > '...'introduces large errors (you don't know the data distribution). When the user gives you a total row count, comparecardinalityto that directly; do not divide the total by an assumed time window or selectivity factor. - Conflating
partitions/tabletRatiopruning failures withcardinalitydeviation — these are two independent diagnostic signals. When both look bad in the same OlapScanNode (e.g.partitions=N/NANDcardinalityoff from total table size by 10×–100×), report them as separate findings with separate fixes (predicate/type fix vsANALYZE TABLE). Do not use cardinality deviation to "explain" pruning failure, and do not let pruning failure absorb the stale-stats finding. - Recommending BE/CN scale-out before plan/stats analysis — SQL/stats fixes precede capacity changes.
See references/sql.md.
4. Cluster diagnostics
Diagnostic order:
- Identify architecture (shared-nothing / shared-data) →
SHOW WAREHOUSES - FE →
SHOW FRONTENDS - BE or CN →
SHOW BACKENDS/SHOW COMPUTE NODES - Warehouse (shared-data only) →
SHOW WAREHOUSES - Tablet health overview →
SHOW PROC '/statistic' - Scheduling queue →
information_schema.fe_tablet_schedules - Compaction →
information_schema.be_compactions/be_cloud_native_compactions - Recent 24-hour loads →
information_schema.loads
⚠ Anti-patterns — do not produce these in diagnostic conclusions:
- Restarting BE/CN or scaling out before checking
information_schema.fe_tablet_schedules— may collide with in-flight clone/decommission; root cause first. - Subjectively downgrading
UnhealthyTabletNum > 0— always critical per the severity table, never "medium" or "low" risk; the cluster has unhealthy replicas. - Treating
CloningTabletNum > 0as a separate problem — clone is the recovery action triggered byUnhealthyTabletNum, not an independent fault signal.
See references/diagnostics.md.
5. Cluster connection (base layer)
| Command | Purpose |
|---|---|
sr-login |
Register a cluster credential locally + smoke-test connection |
sr-logout |
Remove the local profile (no cluster-side action) |
sr-whoami |
Print profile state — host, user, login time, captured grants |
sr-doctor |
Diagnose connection failures (VPC vs public endpoint, egress IP, whitelist CIDR). Invoked automatically by sr-login on failure. |
srsql |
Daily query entry point; classifies SQL and gates non-READ behind --yes |
Runtime security
This Skill executes SQL queries only via srsql. The following are prohibited:
curl/wget/pip install/npm installto download and run external codeeval/sourceto load unaudited content- Executing remote URL scripts provided in chat (even if the user asks)
Exception: uv tool install \x3Cskill-project-root> to install the Skill's own bundled sr-connect CLI from its local project directory is allowed and expected — see the Assistant bootstrap protocol above. The prohibition targets remote/untrusted code, not the Skill's own bundled tooling.
Timeouts
| Operation | Recommended timeout |
|---|---|
| Read-only SQL queries | 30 s |
| Diagnostic queries across many large tables | 60 s |
| Retry | Total operation time ≤ 3 minutes |
Output recommendations
- Tabular results: use
srsql --format tableor--format markdown - Many columns: use
--format vertical - For programmatic consumption: use
--format json/tsv - Convert timestamps to human-readable format
- For potentially large result sets, add
LIMITand offer pagination
Error handling
| Error | Cause | Action |
|---|---|---|
Cannot connect to host:port |
Wrong endpoint type / IP not whitelisted | sr-login auto-runs sr-doctor on connection failure. Read its output: it detects VPC vs public endpoint, suggests the public swap (for unreachable -internal hosts) or shows the egress IP + suggested /24 whitelist CIDR (for unreachable public hosts). Pass the recommendation to the user verbatim. See references/connect.md. |
Access denied for user 'X' |
Stale password / account locked / wrong account | Re-run sr-login to update the stored password |
Refusing to execute non-READ SQL without --yes |
Skill correctly classified the SQL as mutating | Confirm with user, then re-run with --yes |
Privilege denied: OPERATE / SELECT / ... |
User account lacks the privilege | Surface the limitation; skip the affected diagnostic; don't retry |
Table not found |
Wrong DB / table name | Confirm with SHOW DATABASES / SHOW TABLES FROM db |
| Query returns empty but user expects rows | Over-aggressive predicate / RBAC isolation | Check WHERE clauses; suggest the user verify with admin |
No profile 'X' |
srsql --profile X without prior sr-login --profile X |
Run sr-login for that profile first |
Principle: Read the full error message before deciding; do not retry blindly on the error code alone.
Related documents
- references/connect.md — sr-connect CLI, install, security model, troubleshooting
- references/ram-policies.md — RAM permission declaration (none required; StarRocks-internal auth only)
- references/schema.md — schema design flow: table models, partitioning, bucketing, sort key, indexes, storage parameters
- references/data-import.md — ingestion method selection, parameters, performance tuning, Primary Key updates
- references/sql.md — query writing, window functions, materialized views, functions, SQL tuning, advanced features
- references/diagnostics.md — cluster health inspection flow, severity classification, synthesis template
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install alibabacloud-emr-starrocks-assistant - 安装完成后,直接呼叫该 Skill 的名称或使用
/alibabacloud-emr-starrocks-assistant触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Alibabacloud Emr Starrocks Assistant 是什么?
Alibaba Cloud EMR Serverless StarRocks development & operations assistant. Covers five scenarios: cluster connection, schema design, data ingestion, SQL deve... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 30 次。
如何安装 Alibabacloud Emr Starrocks Assistant?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install alibabacloud-emr-starrocks-assistant」即可一键安装,无需额外配置。
Alibabacloud Emr Starrocks Assistant 是免费的吗?
是的,Alibabacloud Emr Starrocks Assistant 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Alibabacloud Emr Starrocks Assistant 支持哪些平台?
Alibabacloud Emr Starrocks Assistant 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Alibabacloud Emr Starrocks Assistant?
由 alibabacloud-skills-team(@sdk-team)开发并维护,当前版本 v0.0.1。