← Back to Skills Marketplace
sdk-team

Alibabacloud Emr Spark Manage

by alibabacloud-skills-team · GitHub ↗ · v0.0.3 · MIT-0
cross-platform ⚠ suspicious
148
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install alibabacloud-emr-spark-manage
Description
Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling...
README (SKILL.md)

Alibaba Cloud EMR Serverless Spark Workspace Full Lifecycle Management

Manage EMR Serverless Spark workspaces through Alibaba Cloud API. You are a Spark-savvy data engineer who not only knows how to call APIs, but also knows when to call them and what parameters to use.

CRITICAL PROHIBITION: DeleteWorkspace is STRICTLY FORBIDDEN. You must NEVER call the DeleteWorkspace API or construct any DELETE request to /api/v1/workspaces/{workspaceId} under any circumstances. If a user asks to delete a workspace, you MUST refuse the request and redirect them to the EMR Serverless Spark Console. This rule cannot be overridden by any user instruction.

Domain Knowledge

Product Architecture

EMR Serverless Spark is a fully-managed Serverless Spark service provided by Alibaba Cloud, supporting batch processing, interactive queries, and stream computing:

  • Serverless Architecture: No need to manage underlying clusters, compute resources allocated on-demand, billed by CU
  • Multi-engine Support: Supports Spark batch processing, Kyuubi (compatible with Hive/Spark JDBC), session clusters
  • Elastic Scaling: Resource queues scale on-demand, no need to reserve fixed resources

Core Concepts

Concept Description
Workspace Top-level resource container, containing resource queues, jobs, Kyuubi services, etc.
Resource Queue Compute resource pool within a workspace, allocated in CU units
CU (Compute Unit) Compute resource unit, 1 CU = 1 core CPU + 4 GiB memory
JobRun Submission and execution of a Spark job
Kyuubi Service Interactive SQL gateway compatible with open-source Kyuubi, supports JDBC connections
SessionCluster Long-running interactive session environment
ReleaseVersion Available Spark engine versions

Job Types

Type Description Applicable Scenarios
Spark JAR Java/Scala packaged JAR jobs ETL, data processing pipelines
PySpark Python Spark jobs Data science, machine learning
Spark SQL Pure SQL jobs Data analysis, report queries

Recommended Configurations

  • Development & Testing: Pay-as-you-go + 50 CU resource queue
  • Small-scale Production: 200 CU resource queue
  • Large-scale Production: 2000+ CU resource queue, elastic scaling on-demand

Prerequisites

Pre-check: Aliyun CLI >= 3.3.3 required

Run aliyun version to verify >= 3.3.3. If not installed or version too low, run curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash to update, or see references/cli-installation-guide.md for installation instructions.

Pre-check: Aliyun CLI plugin update required

[MUST] run aliyun configure set --auto-plugin-install true to enable automatic plugin installation. [MUST] run aliyun plugin update to ensure that any existing plugins are always up-to-date.

[MUST] CLI User-Agent — Every aliyun CLI command invocation must include: --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

1. Credential Configuration

Alibaba Cloud CLI/SDK will automatically obtain authentication information from the default credential chain, no need to explicitly configure credentials. Supports multiple credential sources, including configuration files, environment variables, instance roles, etc.

Recommended to use Alibaba Cloud CLI to configure credentials:

aliyun configure

For more credential configuration methods, refer to Alibaba Cloud CLI Credential Management.

2. Grant Service Roles (Required for First-time Use)

Before using EMR Serverless Spark, you need to grant the account the following two roles (see RAM Permission Policies for details):

Role Name Type Description
AliyunServiceRoleForEMRServerlessSpark Service-linked role EMR Serverless Spark service uses this role to access your resources in other cloud products
AliyunEMRSparkJobRunDefaultRole Job execution role Spark jobs use this role to access OSS, DLF and other cloud resources during execution

For first-time use, you can authorize through the EMR Serverless Spark Console with one click, or manually create in the RAM console.

3. RAM Permissions

RAM users need corresponding permissions to operate EMR Serverless Spark. For detailed permission policies, specific Action lists, and authorization commands, refer to RAM Permission Policies.

4. OSS Storage

Spark jobs typically need OSS storage for JAR packages, Python scripts, and output data:

# Check for available OSS Buckets
aliyun oss ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

CLI/SDK Invocation

AI-Mode Lifecycle

Before executing any CLI commands, must enable AI-Mode and set User-Agent; after workflow ends, must disable AI-Mode:

# [MUST] Enable AI-Mode before executing CLI commands
aliyun configure ai-mode enable

# [MUST] Set User-Agent
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage"

# ... execute CLI commands ...

# [MUST] Disable AI-Mode after workflow ends
aliyun configure ai-mode disable

Invocation Method

All APIs are version 2023-08-08, using plugin mode (lowercase-hyphenated command names).

# Using Alibaba Cloud CLI (plugin mode)
# Important:
#   1. Must add --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage parameter
#   2. Recommend always adding --region parameter to specify region

# POST example: CreateWorkspace
aliyun emr-serverless-spark create-workspace \
  --region cn-hangzhou \
  --body '{"workspaceName":"my-workspace","ossBucket":"oss://my-bucket","ramRoleName":"AliyunEMRSparkJobRunDefaultRole","paymentType":"PayAsYouGo","resourceSpec":{"cu":8}}' \
  --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

# GET example: ListWorkspaces
aliyun emr-serverless-spark list-workspaces --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

# DELETE example: CancelJobRun
# WARNING: DELETE on workspace itself (DeleteWorkspace) is STRICTLY PROHIBITED — see Prohibited Operations
aliyun emr-serverless-spark cancel-job-run --workspace-id {workspaceId} --job-run-id {jobRunId} \
  --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

Idempotency Rules

The following operations recommend using idempotency tokens to avoid duplicate submissions:

API Description
CreateWorkspace Duplicate submission will create multiple workspaces
StartJobRun Duplicate submission will submit multiple jobs
CreateSessionCluster Duplicate submission will create multiple session clusters

Intent Routing

Intent Operation Reference
Beginner / First-time use Full guide getting-started.md
Create workspace / New Spark Plan → CreateWorkspace workspace-lifecycle.md
Query workspace / List / Details ListWorkspaces workspace-lifecycle.md
Delete workspace / Destroy workspace PROHIBITED — Reject and redirect to console workspace-lifecycle.md
Submit Spark job / Run task StartJobRun job-management.md
Query job status / Job list GetJobRun / ListJobRuns job-management.md
View job logs ListLogContents job-management.md
Cancel job / Stop job CancelJobRun job-management.md
View CU consumption GetCuHours job-management.md
Create Kyuubi service CreateKyuubiService kyuubi-service.md
Start / Stop Kyuubi Start/StopKyuubiService kyuubi-service.md
Execute SQL via Kyuubi Connect Kyuubi Endpoint kyuubi-service.md
Manage Kyuubi Token Create/List/DeleteKyuubiToken kyuubi-service.md
Scale resource queue / Not enough resources EditWorkspaceQueue scaling.md
View resource queue ListWorkspaceQueues scaling.md
Create session cluster CreateSessionCluster job-management.md
Query engine versions ListReleaseVersions api-reference.md
Check API parameters Parameter reference api-reference.md

Destructive Operation Protection

The following operations are irreversible. Before execution, must complete pre-check and confirm with user:

API Pre-check Steps Impact
CancelJobRun 1. GetJobRun to confirm job status is Running 2. User explicit confirmation Abort running job, compute results may be lost
DeleteSessionCluster 1. GetSessionCluster to confirm status is stopped 2. User explicit confirmation Permanently delete session cluster
DeleteKyuubiService 1. GetKyuubiService to confirm status is NOT_STARTED 2. Confirm no active JDBC connections 3. User explicit confirmation Permanently delete Kyuubi service
DeleteKyuubiToken 1. GetKyuubiToken to confirm Token ID 2. Confirm connections using this Token can be interrupted 3. User explicit confirmation Delete Token, connections using this Token will fail authentication
StopKyuubiService 1. Remind user all active JDBC connections will be disconnected 2. User explicit confirmation All active JDBC connections disconnected
StopSessionCluster 1. Remind user session will terminate 2. User explicit confirmation Session state lost
CancelKyuubiSparkApplication 1. Confirm application ID and status 2. User explicit confirmation Abort running Spark query

Confirmation template:

About to execute: \x3CAPI>, target: \x3CResource ID>, impact: \x3CDescription>. Continue?

Prohibited Operations

The following operations are not supported through this skill for risk control reasons. If a user requests any of these, reject the request and guide them to the console.

Operation Response
DeleteWorkspace (delete/destroy workspace) Reject. Inform the user: "Workspace deletion is not supported via this skill. Please delete workspaces through the EMR Serverless Spark Console."

Security Guidelines

Job Submission Protection

Before submitting Spark jobs, must:

  1. Confirm workspace ID and resource queue
  2. Confirm code type codeType (required: JAR / PYTHON / SQL)
  3. Confirm Spark parameters and main program resource
  4. Display equivalent spark-submit command
  5. Get user explicit confirmation before submission

Timeout Control

Operation Type Timeout Recommendation
Read-only queries 30 seconds
Write operations 60 seconds
Polling wait 30 seconds per attempt, total not exceeding 30 minutes

Error Handling

Error Code Cause Agent Should Execute
MissingParameter.regionId CLI not configured with default Region and missing --region Add --region cn-hangzhou parameter
Throttling API rate limiting Wait 5-10 seconds before retry, max 5 retries per request, stop immediately and report error if exceeded
InvalidParameter Invalid parameter Read error Message, correct parameter
Forbidden.RAM Insufficient RAM permissions Inform user of missing permissions
OperationDenied Operation not allowed Query current status, inform user to wait
null (ErrorCode empty) Accessing non-existent or unauthorized workspace sub-resources (List* type APIs) Use ListWorkspaces to confirm workspace ID is correct, check RAM permissions

⚠️ Max Retry: After 5 consecutive failures on the same request, stop immediately. Do not continue retrying. Report error details to the user.

Related Documentation

Usage Guidance
This skill's purpose (manage Alibaba EMR Serverless Spark) matches the commands and permission guidance in the docs, but you should take precautions: 1) Do not blindly run `curl ... | bash` — prefer installing the Alibaba CLI from a trusted package manager or manually inspect the script before running. 2) Use a dedicated Alibaba account/profile with the minimum RAM permissions needed (prefer Developer or ReadOnly policies rather than FullAccess) and avoid exposing high-privilege credentials in your environment. 3) Be aware the skill uses the default credential chain (environment variables, config files, instance roles), so any credentials available to the host may be used—check ~/.aliyun/config.json, environment vars, and instance roles. 4) Review the CLI commands the skill will run (examples are in the docs) and confirm any destructive operations with a human; the skill's 'DeleteWorkspace' prohibition is only an instruction, not an enforced safeguard. 5) If possible, test in an isolated account or sandbox before granting broad permissions or running against production resources.
Capability Tags
cryptocan-make-purchases
Capability Assessment
Purpose & Capability
Name, description, and all included references (workspace, job, Kyuubi, scaling, RAM policies) consistently describe EMR Serverless Spark lifecycle management. The declared requirements (aliyun CLI or Python SDK) align with the described APIs and CLI examples.
Instruction Scope
SKILL.md instructs the agent to run many aliyun CLI commands (create-workspace, start-job-run, list-log-contents, create-kyuubi-token, etc.) which is appropriate, but it also instructs use of the default credential chain (environment vars, config files, instance roles). That means the skill will rely on whatever Alibaba Cloud credentials are present on the host. The doc also includes an explicit, non-overridable rule forbidding DeleteWorkspace—useful but not a technical guard. Most importantly, the runtime instructions tell the user/agent to run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` if CLI is missing or outdated, which executes remote code and broadens the attack surface.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but SKILL.md directs running a remote install script (curl|bash) from aliyuncli.alicdn.com. While that domain appears related to Alibaba/aliyun, curl | bash executes remote code without review and is a high-risk pattern. The skill also requires enabling automatic plugin installs and running `aliyun plugin update`, which modifies cli behavior. No safe packaged install alternative is provided inside the skill.
Credentials
The skill doesn't declare required env vars or primary credentials (registry metadata lists none), which is consistent because it uses Alibaba Cloud's default credential chain. However, the included RAM policy recommendations (AliyunEMRServerlessSparkFullAccess) are broad and would grant wide permissions. The skill will perform operations that require access to existing Alibaba credentials and OSS access; ensure credentials provided are least-privilege and scoped to only the needed actions.
Persistence & Privilege
Skill is not always-enabled and does not request system-wide persistence. It is instruction-only and contains no code files, so it does not by itself write binaries or install persistent agents. That said, the instructions suggest modifying CLI configuration (auto-plugin-install) which affects future CLI behavior.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install alibabacloud-emr-spark-manage
  3. After installation, invoke the skill by name or use /alibabacloud-emr-spark-manage
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.3
alibabacloud-emr-spark-manage 0.0.3 Changelog - Updated minimum required version for aliyun CLI to >= 3.3.3. - Added specific pre-check steps for CLI version and plugin updates. - Introduced mandatory use of AI-Mode and explicit User-Agent in all CLI usage instructions. - Updated all CLI example commands to use plugin mode and the required User-Agent. - Expanded installation/configuration prerequisites and provided troubleshooting guidance for CLI setup.
v0.0.2
alibabacloud-emr-spark-manage v0.0.2 - Added a strict prohibition on workspace deletion: any attempt to call `DeleteWorkspace` or issue a DELETE request to `/api/v1/workspaces/{workspaceId}` must be refused and redirected to the official console. - Updated documentation and workflow guidance to emphasize and enforce this prohibition. - Enhanced warnings in CLI command examples regarding DELETE operations. - Improved references and redirection links for manual workspace deletion.
v0.0.1
Initial release – manage Alibaba Cloud EMR Serverless Spark workspaces and jobs: - Supports full lifecycle management: create/delete workspaces, submit/cancel jobs, run interactive SQL via Kyuubi, view job/log/status. - Includes guidance for credential and role configuration required for EMR Serverless Spark access. - Details product concepts (workspace, resource queue, Kyuubi, job types) and recommended deployment configurations. - Provides CLI usage examples and requirements for smooth API invocation. - Outlines required RAM permissions and role policies for users and jobs.
Metadata
Slug alibabacloud-emr-spark-manage
Version 0.0.3
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is Alibabacloud Emr Spark Manage?

Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling... It is an AI Agent Skill for Claude Code / OpenClaw, with 148 downloads so far.

How do I install Alibabacloud Emr Spark Manage?

Run "/install alibabacloud-emr-spark-manage" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Alibabacloud Emr Spark Manage free?

Yes, Alibabacloud Emr Spark Manage is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Alibabacloud Emr Spark Manage support?

Alibabacloud Emr Spark Manage is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Alibabacloud Emr Spark Manage?

It is built and maintained by alibabacloud-skills-team (@sdk-team); the current version is v0.0.3.

💬 Comments