功能描述

Query and control Databricks jobs via text by checking status, listing recent runs, finding failures, and triggering pipelines using the REST API.

使用说明 (SKILL.md)

databricks-helper

Name: databricks-helper
Author: nerikko

Query, inspect, and control your Databricks workspace from plain text. Check job status, rerun/cancel runs, inspect logs, explore Unity Catalog, and run read-only SQL without opening the UI.

Triggers

Use this skill when the user says things like:

"check my databricks jobs"
"what failed in databricks today"
"what failed this morning"
"show me recent databricks runs"
"run pipeline [name]"
"trigger job [name]"
"databricks job status"
"what's running in databricks"
"any failures in databricks"
"retry databricks run 123"
"cancel my stuck databricks job"
"show detailed logs for run 123"
"list only running jobs"
"list jobs tagged env=prod"
"did any runs breach SLA"
"databricks success summary"
"top failing jobs"
"list catalogs/schemas/tables"
"preview table main.bronze.events"
"run SQL select ..." (read-only)

Requirements

Databricks REST API access (no CLI required)
Environment variables set:
- DATABRICKS_HOST — workspace URL, e.g. https://adb-1234567890.12.azuredatabricks.net
- DATABRICKS_TOKEN — personal access token
- DATABRICKS_SQL_WAREHOUSE_ID — required for catalog preview + SQL
Optional safety tuning:
- DATABRICKS_SLA_MINUTES for SLA alerts (default 60)
- DATABRICKS_MAX_ROWS (default 200) row cap for SQL output
- DATABRICKS_SQL_TIMEOUT_SEC (default 60) SQL wait timeout
- DATABRICKS_ALLOW_WRITE_SQL — only set true if DDL/DML should be allowed

Installation

npx clawhub@latest install databricks-helper

Usage Examples

Check recent jobs

"check my databricks jobs"

Lists the last 10 job runs with status, duration, and run URLs.

Find failures

"what failed in databricks today"

Filters runs from the last 24 hours and prints failed ones with error snippets.

Trigger or retry pipelines

"run pipeline customer_ingestion" "retry databricks run 123"

Starts a new run or reruns failed tasks via the Jobs Repair API.

Cancel a run

"cancel databricks run 123"

Calls jobs/runs/cancel with safety checks and prints confirmation.

Live monitoring + analytics

"what's running now" "databricks sla watch" "databricks success summary"

Shows active runs with elapsed time, highlights SLA breaches, and prints 24h/7d success/failure counts plus top failing jobs (with adjustable time ranges).

Catalog + SQL exploration

"list catalogs" "list tables in main bronze" "preview table main.bronze.events" "run sql select * from main.bronze.events"

Uses the Unity Catalog API for discovery and runs read-only SQL through the configured warehouse with enforced row limits.

Implementation

python scripts/databricks_helper.py list-runs
python scripts/databricks_helper.py failures --hours 24
python scripts/databricks_helper.py run-job "job name"
python scripts/databricks_helper.py retry-run 123
python scripts/databricks_helper.py cancel-run 123
python scripts/databricks_helper.py run-details 123
python scripts/databricks_helper.py running-jobs --pattern nightly
python scripts/databricks_helper.py jobs --tag env=prod
python scripts/databricks_helper.py sla-watch --minutes 90
python scripts/databricks_helper.py summary
python scripts/databricks_helper.py top-failures --hours 48
python scripts/databricks_helper.py list-catalogs
python scripts/databricks_helper.py list-schemas --catalog main
python scripts/databricks_helper.py list-tables --catalog main --schema bronze
python scripts/databricks_helper.py preview-table main.bronze.events --limit 20
python scripts/databricks_helper.py run-sql --query "SELECT * FROM main.bronze.events" --limit 50

Output

Plain text. Each run: job name, status (SUCCESS/FAILED/RUNNING), start/end, duration, SLA status, error (if failed). Catalog + SQL commands return textual lists or tabular results.

Notes

Uses Databricks Jobs API v2.1, Unity Catalog API, and SQL Statement Execution API (read-only disposition by default).
Requires CAN_VIEW for read operations, CAN_MANAGE_RUN to trigger/cancel/repair runs, and SQL warehouse access.
SQL commands enforce read-only queries unless DATABRICKS_ALLOW_WRITE_SQL=true. Limits/timeouts are applied to avoid runaway scans.
SLA alerts default to 60 minutes but may be overridden via DATABRICKS_SLA_MINUTES or per-command flags.

CHANGELOG

1.1.0 — Adds run retry/cancel/details, running-job lists, job filtering by tags, SLA watch, success/failure summaries, top failing jobs, Unity Catalog discovery, table previews, and safe read-only SQL execution with row limits plus new docs/tests.

安全使用建议

This skill appears to implement the Databricks functionality it claims, but the registry metadata fails to declare the sensitive environment variables it needs. Before installing: 1) Ask the publisher to update the registry entry to declare DATABRICKS_HOST, DATABRICKS_TOKEN (as the primary credential), and DATABRICKS_SQL_WAREHOUSE_ID so you can see what secrets will be used. 2) Only provide a Databricks personal access token with least privilege: give separate tokens for read-only vs run/manage operations and avoid broad admin scopes. 3) Keep DATABRICKS_ALLOW_WRITE_SQL unset (default false) unless you intentionally need DDL/DML and trust the code. 4) Review or run the bundled tests locally and, if possible, audit the full databricks_helper.py (ensure there are no hidden network calls to unexpected domains). 5) Consider running the skill in a sandboxed agent or with a short-lived token first, and monitor Databricks audit logs for unexpected activity. These steps address the main transparency gap and reduce risk before trusting the skill with production credentials.

功能分析

Type: OpenClaw Skill Name: databricks-helper Version: 1.1.0 The OpenClaw skill 'databricks-helper' is designed to interact with Databricks APIs for monitoring and control. The Python script (`scripts/databricks_helper.py`) correctly uses environment variables for authentication and directs all network traffic to the user-configured Databricks host. Crucially, it includes a 'best-effort' `is_sql_safe` function to prevent DDL/DML operations by default and enforces row limits on SQL queries, demonstrating a clear intent for secure, read-only functionality. Write operations are only permitted if `DATABRICKS_ALLOW_WRITE_SQL` is explicitly set to true. The `SKILL.md` and `README.md` files contain no prompt injection attempts or instructions for the AI agent to perform malicious actions. All functionalities align with the stated purpose without evidence of data exfiltration to unauthorized endpoints, persistence mechanisms, or obfuscation.

能力评估

ℹ Purpose & Capability

The skill's name, description, SKILL.md, README, and bundled Python code all align: they call Databricks REST APIs to list/run/retry/cancel jobs and to run SQL and Unity Catalog queries. The requested permissions (CAN_VIEW, CAN_MANAGE_RUN, SQL warehouse access) are appropriate for the claimed features.

✓ Instruction Scope

Runtime instructions and the code show only Databricks API calls (host/api/...), SQL execution against a configured warehouse, and local output formatting. There are no instructions to read unrelated files, contact external endpoints outside the configured Databricks host, or exfiltrate data to third-party hosts in the provided materials.

✓ Install Mechanism

There is no remote download/install step; the package includes Python scripts that use only the stdlib (urllib). No extract-from-arbitrary-URL or third-party package installs are present, which is low-risk for install-time code execution.

⚠ Credentials

The SKILL.md, README, and code clearly require DATABRICKS_HOST, DATABRICKS_TOKEN, and (for SQL) DATABRICKS_SQL_WAREHOUSE_ID plus optional safety vars. However, the registry metadata lists no required environment variables and no primary credential. That mismatch is a transparency/integrity problem: the skill will need a secret token to function but the registry does not declare it or mark the token as the primary credential.

✓ Persistence & Privilege

The skill is not forced-always (always:false), does not request system config paths, and contains no code that modifies other skills or global agent settings. Default autonomous invocation is allowed (normal).

版本历史

v1.1.0

Add run retry/cancel, logs, running filters, SLA analytics, summaries, catalog exploration, safe read-only SQL tools.

v1.0.0

Initial release of databricks-helper: - Query Databricks job status and recent runs from plain text. - Find failed job runs within specified time windows. - Trigger pipelines or specific jobs by name. - Returns concise job details: name, status, start time, duration, and error (if applicable). - Uses Databricks REST API; requires environment variable setup for host and token.

元数据

Slug databricks-helper

版本 1.1.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 2

常见问题