← Back to Skills Marketplace
wenbingyu

Hologres Schema Generator

by francis · GitHub ↗ · v0.2.0 · MIT-0
cross-platform ✓ Security Clean
50
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install hologres-schema-generator
Description
Hologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuri...
README (SKILL.md)

Prerequisites

This skill requires hologres-cli to be installed first:

pip install hologres-cli
export HOLOGRES_SKILL=hologres-schema-generator

All SQL execution depends on hologres-cli commands (hologres sql run --write, hologres table create).

Hologres Schema Design & Table Creation

Generate optimized Hologres DDL statements with proper storage format, indexing, and distribution strategies.

Information Gathering (IMPORTANT)

Before generating DDL, check whether the user has provided enough context. Many Hologres table properties are immutable after creation (orientation, distribution_key, clustering_key, event_time_column, primary key, partition column) — getting them wrong means recreating the table.

Required Information

If ANY of the following are unclear, ask the user before generating DDL:

Dimension Why Critical Example Question
Query pattern Determines orientation (column/row/row-column) "This table is mainly used for OLAP analytics, KV point lookups, or both?"
Column definitions Core schema structure "Can you list the main columns and their rough types (text, number, timestamp, etc.)?"
Primary key / unique key Determines UPSERT capability; immutable "Does this table need deduplication or updates by a unique key? If so, which column(s)?"
Main query conditions Determines distribution_key, clustering_key, bitmap "What columns are most often used in WHERE, JOIN, or GROUP BY?"

Conditionally Required

Ask these only when relevant signals appear:

Signal Follow-up Question
Table looks like time-series or log data "What is the estimated daily data volume? (determines whether to use partitioning)"
User mentions JOIN with other tables "Which column will this table JOIN on, and what is the distribution_key of the other table?"
Large data volume or mentions retention "How long should data be retained? (recommend dynamic partition management for lifecycle)"
User mentions both analytics and serving "What is the read pattern — batch scans, point lookups by ID, or both?"

Can Use Defaults (No Need to Ask)

Property Default Behavior
bitmap_columns Auto-enabled for TEXT columns; can ALTER later
dictionary_encoding_columns Use :auto mode; can ALTER later
storage_mode Default hot; can ALTER later
time_to_live_in_seconds Not recommended. Deletion time is non-deterministic. Use dynamic partition management instead

Gathering Flow

User request → Check available info
  |
  +-- Columns + query pattern + key info all clear?
  |     → Generate DDL directly
  |
  +-- Missing critical info?
  |     → Ask 1-3 focused questions (batch them, don't ask one at a time)
  |
  +-- Very vague request (e.g., "help me create a user table")?
        → Ask: 1) main columns  2) query pattern  3) key/dedup needs

Principle: Batch questions into a single round. Never ask more than 3-4 questions at once. If the user provides partial info, fill in reasonable defaults for the rest and explain your assumptions in comments.

Storage Formats

Choose the storage format based on your primary query pattern.

Format Syntax Best For Primary Key
Column store orientation = 'column' OLAP, aggregation, scan-heavy queries Optional
Row store orientation = 'row' Point lookups, high-QPS KV queries Required
Row-column store orientation = 'row,column' Mixed workloads (OLAP + point lookup) Required

Default recommendation: Use row,column (row-column store) when the workload is unclear. It handles both OLAP and point queries well.

Quick Start

-- Column store table (OLAP / analytics)
CREATE TABLE orders (
  order_id BIGINT NOT NULL,
  user_id BIGINT,
  amount DECIMAL(18,2),
  status TEXT,
  order_time TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (order_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'order_id',
  clustering_key = 'order_time:asc',
  event_time_column = 'order_time',
  bitmap_columns = 'status,user_id'
);

-- Row store table (high-QPS point lookup)
CREATE TABLE user_profile (
  user_id BIGINT NOT NULL,
  name TEXT,
  email TEXT,
  tags JSONB,
  updated_at TIMESTAMPTZ,
  PRIMARY KEY (user_id)
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id',
  clustering_key = 'user_id'
);

CREATE TABLE Syntax (V2.1+)

CREATE TABLE [IF NOT EXISTS] [schema.]table_name (
  column_name data_type [NOT NULL] [DEFAULT expr],
  ...
  [PRIMARY KEY (col1 [, col2, ...])]
)
[PARTITION BY LIST (partition_column)]
WITH (
  orientation = '{column | row | row,column}',
  distribution_key = 'col1[,col2]',
  clustering_key = 'col1[:asc|:desc][,col2[:asc|:desc]]',
  event_time_column = 'col',
  bitmap_columns = 'col1[,col2,...]',
  dictionary_encoding_columns = 'col1[:auto|:on|:off][,...]',
  time_to_live_in_seconds = 'N',          -- NOT recommended, see below
  storage_mode = '{hot | cold}',
  table_group = 'group_name'
);

Legacy Syntax (All Versions)

BEGIN;
CREATE TABLE table_name (...);
CALL set_table_property('table_name', 'orientation', 'column');
CALL set_table_property('table_name', 'distribution_key', 'col1');
CALL set_table_property('table_name', 'clustering_key', 'col1:asc');
CALL set_table_property('table_name', 'event_time_column', 'col1');
CALL set_table_property('table_name', 'bitmap_columns', 'col1,col2');
COMMIT;

Table Properties Reference

distribution_key (Distribution Key)

Controls how data is hash-distributed across shards. Critical for JOIN and GROUP BY performance.

Rule Description
Choose high-cardinality columns Avoid data skew
Use JOIN/GROUP BY columns Enable local computation, avoid shuffle
Max 2 columns More columns reduce distribution effectiveness
Must be subset of PK If table has a primary key
Immutable after creation Cannot ALTER, must recreate table
-- Single column
distribution_key = 'user_id'

-- Two columns (join on both)
distribution_key = 'order_id,user_id'

clustering_key (Clustering Key / Sorted Index)

Physically sorts data within files. Accelerates range queries and filters.

Rule Description
Use range-query columns e.g., timestamp, date columns
Max 2-3 columns More columns dilute sort benefit
Put high-selectivity column first Most-filtered column goes first
Specify sort order :asc (default) or :desc
Column store only Row store uses PK as sort key
clustering_key = 'order_time:asc'
clustering_key = 'ds:asc,order_time:asc'

event_time_column (Segment Key)

Organizes data files by time ranges. Enables file-level pruning for time-range queries.

Rule Description
Use time/date columns Timestamp of data ingestion or event time
At most 1 column Only one segment key per table
Column store only Not applicable to row store
Combine with partition Partition for coarse pruning, segment key for fine pruning
event_time_column = 'event_time'

bitmap_columns (Bitmap Index)

Builds bitmap indexes for fast equality filtering on low-to-medium cardinality columns.

Rule Description
Use filter columns Columns frequently in WHERE clause
Low-medium cardinality status, type, region — NOT user_id
TEXT columns auto-enabled Default bitmap for text columns
Can be added after creation ALTER TABLE ... SET (bitmap_columns = ...)
bitmap_columns = 'status,payment_type,region'

dictionary_encoding_columns (Dictionary Encoding)

Compresses text columns by mapping values to integers. Speeds up GROUP BY and aggregations.

Rule Description
Use :auto mode Let Hologres decide based on cardinality
Good for low-cardinality text country, status, category
Avoid high-cardinality columns user_id, order_id — no compression benefit
Don't set if unsure Incorrect setting may hurt performance
dictionary_encoding_columns = 'country:auto,status:auto'

time_to_live_in_seconds (TTL) — NOT RECOMMENDED

Do NOT use time_to_live_in_seconds. The actual deletion time is non-deterministic — data will be deleted at an arbitrary time after the specified TTL, not at a precise point. This makes it unreliable for data lifecycle management.

Recommended alternative: Use dynamic partition management — create daily/hourly partitions and drop old partitions on a schedule (via cron or scheduling system). This gives you precise, predictable data lifecycle control.

-- BAD: TTL-based lifecycle (deletion time unpredictable)
-- time_to_live_in_seconds = '2592000'

-- GOOD: Partition-based lifecycle (precise control)
-- 1. Create table with daily partitions
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
) PARTITION BY LIST (ds)
WITH (orientation = 'column', distribution_key = 'event_id');

-- 2. Drop old partitions on schedule (e.g., retain 30 days)
DROP TABLE IF EXISTS events_20251101;  -- drop partition older than 30 days

Primary Key Design

Storage Format PK Requirement Notes
Column store Optional Add PK only if needed for UPSERT
Row store Required PK drives the row-store index
Row-column store Required PK serves both point lookup and analytics

Rules:

  • Max 32 columns in composite PK
  • PK columns must be NOT NULL and UNIQUE
  • Prohibited types: FLOAT, DOUBLE, NUMERIC, ARRAY, JSON, JSONB
  • PK cannot be altered after creation — must recreate table
  • Avoid SERIAL as PK — causes table-level locks on write

Partition Table Design

Use partitions for large tables with time-based or categorical data.

-- Parent table
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  user_id BIGINT,
  event_type TEXT,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'event_id',
  clustering_key = 'ds:asc',
  event_time_column = 'ds'
);

-- Child partitions
CREATE TABLE events_20251201 PARTITION OF events FOR VALUES IN ('20251201');
CREATE TABLE events_20251202 PARTITION OF events FOR VALUES IN ('20251202');

Rules:

  • Only LIST partitioning is supported
  • Partition column must be part of PK (if PK exists)
  • Supported partition column types: TEXT, VARCHAR, INT, DATE (V1.3.22+)
  • Skip daily partitions if daily data \x3C 100M rows — use event_time_column instead
  • Always filter on partition column in queries for pruning

Scenario-Based Templates

1. High-QPS Point Lookup (KV)

CREATE TABLE user_kv (
  user_id BIGINT NOT NULL PRIMARY KEY,
  profile JSONB,
  updated_at TIMESTAMPTZ
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id'
);

2. Prefix Range Scan

CREATE TABLE order_lines (
  order_id BIGINT NOT NULL,
  line_no INT NOT NULL,
  product_id BIGINT,
  quantity INT,
  PRIMARY KEY (order_id, line_no)
)
WITH (
  orientation = 'row',
  distribution_key = 'order_id',
  clustering_key = 'order_id'
);

3. Time-Range Analytics (Partitioned)

CREATE TABLE page_views (
  view_id BIGINT NOT NULL,
  user_id BIGINT,
  page_url TEXT,
  view_time TIMESTAMPTZ,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, view_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'view_id',
  clustering_key = 'view_time:asc',
  event_time_column = 'view_time',
  bitmap_columns = 'user_id,page_url'
);

4. Dimension Filtering (Non-Time)

CREATE TABLE product_stats (
  product_id BIGINT NOT NULL PRIMARY KEY,
  category TEXT,
  brand TEXT,
  sales_count BIGINT,
  revenue DECIMAL(18,2)
)
WITH (
  orientation = 'column',
  distribution_key = 'product_id',
  clustering_key = 'category:asc',
  bitmap_columns = 'category,brand'
);

5. JOIN-Optimized Fact + Dimension

-- Fact table: distribute by join key
CREATE TABLE fact_orders (
  order_id BIGINT NOT NULL PRIMARY KEY,
  customer_id BIGINT,
  amount DECIMAL(18,2),
  order_date TEXT
)
WITH (
  orientation = 'column',
  distribution_key = 'customer_id'
);

-- Dimension table: same distribution key
CREATE TABLE dim_customers (
  customer_id BIGINT NOT NULL PRIMARY KEY,
  name TEXT,
  region TEXT
)
WITH (
  orientation = 'row,column',
  distribution_key = 'customer_id'
);
-- JOIN on customer_id → local join, no shuffle

6. Mixed Workload (OLAP + Serving)

CREATE TABLE realtime_metrics (
  metric_id BIGINT NOT NULL,
  device_id BIGINT NOT NULL,
  value DOUBLE PRECISION,
  ts TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, device_id, metric_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'row,column',
  distribution_key = 'device_id',
  clustering_key = 'ts:asc',
  event_time_column = 'ts',
  bitmap_columns = 'device_id'
);

References

Document Content
data-types.md Complete data type reference
table-properties.md Detailed table property guide with selection flowcharts
partition-guide.md Partition table design and management

Best Practices

  1. Choose storage format first — column for OLAP, row for KV, row-column when unsure
  2. Set distribution_key to JOIN/GROUP BY columns — avoids cross-shard shuffle
  3. Set clustering_key for range-query columns — improves time-range and filter queries
  4. Set event_time_column for time-series data — enables file-level pruning
  5. Use bitmap_columns for low-cardinality filter columns — status, type, region
  6. Partition by date only if daily data > 100M rows — otherwise use segment key alone
  7. PK must include partition column — required by Hologres for partition tables
  8. Align distribution_key across JOINed tables — same column enables local join
  9. Use WITH syntax (V2.1+) — cleaner than CALL set_table_property in transactions
  10. Avoid SERIAL as primary key — causes table-level write locks
Usage Guidance
Before installing, confirm you trust the hologres-cli package and understand which Hologres account it will use. Before running any generated SQL, especially WRITE, ALTER, DROP, or scheduled partition cleanup commands, review the target database, schema, and table names and keep backups for important data.
Capability Analysis
Type: OpenClaw Skill Name: hologres-schema-generator Version: 0.2.0 The skill bundle is a legitimate tool designed to assist users in generating optimized Hologres DDL schemas. The instructions in SKILL.md and the supporting documentation in the references/ directory provide detailed, technically accurate guidance on Hologres-specific features like distribution keys, clustering keys, and partitioning strategies. There are no signs of malicious intent, data exfiltration, or harmful prompt injection; the logic is entirely focused on schema design and best practices (e.g., recommending partition-based lifecycles over non-deterministic TTL). The mention of 'hologres-cli' is consistent with the tool's purpose for interacting with Alibaba Cloud's Hologres service.
Capability Assessment
Purpose & Capability
The skill is coherent with its stated purpose of generating Hologres DDL and table-design guidance. It also includes database-changing examples such as table creation and partition deletion, which are expected for this purpose but should be treated carefully.
Instruction Scope
The visible instructions emphasize asking for schema requirements before generating DDL. They do not show hidden or unrelated behavior, but users should ensure explicit approval before running write or delete SQL.
Install Mechanism
Although the registry says there is no install spec and no required binaries, SKILL.md tells users to install hologres-cli with pip. This is purpose-aligned but under-declared and unpinned.
Credentials
No credentials or environment variables are declared. If hologres-cli is configured locally, SQL would run with that user's Hologres permissions, which is proportionate but potentially high-impact.
Persistence & Privilege
The references include scheduled partition-lifecycle examples such as dropping old partitions via cron. This is disclosed and database-administration-related, but persistent deletion jobs should be user-controlled.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install hologres-schema-generator
  3. After installation, invoke the skill by name or use /hologres-schema-generator
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.2.0
- Improved schema design workflow: now requires clear info on query patterns, key columns, and main query conditions before generating DDL. - Added comprehensive information-gathering guidelines, including critical and conditional questions, to ensure correct and efficient table creation. - Expanded documentation on recommended storage formats (column, row, row-column) and when to choose each based on workload. - Enhanced examples and quick-start usage for creating optimized, production-ready Hologres tables. - Detailed the rationale and default behaviors for properties such as bitmap_columns, dictionary_encoding, storage_mode, and partition management.
Metadata
Slug hologres-schema-generator
Version 0.2.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Hologres Schema Generator?

Hologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuri... It is an AI Agent Skill for Claude Code / OpenClaw, with 50 downloads so far.

How do I install Hologres Schema Generator?

Run "/install hologres-schema-generator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Hologres Schema Generator free?

Yes, Hologres Schema Generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Hologres Schema Generator support?

Hologres Schema Generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Hologres Schema Generator?

It is built and maintained by francis (@wenbingyu); the current version is v0.2.0.

💬 Comments