Skip to content

Metaxy + ClickHouse

Metaxy implements ClickHouseMetadataStore. It uses ClickHouse as metadata storage and versioning engine.

Installation

pip install 'metaxy[clickhouse]'

Metaxy's Versioning Struct Columns

Metaxy uses struct columns (metaxy_provenance_by_field, metaxy_data_version_by_field) to track field-level versioning. In Python world this corresponds to dict[str, str]. In ClickHouse, there are several options to represent these columns.

How ClickHouse Handles Structs

ClickHouse offers multiple approaches to represent Metaxy's structured versioning columns:

Type Description Use Case
Map(String, String) Native key-value map Recommended for Metaxy because of dynamic keys
JSON Native JSON with typed subcolumns Less performant than Map(String, String) but more flexible than Nested
Nested(field_1 String, ...) Static struct with named fields More performant than Map(String, String) but keys are static

Recommended: Map(String, String)

For Metaxy's metaxy_provenance_by_field and metaxy_data_version_by_field columns, use Map(String, String):

  • No migrations required when feature fields change

  • Good performance for key-value lookups

Special Map columns handling

Metaxy transforms its system columns (metaxy_provenance_by_field, metaxy_data_version_by_field):

  • Reading: System Map columns are converted into Ibis Structs (e.g., Struct[{"field_a": str, "field_b": str}])

  • Writing: If the input comes from Polars, then Polars Structs are converted into expected ClickHouse Map format

User-defined Map columns are not transformed. They remain as List[Struct[{"key": str, "value": str}]] (Arrow's Map representation). Make sure to use the right format when providing a Polars DataFrame for writing.

SQLAlchemy and Alembic Migrations

For SQLAlchemy and Alembic migrations support, use the clickhouse-sqlalchemy driver with the native protocol:

pip install clickhouse-sqlalchemy

Use Native Clickhouse Protocol

The HTTP protocol has limited reflection support. Always use the native protocol (clickhouse+native://) for full SQLAlchemy/Alembic compatibility:

connection_string = "clickhouse+native://user:pass@localhost:9000/default"

The ClickHouseMetadataStore.sqlalchemy_url property is tweaked to return the native connection string variant.

Alternative: ClickHouse Connect

Alternatively, use the official clickhouse-connect driver.

Alembic Integration

See Alembic setup guide for additional instructions on how to use Alembic with Metaxy.

Performance Optimization

Table Design

For optimal query performance, create your ClickHouse tables with:

  • Partitioning: Partition your tables!
  • Primary Key: It's probably a good idea to use (metaxy_feature_version, <id_columns>, metaxy_created_at)

Reference