Metaxy + ClickHouse¶
Metaxy implements ClickHouseMetadataStore. It uses ClickHouse as metadata storage and versioning engine.
Installation¶
Metaxy's Versioning Struct Columns¶
Metaxy uses struct columns (metaxy_provenance_by_field, metaxy_data_version_by_field) to track field-level versioning. In Python world this corresponds to dict[str, str]. In ClickHouse, there are several options to represent these columns.
How ClickHouse Handles Structs¶
ClickHouse offers multiple approaches to represent Metaxy's structured versioning columns:
| Type | Description | Use Case |
|---|---|---|
Map(String, String) |
Native key-value map | Recommended for Metaxy because of dynamic keys |
JSON |
Native JSON with typed subcolumns | Less performant than Map(String, String) but more flexible than Nested |
Nested(field_1 String, ...) |
Static struct with named fields | More performant than Map(String, String) but keys are static |
Recommended: Map(String, String)
For Metaxy's metaxy_provenance_by_field and metaxy_data_version_by_field columns, use Map(String, String):
-
No migrations required when feature fields change
-
Good performance for key-value lookups
Special Map columns handling
Metaxy transforms its system columns (metaxy_provenance_by_field, metaxy_data_version_by_field):
-
Reading: System Map columns are converted into Ibis Structs (e.g.,
Struct[{"field_a": str, "field_b": str}]) -
Writing: If the input comes from Polars, then Polars Structs are converted into expected ClickHouse Map format
User-defined Map columns are not transformed. They remain as List[Struct[{"key": str, "value": str}]] (Arrow's Map representation). Make sure to use the right format when providing a Polars DataFrame for writing.
SQLAlchemy and Alembic Migrations¶
For SQLAlchemy and Alembic migrations support, use the clickhouse-sqlalchemy driver with the native protocol:
Use Native Clickhouse Protocol
The HTTP protocol has limited reflection support. Always use the native protocol (clickhouse+native://) for full SQLAlchemy/Alembic compatibility:
The ClickHouseMetadataStore.sqlalchemy_url property is tweaked to return the native connection string variant.
Alternative: ClickHouse Connect
Alternatively, use the official clickhouse-connect driver.
Alembic Integration
See Alembic setup guide for additional instructions on how to use Alembic with Metaxy.
Performance Optimization¶
Table Design
For optimal query performance, create your ClickHouse tables with:
- Partitioning: Partition your tables!
- Primary Key: It's probably a good idea to use
(metaxy_feature_version, <id_columns>, metaxy_created_at)