ClickHouse Metadata Store API¶
metaxy.metadata_store.clickhouse
¶
This module implements IbisMetadataStore for ClickHouse.
It takes care of some ClickHouse-specific logic such as nw.Struct type conversion against ClickHouse types such as Map(K,V).
metaxy.metadata_store.clickhouse.ClickHouseMetadataStore
¶
ClickHouseMetadataStore(connection_string: str | None = None, *, connection_params: dict[str, Any] | None = None, fallback_stores: list[MetadataStore] | None = None, **kwargs: Any)
Bases: IbisMetadataStore
ClickHouse metadata store using Ibis backend.
Connection Parameters
Parameters:
-
connection_string(str | None, default:None) – -
connection_params(dict[str, Any] | None, default:None) –Alternative to connection_string, specify params as dict:
-
host: Server host
-
port: Server port (default:
8443) -
database: Database name
-
user: Username
-
password: Password
-
secure: Use secure connection (default:
False)
-
-
fallback_stores(list[MetadataStore] | None, default:None) –Ordered list of read-only fallback stores.
-
**kwargs(Any, default:{}) –Passed to metaxy.metadata_store.ibis.IbisMetadataStore`
Raises:
-
ImportError–If ibis-clickhouse not installed
-
ValueError–If neither connection_string nor connection_params provided
Source code in src/metaxy/metadata_store/clickhouse.py
def __init__(
self,
connection_string: str | None = None,
*,
connection_params: dict[str, Any] | None = None,
fallback_stores: list["MetadataStore"] | None = None,
**kwargs: Any,
):
"""
Initialize [ClickHouse](https://clickhouse.com/) metadata store.
Args:
connection_string: ClickHouse connection string.
Format: `clickhouse://[user[:password]@]host[:port]/database[?param=value]`
Example:
```
"clickhouse://localhost:8443/default"
```
connection_params: Alternative to connection_string, specify params as dict:
- host: Server host
- port: Server port (default: `8443`)
- database: Database name
- user: Username
- password: Password
- secure: Use secure connection (default: `False`)
fallback_stores: Ordered list of read-only fallback stores.
**kwargs: Passed to [metaxy.metadata_store.ibis.IbisMetadataStore][]`
Raises:
ImportError: If ibis-clickhouse not installed
ValueError: If neither connection_string nor connection_params provided
"""
if connection_string is None and connection_params is None:
raise ValueError(
"Must provide either connection_string or connection_params. "
"Example: connection_string='clickhouse://localhost:8443/default'"
)
# Cache for ClickHouse table schemas (cleared on close)
self._ch_schema_cache: dict[str, IbisSchema] = {}
# Initialize Ibis store with ClickHouse backend
super().__init__(
connection_string=connection_string,
backend="clickhouse" if connection_string is None else None,
connection_params=connection_params,
fallback_stores=fallback_stores,
**kwargs,
)
Attributes¶
metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.sqlalchemy_url
property
¶
sqlalchemy_url: str
Get SQLAlchemy-compatible connection URL for ClickHouse.
Overrides the base implementation to return the native protocol format
(clickhouse+native://) which is required for better SQLAlchemy/Alembic
reflection support in clickhouse-sqlalchemy.
The HTTP protocol used by Ibis has limited reflection capabilities.
Port mapping (assumes default ports):
-
HTTP
8123(non-secure) → Native9000 -
HTTP
8443(secure) → Native9440
For secure connections, adds secure=True query parameter.
Returns:
-
str–SQLAlchemy-compatible URL string with native protocol
Raises:
-
ValueError–If connection_string is not available
Functions¶
metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.transform_after_read
¶
transform_after_read(table: Table, feature_key: FeatureKey) -> Table
Transform ClickHouse-specific column types for PyArrow compatibility.
Handles:
-
JSONcolumns: Cast to String (ClickHouse driver returns dict, PyArrow expects bytes) -
Map(String, String)metaxy columns: Convert to named Struct by extracting keys
For metaxy Map columns (metaxy_provenance_by_field, metaxy_data_version_by_field),
we build a named Struct from map key accesses using known field names from the
feature spec.
User-defined Map columns are left as-is and will appear in e.g. Polars as
List[Struct{key, value}] (the standard Arrow Map representation).
Source code in src/metaxy/metadata_store/clickhouse.py
def transform_after_read(
self, table: "ibis.Table", feature_key: "FeatureKey"
) -> "ibis.Table":
"""Transform ClickHouse-specific column types for PyArrow compatibility.
Handles:
- `JSON` columns: Cast to String (ClickHouse driver returns dict, PyArrow expects bytes)
- `Map(String, String)` metaxy columns: Convert to named Struct by extracting keys
For metaxy Map columns (`metaxy_provenance_by_field`, `metaxy_data_version_by_field`),
we build a named Struct from map key accesses using known field names from the
feature spec.
User-defined Map columns are left as-is and will appear in e.g. Polars as
`List[Struct{key, value}]` (the standard Arrow Map representation).
"""
import ibis.expr.datatypes as dt
from metaxy.models.constants import (
METAXY_DATA_VERSION_BY_FIELD,
METAXY_PROVENANCE_BY_FIELD,
)
# Only convert these metaxy system Map columns to Struct
metaxy_map_columns = {METAXY_PROVENANCE_BY_FIELD, METAXY_DATA_VERSION_BY_FIELD}
schema = table.schema()
mutations: dict[str, Any] = {}
for col_name, dtype in schema.items():
if isinstance(dtype, dt.JSON):
# JSON → String (can't convert to Struct due to ClickHouse CAST limitations)
mutations[col_name] = table[col_name].cast("string")
elif isinstance(dtype, dt.Map) and col_name in metaxy_map_columns:
# Only convert metaxy system Map(String, String) columns to Struct
# User-defined Map columns are left as-is
mutations[col_name] = self._map_to_struct_expr(
table, col_name, dtype, feature_key
)
if not mutations:
return table
return table.mutate(**mutations)
metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.transform_before_write
¶
transform_before_write(df: Frame, feature_key: FeatureKey, table_name: str) -> Frame
Transform Polars Struct columns to Map format for ClickHouse.
If the ClickHouse table has Map(K,V) columns but the DataFrame has Struct columns, convert the Struct to Map format before inserting.
Source code in src/metaxy/metadata_store/clickhouse.py
def transform_before_write(
self, df: Frame, feature_key: "FeatureKey", table_name: str
) -> Frame:
"""Transform Polars Struct columns to Map format for ClickHouse.
If the ClickHouse table has Map(K,V) columns but the DataFrame has Struct
columns, convert the Struct to Map format before inserting.
"""
# Check if table exists and get its schema
if table_name not in self.conn.list_tables():
return df
ch_schema = self._get_cached_schema(table_name)
return self._transform_struct_to_map(df, ch_schema)
metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.config_model
classmethod
¶
config_model() -> type[ClickHouseMetadataStoreConfig]
Return the configuration model class for this store type.
Subclasses must override this to return their specific config class.
Returns:
-
type[MetadataStoreConfig]–The config class type (e.g., DuckDBMetadataStoreConfig)
Note
Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.