Skip to content

ClickHouse Metadata Store API

metaxy.metadata_store.clickhouse

This module implements IbisMetadataStore for ClickHouse.

It takes care of some ClickHouse-specific logic such as nw.Struct type conversion against ClickHouse types such as Map(K,V).

metaxy.metadata_store.clickhouse.ClickHouseMetadataStore

ClickHouseMetadataStore(connection_string: str | None = None, *, connection_params: dict[str, Any] | None = None, fallback_stores: list[MetadataStore] | None = None, **kwargs: Any)

Bases: IbisMetadataStore

ClickHouse metadata store using Ibis backend.

Connection Parameters
store = ClickHouseMetadataStore(
    backend="clickhouse",
    connection_params={
        "host": "localhost",
        "port": 8443,
        "database": "default",
        "user": "default",
        "password": ""
    },
    hash_algorithm=HashAlgorithm.XXHASH64
)

Parameters:

  • connection_string (str | None, default: None ) –

    ClickHouse connection string.

    Format: clickhouse://[user[:password]@]host[:port]/database[?param=value]

    Example:

    "clickhouse://localhost:8443/default"
    

  • connection_params (dict[str, Any] | None, default: None ) –

    Alternative to connection_string, specify params as dict:

    • host: Server host

    • port: Server port (default: 8443)

    • database: Database name

    • user: Username

    • password: Password

    • secure: Use secure connection (default: False)

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores.

  • **kwargs (Any, default: {} ) –

Raises:

  • ImportError

    If ibis-clickhouse not installed

  • ValueError

    If neither connection_string nor connection_params provided

Source code in src/metaxy/metadata_store/clickhouse.py
def __init__(
    self,
    connection_string: str | None = None,
    *,
    connection_params: dict[str, Any] | None = None,
    fallback_stores: list["MetadataStore"] | None = None,
    **kwargs: Any,
):
    """
    Initialize [ClickHouse](https://clickhouse.com/) metadata store.

    Args:
        connection_string: ClickHouse connection string.

            Format: `clickhouse://[user[:password]@]host[:port]/database[?param=value]`

            Example:
                ```
                "clickhouse://localhost:8443/default"
                ```

        connection_params: Alternative to connection_string, specify params as dict:

            - host: Server host

            - port: Server port (default: `8443`)

            - database: Database name

            - user: Username

            - password: Password

            - secure: Use secure connection (default: `False`)

        fallback_stores: Ordered list of read-only fallback stores.

        **kwargs: Passed to [metaxy.metadata_store.ibis.IbisMetadataStore][]`

    Raises:
        ImportError: If ibis-clickhouse not installed
        ValueError: If neither connection_string nor connection_params provided
    """
    if connection_string is None and connection_params is None:
        raise ValueError(
            "Must provide either connection_string or connection_params. "
            "Example: connection_string='clickhouse://localhost:8443/default'"
        )

    # Cache for ClickHouse table schemas (cleared on close)
    self._ch_schema_cache: dict[str, IbisSchema] = {}

    # Initialize Ibis store with ClickHouse backend
    super().__init__(
        connection_string=connection_string,
        backend="clickhouse" if connection_string is None else None,
        connection_params=connection_params,
        fallback_stores=fallback_stores,
        **kwargs,
    )

Attributes

metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.sqlalchemy_url property

sqlalchemy_url: str

Get SQLAlchemy-compatible connection URL for ClickHouse.

Overrides the base implementation to return the native protocol format (clickhouse+native://) which is required for better SQLAlchemy/Alembic reflection support in clickhouse-sqlalchemy.

The HTTP protocol used by Ibis has limited reflection capabilities.

Port mapping (assumes default ports):

  • HTTP 8123 (non-secure) → Native 9000

  • HTTP 8443 (secure) → Native 9440

For secure connections, adds secure=True query parameter.

Returns:

  • str

    SQLAlchemy-compatible URL string with native protocol

Raises:

  • ValueError

    If connection_string is not available

Functions

metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.transform_after_read

transform_after_read(table: Table, feature_key: FeatureKey) -> Table

Transform ClickHouse-specific column types for PyArrow compatibility.

Handles:

  • JSON columns: Cast to String (ClickHouse driver returns dict, PyArrow expects bytes)

  • Map(String, String) metaxy columns: Convert to named Struct by extracting keys

For metaxy Map columns (metaxy_provenance_by_field, metaxy_data_version_by_field), we build a named Struct from map key accesses using known field names from the feature spec.

User-defined Map columns are left as-is and will appear in e.g. Polars as List[Struct{key, value}] (the standard Arrow Map representation).

Source code in src/metaxy/metadata_store/clickhouse.py
def transform_after_read(
    self, table: "ibis.Table", feature_key: "FeatureKey"
) -> "ibis.Table":
    """Transform ClickHouse-specific column types for PyArrow compatibility.

    Handles:

    - `JSON` columns: Cast to String (ClickHouse driver returns dict, PyArrow expects bytes)

    - `Map(String, String)` metaxy columns: Convert to named Struct by extracting keys

    For metaxy Map columns (`metaxy_provenance_by_field`, `metaxy_data_version_by_field`),
    we build a named Struct from map key accesses using known field names from the
    feature spec.

    User-defined Map columns are left as-is and will appear in e.g. Polars as
    `List[Struct{key, value}]` (the standard Arrow Map representation).
    """
    import ibis.expr.datatypes as dt

    from metaxy.models.constants import (
        METAXY_DATA_VERSION_BY_FIELD,
        METAXY_PROVENANCE_BY_FIELD,
    )

    # Only convert these metaxy system Map columns to Struct
    metaxy_map_columns = {METAXY_PROVENANCE_BY_FIELD, METAXY_DATA_VERSION_BY_FIELD}

    schema = table.schema()
    mutations: dict[str, Any] = {}

    for col_name, dtype in schema.items():
        if isinstance(dtype, dt.JSON):
            # JSON → String (can't convert to Struct due to ClickHouse CAST limitations)
            mutations[col_name] = table[col_name].cast("string")

        elif isinstance(dtype, dt.Map) and col_name in metaxy_map_columns:
            # Only convert metaxy system Map(String, String) columns to Struct
            # User-defined Map columns are left as-is
            mutations[col_name] = self._map_to_struct_expr(
                table, col_name, dtype, feature_key
            )

    if not mutations:
        return table

    return table.mutate(**mutations)

metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.transform_before_write

transform_before_write(df: Frame, feature_key: FeatureKey, table_name: str) -> Frame

Transform Polars Struct columns to Map format for ClickHouse.

If the ClickHouse table has Map(K,V) columns but the DataFrame has Struct columns, convert the Struct to Map format before inserting.

Source code in src/metaxy/metadata_store/clickhouse.py
def transform_before_write(
    self, df: Frame, feature_key: "FeatureKey", table_name: str
) -> Frame:
    """Transform Polars Struct columns to Map format for ClickHouse.

    If the ClickHouse table has Map(K,V) columns but the DataFrame has Struct
    columns, convert the Struct to Map format before inserting.
    """
    # Check if table exists and get its schema
    if table_name not in self.conn.list_tables():
        return df

    ch_schema = self._get_cached_schema(table_name)
    return self._transform_struct_to_map(df, ch_schema)

metaxy.metadata_store.clickhouse.ClickHouseMetadataStore.config_model classmethod

config_model() -> type[ClickHouseMetadataStoreConfig]

Return the configuration model class for this store type.

Subclasses must override this to return their specific config class.

Returns:

Note

Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.

Source code in src/metaxy/metadata_store/clickhouse.py
@classmethod
def config_model(cls) -> type[ClickHouseMetadataStoreConfig]:
    return ClickHouseMetadataStoreConfig