Ibis API Reference¶
metaxy.metadata_store.ibis.IbisMetadataStore
¶
IbisMetadataStore(versioning_engine: VersioningEngineOptions = 'auto', connection_string: str | None = None, *, backend: str | None = None, connection_params: dict[str, Any] | None = None, table_prefix: str | None = None, **kwargs: Any)
Bases: MetadataStore, ABC
Generic SQL metadata store using Ibis.
Supports any Ibis backend that supports struct types, such as: DuckDB, PostgreSQL, ClickHouse, and others.
Warning
Backends without native struct support (e.g., SQLite) are NOT supported.
Storage layout: - Each feature gets its own table: {feature}__{key} - System tables: metaxy__system__feature_versions, metaxy__system__migrations - Uses Ibis for cross-database compatibility
Note: Uses MD5 hash by default for cross-database compatibility. DuckDBMetadataStore overrides this with dynamic algorithm detection. For other backends, override the calculator instance variable with backend-specific implementations.
Example
# ClickHouse
store = IbisMetadataStore("clickhouse://user:pass@host:9000/db")
# PostgreSQL
store = IbisMetadataStore("postgresql://user:pass@host:5432/db")
# DuckDB (use DuckDBMetadataStore instead for better hash support)
store = IbisMetadataStore("duckdb:///metadata.db")
with store:
store.write_metadata(MyFeature, df)
Parameters:
-
versioning_engine(VersioningEngineOptions, default:'auto') –Which versioning engine to use. - "auto": Prefer the store's native engine, fall back to Polars if needed - "native": Always use the store's native engine, raise
VersioningEngineMismatchErrorif provided dataframes are incompatible - "polars": Always use the Polars engine -
connection_string(str | None, default:None) –Ibis connection string (e.g., "clickhouse://host:9000/db") If provided, backend and connection_params are ignored.
-
backend(str | None, default:None) –Ibis backend name (e.g., "clickhouse", "postgres", "duckdb") Used with connection_params for more control.
-
connection_params(dict[str, Any] | None, default:None) –Backend-specific connection parameters e.g., {"host": "localhost", "port": 9000, "database": "default"}
-
table_prefix(str | None, default:None) –Optional prefix applied to all feature and system table names. Useful for logically separating environments (e.g., "prod_"). Must form a valid SQL identifier when combined with the generated table name.
-
**kwargs(Any, default:{}) –Passed to MetadataStore.init (e.g., fallback_stores, hash_algorithm)
Raises:
-
ValueError–If neither connection_string nor backend is provided
-
ImportError–If Ibis or required backend driver not installed
Example
Source code in src/metaxy/metadata_store/ibis.py
def __init__(
self,
versioning_engine: VersioningEngineOptions = "auto",
connection_string: str | None = None,
*,
backend: str | None = None,
connection_params: dict[str, Any] | None = None,
table_prefix: str | None = None,
**kwargs: Any,
):
"""
Initialize Ibis metadata store.
Args:
versioning_engine: Which versioning engine to use.
- "auto": Prefer the store's native engine, fall back to Polars if needed
- "native": Always use the store's native engine, raise `VersioningEngineMismatchError`
if provided dataframes are incompatible
- "polars": Always use the Polars engine
connection_string: Ibis connection string (e.g., "clickhouse://host:9000/db")
If provided, backend and connection_params are ignored.
backend: Ibis backend name (e.g., "clickhouse", "postgres", "duckdb")
Used with connection_params for more control.
connection_params: Backend-specific connection parameters
e.g., {"host": "localhost", "port": 9000, "database": "default"}
table_prefix: Optional prefix applied to all feature and system table names.
Useful for logically separating environments (e.g., "prod_"). Must form a valid SQL
identifier when combined with the generated table name.
**kwargs: Passed to MetadataStore.__init__ (e.g., fallback_stores, hash_algorithm)
Raises:
ValueError: If neither connection_string nor backend is provided
ImportError: If Ibis or required backend driver not installed
Example:
```py
# Using connection string
store = IbisMetadataStore("clickhouse://user:pass@host:9000/db")
# Using backend + params
store = IbisMetadataStore(
backend="clickhouse",
connection_params={"host": "localhost", "port": 9000}
)
```
"""
import ibis
self.connection_string = connection_string
self.backend = backend
self.connection_params = connection_params or {}
self._conn: ibis.BaseBackend | None = None
self._table_prefix = table_prefix or ""
super().__init__(
**kwargs,
versioning_engine=versioning_engine,
versioning_engine_cls=IbisVersioningEngine,
)
Attributes¶
metaxy.metadata_store.ibis.IbisMetadataStore.ibis_conn
property
¶
Get Ibis backend connection.
Returns:
-
BaseBackend–Active Ibis backend connection
Raises:
-
StoreNotOpenError–If store is not open
metaxy.metadata_store.ibis.IbisMetadataStore.conn
property
¶
Get connection (alias for ibis_conn for consistency).
Returns:
-
BaseBackend–Active Ibis backend connection
Raises:
-
StoreNotOpenError–If store is not open
metaxy.metadata_store.ibis.IbisMetadataStore.sqlalchemy_url
property
¶
sqlalchemy_url: str
Get SQLAlchemy-compatible connection URL for tools like Alembic.
Returns the connection string if available. If the store was initialized with backend + connection_params instead of a connection string, raises an error since constructing a proper URL is backend-specific.
Returns:
-
str–SQLAlchemy-compatible URL string
Raises:
-
ValueError–If connection_string is not available
Functions¶
metaxy.metadata_store.ibis.IbisMetadataStore.get_table_name
¶
get_table_name(key: FeatureKey) -> str
Generate the storage table name for a feature or system table.
Applies the configured table_prefix (if any) to the feature key's table name. Subclasses can override this method to implement custom naming logic.
Parameters:
-
key(FeatureKey) –Feature key to convert to storage table name.
Returns:
-
str–Storage table name with optional prefix applied.
Source code in src/metaxy/metadata_store/ibis.py
def get_table_name(
self,
key: FeatureKey,
) -> str:
"""Generate the storage table name for a feature or system table.
Applies the configured table_prefix (if any) to the feature key's table name.
Subclasses can override this method to implement custom naming logic.
Args:
key: Feature key to convert to storage table name.
Returns:
Storage table name with optional prefix applied.
"""
base_name = key.table_name
return f"{self._table_prefix}{base_name}" if self._table_prefix else base_name
metaxy.metadata_store.ibis.IbisMetadataStore.open
¶
open(mode: AccessMode = 'read') -> Iterator[Self]
Open connection to database via Ibis.
Subclasses should override this to add backend-specific initialization (e.g., loading extensions) and must call this method via super().open(mode).
Parameters:
-
mode(AccessMode, default:'read') –Access mode. Subclasses may use this to set backend-specific connection parameters (e.g.,
read_onlyfor DuckDB).
Yields:
-
Self(Self) –The store instance with connection open
Source code in src/metaxy/metadata_store/ibis.py
@contextmanager
def open(self, mode: AccessMode = "read") -> Iterator[Self]:
"""Open connection to database via Ibis.
Subclasses should override this to add backend-specific initialization
(e.g., loading extensions) and must call this method via super().open(mode).
Args:
mode: Access mode. Subclasses may use this to set backend-specific connection
parameters (e.g., `read_only` for DuckDB).
Yields:
Self: The store instance with connection open
"""
import ibis
# Increment context depth to support nested contexts
self._context_depth += 1
try:
# Only perform actual open on first entry
if self._context_depth == 1:
# Setup: Connect to database
if self.connection_string:
# Use connection string
self._conn = ibis.connect(self.connection_string)
else:
# Use backend + params
# Get backend-specific connect function
assert self.backend is not None, (
"backend must be set if connection_string is None"
)
backend_module = getattr(ibis, self.backend)
self._conn = backend_module.connect(**self.connection_params)
# Mark store as open and validate
self._is_open = True
self._validate_after_open()
yield self
finally:
# Decrement context depth
self._context_depth -= 1
# Only perform actual close on last exit
if self._context_depth == 0:
# Teardown: Close connection
if self._conn is not None:
# Ibis connections may not have explicit close method
# but setting to None releases resources
self._conn = None
self._is_open = False
metaxy.metadata_store.ibis.IbisMetadataStore.write_metadata_to_store
¶
write_metadata_to_store(feature_key: FeatureKey, df: Frame, **kwargs: Any) -> None
Internal write implementation using Ibis.
Parameters:
-
feature_key(FeatureKey) –Feature key to write to
-
df(Frame) –DataFrame with metadata (already validated)
-
**kwargs(Any, default:{}) –Backend-specific parameters (currently unused)
Raises:
-
TableNotFoundError–If table doesn't exist and auto_create_tables is False
Source code in src/metaxy/metadata_store/ibis.py
def write_metadata_to_store(
self,
feature_key: FeatureKey,
df: Frame,
**kwargs: Any,
) -> None:
"""
Internal write implementation using Ibis.
Args:
feature_key: Feature key to write to
df: DataFrame with metadata (already validated)
**kwargs: Backend-specific parameters (currently unused)
Raises:
TableNotFoundError: If table doesn't exist and auto_create_tables is False
"""
if df.implementation == nw.Implementation.IBIS:
df_to_insert = df.to_native() # Ibis expression
else:
from metaxy._utils import collect_to_polars
df_to_insert = collect_to_polars(df) # Polars DataFrame
table_name = self.get_table_name(feature_key)
try:
self.conn.insert(table_name, obj=df_to_insert) # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue]
except Exception as e:
import ibis.common.exceptions
if not isinstance(e, ibis.common.exceptions.TableNotFound):
raise
if self.auto_create_tables:
# Warn about auto-create (first time only)
if self._should_warn_auto_create_tables:
import warnings
warnings.warn(
f"AUTO_CREATE_TABLES is enabled - automatically creating table '{table_name}'. "
"Do not use in production! "
"Use proper database migration tools like Alembic for production deployments.",
UserWarning,
stacklevel=4,
)
# Note: create_table(table_name, obj=df) both creates the table AND inserts the data
# No separate insert needed - the data from df is already written
self.conn.create_table(table_name, obj=df_to_insert)
else:
raise TableNotFoundError(
f"Table '{table_name}' does not exist for feature {feature_key.to_string()}. "
f"Enable auto_create_tables=True to automatically create tables, "
f"or use proper database migration tools like Alembic to create the table first."
) from e
metaxy.metadata_store.ibis.IbisMetadataStore.read_metadata_in_store
¶
read_metadata_in_store(feature: CoercibleToFeatureKey, *, feature_version: str | None = None, filters: Sequence[Expr] | None = None, columns: Sequence[str] | None = None, **kwargs: Any) -> LazyFrame[Any] | None
Read metadata from this store only (no fallback).
Parameters:
-
feature(CoercibleToFeatureKey) –Feature to read
-
feature_version(str | None, default:None) –Filter by specific feature_version (applied as SQL WHERE clause)
-
filters(Sequence[Expr] | None, default:None) –List of Narwhals filter expressions (converted to SQL WHERE clauses)
-
columns(Sequence[str] | None, default:None) –Optional list of columns to select
-
**kwargs(Any, default:{}) –Backend-specific parameters (currently unused)
Returns:
Source code in src/metaxy/metadata_store/ibis.py
def read_metadata_in_store(
self,
feature: CoercibleToFeatureKey,
*,
feature_version: str | None = None,
filters: Sequence[nw.Expr] | None = None,
columns: Sequence[str] | None = None,
**kwargs: Any,
) -> nw.LazyFrame[Any] | None:
"""
Read metadata from this store only (no fallback).
Args:
feature: Feature to read
feature_version: Filter by specific feature_version (applied as SQL WHERE clause)
filters: List of Narwhals filter expressions (converted to SQL WHERE clauses)
columns: Optional list of columns to select
**kwargs: Backend-specific parameters (currently unused)
Returns:
Narwhals LazyFrame with metadata, or None if not found
"""
feature_key = self._resolve_feature_key(feature)
table_name = self.get_table_name(feature_key)
# Check if table exists
existing_tables = self.conn.list_tables()
if table_name not in existing_tables:
return None
# Get Ibis table reference
table = self.conn.table(table_name)
# Wrap Ibis table with Narwhals (stays lazy in SQL)
nw_lazy: nw.LazyFrame[Any] = nw.from_native(table, eager_only=False)
# Apply feature_version filter (stays in SQL via Narwhals)
if feature_version is not None:
nw_lazy = nw_lazy.filter(
nw.col("metaxy_feature_version") == feature_version
)
# Apply generic Narwhals filters (stays in SQL)
if filters is not None:
for filter_expr in filters:
nw_lazy = nw_lazy.filter(filter_expr)
# Select columns (stays in SQL)
if columns is not None:
nw_lazy = nw_lazy.select(columns)
# Return Narwhals LazyFrame wrapping Ibis table (stays lazy in SQL)
return nw_lazy
metaxy.metadata_store.ibis.IbisMetadataStore.display
¶
display() -> str
Display string for this store.
Source code in src/metaxy/metadata_store/ibis.py
def display(self) -> str:
"""Display string for this store."""
from metaxy.metadata_store.utils import sanitize_uri
backend_info = self.connection_string or f"{self.backend}"
# Sanitize connection strings that may contain credentials
sanitized_info = sanitize_uri(backend_info)
return f"{self.__class__.__name__}(backend={sanitized_info})"
metaxy.metadata_store.ibis.IbisMetadataStore.config_model
classmethod
¶
config_model() -> type[IbisMetadataStoreConfig]
Return the configuration model class for this store type.
Subclasses must override this to return their specific config class.
Returns:
-
type[MetadataStoreConfig]–The config class type (e.g., DuckDBMetadataStoreConfig)
Note
Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.