LanceDB API Reference¶
metaxy.metadata_store.lancedb.LanceDBMetadataStore
¶
LanceDBMetadataStore(uri: str | Path, *, fallback_stores: list[MetadataStore] | None = None, connect_kwargs: dict[str, Any] | None = None, **kwargs: Any)
Bases: MetadataStore
LanceDB metadata store for vector and structured data.
LanceDB is a columnar database optimized for vector search and multimodal data. Each feature is stored in its own Lance table within the database directory. Uses Polars components for data processing (no native SQL execution).
Storage layout:
-
Each feature gets its own table:
{namespace}__{feature_name} -
Tables are stored as Lance format in the directory specified by the URI
-
LanceDB handles schema evolution, transactions, and compaction automatically
Local Directory
Object Storage (S3, GCS, Azure)
LanceDB Cloud
The database directory is created automatically if it doesn't exist (local paths only). Tables are created on-demand when features are first written.
Parameters:
-
uri(str | Path) –Directory path or URI for LanceDB tables. Supports:
-
Local path:
"./metadata"orPath("/data/metaxy/lancedb") -
Object stores:
s3://,gs://,az://(requires cloud credentials) -
LanceDB Cloud:
"db://database-name"(requires API key) -
Remote HTTP/HTTPS: Any URI supported by LanceDB
-
-
fallback_stores(list[MetadataStore] | None, default:None) –Ordered list of read-only fallback stores. When reading features not found in this store, Metaxy searches fallback stores in order. Useful for local dev → staging → production chains.
-
connect_kwargs(dict[str, Any] | None, default:None) –Extra keyword arguments passed directly to lancedb.connect(). Useful for LanceDB Cloud credentials (api_key, region) when you cannot rely on environment variables.
-
**kwargs(Any, default:{}) –Passed to metaxy.metadata_store.base.MetadataStore (e.g., hash_algorithm, hash_truncation_length, prefer_native)
Note
Unlike SQL stores, LanceDB doesn't require explicit table creation. Tables are created automatically when writing metadata.
Source code in src/metaxy/metadata_store/lancedb.py
def __init__(
self,
uri: str | Path,
*,
fallback_stores: list[MetadataStore] | None = None,
connect_kwargs: dict[str, Any] | None = None,
**kwargs: Any,
):
"""
Initialize [LanceDB](https://lancedb.com/docs/) metadata store.
The database directory is created automatically if it doesn't exist (local paths only).
Tables are created on-demand when features are first written.
Args:
uri: Directory path or URI for LanceDB tables. Supports:
- **Local path**: `"./metadata"` or `Path("/data/metaxy/lancedb")`
- **Object stores**: `s3://`, `gs://`, `az://` (requires cloud credentials)
- **LanceDB Cloud**: `"db://database-name"` (requires API key)
- **Remote HTTP/HTTPS**: Any URI supported by LanceDB
fallback_stores: Ordered list of read-only fallback stores.
When reading features not found in this store, Metaxy searches
fallback stores in order. Useful for local dev → staging → production chains.
connect_kwargs: Extra keyword arguments passed directly to
[lancedb.connect()](https://lancedb.github.io/lancedb/python/python/#lancedb.connect).
Useful for LanceDB Cloud credentials (api_key, region) when you cannot
rely on environment variables.
**kwargs: Passed to [metaxy.metadata_store.base.MetadataStore][]
(e.g., hash_algorithm, hash_truncation_length, prefer_native)
Note:
Unlike SQL stores, LanceDB doesn't require explicit table creation.
Tables are created automatically when writing metadata.
"""
self.uri: str = str(uri)
self._conn: Any | None = None
self._connect_kwargs = connect_kwargs or {}
super().__init__(
fallback_stores=fallback_stores,
auto_create_tables=True,
versioning_engine_cls=PolarsVersioningEngine,
**kwargs,
)
Attributes¶
metaxy.metadata_store.lancedb.LanceDBMetadataStore.conn
property
¶
conn: Any
Get LanceDB connection.
Returns:
-
Any–Active LanceDB connection
Raises:
-
StoreNotOpenError–If store is not open
Functions¶
metaxy.metadata_store.lancedb.LanceDBMetadataStore.open
¶
open(mode: AccessMode = 'read') -> Iterator[Self]
Open LanceDB connection.
For local filesystem paths, creates the directory if it doesn't exist. For remote URIs (S3, LanceDB Cloud, etc.), connects directly. Tables are created on-demand when features are first written.
Parameters:
-
mode(AccessMode, default:'read') –Access mode (READ or WRITE). Accepted for consistency but not used by LanceDB (LanceDB handles concurrent access internally).
Yields:
-
Self(Self) –The store instance
Raises:
-
ConnectionError–If remote connection fails (e.g., invalid credentials)
Source code in src/metaxy/metadata_store/lancedb.py
@contextmanager
def open(self, mode: AccessMode = "read") -> Iterator[Self]:
"""Open LanceDB connection.
For local filesystem paths, creates the directory if it doesn't exist.
For remote URIs (S3, LanceDB Cloud, etc.), connects directly.
Tables are created on-demand when features are first written.
Args:
mode: Access mode (READ or WRITE). Accepted for consistency but not used
by LanceDB (LanceDB handles concurrent access internally).
Yields:
Self: The store instance
Raises:
ConnectionError: If remote connection fails (e.g., invalid credentials)
"""
# Increment context depth to support nested contexts
self._context_depth += 1
try:
# Only perform actual open on first entry
if self._context_depth == 1:
import lancedb
if is_local_path(self.uri):
Path(self.uri).mkdir(parents=True, exist_ok=True)
self._conn = lancedb.connect(self.uri, **self._connect_kwargs)
self._is_open = True
self._validate_after_open()
yield self
finally:
# Decrement context depth
self._context_depth -= 1
# Only perform actual close on last exit
if self._context_depth == 0:
self._conn = None
self._is_open = False
metaxy.metadata_store.lancedb.LanceDBMetadataStore.write_metadata_to_store
¶
write_metadata_to_store(feature_key: FeatureKey, df: Frame, **kwargs: Any) -> None
Append metadata to Lance table.
Creates the table if it doesn't exist, otherwise appends to existing table. Uses LanceDB's native Polars/Arrow integration for efficient storage.
Parameters:
-
feature_key(FeatureKey) –Feature key to write to
-
df(Frame) –Narwhals Frame with metadata (already validated by base class)
Source code in src/metaxy/metadata_store/lancedb.py
def write_metadata_to_store(
self,
feature_key: FeatureKey,
df: Frame,
**kwargs: Any,
) -> None:
"""Append metadata to Lance table.
Creates the table if it doesn't exist, otherwise appends to existing table.
Uses LanceDB's native Polars/Arrow integration for efficient storage.
Args:
feature_key: Feature key to write to
df: Narwhals Frame with metadata (already validated by base class)
"""
# Convert Narwhals frame to Polars DataFrame
df_polars = collect_to_polars(df)
table_name = self._table_name(feature_key)
# LanceDB supports both Polars DataFrames and Arrow tables directly
# Try Polars first (native integration), fall back to Arrow if needed
try:
if self._table_exists(table_name):
table = self._get_table(table_name)
# Use Polars DataFrame directly - LanceDB handles conversion
table.add(df_polars) # type: ignore[attr-defined]
else:
# Create table from Polars DataFrame - LanceDB handles schema
self.conn.create_table(table_name, data=df_polars) # type: ignore[attr-defined]
except TypeError as exc:
if not self._should_fallback_to_arrow(exc):
raise
# Defensive fallback: Modern LanceDB (>=0.3) accepts Polars DataFrames natively,
# but fall back to Arrow if an older version or edge case doesn't support it.
# This ensures compatibility across LanceDB versions.
logger.debug("Falling back to Arrow format for LanceDB write: %s", exc)
arrow_table = df_polars.to_arrow()
if self._table_exists(table_name):
table = self._get_table(table_name)
table.add(arrow_table) # type: ignore[attr-defined]
else:
self.conn.create_table(table_name, data=arrow_table) # type: ignore[attr-defined]
metaxy.metadata_store.lancedb.LanceDBMetadataStore.read_metadata_in_store
¶
read_metadata_in_store(feature: CoercibleToFeatureKey, *, filters: Sequence[Expr] | None = None, columns: Sequence[str] | None = None, **kwargs: Any) -> LazyFrame[Any] | None
Read metadata from Lance table.
Loads data from Lance, converts to Polars, and returns as Narwhals LazyFrame. Applies filters and column selection in memory.
Parameters:
-
feature(CoercibleToFeatureKey) –Feature to read
-
filters(Sequence[Expr] | None, default:None) –List of Narwhals filter expressions
-
columns(Sequence[str] | None, default:None) –Optional list of columns to select
-
**kwargs(Any, default:{}) –Backend-specific parameters (unused)
Returns:
Source code in src/metaxy/metadata_store/lancedb.py
def read_metadata_in_store(
self,
feature: CoercibleToFeatureKey,
*,
filters: Sequence[nw.Expr] | None = None,
columns: Sequence[str] | None = None,
**kwargs: Any,
) -> nw.LazyFrame[Any] | None:
"""Read metadata from Lance table.
Loads data from Lance, converts to Polars, and returns as Narwhals LazyFrame.
Applies filters and column selection in memory.
Args:
feature: Feature to read
filters: List of Narwhals filter expressions
columns: Optional list of columns to select
**kwargs: Backend-specific parameters (unused)
Returns:
Narwhals LazyFrame with metadata, or None if table not found
"""
self._check_open()
feature_key = self._resolve_feature_key(feature)
table_name = self._table_name(feature_key)
if not self._table_exists(table_name):
return None
table = self._get_table(table_name)
# https://github.com/lancedb/lancedb/issues/1539
# Fall back to eager Arrow conversion until LanceDB issue #1539 is resolved.
arrow_table = table.to_arrow()
pl_lazy = pl.DataFrame(arrow_table).lazy()
nw_lazy = nw.from_native(pl_lazy)
if filters:
nw_lazy = nw_lazy.filter(*filters)
if columns is not None:
nw_lazy = nw_lazy.select(columns)
return nw_lazy
metaxy.metadata_store.lancedb.LanceDBMetadataStore.config_model
classmethod
¶
config_model() -> type[LanceDBMetadataStoreConfig]
Return the configuration model class for this store type.
Subclasses must override this to return their specific config class.
Returns:
-
type[MetadataStoreConfig]–The config class type (e.g., DuckDBMetadataStoreConfig)
Note
Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.