Skip to content

LanceDB API Reference

metaxy.metadata_store.lancedb.LanceDBMetadataStore

LanceDBMetadataStore(uri: str | Path, *, fallback_stores: list[MetadataStore] | None = None, connect_kwargs: dict[str, Any] | None = None, **kwargs: Any)

Bases: MetadataStore

LanceDB metadata store for vector and structured data.

LanceDB is a columnar database optimized for vector search and multimodal data. Each feature is stored in its own Lance table within the database directory. Uses Polars components for data processing (no native SQL execution).

Storage layout:

  • Each feature gets its own table: {namespace}__{feature_name}

  • Tables are stored as Lance format in the directory specified by the URI

  • LanceDB handles schema evolution, transactions, and compaction automatically

Local Directory
from pathlib import Path
from metaxy.metadata_store.lancedb import LanceDBMetadataStore

# Local filesystem
store = LanceDBMetadataStore(Path("/path/to/featuregraph"))
Object Storage (S3, GCS, Azure)
# object store (requires credentials)
store = LanceDBMetadataStore("s3:///path/to/featuregraph")
LanceDB Cloud
import os

# Option 1: Environment variable
os.environ["LANCEDB_API_KEY"] = "your-api-key"
store = LanceDBMetadataStore("db://my-database")

# Option 2: Explicit credentials
store = LanceDBMetadataStore(
    "db://my-database",
    connect_kwargs={"api_key": "your-api-key", "region": "us-east-1"}
)

The database directory is created automatically if it doesn't exist (local paths only). Tables are created on-demand when features are first written.

Parameters:

  • uri (str | Path) –

    Directory path or URI for LanceDB tables. Supports:

    • Local path: "./metadata" or Path("/data/metaxy/lancedb")

    • Object stores: s3://, gs://, az:// (requires cloud credentials)

    • LanceDB Cloud: "db://database-name" (requires API key)

    • Remote HTTP/HTTPS: Any URI supported by LanceDB

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores. When reading features not found in this store, Metaxy searches fallback stores in order. Useful for local dev → staging → production chains.

  • connect_kwargs (dict[str, Any] | None, default: None ) –

    Extra keyword arguments passed directly to lancedb.connect(). Useful for LanceDB Cloud credentials (api_key, region) when you cannot rely on environment variables.

  • **kwargs (Any, default: {} ) –

    Passed to metaxy.metadata_store.base.MetadataStore (e.g., hash_algorithm, hash_truncation_length, prefer_native)

Note

Unlike SQL stores, LanceDB doesn't require explicit table creation. Tables are created automatically when writing metadata.

Source code in src/metaxy/metadata_store/lancedb.py
def __init__(
    self,
    uri: str | Path,
    *,
    fallback_stores: list[MetadataStore] | None = None,
    connect_kwargs: dict[str, Any] | None = None,
    **kwargs: Any,
):
    """
    Initialize [LanceDB](https://lancedb.com/docs/) metadata store.

    The database directory is created automatically if it doesn't exist (local paths only).
    Tables are created on-demand when features are first written.

    Args:
        uri: Directory path or URI for LanceDB tables. Supports:

            - **Local path**: `"./metadata"` or `Path("/data/metaxy/lancedb")`

            - **Object stores**: `s3://`, `gs://`, `az://` (requires cloud credentials)

            - **LanceDB Cloud**: `"db://database-name"` (requires API key)

            - **Remote HTTP/HTTPS**: Any URI supported by LanceDB

        fallback_stores: Ordered list of read-only fallback stores.
            When reading features not found in this store, Metaxy searches
            fallback stores in order. Useful for local dev → staging → production chains.
        connect_kwargs: Extra keyword arguments passed directly to
            [lancedb.connect()](https://lancedb.github.io/lancedb/python/python/#lancedb.connect).
            Useful for LanceDB Cloud credentials (api_key, region) when you cannot
            rely on environment variables.
        **kwargs: Passed to [metaxy.metadata_store.base.MetadataStore][]
            (e.g., hash_algorithm, hash_truncation_length, prefer_native)

    Note:
        Unlike SQL stores, LanceDB doesn't require explicit table creation.
        Tables are created automatically when writing metadata.
    """
    self.uri: str = str(uri)
    self._conn: Any | None = None
    self._connect_kwargs = connect_kwargs or {}
    super().__init__(
        fallback_stores=fallback_stores,
        auto_create_tables=True,
        versioning_engine_cls=PolarsVersioningEngine,
        **kwargs,
    )

Attributes

metaxy.metadata_store.lancedb.LanceDBMetadataStore.conn property

conn: Any

Get LanceDB connection.

Returns:

  • Any

    Active LanceDB connection

Raises:

Functions

metaxy.metadata_store.lancedb.LanceDBMetadataStore.open

open(mode: AccessMode = 'read') -> Iterator[Self]

Open LanceDB connection.

For local filesystem paths, creates the directory if it doesn't exist. For remote URIs (S3, LanceDB Cloud, etc.), connects directly. Tables are created on-demand when features are first written.

Parameters:

  • mode (AccessMode, default: 'read' ) –

    Access mode (READ or WRITE). Accepted for consistency but not used by LanceDB (LanceDB handles concurrent access internally).

Yields:

  • Self ( Self ) –

    The store instance

Raises:

  • ConnectionError

    If remote connection fails (e.g., invalid credentials)

Source code in src/metaxy/metadata_store/lancedb.py
@contextmanager
def open(self, mode: AccessMode = "read") -> Iterator[Self]:
    """Open LanceDB connection.

    For local filesystem paths, creates the directory if it doesn't exist.
    For remote URIs (S3, LanceDB Cloud, etc.), connects directly.
    Tables are created on-demand when features are first written.

    Args:
        mode: Access mode (READ or WRITE). Accepted for consistency but not used
            by LanceDB (LanceDB handles concurrent access internally).

    Yields:
        Self: The store instance

    Raises:
        ConnectionError: If remote connection fails (e.g., invalid credentials)
    """
    # Increment context depth to support nested contexts
    self._context_depth += 1

    try:
        # Only perform actual open on first entry
        if self._context_depth == 1:
            import lancedb

            if is_local_path(self.uri):
                Path(self.uri).mkdir(parents=True, exist_ok=True)

            self._conn = lancedb.connect(self.uri, **self._connect_kwargs)
            self._is_open = True
            self._validate_after_open()

        yield self
    finally:
        # Decrement context depth
        self._context_depth -= 1

        # Only perform actual close on last exit
        if self._context_depth == 0:
            self._conn = None
            self._is_open = False

metaxy.metadata_store.lancedb.LanceDBMetadataStore.write_metadata_to_store

write_metadata_to_store(feature_key: FeatureKey, df: Frame, **kwargs: Any) -> None

Append metadata to Lance table.

Creates the table if it doesn't exist, otherwise appends to existing table. Uses LanceDB's native Polars/Arrow integration for efficient storage.

Parameters:

  • feature_key (FeatureKey) –

    Feature key to write to

  • df (Frame) –

    Narwhals Frame with metadata (already validated by base class)

Source code in src/metaxy/metadata_store/lancedb.py
def write_metadata_to_store(
    self,
    feature_key: FeatureKey,
    df: Frame,
    **kwargs: Any,
) -> None:
    """Append metadata to Lance table.

    Creates the table if it doesn't exist, otherwise appends to existing table.
    Uses LanceDB's native Polars/Arrow integration for efficient storage.

    Args:
        feature_key: Feature key to write to
        df: Narwhals Frame with metadata (already validated by base class)
    """
    # Convert Narwhals frame to Polars DataFrame
    df_polars = collect_to_polars(df)

    table_name = self._table_name(feature_key)

    # LanceDB supports both Polars DataFrames and Arrow tables directly
    # Try Polars first (native integration), fall back to Arrow if needed
    try:
        if self._table_exists(table_name):
            table = self._get_table(table_name)
            # Use Polars DataFrame directly - LanceDB handles conversion
            table.add(df_polars)  # type: ignore[attr-defined]
        else:
            # Create table from Polars DataFrame - LanceDB handles schema
            self.conn.create_table(table_name, data=df_polars)  # type: ignore[attr-defined]
    except TypeError as exc:
        if not self._should_fallback_to_arrow(exc):
            raise
        # Defensive fallback: Modern LanceDB (>=0.3) accepts Polars DataFrames natively,
        # but fall back to Arrow if an older version or edge case doesn't support it.
        # This ensures compatibility across LanceDB versions.
        logger.debug("Falling back to Arrow format for LanceDB write: %s", exc)
        arrow_table = df_polars.to_arrow()
        if self._table_exists(table_name):
            table = self._get_table(table_name)
            table.add(arrow_table)  # type: ignore[attr-defined]
        else:
            self.conn.create_table(table_name, data=arrow_table)  # type: ignore[attr-defined]

metaxy.metadata_store.lancedb.LanceDBMetadataStore.read_metadata_in_store

read_metadata_in_store(feature: CoercibleToFeatureKey, *, filters: Sequence[Expr] | None = None, columns: Sequence[str] | None = None, **kwargs: Any) -> LazyFrame[Any] | None

Read metadata from Lance table.

Loads data from Lance, converts to Polars, and returns as Narwhals LazyFrame. Applies filters and column selection in memory.

Parameters:

  • feature (CoercibleToFeatureKey) –

    Feature to read

  • filters (Sequence[Expr] | None, default: None ) –

    List of Narwhals filter expressions

  • columns (Sequence[str] | None, default: None ) –

    Optional list of columns to select

  • **kwargs (Any, default: {} ) –

    Backend-specific parameters (unused)

Returns:

  • LazyFrame[Any] | None

    Narwhals LazyFrame with metadata, or None if table not found

Source code in src/metaxy/metadata_store/lancedb.py
def read_metadata_in_store(
    self,
    feature: CoercibleToFeatureKey,
    *,
    filters: Sequence[nw.Expr] | None = None,
    columns: Sequence[str] | None = None,
    **kwargs: Any,
) -> nw.LazyFrame[Any] | None:
    """Read metadata from Lance table.

    Loads data from Lance, converts to Polars, and returns as Narwhals LazyFrame.
    Applies filters and column selection in memory.

    Args:
        feature: Feature to read
        filters: List of Narwhals filter expressions
        columns: Optional list of columns to select
        **kwargs: Backend-specific parameters (unused)

    Returns:
        Narwhals LazyFrame with metadata, or None if table not found
    """
    self._check_open()
    feature_key = self._resolve_feature_key(feature)
    table_name = self._table_name(feature_key)
    if not self._table_exists(table_name):
        return None

    table = self._get_table(table_name)
    # https://github.com/lancedb/lancedb/issues/1539
    # Fall back to eager Arrow conversion until LanceDB issue #1539 is resolved.
    arrow_table = table.to_arrow()
    pl_lazy = pl.DataFrame(arrow_table).lazy()
    nw_lazy = nw.from_native(pl_lazy)

    if filters:
        nw_lazy = nw_lazy.filter(*filters)

    if columns is not None:
        nw_lazy = nw_lazy.select(columns)

    return nw_lazy

metaxy.metadata_store.lancedb.LanceDBMetadataStore.display

display() -> str

Human-readable representation with sanitized credentials.

Source code in src/metaxy/metadata_store/lancedb.py
def display(self) -> str:
    """Human-readable representation with sanitized credentials."""
    path = sanitize_uri(self.uri)
    return f"LanceDBMetadataStore(path={path})"

metaxy.metadata_store.lancedb.LanceDBMetadataStore.config_model classmethod

config_model() -> type[LanceDBMetadataStoreConfig]

Return the configuration model class for this store type.

Subclasses must override this to return their specific config class.

Returns:

Note

Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.

Source code in src/metaxy/metadata_store/lancedb.py
@classmethod
def config_model(cls) -> type[LanceDBMetadataStoreConfig]:  # pyright: ignore[reportIncompatibleMethodOverride]
    return LanceDBMetadataStoreConfig