Skip to content

DuckDB API Reference

metaxy.metadata_store.duckdb.DuckDBMetadataStore

DuckDBMetadataStore(database: str | Path, *, config: dict[str, str] | None = None, extensions: Sequence[ExtensionInput] | None = None, fallback_stores: list[MetadataStore] | None = None, ducklake: DuckLakeConfigInput | None = None, **kwargs)

Bases: IbisMetadataStore

DuckDB metadata store using Ibis backend.

Local File
store = DuckDBMetadataStore("metadata.db")
In-memory database
# In-memory database
store = DuckDBMetadataStore(":memory:")
MotherDuck
# MotherDuck
store = DuckDBMetadataStore("md:my_database")
With extensions
# With extensions
store = DuckDBMetadataStore(
    "metadata.db",
    hash_algorithm=HashAlgorithm.XXHASH64,
    extensions=["hashfuncs"]
)

Parameters:

  • database (str | Path) –

    Database connection string or path. - File path: "metadata.db" or Path("metadata.db")

    • In-memory: ":memory:"

    • MotherDuck: "md:my_database" or "md:my_database?motherduck_token=..."

    • S3: "s3://bucket/path/database.duckdb" (read-only via ATTACH)

    • HTTPS: "https://example.com/database.duckdb" (read-only via ATTACH)

    • Any valid DuckDB connection string

  • config (dict[str, str] | None, default: None ) –

    Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})

  • extensions (Sequence[ExtensionInput] | None, default: None ) –

    List of DuckDB extensions to install and load on open. Supports strings (community repo), mapping-like objects with name/repository keys, or metaxy.metadata_store.duckdb.ExtensionSpec instances.

Optional DuckLake attachment configuration. Provide either a

mapping with 'metadata_backend' and 'storage_backend' entries or a DuckLakeAttachmentConfig instance. When supplied, the DuckDB connection is configured to ATTACH the DuckLake catalog after open(). fallback_stores: Ordered list of read-only fallback stores.

**kwargs: Passed to metaxy.metadata_store.ibis.IbisMetadataStore`

Warning

Parent directories are NOT created automatically. Ensure paths exist before initializing the store.

Source code in src/metaxy/metadata_store/duckdb.py
def __init__(
    self,
    database: str | Path,
    *,
    config: dict[str, str] | None = None,
    extensions: Sequence[ExtensionInput] | None = None,
    fallback_stores: list["MetadataStore"] | None = None,
    ducklake: DuckLakeConfigInput | None = None,
    **kwargs,
):
    """
    Initialize [DuckDB](https://duckdb.org/) metadata store.

    Args:
        database: Database connection string or path.
            - File path: `"metadata.db"` or `Path("metadata.db")`

            - In-memory: `":memory:"`

            - MotherDuck: `"md:my_database"` or `"md:my_database?motherduck_token=..."`

            - S3: `"s3://bucket/path/database.duckdb"` (read-only via ATTACH)

            - HTTPS: `"https://example.com/database.duckdb"` (read-only via ATTACH)

            - Any valid DuckDB connection string

        config: Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
        extensions: List of DuckDB extensions to install and load on open.
            Supports strings (community repo), mapping-like objects with
            ``name``/``repository`` keys, or [metaxy.metadata_store.duckdb.ExtensionSpec][] instances.

    ducklake: Optional DuckLake attachment configuration. Provide either a
        mapping with 'metadata_backend' and 'storage_backend' entries or a
        DuckLakeAttachmentConfig instance. When supplied, the DuckDB
        connection is configured to ATTACH the DuckLake catalog after open().
        fallback_stores: Ordered list of read-only fallback stores.

        **kwargs: Passed to [metaxy.metadata_store.ibis.IbisMetadataStore][]`

    Warning:
        Parent directories are NOT created automatically. Ensure paths exist
        before initializing the store.
    """
    database_str = str(database)

    # Build connection params for Ibis DuckDB backend
    # Ibis DuckDB backend accepts config params directly (not nested under 'config')
    connection_params = {"database": database_str}
    if config:
        connection_params.update(config)

    self.database = database_str
    base_extensions: list[NormalisedExtension] = _normalise_extensions(
        extensions or []
    )

    self._ducklake_config: DuckLakeAttachmentConfig | None = None
    self._ducklake_attachment: DuckLakeAttachmentManager | None = None
    if ducklake is not None:
        attachment_config, manager = build_ducklake_attachment(ducklake)
        ensure_extensions_with_plugins(base_extensions, attachment_config.plugins)
        self._ducklake_config = attachment_config
        self._ducklake_attachment = manager

    self.extensions = base_extensions

    # Auto-add hashfuncs extension if not present (needed for default XXHASH64)
    # But we'll fall back to MD5 if hashfuncs is not available
    extension_names: list[str] = []
    for ext in self.extensions:
        if isinstance(ext, str):
            extension_names.append(ext)
        elif isinstance(ext, ExtensionSpec):
            extension_names.append(ext.name)
        else:
            # After _normalise_extensions, this should not happen
            # But keep defensive check for type safety
            raise TypeError(
                f"Extension must be str or ExtensionSpec after normalization; got {type(ext)}"
            )
    if "hashfuncs" not in extension_names:
        self.extensions.append("hashfuncs")

    # Initialize Ibis store with DuckDB backend
    super().__init__(
        backend="duckdb",
        connection_params=connection_params,
        fallback_stores=fallback_stores,
        **kwargs,
    )

Attributes

metaxy.metadata_store.duckdb.DuckDBMetadataStore.sqlalchemy_url property

sqlalchemy_url: str

Get SQLAlchemy-compatible connection URL for DuckDB.

Constructs a DuckDB SQLAlchemy URL from the database parameter.

Returns:

  • str

    SQLAlchemy-compatible URL string (e.g., "duckdb:///path/to/db.db")

Example
store = DuckDBMetadataStore(":memory:")
print(store.sqlalchemy_url)  # duckdb:///:memory:

store = DuckDBMetadataStore("metadata.db")
print(store.sqlalchemy_url)  # duckdb:///metadata.db

metaxy.metadata_store.duckdb.DuckDBMetadataStore.ducklake_attachment property

ducklake_attachment: DuckLakeAttachmentManager

DuckLake attachment manager (raises if not configured).

metaxy.metadata_store.duckdb.DuckDBMetadataStore.ducklake_attachment_config property

ducklake_attachment_config: DuckLakeAttachmentConfig

DuckLake attachment configuration (raises if not configured).

Functions

metaxy.metadata_store.duckdb.DuckDBMetadataStore.open

open(mode: AccessMode = 'read') -> Iterator[Self]

Open DuckDB connection with specified access mode.

Parameters:

  • mode (AccessMode, default: 'read' ) –

    Access mode (READ or WRITE). Defaults to READ. READ mode sets read_only=True for concurrent access.

Yields:

  • Self ( Self ) –

    The store instance with connection open

Source code in src/metaxy/metadata_store/duckdb.py
@contextmanager
def open(self, mode: AccessMode = "read") -> Iterator[Self]:
    """Open DuckDB connection with specified access mode.

    Args:
        mode: Access mode (READ or WRITE). Defaults to READ.
            READ mode sets read_only=True for concurrent access.

    Yields:
        Self: The store instance with connection open
    """
    # Setup: Configure connection params based on mode
    if mode == "read":
        self.connection_params["read_only"] = True
    else:
        # Remove read_only if present (switching to WRITE)
        self.connection_params.pop("read_only", None)

    # Call parent context manager to establish connection
    with super().open(mode):
        try:
            # Configure DuckLake if needed (only on first entry)
            if self._ducklake_attachment is not None and self._context_depth == 1:
                duckdb_conn = self._duckdb_raw_connection()
                self._ducklake_attachment.configure(duckdb_conn)

            yield self
        finally:
            # Cleanup is handled by parent's finally block
            pass

metaxy.metadata_store.duckdb.DuckDBMetadataStore.preview_ducklake_sql

preview_ducklake_sql() -> list[str]

Return DuckLake attachment SQL if configured.

Source code in src/metaxy/metadata_store/duckdb.py
def preview_ducklake_sql(self) -> list[str]:
    """Return DuckLake attachment SQL if configured."""
    return self.ducklake_attachment.preview_sql()

metaxy.metadata_store.duckdb.DuckDBMetadataStore.config_model classmethod

config_model() -> type[DuckDBMetadataStoreConfig]

Return the configuration model class for this store type.

Subclasses must override this to return their specific config class.

Returns:

Note

Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.

Source code in src/metaxy/metadata_store/duckdb.py
@classmethod
def config_model(cls) -> type[DuckDBMetadataStoreConfig]:  # pyright: ignore[reportIncompatibleMethodOverride]
    return DuckDBMetadataStoreConfig

metaxy.metadata_store.duckdb.ExtensionSpec pydantic-model

Bases: BaseModel

DuckDB extension specification accepted by DuckDBMetadataStore.

Supports additional keys for forward compatibility.

Show JSON schema:
{
  "additionalProperties": true,
  "description": "DuckDB extension specification accepted by DuckDBMetadataStore.\n\nSupports additional keys for forward compatibility.",
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "repository": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Repository"
    }
  },
  "required": [
    "name"
  ],
  "title": "ExtensionSpec",
  "type": "object"
}

Config:

  • extra: allow

Fields:

  • name (str)
  • repository (str | None)

metaxy.metadata_store.duckdb.DuckLakeConfigInput module-attribute

DuckLakeConfigInput = DuckLakeAttachmentConfig | Mapping[str, Any]

metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig pydantic-model

Bases: BaseModel

Configuration payload used to attach DuckLake to a DuckDB connection.

Show JSON schema:
{
  "additionalProperties": true,
  "description": "Configuration payload used to attach DuckLake to a DuckDB connection.",
  "properties": {
    "metadata_backend": {
      "additionalProperties": true,
      "title": "Metadata Backend",
      "type": "object"
    },
    "storage_backend": {
      "additionalProperties": true,
      "title": "Storage Backend",
      "type": "object"
    },
    "alias": {
      "default": "ducklake",
      "title": "Alias",
      "type": "string"
    },
    "plugins": {
      "items": {
        "type": "string"
      },
      "title": "Plugins",
      "type": "array"
    },
    "attach_options": {
      "additionalProperties": true,
      "title": "Attach Options",
      "type": "object"
    }
  },
  "required": [
    "metadata_backend",
    "storage_backend"
  ],
  "title": "DuckLakeAttachmentConfig",
  "type": "object"
}

Config:

  • arbitrary_types_allowed: True
  • extra: allow

Fields:

  • metadata_backend (DuckLakeBackend)
  • storage_backend (DuckLakeBackend)
  • alias (str)
  • plugins (tuple[str, ...])
  • attach_options (dict[str, Any])

Validators:

  • _coerce_backendsmetadata_backend, storage_backend
  • _coerce_aliasalias
  • _coerce_pluginsplugins
  • _coerce_attach_optionsattach_options

Functions

metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig.metadata_sql_parts

metadata_sql_parts() -> tuple[str, str]

Pre-computed metadata SQL components for DuckLake attachments.

Source code in src/metaxy/metadata_store/_ducklake_support.py
@computed_field(return_type=tuple[str, str])
def metadata_sql_parts(self) -> tuple[str, str]:
    """Pre-computed metadata SQL components for DuckLake attachments."""
    return resolve_metadata_backend(self.metadata_backend, self.alias)

metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig.storage_sql_parts

storage_sql_parts() -> tuple[str, str]

Pre-computed storage SQL components for DuckLake attachments.

Source code in src/metaxy/metadata_store/_ducklake_support.py
@computed_field(return_type=tuple[str, str])
def storage_sql_parts(self) -> tuple[str, str]:
    """Pre-computed storage SQL components for DuckLake attachments."""
    return resolve_storage_backend(self.storage_backend, self.alias)