DuckDB API Reference¶
metaxy.metadata_store.duckdb.DuckDBMetadataStore
¶
DuckDBMetadataStore(database: str | Path, *, config: dict[str, str] | None = None, extensions: Sequence[ExtensionInput] | None = None, fallback_stores: list[MetadataStore] | None = None, ducklake: DuckLakeConfigInput | None = None, **kwargs)
Bases: IbisMetadataStore
DuckDB metadata store using Ibis backend.
With extensions
Parameters:
-
database(str | Path) –Database connection string or path. - File path:
"metadata.db"orPath("metadata.db")-
In-memory:
":memory:" -
MotherDuck:
"md:my_database"or"md:my_database?motherduck_token=..." -
S3:
"s3://bucket/path/database.duckdb"(read-only via ATTACH) -
HTTPS:
"https://example.com/database.duckdb"(read-only via ATTACH) -
Any valid DuckDB connection string
-
-
config(dict[str, str] | None, default:None) –Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
-
extensions(Sequence[ExtensionInput] | None, default:None) –List of DuckDB extensions to install and load on open. Supports strings (community repo), mapping-like objects with
name/repositorykeys, or metaxy.metadata_store.duckdb.ExtensionSpec instances.
Optional DuckLake attachment configuration. Provide either a
mapping with 'metadata_backend' and 'storage_backend' entries or a DuckLakeAttachmentConfig instance. When supplied, the DuckDB connection is configured to ATTACH the DuckLake catalog after open(). fallback_stores: Ordered list of read-only fallback stores.
**kwargs: Passed to metaxy.metadata_store.ibis.IbisMetadataStore`
Warning
Parent directories are NOT created automatically. Ensure paths exist before initializing the store.
Source code in src/metaxy/metadata_store/duckdb.py
def __init__(
self,
database: str | Path,
*,
config: dict[str, str] | None = None,
extensions: Sequence[ExtensionInput] | None = None,
fallback_stores: list["MetadataStore"] | None = None,
ducklake: DuckLakeConfigInput | None = None,
**kwargs,
):
"""
Initialize [DuckDB](https://duckdb.org/) metadata store.
Args:
database: Database connection string or path.
- File path: `"metadata.db"` or `Path("metadata.db")`
- In-memory: `":memory:"`
- MotherDuck: `"md:my_database"` or `"md:my_database?motherduck_token=..."`
- S3: `"s3://bucket/path/database.duckdb"` (read-only via ATTACH)
- HTTPS: `"https://example.com/database.duckdb"` (read-only via ATTACH)
- Any valid DuckDB connection string
config: Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
extensions: List of DuckDB extensions to install and load on open.
Supports strings (community repo), mapping-like objects with
``name``/``repository`` keys, or [metaxy.metadata_store.duckdb.ExtensionSpec][] instances.
ducklake: Optional DuckLake attachment configuration. Provide either a
mapping with 'metadata_backend' and 'storage_backend' entries or a
DuckLakeAttachmentConfig instance. When supplied, the DuckDB
connection is configured to ATTACH the DuckLake catalog after open().
fallback_stores: Ordered list of read-only fallback stores.
**kwargs: Passed to [metaxy.metadata_store.ibis.IbisMetadataStore][]`
Warning:
Parent directories are NOT created automatically. Ensure paths exist
before initializing the store.
"""
database_str = str(database)
# Build connection params for Ibis DuckDB backend
# Ibis DuckDB backend accepts config params directly (not nested under 'config')
connection_params = {"database": database_str}
if config:
connection_params.update(config)
self.database = database_str
base_extensions: list[NormalisedExtension] = _normalise_extensions(
extensions or []
)
self._ducklake_config: DuckLakeAttachmentConfig | None = None
self._ducklake_attachment: DuckLakeAttachmentManager | None = None
if ducklake is not None:
attachment_config, manager = build_ducklake_attachment(ducklake)
ensure_extensions_with_plugins(base_extensions, attachment_config.plugins)
self._ducklake_config = attachment_config
self._ducklake_attachment = manager
self.extensions = base_extensions
# Auto-add hashfuncs extension if not present (needed for default XXHASH64)
# But we'll fall back to MD5 if hashfuncs is not available
extension_names: list[str] = []
for ext in self.extensions:
if isinstance(ext, str):
extension_names.append(ext)
elif isinstance(ext, ExtensionSpec):
extension_names.append(ext.name)
else:
# After _normalise_extensions, this should not happen
# But keep defensive check for type safety
raise TypeError(
f"Extension must be str or ExtensionSpec after normalization; got {type(ext)}"
)
if "hashfuncs" not in extension_names:
self.extensions.append("hashfuncs")
# Initialize Ibis store with DuckDB backend
super().__init__(
backend="duckdb",
connection_params=connection_params,
fallback_stores=fallback_stores,
**kwargs,
)
Attributes¶
metaxy.metadata_store.duckdb.DuckDBMetadataStore.sqlalchemy_url
property
¶
sqlalchemy_url: str
Get SQLAlchemy-compatible connection URL for DuckDB.
Constructs a DuckDB SQLAlchemy URL from the database parameter.
Returns:
-
str–SQLAlchemy-compatible URL string (e.g., "duckdb:///path/to/db.db")
metaxy.metadata_store.duckdb.DuckDBMetadataStore.ducklake_attachment
property
¶
DuckLake attachment manager (raises if not configured).
metaxy.metadata_store.duckdb.DuckDBMetadataStore.ducklake_attachment_config
property
¶
ducklake_attachment_config: DuckLakeAttachmentConfig
DuckLake attachment configuration (raises if not configured).
Functions¶
metaxy.metadata_store.duckdb.DuckDBMetadataStore.open
¶
open(mode: AccessMode = 'read') -> Iterator[Self]
Open DuckDB connection with specified access mode.
Parameters:
-
mode(AccessMode, default:'read') –Access mode (READ or WRITE). Defaults to READ. READ mode sets read_only=True for concurrent access.
Yields:
-
Self(Self) –The store instance with connection open
Source code in src/metaxy/metadata_store/duckdb.py
@contextmanager
def open(self, mode: AccessMode = "read") -> Iterator[Self]:
"""Open DuckDB connection with specified access mode.
Args:
mode: Access mode (READ or WRITE). Defaults to READ.
READ mode sets read_only=True for concurrent access.
Yields:
Self: The store instance with connection open
"""
# Setup: Configure connection params based on mode
if mode == "read":
self.connection_params["read_only"] = True
else:
# Remove read_only if present (switching to WRITE)
self.connection_params.pop("read_only", None)
# Call parent context manager to establish connection
with super().open(mode):
try:
# Configure DuckLake if needed (only on first entry)
if self._ducklake_attachment is not None and self._context_depth == 1:
duckdb_conn = self._duckdb_raw_connection()
self._ducklake_attachment.configure(duckdb_conn)
yield self
finally:
# Cleanup is handled by parent's finally block
pass
metaxy.metadata_store.duckdb.DuckDBMetadataStore.preview_ducklake_sql
¶
metaxy.metadata_store.duckdb.DuckDBMetadataStore.config_model
classmethod
¶
config_model() -> type[DuckDBMetadataStoreConfig]
Return the configuration model class for this store type.
Subclasses must override this to return their specific config class.
Returns:
-
type[MetadataStoreConfig]–The config class type (e.g., DuckDBMetadataStoreConfig)
Note
Subclasses override this with a more specific return type. Type checkers may show a warning about incompatible override, but this is intentional - each store returns its own config type.
metaxy.metadata_store.duckdb.ExtensionSpec
pydantic-model
¶
Bases: BaseModel
DuckDB extension specification accepted by DuckDBMetadataStore.
Supports additional keys for forward compatibility.
Show JSON schema:
{
"additionalProperties": true,
"description": "DuckDB extension specification accepted by DuckDBMetadataStore.\n\nSupports additional keys for forward compatibility.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"repository": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Repository"
}
},
"required": [
"name"
],
"title": "ExtensionSpec",
"type": "object"
}
Config:
extra:allow
Fields:
metaxy.metadata_store.duckdb.DuckLakeConfigInput
module-attribute
¶
DuckLakeConfigInput = DuckLakeAttachmentConfig | Mapping[str, Any]
metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig
pydantic-model
¶
Bases: BaseModel
Configuration payload used to attach DuckLake to a DuckDB connection.
Show JSON schema:
{
"additionalProperties": true,
"description": "Configuration payload used to attach DuckLake to a DuckDB connection.",
"properties": {
"metadata_backend": {
"additionalProperties": true,
"title": "Metadata Backend",
"type": "object"
},
"storage_backend": {
"additionalProperties": true,
"title": "Storage Backend",
"type": "object"
},
"alias": {
"default": "ducklake",
"title": "Alias",
"type": "string"
},
"plugins": {
"items": {
"type": "string"
},
"title": "Plugins",
"type": "array"
},
"attach_options": {
"additionalProperties": true,
"title": "Attach Options",
"type": "object"
}
},
"required": [
"metadata_backend",
"storage_backend"
],
"title": "DuckLakeAttachmentConfig",
"type": "object"
}
Config:
arbitrary_types_allowed:Trueextra:allow
Fields:
-
metadata_backend(DuckLakeBackend) -
storage_backend(DuckLakeBackend) -
alias(str) -
plugins(tuple[str, ...]) -
attach_options(dict[str, Any])
Validators:
-
_coerce_backends→metadata_backend,storage_backend -
_coerce_alias→alias -
_coerce_plugins→plugins -
_coerce_attach_options→attach_options
Functions¶
metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig.metadata_sql_parts
¶
Pre-computed metadata SQL components for DuckLake attachments.
metaxy.metadata_store._ducklake_support.DuckLakeAttachmentConfig.storage_sql_parts
¶
Pre-computed storage SQL components for DuckLake attachments.