Skip to content

Metadata Stores

Metaxy abstracts interactions with metadata stored in external systems such as databases, files, or object stores, through a unified interface: MetadataStore.

Metadata stores expose methods for reading, writing, deleting metadata, and the most important one: resolve_update for receiving a metadata increment.

It looks more or less like this:

Example

with store:
    df = store.read_metadata("/my/feature/key")

with store.open("write"):
    store.write_metadata("/another/key", df)

with store:
    increment = store.resolve_update("and/another/key")

Metadata stores implement an append-only storage model and rely on Metaxy system columns.

Deletes are not required during normal operations, but they are still supported since users would want to eventually delete stale metadata and data.

Note

Metaxy does not mutate metadata in-place, unless explicitly requested. (1)

  1. 🔥 for performance reasons

Forged About ACID

Metadata reads/writes are not guaranteed to be ACID: Metaxy is designed to interact with analytical databases which lack ACID guarantees by definition and design. (1)

  1. for - you've guessed it right - 🔥 performance reasons

However, Metaxy never retrieves the same sample version twice, and performs read-time deduplication (1) by the combination of the feature version, ID columns, and metaxy_created_at.

  1. also known as merge-on-read

When resolving incremental updates for a feature, Metaxy attempts to perform all computations such as sample version calculations within the metadata store. This includes joining upstream features, hashing their versions, and filtering out samples that have already been processed - everything is pushed into the DB.

When can local computations happen instead

Metaxy's versioning engine runs locally instead:

Info

The local versioning engine is implemented with polars-hash and benefits from parallelism, predicate pushdown, and other features of Polars.

  1. If the metadata store does not have a compute engine at all: for example, DeltaLake is just a storage format.

  2. If the user explicitly requested to keep the computations local by setting versioning_engine="polars" when instantiating the metadata store.

  3. If a fallback store had to be used to retrieve one of the parent features missing in the current store.

All 3 cases cannot be accidental and require preconfigured settings or explicit user action. In the third case, Metaxy will also issue a warning just in case the user has accidentally configured a fallback store in production.

Metadata Store Implementations

Metaxy provides ready MetadataStore implementations for popular databases and storage systems.