Quickstart¶
Installation¶
Install Metaxy with deltalake --- an easy way to setup a MetadataStore locally:
Create a minimal Metaxy config¶
entrypoints = ["features.py"]
[stores.dev]
type = "metaxy.metadata_store.deltalake.DeltaMetadataStore"
config = { path = "/tmp/metaxy.delta" }
Define a root feature¶
Every Metaxy project must define at least one root feature. Such features do not have upstream dependencies and act as inputs to the feature graph.
import metaxy as mx
from pydantic import Field
class Video(
mx.BaseFeature,
spec=mx.FeatureSpec(
key="video",
id_columns=["video_id"],
fields=[
"audio",
"frames",
],
),
):
# define DB columns
video_id: str = Field(description="Unique identifier for the video")
path: str = Field(description="Path to the video file")
duration: float = Field(description="Duration of the video in seconds")
Create feature materialization script¶
Use MetadataStore.resolve_update to compute an increment for materialization:
import metaxy as mx
from .features import Video
# discover and load Metaxy features
cfg = mx.init_metaxy()
# instantiate the MetadataStore
store = cfg.get_store("dev")
# somehow prepare a DataFrame with incoming metadata
# this can be a Pandas, Polars, Ibis, or any other DataFrame supported by Narwhals
samples = ...
with store:
increment = store.resolve_update(Video, samples=samples)
3. Run user-defined computation over the increment¶
Metaxy is not involved in this step at all.
if (len(increment.added) + len(increment.changed)) > 0:
# run your computation, this can be done in a distributed manner
results = run_custom_pipeline(diff, ...)
4. Record metadata for processed samples¶
We have now successfully recorded the metadata for the computed samples! Processed samples will no longer be returned by MetadataStore.resolve_update during future pipeline runs.
No Write Time Uniqueness Checks!
Metaxy doesn't enforce deduplication or uniqueness checks at write time for performance reasons.
While MetadataStore.resolve_update is guaranteed to never return the same versioned sample twice, it's up to the user to ensure that samples are not written multiple times to the metadata store.
Configuring deduplication or uniqueness checks in the store (database) is a good idea.
For example, the SQLModel integration can inject a composite primary key on metaxy_data_version, metaxy_created_at and the user-defined ID columns.
However, Metaxy only uses the latest version (by metaxy_created_at) at read time.
Next Steps¶
Continue to next section to learn how to add more features and define feature dependencies.
Additional info¶
-
Learn more about feature definitions or versioning
-
Explore Metaxy integrations
-
Use Metaxy from the command line
-
Learn how to configure Metaxy
-
Get lost in our API Reference