Skip to content

Recompute

View Example Source on GitHub

This example demonstrates how Metaxy automatically detects when upstream features change and recomputes downstream dependencies.

Warning

This examples is a WIP

How It Works

When a feature's code_version changes, Metaxy:

  1. Detects the change in the feature definition
  2. Identifies all downstream features that depend on it
  3. Automatically recomputes those features with the new upstream data

Plan

1. Setup upstream data

Create initial upstream data for the pipeline

  • Generate raw data samples
    python -m example_recompute.setup_data
    

2. Initial pipeline run

First execution with version 1 features

  • Run pipeline with initial feature definitions
    python -m example_recompute.pipeline
    

3. Idempotent rerun

Second execution should detect no changes

  • Rerun pipeline without any code changes
    python -m example_recompute.pipeline
    

4. Code evolution

Apply patch to change parent feature code_version from 1 to 2, triggering child recomputation

  • Change parent embedding code_version from 1 to 2:
    patch -p1 -i patches/01_update_parent_algorithm.patch
    
  • Run pipeline with updated parent feature
    python -m example_recompute.pipeline
    

Feature Definitions

Initial Code

The parent feature starts with code_version="1":

src/example_recompute/features.py
src/example_recompute/features.py
"""Feature definitions for recompute example."""

from metaxy import (
    Feature,
    FeatureDep,
    FeatureKey,
    FeatureSpec,
    FieldDep,
    FieldKey,
    FieldSpec,
)


class ParentFeature(
    Feature,
    spec=FeatureSpec(
        key=FeatureKey(["examples", "parent"]),
        fields=[
            FieldSpec(
                key=FieldKey(["embeddings"]),
                code_version="1",
            ),
        ],
        id_columns=("sample_uid",),
    ),
):
    """Parent feature that generates embeddings from raw data."""

    pass


class ChildFeature(
    Feature,
    spec=FeatureSpec(
        key=FeatureKey(["examples", "child"]),
        deps=[FeatureDep(feature=ParentFeature.spec().key)],
        fields=[
            FieldSpec(
                key=FieldKey(["predictions"]),
                code_version="1",
                deps=[
                    FieldDep(
                        feature=ParentFeature.spec().key,
                        fields=[FieldKey(["embeddings"])],
                    )
                ],
            ),
        ],
        id_columns=("sample_uid",),
    ),
):
    """Child feature that uses parent embeddings to generate predictions."""

    pass

The Change

Let's change the code_version:

patches/01_update_parent_algorithm.patch
patches/01_update_parent_algorithm.patch
--- a/src/example_recompute/features.py
+++ b/src/example_recompute/features.py
@@ -20,7 +20,7 @@ class ParentFeature(
         fields=[
             FieldSpec(
                 key=FieldKey(["embeddings"]),
-                code_version="1",
+                code_version="2",
             ),
         ],
     ),

Updated Code

Error

Failed to process metaxy-example file

Failed to apply patch patches/01_update_parent_algorithm.patch: error: patch failed: src/example_recompute/features.py:20
error: src/example_recompute/features.py: patch does not apply

Key Takeaway

Metaxy ensures all features remain consistent with their dependencies. When ParentFeature.code_version changes from "1" to "2", ChildFeature automatically recomputes—no manual tracking required.