Skip to content

Types

A few types used in Metaxy here and there.

LazyIncrement

Bases: NamedTuple

Result of resolving an incremental update with lazy Narwhals LazyFrames.

Attributes:

  • added (LazyFrame[Any]) –

    New samples that appear upstream and haven't been processed yet.

    Columns: [*user_defined_columns, "metaxy_provenance_by_field"]

  • changed (LazyFrame[Any]) –

    Samples with new field provenance that should be re-processed.

    Columns: [*user_defined_columns, "metaxy_provenance_by_field"]

  • removed (LazyFrame[Any]) –

    Samples that have been previously processed but have been removed from upstream since that.

    Columns: [*id_columns, "metaxy_provenance_by_field"]

Note

added and changed contain all the user-defined columns, but removed only contains the ID columns.

Functions

collect

collect() -> Increment

Materialize all lazy frames to create a Increment.

Returns:

  • Increment

    Increment with all frames materialized to eager DataFrames.

Source code in src/metaxy/data_versioning/diff/base.py
def collect(self) -> "Increment":
    """Materialize all lazy frames to create a Increment.

    Returns:
        Increment with all frames materialized to eager DataFrames.
    """
    return Increment(
        added=self.added.collect(),
        changed=self.changed.collect(),
        removed=self.removed.collect(),
    )

Increment

Bases: NamedTuple

Result of resolving an incremental update with eager Narwhals DataFrames.

Contains materialized Narwhals DataFrames.

Users can convert to their preferred format: - Polars: result.added.to_native()

Attributes:

  • added (DataFrame[Any]) –

    New samples that appear upstream and haven't been processed yet.

    Columns: [*user_defined_columns, "metaxy_provenance_by_field"]

  • changed (DataFrame[Any]) –

    Samples with new field provenance that should be re-processed.

    Columns: [*user_defined_columns, "metaxy_provenance_by_field"]

  • removed (DataFrame[Any]) –

    Samples that have been previously processed but have been removed from upstream since that.

    Columns: [*id_columns, "metaxy_provenance_by_field"]

Note

added and changed contain all the user-defined columns, but removed only contains the ID columns.

HashAlgorithm

Bases: Enum

Supported hash algorithms for field provenance calculation.

These algorithms are chosen for: - Speed (non-cryptographic hashes preferred) - Cross-database availability - Good collision resistance for field provenance calculation

Attributes

XXHASH64 class-attribute instance-attribute

XXHASH64 = 'xxhash64'

XXHASH32 class-attribute instance-attribute

XXHASH32 = 'xxhash32'

WYHASH class-attribute instance-attribute

WYHASH = 'wyhash'

SHA256 class-attribute instance-attribute

SHA256 = 'sha256'

MD5 class-attribute instance-attribute

MD5 = 'md5'

FARMHASH class-attribute instance-attribute

FARMHASH = 'farmhash'

SnapshotPushResult

Bases: NamedTuple

Result of recording a feature graph snapshot.

Attributes:

  • snapshot_version (str) –

    The deterministic hash of the graph snapshot

  • already_recorded (bool) –

    True if computational changes were already recorded

  • metadata_changed (bool) –

    True if metadata-only changes were detected

  • features_with_spec_changes (list[str]) –

    List of feature keys with spec version changes

IDColumns module-attribute

IDColumns: TypeAlias = Sequence[str]