Skip to content

Feature

BaseFeature is the most important class in Metaxy. Features are defined by extending it.

metaxy.BaseFeature pydantic-model

Bases: BaseModel

Show JSON schema:
{
  "properties": {
    "metaxy_provenance_by_field": {
      "additionalProperties": {
        "type": "string"
      },
      "description": "Field-level provenance hashes (maps field names to hashes)",
      "title": "Metaxy Provenance By Field",
      "type": "object"
    },
    "metaxy_provenance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash of metaxy_provenance_by_field",
      "title": "Metaxy Provenance"
    },
    "metaxy_feature_version": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash of the feature definition (dependencies + fields + code_versions)",
      "title": "Metaxy Feature Version"
    },
    "metaxy_snapshot_version": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash of the entire feature graph snapshot",
      "title": "Metaxy Snapshot Version"
    },
    "metaxy_data_version_by_field": {
      "anyOf": [
        {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Field-level data version hashes (maps field names to version hashes)",
      "title": "Metaxy Data Version By Field"
    },
    "metaxy_data_version": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash of metaxy_data_version_by_field",
      "title": "Metaxy Data Version"
    },
    "metaxy_created_at": {
      "anyOf": [
        {
          "format": "date-time",
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Timestamp when the metadata row was created (UTC)",
      "title": "Metaxy Created At"
    },
    "metaxy_materialization_id": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "External orchestration run ID (e.g., Dagster Run ID)",
      "title": "Metaxy Materialization Id"
    }
  },
  "title": "BaseFeature",
  "type": "object"
}

Fields:

Validators:

  • _validate_id_columns_exist

Attributes

metaxy.BaseFeature.metaxy_provenance_by_field pydantic-field

metaxy_provenance_by_field: dict[str, str]

Field-level provenance hashes (maps field names to hashes)

metaxy.BaseFeature.metaxy_provenance pydantic-field

metaxy_provenance: str | None = None

Hash of metaxy_provenance_by_field

metaxy.BaseFeature.metaxy_feature_version pydantic-field

metaxy_feature_version: str | None = None

Hash of the feature definition (dependencies + fields + code_versions)

metaxy.BaseFeature.metaxy_snapshot_version pydantic-field

metaxy_snapshot_version: str | None = None

Hash of the entire feature graph snapshot

metaxy.BaseFeature.metaxy_data_version_by_field pydantic-field

metaxy_data_version_by_field: dict[str, str] | None = None

Field-level data version hashes (maps field names to version hashes)

metaxy.BaseFeature.metaxy_data_version pydantic-field

metaxy_data_version: str | None = None

Hash of metaxy_data_version_by_field

metaxy.BaseFeature.metaxy_created_at pydantic-field

metaxy_created_at: AwareDatetime | None = None

Timestamp when the metadata row was created (UTC)

metaxy.BaseFeature.metaxy_materialization_id pydantic-field

metaxy_materialization_id: str | None = None

External orchestration run ID (e.g., Dagster Run ID)

Functions

metaxy.BaseFeature.table_name classmethod

table_name() -> str

Get SQL-like table name for this feature.

Converts feature key to SQL-compatible table name by joining parts with double underscores, consistent with IbisMetadataStore.

Returns:

  • str

    Table name string (e.g., "my_namespace__my_feature")

Example
class VideoFeature(Feature, spec=FeatureSpec(
    key=FeatureKey(["video", "processing"]),
    ...
)):
    pass
VideoFeature.table_name()
# 'video__processing'
Source code in src/metaxy/models/feature.py
@classmethod
def table_name(cls) -> str:
    """Get SQL-like table name for this feature.

    Converts feature key to SQL-compatible table name by joining
    parts with double underscores, consistent with IbisMetadataStore.

    Returns:
        Table name string (e.g., "my_namespace__my_feature")

    Example:
        ```py
        class VideoFeature(Feature, spec=FeatureSpec(
            key=FeatureKey(["video", "processing"]),
            ...
        )):
            pass
        VideoFeature.table_name()
        # 'video__processing'
        ```
    """
    return cls.spec().table_name()

metaxy.BaseFeature.feature_version classmethod

feature_version() -> str

Get hash of feature specification.

Returns a hash representing the feature's complete configuration: - Feature key - Field definitions and code versions - Dependencies (feature-level and field-level)

This hash changes when you modify: - Field code versions - Dependencies - Field definitions

Used to distinguish current vs historical metafield provenance hashes. Stored in the 'metaxy_feature_version' column of metadata DataFrames.

Returns:

  • str

    SHA256 hex digest (like git short hashes)

Example
class MyFeature(Feature, spec=FeatureSpec(
    key=FeatureKey(["my", "feature"]),
    fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
)):
    pass
MyFeature.feature_version()
# 'a3f8b2c1...'
Source code in src/metaxy/models/feature.py
@classmethod
def feature_version(cls) -> str:
    """Get hash of feature specification.

    Returns a hash representing the feature's complete configuration:
    - Feature key
    - Field definitions and code versions
    - Dependencies (feature-level and field-level)

    This hash changes when you modify:
    - Field code versions
    - Dependencies
    - Field definitions

    Used to distinguish current vs historical metafield provenance hashes.
    Stored in the 'metaxy_feature_version' column of metadata DataFrames.

    Returns:
        SHA256 hex digest (like git short hashes)

    Example:
        ```py
        class MyFeature(Feature, spec=FeatureSpec(
            key=FeatureKey(["my", "feature"]),
            fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
        )):
            pass
        MyFeature.feature_version()
        # 'a3f8b2c1...'
        ```
    """
    return cls.graph.get_feature_version(cls.spec().key)

metaxy.BaseFeature.feature_spec_version classmethod

feature_spec_version() -> str

Get hash of the complete feature specification.

Returns a hash representing ALL specification properties including: - Feature key - Dependencies - Fields - Code versions - Any future metadata, tags, or other properties

Unlike feature_version which only hashes computational properties (for migration triggering), feature_spec_version captures the entire specification for complete reproducibility and audit purposes.

Stored in the 'metaxy_feature_spec_version' column of metadata DataFrames.

Returns:

  • str

    SHA256 hex digest of the complete specification

Example
class MyFeature(Feature, spec=FeatureSpec(
    key=FeatureKey(["my", "feature"]),
    fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
)):
    pass
MyFeature.feature_spec_version()
# 'def456...'  # Different from feature_version
Source code in src/metaxy/models/feature.py
@classmethod
def feature_spec_version(cls) -> str:
    """Get hash of the complete feature specification.

    Returns a hash representing ALL specification properties including:
    - Feature key
    - Dependencies
    - Fields
    - Code versions
    - Any future metadata, tags, or other properties

    Unlike feature_version which only hashes computational properties
    (for migration triggering), feature_spec_version captures the entire specification
    for complete reproducibility and audit purposes.

    Stored in the 'metaxy_feature_spec_version' column of metadata DataFrames.

    Returns:
        SHA256 hex digest of the complete specification

    Example:
        ```py
        class MyFeature(Feature, spec=FeatureSpec(
            key=FeatureKey(["my", "feature"]),
            fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
        )):
            pass
        MyFeature.feature_spec_version()
        # 'def456...'  # Different from feature_version
        ```
    """
    return cls.spec().feature_spec_version

metaxy.BaseFeature.full_definition_version classmethod

full_definition_version() -> str

Get hash of the complete feature definition including Pydantic schema.

This method computes a hash of the entire feature class definition, including: - Pydantic model schema - Project name

Used in the metaxy_full_definition_version column of system tables.

Returns:

  • str

    SHA256 hex digest of the complete definition

Source code in src/metaxy/models/feature.py
@classmethod
def full_definition_version(cls) -> str:
    """Get hash of the complete feature definition including Pydantic schema.

    This method computes a hash of the entire feature class definition, including:
    - Pydantic model schema
    - Project name

    Used in the `metaxy_full_definition_version` column of system tables.

    Returns:
        SHA256 hex digest of the complete definition
    """
    import json

    hasher = hashlib.sha256()

    # Hash the Pydantic schema (includes field types, descriptions, validators, etc.)
    schema = cls.model_json_schema()
    schema_json = json.dumps(schema, sort_keys=True)
    hasher.update(schema_json.encode())

    # Hash the feature specification
    hasher.update(cls.feature_spec_version().encode())

    # Hash the project name
    hasher.update(cls.project.encode())

    return truncate_hash(hasher.hexdigest())

metaxy.BaseFeature.provenance_by_field classmethod

provenance_by_field() -> dict[str, str]

Get the code-level field provenance for this feature.

This returns a static hash based on code versions and dependencies, not sample-level field provenance computed from upstream data.

Returns:

  • dict[str, str]

    Dictionary mapping field keys to their provenance hashes.

Source code in src/metaxy/models/feature.py
@classmethod
def provenance_by_field(cls) -> dict[str, str]:
    """Get the code-level field provenance for this feature.

    This returns a static hash based on code versions and dependencies,
    not sample-level field provenance computed from upstream data.

    Returns:
        Dictionary mapping field keys to their provenance hashes.
    """
    return cls.graph.get_feature_version_by_field(cls.spec().key)

metaxy.BaseFeature.load_input classmethod

load_input(joiner: Any, upstream_refs: dict[str, LazyFrame[Any]]) -> tuple[LazyFrame[Any], dict[str, str]]

Join upstream feature metadata.

Override for custom join logic (1:many, different keys, filtering, etc.).

Parameters:

  • joiner (Any) –

    UpstreamJoiner from MetadataStore

  • upstream_refs (dict[str, LazyFrame[Any]]) –

    Upstream feature metadata references (lazy where possible)

Returns:

Source code in src/metaxy/models/feature.py
@classmethod
def load_input(
    cls,
    joiner: Any,
    upstream_refs: dict[str, "nw.LazyFrame[Any]"],
) -> tuple["nw.LazyFrame[Any]", dict[str, str]]:
    """Join upstream feature metadata.

    Override for custom join logic (1:many, different keys, filtering, etc.).

    Args:
        joiner: UpstreamJoiner from MetadataStore
        upstream_refs: Upstream feature metadata references (lazy where possible)

    Returns:
        (joined_upstream, upstream_column_mapping)
        - joined_upstream: All upstream data joined together
        - upstream_column_mapping: Maps upstream_key -> column name
    """
    from metaxy.models.feature_spec import FeatureDep

    # Extract columns and renames from deps
    upstream_columns: dict[str, tuple[str, ...] | None] = {}
    upstream_renames: dict[str, dict[str, str] | None] = {}

    deps = cls.spec().deps
    if deps:
        for dep in deps:
            if isinstance(dep, FeatureDep):
                dep_key_str = dep.feature.to_string()
                upstream_columns[dep_key_str] = dep.columns
                upstream_renames[dep_key_str] = dep.rename

    return joiner.join_upstream(
        upstream_refs=upstream_refs,
        feature_spec=cls.spec(),
        feature_plan=cls.graph.get_feature_plan(cls.spec().key),
        upstream_columns=upstream_columns,
        upstream_renames=upstream_renames,
    )

metaxy.BaseFeature.resolve_data_version_diff classmethod

resolve_data_version_diff(diff_resolver: Any, target_provenance: LazyFrame[Any], current_metadata: LazyFrame[Any] | None, *, lazy: bool = False) -> Increment | LazyIncrement

Resolve differences between target and current field provenance.

Override for custom diff logic (ignore certain fields, custom rules, etc.).

Parameters:

  • diff_resolver (Any) –

    MetadataDiffResolver from MetadataStore

  • target_provenance (LazyFrame[Any]) –

    Calculated target field provenance (Narwhals LazyFrame)

  • current_metadata (LazyFrame[Any] | None) –

    Current metadata for this feature (Narwhals LazyFrame, or None). Should be pre-filtered by feature_version at the store level.

  • lazy (bool, default: False ) –

    If True, return LazyIncrement. If False, return Increment.

Returns:

Example (default):

class MyFeature(Feature, spec=...):
    pass  # Uses diff resolver's default implementation

Example (ignore certain field changes):

class MyFeature(Feature, spec=...):
    @classmethod
    def resolve_data_version_diff(cls, diff_resolver, target_provenance, current_metadata, **kwargs):
        # Get standard diff
        result = diff_resolver.find_changes(target_provenance, current_metadata, cls.spec().id_columns)

        # Custom: Only consider 'frames' field changes, ignore 'audio'
        # Users can filter/modify the increment here

        return result  # Return modified Increment

Source code in src/metaxy/models/feature.py
@classmethod
def resolve_data_version_diff(
    cls,
    diff_resolver: Any,
    target_provenance: "nw.LazyFrame[Any]",
    current_metadata: "nw.LazyFrame[Any] | None",
    *,
    lazy: bool = False,
) -> "Increment | LazyIncrement":
    """Resolve differences between target and current field provenance.

    Override for custom diff logic (ignore certain fields, custom rules, etc.).

    Args:
        diff_resolver: MetadataDiffResolver from MetadataStore
        target_provenance: Calculated target field provenance (Narwhals LazyFrame)
        current_metadata: Current metadata for this feature (Narwhals LazyFrame, or None).
            Should be pre-filtered by feature_version at the store level.
        lazy: If True, return LazyIncrement. If False, return Increment.

    Returns:
        Increment (eager) or LazyIncrement (lazy) with added, changed, removed

    Example (default):
        ```py
        class MyFeature(Feature, spec=...):
            pass  # Uses diff resolver's default implementation
        ```

    Example (ignore certain field changes):
        ```py
        class MyFeature(Feature, spec=...):
            @classmethod
            def resolve_data_version_diff(cls, diff_resolver, target_provenance, current_metadata, **kwargs):
                # Get standard diff
                result = diff_resolver.find_changes(target_provenance, current_metadata, cls.spec().id_columns)

                # Custom: Only consider 'frames' field changes, ignore 'audio'
                # Users can filter/modify the increment here

                return result  # Return modified Increment
        ```
    """
    # Diff resolver always returns LazyIncrement - materialize if needed
    lazy_result = diff_resolver.find_changes(
        target_provenance=target_provenance,
        current_metadata=current_metadata,
        id_columns=cls.spec().id_columns,  # Pass ID columns from feature spec
    )

    # Materialize to Increment if lazy=False
    if not lazy:
        from metaxy.versioning.types import Increment

        return Increment(
            added=lazy_result.added.collect(),
            changed=lazy_result.changed.collect(),
            removed=lazy_result.removed.collect(),
        )

    return lazy_result

Code Version Access

Retrieve a feature's code version from its spec: MyFeature.spec().code_version.

metaxy.get_feature_by_key

get_feature_by_key(key: CoercibleToFeatureKey) -> type[BaseFeature]

Get a feature class by its key from the active graph.

Convenience function that retrieves Metaxy feature class from the currently active feature graph. Can be useful when receiving a feature key from storage or across process boundaries.

Parameters:

  • key (CoercibleToFeatureKey) –

    Feature key to look up. Accepts types that can be converted into a feature key..

Returns:

Raises:

  • KeyError

    If no feature with the given key is registered

Example
from metaxy import get_feature_by_key, FeatureKey
parent_key = FeatureKey(["examples", "parent"])
ParentFeature = get_feature_by_key(parent_key)

# Or use string notation
ParentFeature = get_feature_by_key("examples/parent")
Source code in src/metaxy/models/feature.py
def get_feature_by_key(key: CoercibleToFeatureKey) -> type["BaseFeature"]:
    """Get a feature class by its key from the active graph.

    Convenience function that retrieves Metaxy feature class from the currently active [feature graph][metaxy.FeatureGraph]. Can be useful when receiving a feature key from storage or across process boundaries.

    Args:
        key: Feature key to look up. Accepts types that can be converted into a feature key..

    Returns:
        Feature class

    Raises:
        KeyError: If no feature with the given key is registered

    Example:
        ```py
        from metaxy import get_feature_by_key, FeatureKey
        parent_key = FeatureKey(["examples", "parent"])
        ParentFeature = get_feature_by_key(parent_key)

        # Or use string notation
        ParentFeature = get_feature_by_key("examples/parent")
        ```
    """
    graph = FeatureGraph.get_active()
    return graph.get_feature_by_key(key)