Feature¶
BaseFeature is the most important class in Metaxy.
Features are defined by extending it.
metaxy.BaseFeature
pydantic-model
¶
Bases: BaseModel
Show JSON schema:
{
"properties": {
"metaxy_provenance_by_field": {
"additionalProperties": {
"type": "string"
},
"description": "Field-level provenance hashes (maps field names to hashes)",
"title": "Metaxy Provenance By Field",
"type": "object"
},
"metaxy_provenance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash of metaxy_provenance_by_field",
"title": "Metaxy Provenance"
},
"metaxy_feature_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash of the feature definition (dependencies + fields + code_versions)",
"title": "Metaxy Feature Version"
},
"metaxy_snapshot_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash of the entire feature graph snapshot",
"title": "Metaxy Snapshot Version"
},
"metaxy_data_version_by_field": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Field-level data version hashes (maps field names to version hashes)",
"title": "Metaxy Data Version By Field"
},
"metaxy_data_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash of metaxy_data_version_by_field",
"title": "Metaxy Data Version"
},
"metaxy_created_at": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Timestamp when the metadata row was created (UTC)",
"title": "Metaxy Created At"
},
"metaxy_materialization_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "External orchestration run ID (e.g., Dagster Run ID)",
"title": "Metaxy Materialization Id"
}
},
"title": "BaseFeature",
"type": "object"
}
Fields:
-
metaxy_provenance_by_field(dict[str, str]) -
metaxy_provenance(str | None) -
metaxy_feature_version(str | None) -
metaxy_snapshot_version(str | None) -
metaxy_data_version_by_field(dict[str, str] | None) -
metaxy_data_version(str | None) -
metaxy_created_at(AwareDatetime | None) -
metaxy_materialization_id(str | None)
Validators:
-
_validate_id_columns_exist
Attributes¶
metaxy.BaseFeature.metaxy_provenance_by_field
pydantic-field
¶
Field-level provenance hashes (maps field names to hashes)
metaxy.BaseFeature.metaxy_provenance
pydantic-field
¶
metaxy_provenance: str | None = None
Hash of metaxy_provenance_by_field
metaxy.BaseFeature.metaxy_feature_version
pydantic-field
¶
metaxy_feature_version: str | None = None
Hash of the feature definition (dependencies + fields + code_versions)
metaxy.BaseFeature.metaxy_snapshot_version
pydantic-field
¶
metaxy_snapshot_version: str | None = None
Hash of the entire feature graph snapshot
metaxy.BaseFeature.metaxy_data_version_by_field
pydantic-field
¶
Field-level data version hashes (maps field names to version hashes)
metaxy.BaseFeature.metaxy_data_version
pydantic-field
¶
metaxy_data_version: str | None = None
Hash of metaxy_data_version_by_field
metaxy.BaseFeature.metaxy_created_at
pydantic-field
¶
Timestamp when the metadata row was created (UTC)
metaxy.BaseFeature.metaxy_materialization_id
pydantic-field
¶
metaxy_materialization_id: str | None = None
External orchestration run ID (e.g., Dagster Run ID)
Functions¶
metaxy.BaseFeature.table_name
classmethod
¶
table_name() -> str
Get SQL-like table name for this feature.
Converts feature key to SQL-compatible table name by joining parts with double underscores, consistent with IbisMetadataStore.
Returns:
-
str–Table name string (e.g., "my_namespace__my_feature")
Example
Source code in src/metaxy/models/feature.py
@classmethod
def table_name(cls) -> str:
"""Get SQL-like table name for this feature.
Converts feature key to SQL-compatible table name by joining
parts with double underscores, consistent with IbisMetadataStore.
Returns:
Table name string (e.g., "my_namespace__my_feature")
Example:
```py
class VideoFeature(Feature, spec=FeatureSpec(
key=FeatureKey(["video", "processing"]),
...
)):
pass
VideoFeature.table_name()
# 'video__processing'
```
"""
return cls.spec().table_name()
metaxy.BaseFeature.feature_version
classmethod
¶
feature_version() -> str
Get hash of feature specification.
Returns a hash representing the feature's complete configuration: - Feature key - Field definitions and code versions - Dependencies (feature-level and field-level)
This hash changes when you modify: - Field code versions - Dependencies - Field definitions
Used to distinguish current vs historical metafield provenance hashes. Stored in the 'metaxy_feature_version' column of metadata DataFrames.
Returns:
-
str–SHA256 hex digest (like git short hashes)
Example
Source code in src/metaxy/models/feature.py
@classmethod
def feature_version(cls) -> str:
"""Get hash of feature specification.
Returns a hash representing the feature's complete configuration:
- Feature key
- Field definitions and code versions
- Dependencies (feature-level and field-level)
This hash changes when you modify:
- Field code versions
- Dependencies
- Field definitions
Used to distinguish current vs historical metafield provenance hashes.
Stored in the 'metaxy_feature_version' column of metadata DataFrames.
Returns:
SHA256 hex digest (like git short hashes)
Example:
```py
class MyFeature(Feature, spec=FeatureSpec(
key=FeatureKey(["my", "feature"]),
fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
)):
pass
MyFeature.feature_version()
# 'a3f8b2c1...'
```
"""
return cls.graph.get_feature_version(cls.spec().key)
metaxy.BaseFeature.feature_spec_version
classmethod
¶
feature_spec_version() -> str
Get hash of the complete feature specification.
Returns a hash representing ALL specification properties including: - Feature key - Dependencies - Fields - Code versions - Any future metadata, tags, or other properties
Unlike feature_version which only hashes computational properties (for migration triggering), feature_spec_version captures the entire specification for complete reproducibility and audit purposes.
Stored in the 'metaxy_feature_spec_version' column of metadata DataFrames.
Returns:
-
str–SHA256 hex digest of the complete specification
Example
Source code in src/metaxy/models/feature.py
@classmethod
def feature_spec_version(cls) -> str:
"""Get hash of the complete feature specification.
Returns a hash representing ALL specification properties including:
- Feature key
- Dependencies
- Fields
- Code versions
- Any future metadata, tags, or other properties
Unlike feature_version which only hashes computational properties
(for migration triggering), feature_spec_version captures the entire specification
for complete reproducibility and audit purposes.
Stored in the 'metaxy_feature_spec_version' column of metadata DataFrames.
Returns:
SHA256 hex digest of the complete specification
Example:
```py
class MyFeature(Feature, spec=FeatureSpec(
key=FeatureKey(["my", "feature"]),
fields=[FieldSpec(key=FieldKey(["default"]), code_version="1")],
)):
pass
MyFeature.feature_spec_version()
# 'def456...' # Different from feature_version
```
"""
return cls.spec().feature_spec_version
metaxy.BaseFeature.full_definition_version
classmethod
¶
full_definition_version() -> str
Get hash of the complete feature definition including Pydantic schema.
This method computes a hash of the entire feature class definition, including: - Pydantic model schema - Project name
Used in the metaxy_full_definition_version column of system tables.
Returns:
-
str–SHA256 hex digest of the complete definition
Source code in src/metaxy/models/feature.py
@classmethod
def full_definition_version(cls) -> str:
"""Get hash of the complete feature definition including Pydantic schema.
This method computes a hash of the entire feature class definition, including:
- Pydantic model schema
- Project name
Used in the `metaxy_full_definition_version` column of system tables.
Returns:
SHA256 hex digest of the complete definition
"""
import json
hasher = hashlib.sha256()
# Hash the Pydantic schema (includes field types, descriptions, validators, etc.)
schema = cls.model_json_schema()
schema_json = json.dumps(schema, sort_keys=True)
hasher.update(schema_json.encode())
# Hash the feature specification
hasher.update(cls.feature_spec_version().encode())
# Hash the project name
hasher.update(cls.project.encode())
return truncate_hash(hasher.hexdigest())
metaxy.BaseFeature.provenance_by_field
classmethod
¶
Get the code-level field provenance for this feature.
This returns a static hash based on code versions and dependencies, not sample-level field provenance computed from upstream data.
Returns:
Source code in src/metaxy/models/feature.py
@classmethod
def provenance_by_field(cls) -> dict[str, str]:
"""Get the code-level field provenance for this feature.
This returns a static hash based on code versions and dependencies,
not sample-level field provenance computed from upstream data.
Returns:
Dictionary mapping field keys to their provenance hashes.
"""
return cls.graph.get_feature_version_by_field(cls.spec().key)
metaxy.BaseFeature.load_input
classmethod
¶
load_input(joiner: Any, upstream_refs: dict[str, LazyFrame[Any]]) -> tuple[LazyFrame[Any], dict[str, str]]
Join upstream feature metadata.
Override for custom join logic (1:many, different keys, filtering, etc.).
Parameters:
-
joiner(Any) –UpstreamJoiner from MetadataStore
-
upstream_refs(dict[str, LazyFrame[Any]]) –Upstream feature metadata references (lazy where possible)
Returns:
-
LazyFrame[Any]–(joined_upstream, upstream_column_mapping)
-
dict[str, str]–- joined_upstream: All upstream data joined together
-
tuple[LazyFrame[Any], dict[str, str]]–- upstream_column_mapping: Maps upstream_key -> column name
Source code in src/metaxy/models/feature.py
@classmethod
def load_input(
cls,
joiner: Any,
upstream_refs: dict[str, "nw.LazyFrame[Any]"],
) -> tuple["nw.LazyFrame[Any]", dict[str, str]]:
"""Join upstream feature metadata.
Override for custom join logic (1:many, different keys, filtering, etc.).
Args:
joiner: UpstreamJoiner from MetadataStore
upstream_refs: Upstream feature metadata references (lazy where possible)
Returns:
(joined_upstream, upstream_column_mapping)
- joined_upstream: All upstream data joined together
- upstream_column_mapping: Maps upstream_key -> column name
"""
from metaxy.models.feature_spec import FeatureDep
# Extract columns and renames from deps
upstream_columns: dict[str, tuple[str, ...] | None] = {}
upstream_renames: dict[str, dict[str, str] | None] = {}
deps = cls.spec().deps
if deps:
for dep in deps:
if isinstance(dep, FeatureDep):
dep_key_str = dep.feature.to_string()
upstream_columns[dep_key_str] = dep.columns
upstream_renames[dep_key_str] = dep.rename
return joiner.join_upstream(
upstream_refs=upstream_refs,
feature_spec=cls.spec(),
feature_plan=cls.graph.get_feature_plan(cls.spec().key),
upstream_columns=upstream_columns,
upstream_renames=upstream_renames,
)
metaxy.BaseFeature.resolve_data_version_diff
classmethod
¶
resolve_data_version_diff(diff_resolver: Any, target_provenance: LazyFrame[Any], current_metadata: LazyFrame[Any] | None, *, lazy: bool = False) -> Increment | LazyIncrement
Resolve differences between target and current field provenance.
Override for custom diff logic (ignore certain fields, custom rules, etc.).
Parameters:
-
diff_resolver(Any) –MetadataDiffResolver from MetadataStore
-
target_provenance(LazyFrame[Any]) –Calculated target field provenance (Narwhals LazyFrame)
-
current_metadata(LazyFrame[Any] | None) –Current metadata for this feature (Narwhals LazyFrame, or None). Should be pre-filtered by feature_version at the store level.
-
lazy(bool, default:False) –If True, return LazyIncrement. If False, return Increment.
Returns:
-
Increment | LazyIncrement–Increment (eager) or LazyIncrement (lazy) with added, changed, removed
Example (default):
Example (ignore certain field changes):
class MyFeature(Feature, spec=...):
@classmethod
def resolve_data_version_diff(cls, diff_resolver, target_provenance, current_metadata, **kwargs):
# Get standard diff
result = diff_resolver.find_changes(target_provenance, current_metadata, cls.spec().id_columns)
# Custom: Only consider 'frames' field changes, ignore 'audio'
# Users can filter/modify the increment here
return result # Return modified Increment
Source code in src/metaxy/models/feature.py
@classmethod
def resolve_data_version_diff(
cls,
diff_resolver: Any,
target_provenance: "nw.LazyFrame[Any]",
current_metadata: "nw.LazyFrame[Any] | None",
*,
lazy: bool = False,
) -> "Increment | LazyIncrement":
"""Resolve differences between target and current field provenance.
Override for custom diff logic (ignore certain fields, custom rules, etc.).
Args:
diff_resolver: MetadataDiffResolver from MetadataStore
target_provenance: Calculated target field provenance (Narwhals LazyFrame)
current_metadata: Current metadata for this feature (Narwhals LazyFrame, or None).
Should be pre-filtered by feature_version at the store level.
lazy: If True, return LazyIncrement. If False, return Increment.
Returns:
Increment (eager) or LazyIncrement (lazy) with added, changed, removed
Example (default):
```py
class MyFeature(Feature, spec=...):
pass # Uses diff resolver's default implementation
```
Example (ignore certain field changes):
```py
class MyFeature(Feature, spec=...):
@classmethod
def resolve_data_version_diff(cls, diff_resolver, target_provenance, current_metadata, **kwargs):
# Get standard diff
result = diff_resolver.find_changes(target_provenance, current_metadata, cls.spec().id_columns)
# Custom: Only consider 'frames' field changes, ignore 'audio'
# Users can filter/modify the increment here
return result # Return modified Increment
```
"""
# Diff resolver always returns LazyIncrement - materialize if needed
lazy_result = diff_resolver.find_changes(
target_provenance=target_provenance,
current_metadata=current_metadata,
id_columns=cls.spec().id_columns, # Pass ID columns from feature spec
)
# Materialize to Increment if lazy=False
if not lazy:
from metaxy.versioning.types import Increment
return Increment(
added=lazy_result.added.collect(),
changed=lazy_result.changed.collect(),
removed=lazy_result.removed.collect(),
)
return lazy_result
Code Version Access
Retrieve a feature's code version from its spec: MyFeature.spec().code_version.
metaxy.get_feature_by_key
¶
get_feature_by_key(key: CoercibleToFeatureKey) -> type[BaseFeature]
Get a feature class by its key from the active graph.
Convenience function that retrieves Metaxy feature class from the currently active feature graph. Can be useful when receiving a feature key from storage or across process boundaries.
Parameters:
-
key(CoercibleToFeatureKey) –Feature key to look up. Accepts types that can be converted into a feature key..
Returns:
-
type[BaseFeature]–Feature class
Raises:
-
KeyError–If no feature with the given key is registered
Example
Source code in src/metaxy/models/feature.py
def get_feature_by_key(key: CoercibleToFeatureKey) -> type["BaseFeature"]:
"""Get a feature class by its key from the active graph.
Convenience function that retrieves Metaxy feature class from the currently active [feature graph][metaxy.FeatureGraph]. Can be useful when receiving a feature key from storage or across process boundaries.
Args:
key: Feature key to look up. Accepts types that can be converted into a feature key..
Returns:
Feature class
Raises:
KeyError: If no feature with the given key is registered
Example:
```py
from metaxy import get_feature_by_key, FeatureKey
parent_key = FeatureKey(["examples", "parent"])
ParentFeature = get_feature_by_key(parent_key)
# Or use string notation
ParentFeature = get_feature_by_key("examples/parent")
```
"""
graph = FeatureGraph.get_active()
return graph.get_feature_by_key(key)