Feature Spec¶

Feature specs act as source of truth for all metadata related to features: their dependencies, fields, code versions, and so on.

metaxy.FeatureSpec `pydantic-model` ¶

FeatureSpec(*, key: CoercibleToFeatureKey, id_columns: IDColumns, deps: list[FeatureDep] | None = None, fields: Sequence[str | FieldSpec] | None = None, metadata: dict[str, Any] | None = None)

FeatureSpec(*, key: CoercibleToFeatureKey, id_columns: IDColumns, deps: list[CoercibleToFeatureDep] | None = None, fields: Sequence[str | FieldSpec] | None = None, metadata: dict[str, Any] | None = None)

FeatureSpec(*, key: CoercibleToFeatureKey, id_columns: IDColumns, deps: list[FeatureDep] | list[CoercibleToFeatureDep] | None = None, fields: Sequence[str | FieldSpec] | None = None, metadata: dict[str, Any] | None = None)

Bases: FrozenBaseModel

Show JSON schema:

{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "AllFieldsMapping": {
      "description": "Field mapping that explicitly depends on all upstream fields.",
      "properties": {
        "type": {
          "const": "all",
          "default": "all",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "AllFieldsMapping",
      "type": "object"
    },
    "DefaultFieldsMapping": {
      "description": "Default automatic field mapping configuration.\n\nWhen used, automatically maps fields to matching upstream fields based on field keys.\n\nAttributes:\n    match_suffix: If True, allows suffix matching (e.g., \"french\" matches \"audio/french\")\n    exclude_fields: List of field keys to exclude from auto-mapping",
      "properties": {
        "type": {
          "const": "default",
          "default": "default",
          "title": "Type",
          "type": "string"
        },
        "match_suffix": {
          "default": false,
          "title": "Match Suffix",
          "type": "boolean"
        },
        "exclude_fields": {
          "items": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Exclude Fields",
          "type": "array"
        }
      },
      "title": "DefaultFieldsMapping",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "FeatureDep": {
      "additionalProperties": false,
      "description": "Feature dependency specification with optional column selection, renaming, and lineage.\n\nAttributes:\n    feature: The feature key to depend on. Accepts string (\"a/b/c\"), list ([\"a\", \"b\", \"c\"]),\n        FeatureKey instance, or BaseFeature class.\n    columns: Optional tuple of column names to select from upstream feature.\n        - None (default): Keep all columns from upstream\n        - Empty tuple (): Keep only system columns (sample_uid, provenance_by_field, etc.)\n        - Tuple of names: Keep only specified columns (plus system columns)\n    rename: Optional mapping of old column names to new names.\n        Applied after column selection.\n    fields_mapping: Optional field mapping configuration for automatic field dependency resolution.\n        When provided, fields without explicit deps will automatically map to matching upstream fields.\n        Defaults to using `[FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping]`.\n    filters: Optional SQL-like filter strings applied to this dependency. Automatically parsed into\n        Narwhals expressions (accessible via the `filters` property). Filters are automatically\n        applied by FeatureDepTransformer after renames during all FeatureDep operations (including\n        resolve_update and version computation).\n    lineage: The lineage relationship between this upstream dependency and the downstream feature.\n        - `LineageRelationship.identity()` (default): 1:1 relationship, same cardinality\n        - `LineageRelationship.aggregation(on=...)`: N:1, multiple upstream rows aggregate to one downstream\n        - `LineageRelationship.expansion(on=...)`: 1:N, one upstream row expands to multiple downstream rows\n    optional: Whether individual samples of the downstream feature can be computed without\n        the corresponding samples of the upstream feature. If upstream samples are missing,\n        they are going to be represented as NULL values in the joined upstream metadata.\n        Defaults to False (required dependency).\n\nExample: Basic Usage\n    ```py\n    # Keep all columns with default field mapping (1:1 lineage)\n    FeatureDep(feature=\"upstream\")\n\n    # Keep only specific columns\n    FeatureDep(\n        feature=\"upstream/feature\",\n        columns=(\"col1\", \"col2\")\n    )\n\n    # Rename columns to avoid conflicts\n    FeatureDep(\n        feature=\"upstream/feature\",\n        rename={\"old_name\": \"new_name\"}\n    )\n\n    # SQL filters\n    FeatureDep(\n        feature=\"upstream\",\n        filters=[\"age >= 25\", \"status = 'active'\"]\n    )\n\n    # Optional dependency (left join - samples preserved even if no match)\n    FeatureDep(\n        feature=\"enrichment/data\",\n        optional=True\n    )\n    ```\n\nExample: Lineage Relationships\n    ```py\n    # Aggregation: many sensor readings aggregate to one hourly stat\n    FeatureDep(\n        feature=\"sensor_readings\",\n        lineage=LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"])\n    )\n\n    # Expansion: one video expands to many frames\n    FeatureDep(\n        feature=\"video\",\n        lineage=LineageRelationship.expansion(on=[\"video_id\"])\n    )\n\n    # Mixed lineage: aggregate from one parent, identity from another\n    # In FeatureSpec:\n    deps=[\n        FeatureDep(feature=\"readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\"])),\n        FeatureDep(feature=\"sensor_info\", lineage=LineageRelationship.identity()),\n    ]\n    ```",
      "properties": {
        "feature": {
          "$ref": "#/$defs/FeatureKey",
          "description": "Feature key. Accepts a slashed string ('a/b/c'), a sequence of strings, a FeatureKey instance, or a child class of BaseFeature"
        },
        "columns": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Columns"
        },
        "rename": {
          "anyOf": [
            {
              "additionalProperties": {
                "type": "string"
              },
              "type": "object"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Rename"
        },
        "fields_mapping": {
          "$ref": "#/$defs/FieldsMapping"
        },
        "filters": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "SQL-like filter strings applied to this dependency.",
          "title": "Filters"
        },
        "lineage": {
          "$ref": "#/$defs/LineageRelationship",
          "description": "Lineage relationship between this upstream dependency and the downstream feature."
        },
        "optional": {
          "default": false,
          "description": "Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.",
          "title": "Optional",
          "type": "boolean"
        }
      },
      "required": [
        "feature"
      ],
      "title": "FeatureDep",
      "type": "object"
    },
    "FeatureKey": {
      "description": "Feature key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FeatureKey(\"a/b/c\")  # String format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey([\"a\", \"b\", \"c\"])  # List format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey(FeatureKey([\"a\", \"b\", \"c\"]))  # FeatureKey copy\n    # FeatureKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FeatureKey",
      "type": "array"
    },
    "FieldDep": {
      "additionalProperties": false,
      "properties": {
        "feature": {
          "$ref": "#/$defs/FeatureKey"
        },
        "fields": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/FieldKey"
              },
              "type": "array"
            },
            {
              "const": "__METAXY_ALL_DEP__",
              "type": "string"
            }
          ],
          "default": "__METAXY_ALL_DEP__",
          "title": "Fields"
        }
      },
      "required": [
        "feature"
      ],
      "title": "FieldDep",
      "type": "object"
    },
    "FieldKey": {
      "description": "Field key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FieldKey(\"a/b/c\")  # String format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey([\"a\", \"b\", \"c\"])  # List format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey(FieldKey([\"a\", \"b\", \"c\"]))  # FieldKey copy\n    # FieldKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FieldKey",
      "type": "array"
    },
    "FieldsMapping": {
      "description": "Base class for field mapping configurations.\n\nField mappings define how a field automatically resolves its dependencies\nbased on upstream feature fields. This is separate from explicit field\ndependencies which are defined directly.",
      "properties": {
        "mapping": {
          "discriminator": {
            "mapping": {
              "all": "#/$defs/AllFieldsMapping",
              "default": "#/$defs/DefaultFieldsMapping",
              "none": "#/$defs/NoneFieldsMapping",
              "specific": "#/$defs/SpecificFieldsMapping"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/AllFieldsMapping"
            },
            {
              "$ref": "#/$defs/SpecificFieldsMapping"
            },
            {
              "$ref": "#/$defs/NoneFieldsMapping"
            },
            {
              "$ref": "#/$defs/DefaultFieldsMapping"
            }
          ],
          "title": "Mapping"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "FieldsMapping",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    },
    "LineageRelationship": {
      "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
      "properties": {
        "relationship": {
          "discriminator": {
            "mapping": {
              "1:1": "#/$defs/IdentityRelationship",
              "1:N": "#/$defs/ExpansionRelationship",
              "N:1": "#/$defs/AggregationRelationship"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/IdentityRelationship"
            },
            {
              "$ref": "#/$defs/AggregationRelationship"
            },
            {
              "$ref": "#/$defs/ExpansionRelationship"
            }
          ],
          "title": "Relationship"
        }
      },
      "required": [
        "relationship"
      ],
      "title": "LineageRelationship",
      "type": "object"
    },
    "NoneFieldsMapping": {
      "description": "Field mapping that never matches any upstream fields.",
      "properties": {
        "type": {
          "const": "none",
          "default": "none",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "NoneFieldsMapping",
      "type": "object"
    },
    "SpecialFieldDep": {
      "enum": [
        "__METAXY_ALL_DEP__"
      ],
      "title": "SpecialFieldDep",
      "type": "string"
    },
    "SpecificFieldsMapping": {
      "description": "Field mapping that explicitly depends on specific upstream fields.",
      "properties": {
        "type": {
          "const": "specific",
          "default": "specific",
          "title": "Type",
          "type": "string"
        },
        "mapping": {
          "additionalProperties": {
            "items": {
              "$ref": "#/$defs/FieldKey"
            },
            "type": "array",
            "uniqueItems": true
          },
          "propertyNames": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Mapping",
          "type": "object"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "SpecificFieldsMapping",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "properties": {
    "key": {
      "$ref": "#/$defs/FeatureKey"
    },
    "id_columns": {
      "description": "Columns that uniquely identify a sample in this feature.",
      "items": {
        "type": "string"
      },
      "title": "Id Columns",
      "type": "array"
    },
    "deps": {
      "items": {
        "$ref": "#/$defs/FeatureDep"
      },
      "title": "Deps",
      "type": "array"
    },
    "fields": {
      "items": {
        "additionalProperties": false,
        "properties": {
          "key": {
            "$ref": "#/$defs/FieldKey"
          },
          "code_version": {
            "default": "__metaxy_initial__",
            "title": "Code Version",
            "type": "string"
          },
          "deps": {
            "anyOf": [
              {
                "$ref": "#/$defs/SpecialFieldDep"
              },
              {
                "items": {
                  "$ref": "#/$defs/FieldDep"
                },
                "type": "array"
              }
            ],
            "title": "Deps"
          }
        },
        "title": "FieldSpec",
        "type": "object"
      },
      "title": "Fields",
      "type": "array"
    },
    "metadata": {
      "additionalProperties": true,
      "description": "Metadata attached to this feature.",
      "title": "Metadata",
      "type": "object"
    }
  },
  "required": [
    "key",
    "id_columns"
  ],
  "title": "FeatureSpec",
  "type": "object"
}

Fields:

key (FeatureKey)
id_columns (tuple[str, ...])
deps (list[FeatureDep])
fields (list[FieldSpec])
metadata (dict[str, Any])

Validators:

validate_unique_field_keys
validate_id_columns

Source code in src/metaxy/models/feature_spec.py

def __init__(
    self,
    *,
    key: CoercibleToFeatureKey,
    id_columns: IDColumns,
    deps: list[FeatureDep] | list[CoercibleToFeatureDep] | None = None,
    fields: Sequence[str | FieldSpec] | None = None,
    metadata: dict[str, Any] | None = None,
) -> None: ...

Attributes¶

metaxy.FeatureSpec.id_columns `pydantic-field` ¶

id_columns: tuple[str, ...]

Columns that uniquely identify a sample in this feature.

metaxy.FeatureSpec.metadata `pydantic-field` ¶

metadata: dict[str, Any]

Metadata attached to this feature.

metaxy.FeatureSpec.deps_by_key `cached` `property` ¶

deps_by_key: Mapping[FeatureKey, FeatureDep]

Get dependencies indexed by their feature key.

metaxy.FeatureSpec.code_version `cached` `property` ¶

code_version: str

Hash of this feature's field code_versions only (no dependencies).

metaxy.FeatureSpec.feature_spec_version `property` ¶

feature_spec_version: str

Compute SHA256 hash of the complete feature specification.

This property provides a deterministic hash of ALL specification properties, including key, deps, fields, and any metadata/tags. Used for audit trail and tracking specification changes.

Unlike feature_version which only hashes computational properties (for migration triggering), feature_spec_version captures the entire specification for complete reproducibility and audit purposes.

Returns:

str –

SHA256 hex digest of the specification

Example

spec = FeatureSpec(
    key=FeatureKey(["my", "feature"]),
    fields=[FieldSpec(key=FieldKey(["default"]))],
)
spec.feature_spec_version
# 'abc123...'  # 64-character hex string

Functions¶

metaxy.FeatureSpec.table_name ¶

table_name() -> str

Get SQL-like table name for this feature spec.

Source code in src/metaxy/models/feature_spec.py

def table_name(self) -> str:
    """Get SQL-like table name for this feature spec."""
    return self.key.table_name

metaxy.FeatureSpec.validate_unique_field_keys `pydantic-validator` ¶

validate_unique_field_keys() -> Self

Validate that all fields have unique keys.

Source code in src/metaxy/models/feature_spec.py

@pydantic.model_validator(mode="after")
def validate_unique_field_keys(self) -> Self:
    """Validate that all fields have unique keys."""
    seen_keys: set[tuple[str, ...]] = set()
    for field in self.fields:
        # Convert to tuple for hashability in case it's a plain list
        key_tuple = tuple(field.key)
        if key_tuple in seen_keys:
            raise ValueError(
                f"Duplicate field key found: {field.key}. "
                f"All fields must have unique keys."
            )
        seen_keys.add(key_tuple)
    return self

metaxy.FeatureSpec.validate_id_columns `pydantic-validator` ¶

validate_id_columns() -> Self

Validate that id_columns is non-empty if specified.

Source code in src/metaxy/models/feature_spec.py

@pydantic.model_validator(mode="after")
def validate_id_columns(self) -> Self:
    """Validate that id_columns is non-empty if specified."""
    if self.id_columns is not None and len(self.id_columns) == 0:
        raise ValueError(
            "id_columns must be non-empty if specified. Use None for default."
        )
    return self

Feature Dependencies¶

metaxy.FeatureDep `pydantic-model` ¶

FeatureDep(*, feature: str | Sequence[str] | FeatureKey | type[BaseFeature], columns: tuple[str, ...] | None = None, rename: dict[str, str] | None = None, fields_mapping: FieldsMapping | None = None, filters: Sequence[str] | None = None, lineage: LineageRelationship | None = None, optional: bool = False)

Bases: BaseModel

Feature dependency specification with optional column selection, renaming, and lineage.

Attributes:

feature (ValidatedFeatureKey) –

The feature key to depend on. Accepts string ("a/b/c"), list (["a", "b", "c"]), FeatureKey instance, or BaseFeature class.
columns (tuple[str, ...] | None) –

Optional tuple of column names to select from upstream feature. - None (default): Keep all columns from upstream - Empty tuple (): Keep only system columns (sample_uid, provenance_by_field, etc.) - Tuple of names: Keep only specified columns (plus system columns)
rename (dict[str, str] | None) –

Optional mapping of old column names to new names. Applied after column selection.
fields_mapping (FieldsMapping) –

Optional field mapping configuration for automatic field dependency resolution. When provided, fields without explicit deps will automatically map to matching upstream fields. Defaults to using [FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping].
filters (tuple[Expr, ...]) –

Optional SQL-like filter strings applied to this dependency. Automatically parsed into Narwhals expressions (accessible via the filters property). Filters are automatically applied by FeatureDepTransformer after renames during all FeatureDep operations (including resolve_update and version computation).
lineage (LineageRelationship) –

The lineage relationship between this upstream dependency and the downstream feature. - LineageRelationship.identity() (default): 1:1 relationship, same cardinality - LineageRelationship.aggregation(on=...): N:1, multiple upstream rows aggregate to one downstream - LineageRelationship.expansion(on=...): 1:N, one upstream row expands to multiple downstream rows
optional (bool) –

Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata. Defaults to False (required dependency).

Basic Usage

# Keep all columns with default field mapping (1:1 lineage)
FeatureDep(feature="upstream")

# Keep only specific columns
FeatureDep(
    feature="upstream/feature",
    columns=("col1", "col2")
)

# Rename columns to avoid conflicts
FeatureDep(
    feature="upstream/feature",
    rename={"old_name": "new_name"}
)

# SQL filters
FeatureDep(
    feature="upstream",
    filters=["age >= 25", "status = 'active'"]
)

# Optional dependency (left join - samples preserved even if no match)
FeatureDep(
    feature="enrichment/data",
    optional=True
)

Lineage Relationships

# Aggregation: many sensor readings aggregate to one hourly stat
FeatureDep(
    feature="sensor_readings",
    lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
)

# Expansion: one video expands to many frames
FeatureDep(
    feature="video",
    lineage=LineageRelationship.expansion(on=["video_id"])
)

# Mixed lineage: aggregate from one parent, identity from another
# In FeatureSpec:
deps=[
    FeatureDep(feature="readings", lineage=LineageRelationship.aggregation(on=["sensor_id"])),
    FeatureDep(feature="sensor_info", lineage=LineageRelationship.identity()),
]

Show JSON schema:

{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "AllFieldsMapping": {
      "description": "Field mapping that explicitly depends on all upstream fields.",
      "properties": {
        "type": {
          "const": "all",
          "default": "all",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "AllFieldsMapping",
      "type": "object"
    },
    "DefaultFieldsMapping": {
      "description": "Default automatic field mapping configuration.\n\nWhen used, automatically maps fields to matching upstream fields based on field keys.\n\nAttributes:\n    match_suffix: If True, allows suffix matching (e.g., \"french\" matches \"audio/french\")\n    exclude_fields: List of field keys to exclude from auto-mapping",
      "properties": {
        "type": {
          "const": "default",
          "default": "default",
          "title": "Type",
          "type": "string"
        },
        "match_suffix": {
          "default": false,
          "title": "Match Suffix",
          "type": "boolean"
        },
        "exclude_fields": {
          "items": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Exclude Fields",
          "type": "array"
        }
      },
      "title": "DefaultFieldsMapping",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "FeatureKey": {
      "description": "Feature key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FeatureKey(\"a/b/c\")  # String format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey([\"a\", \"b\", \"c\"])  # List format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey(FeatureKey([\"a\", \"b\", \"c\"]))  # FeatureKey copy\n    # FeatureKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FeatureKey",
      "type": "array"
    },
    "FieldKey": {
      "description": "Field key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FieldKey(\"a/b/c\")  # String format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey([\"a\", \"b\", \"c\"])  # List format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey(FieldKey([\"a\", \"b\", \"c\"]))  # FieldKey copy\n    # FieldKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FieldKey",
      "type": "array"
    },
    "FieldsMapping": {
      "description": "Base class for field mapping configurations.\n\nField mappings define how a field automatically resolves its dependencies\nbased on upstream feature fields. This is separate from explicit field\ndependencies which are defined directly.",
      "properties": {
        "mapping": {
          "discriminator": {
            "mapping": {
              "all": "#/$defs/AllFieldsMapping",
              "default": "#/$defs/DefaultFieldsMapping",
              "none": "#/$defs/NoneFieldsMapping",
              "specific": "#/$defs/SpecificFieldsMapping"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/AllFieldsMapping"
            },
            {
              "$ref": "#/$defs/SpecificFieldsMapping"
            },
            {
              "$ref": "#/$defs/NoneFieldsMapping"
            },
            {
              "$ref": "#/$defs/DefaultFieldsMapping"
            }
          ],
          "title": "Mapping"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "FieldsMapping",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    },
    "LineageRelationship": {
      "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
      "properties": {
        "relationship": {
          "discriminator": {
            "mapping": {
              "1:1": "#/$defs/IdentityRelationship",
              "1:N": "#/$defs/ExpansionRelationship",
              "N:1": "#/$defs/AggregationRelationship"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/IdentityRelationship"
            },
            {
              "$ref": "#/$defs/AggregationRelationship"
            },
            {
              "$ref": "#/$defs/ExpansionRelationship"
            }
          ],
          "title": "Relationship"
        }
      },
      "required": [
        "relationship"
      ],
      "title": "LineageRelationship",
      "type": "object"
    },
    "NoneFieldsMapping": {
      "description": "Field mapping that never matches any upstream fields.",
      "properties": {
        "type": {
          "const": "none",
          "default": "none",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "NoneFieldsMapping",
      "type": "object"
    },
    "SpecificFieldsMapping": {
      "description": "Field mapping that explicitly depends on specific upstream fields.",
      "properties": {
        "type": {
          "const": "specific",
          "default": "specific",
          "title": "Type",
          "type": "string"
        },
        "mapping": {
          "additionalProperties": {
            "items": {
              "$ref": "#/$defs/FieldKey"
            },
            "type": "array",
            "uniqueItems": true
          },
          "propertyNames": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Mapping",
          "type": "object"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "SpecificFieldsMapping",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Feature dependency specification with optional column selection, renaming, and lineage.\n\nAttributes:\n    feature: The feature key to depend on. Accepts string (\"a/b/c\"), list ([\"a\", \"b\", \"c\"]),\n        FeatureKey instance, or BaseFeature class.\n    columns: Optional tuple of column names to select from upstream feature.\n        - None (default): Keep all columns from upstream\n        - Empty tuple (): Keep only system columns (sample_uid, provenance_by_field, etc.)\n        - Tuple of names: Keep only specified columns (plus system columns)\n    rename: Optional mapping of old column names to new names.\n        Applied after column selection.\n    fields_mapping: Optional field mapping configuration for automatic field dependency resolution.\n        When provided, fields without explicit deps will automatically map to matching upstream fields.\n        Defaults to using `[FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping]`.\n    filters: Optional SQL-like filter strings applied to this dependency. Automatically parsed into\n        Narwhals expressions (accessible via the `filters` property). Filters are automatically\n        applied by FeatureDepTransformer after renames during all FeatureDep operations (including\n        resolve_update and version computation).\n    lineage: The lineage relationship between this upstream dependency and the downstream feature.\n        - `LineageRelationship.identity()` (default): 1:1 relationship, same cardinality\n        - `LineageRelationship.aggregation(on=...)`: N:1, multiple upstream rows aggregate to one downstream\n        - `LineageRelationship.expansion(on=...)`: 1:N, one upstream row expands to multiple downstream rows\n    optional: Whether individual samples of the downstream feature can be computed without\n        the corresponding samples of the upstream feature. If upstream samples are missing,\n        they are going to be represented as NULL values in the joined upstream metadata.\n        Defaults to False (required dependency).\n\nExample: Basic Usage\n    ```py\n    # Keep all columns with default field mapping (1:1 lineage)\n    FeatureDep(feature=\"upstream\")\n\n    # Keep only specific columns\n    FeatureDep(\n        feature=\"upstream/feature\",\n        columns=(\"col1\", \"col2\")\n    )\n\n    # Rename columns to avoid conflicts\n    FeatureDep(\n        feature=\"upstream/feature\",\n        rename={\"old_name\": \"new_name\"}\n    )\n\n    # SQL filters\n    FeatureDep(\n        feature=\"upstream\",\n        filters=[\"age >= 25\", \"status = 'active'\"]\n    )\n\n    # Optional dependency (left join - samples preserved even if no match)\n    FeatureDep(\n        feature=\"enrichment/data\",\n        optional=True\n    )\n    ```\n\nExample: Lineage Relationships\n    ```py\n    # Aggregation: many sensor readings aggregate to one hourly stat\n    FeatureDep(\n        feature=\"sensor_readings\",\n        lineage=LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"])\n    )\n\n    # Expansion: one video expands to many frames\n    FeatureDep(\n        feature=\"video\",\n        lineage=LineageRelationship.expansion(on=[\"video_id\"])\n    )\n\n    # Mixed lineage: aggregate from one parent, identity from another\n    # In FeatureSpec:\n    deps=[\n        FeatureDep(feature=\"readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\"])),\n        FeatureDep(feature=\"sensor_info\", lineage=LineageRelationship.identity()),\n    ]\n    ```",
  "properties": {
    "feature": {
      "$ref": "#/$defs/FeatureKey",
      "description": "Feature key. Accepts a slashed string ('a/b/c'), a sequence of strings, a FeatureKey instance, or a child class of BaseFeature"
    },
    "columns": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Columns"
    },
    "rename": {
      "anyOf": [
        {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Rename"
    },
    "fields_mapping": {
      "$ref": "#/$defs/FieldsMapping"
    },
    "filters": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "SQL-like filter strings applied to this dependency.",
      "title": "Filters"
    },
    "lineage": {
      "$ref": "#/$defs/LineageRelationship",
      "description": "Lineage relationship between this upstream dependency and the downstream feature."
    },
    "optional": {
      "default": false,
      "description": "Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.",
      "title": "Optional",
      "type": "boolean"
    }
  },
  "required": [
    "feature"
  ],
  "title": "FeatureDep",
  "type": "object"
}

Config:

extra: forbid

Fields:

feature (ValidatedFeatureKey)
columns (tuple[str, ...] | None)
rename (dict[str, str] | None)
fields_mapping (FieldsMapping)
sql_filters (tuple[str, ...] | None)
lineage (LineageRelationship)
optional (bool)

Source code in src/metaxy/models/feature_spec.py

def __init__(
    self,
    *,
    feature: str | Sequence[str] | FeatureKey | type[BaseFeature],
    columns: tuple[str, ...] | None = None,
    rename: dict[str, str] | None = None,
    fields_mapping: FieldsMapping | None = None,
    filters: Sequence[str] | None = None,
    lineage: LineageRelationship | None = None,
    optional: bool = False,
) -> None: ...

Attributes¶

metaxy.FeatureDep.sql_filters `pydantic-field` ¶

sql_filters: tuple[str, ...] | None = None

SQL-like filter strings applied to this dependency.

metaxy.FeatureDep.lineage `pydantic-field` ¶

lineage: LineageRelationship

Lineage relationship between this upstream dependency and the downstream feature.

metaxy.FeatureDep.optional `pydantic-field` ¶

optional: bool = False

Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.

metaxy.FeatureDep.filters `cached` `property` ¶

filters: tuple[Expr, ...]

Parse sql_filters into Narwhals expressions.

Functions¶

metaxy.FeatureDep.table_name ¶

table_name() -> str

Get SQL-like table name for this feature spec.

Source code in src/metaxy/models/feature_spec.py

def table_name(self) -> str:
    """Get SQL-like table name for this feature spec."""
    return self.feature.table_name

Feature Spec¶

metaxy.FeatureSpec pydantic-model ¶

Attributes¶

metaxy.FeatureSpec.id_columns pydantic-field ¶

metaxy.FeatureSpec.metadata pydantic-field ¶

metaxy.FeatureSpec.deps_by_key cached property ¶

metaxy.FeatureSpec.code_version cached property ¶

metaxy.FeatureSpec.feature_spec_version property ¶

Functions¶

metaxy.FeatureSpec.table_name ¶

metaxy.FeatureSpec.validate_unique_field_keys pydantic-validator ¶

metaxy.FeatureSpec.validate_id_columns pydantic-validator ¶

Feature Dependencies¶

metaxy.FeatureDep pydantic-model ¶

Attributes¶

metaxy.FeatureDep.sql_filters pydantic-field ¶

metaxy.FeatureDep.lineage pydantic-field ¶

metaxy.FeatureDep.optional pydantic-field ¶

metaxy.FeatureDep.filters cached property ¶

Functions¶

metaxy.FeatureDep.table_name ¶

metaxy.FeatureSpec `pydantic-model` ¶

metaxy.FeatureSpec.id_columns `pydantic-field` ¶

metaxy.FeatureSpec.metadata `pydantic-field` ¶

metaxy.FeatureSpec.deps_by_key `cached` `property` ¶

metaxy.FeatureSpec.code_version `cached` `property` ¶

metaxy.FeatureSpec.feature_spec_version `property` ¶

metaxy.FeatureSpec.validate_unique_field_keys `pydantic-validator` ¶

metaxy.FeatureSpec.validate_id_columns `pydantic-validator` ¶

metaxy.FeatureDep `pydantic-model` ¶

metaxy.FeatureDep.sql_filters `pydantic-field` ¶

metaxy.FeatureDep.lineage `pydantic-field` ¶

metaxy.FeatureDep.optional `pydantic-field` ¶

metaxy.FeatureDep.filters `cached` `property` ¶