Lineage Relationships¶

metaxy.models.lineage.LineageRelationship `pydantic-model` ¶

Bases: BaseModel

Wrapper class for lineage relationship configurations with convenient constructors.

This provides a cleaner API for creating lineage relationships while maintaining type safety through discriminated unions.

Show JSON schema:

{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    }
  },
  "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
  "properties": {
    "relationship": {
      "discriminator": {
        "mapping": {
          "1:1": "#/$defs/IdentityRelationship",
          "1:N": "#/$defs/ExpansionRelationship",
          "N:1": "#/$defs/AggregationRelationship"
        },
        "propertyName": "type"
      },
      "oneOf": [
        {
          "$ref": "#/$defs/IdentityRelationship"
        },
        {
          "$ref": "#/$defs/AggregationRelationship"
        },
        {
          "$ref": "#/$defs/ExpansionRelationship"
        }
      ],
      "title": "Relationship"
    }
  },
  "required": [
    "relationship"
  ],
  "title": "LineageRelationship",
  "type": "object"
}

Config:

frozen: True

Fields:

relationship (LineageRelationshipUnion)

Functions¶

metaxy.models.lineage.LineageRelationship.identity `classmethod` ¶

identity() -> Self

Create an identity (1:1) relationship.

Returns:

Self –

Configured LineageRelationship for 1:1 relationship.

Examples:

>>> spec = FeatureSpec(
...     key="feature",
...     lineage=LineageRelationship.identity()
... )

Source code in src/metaxy/models/lineage.py

@classmethod
def identity(cls) -> Self:
    """Create an identity (1:1) relationship.

    Returns:
        Configured LineageRelationship for 1:1 relationship.

    Examples:
        >>> spec = FeatureSpec(
        ...     key="feature",
        ...     lineage=LineageRelationship.identity()
        ... )
    """
    return cls(relationship=IdentityRelationship())

metaxy.models.lineage.LineageRelationship.aggregation `classmethod` ¶

aggregation(on: Sequence[str] | None = None) -> Self

Create an aggregation (N:1) relationship.

Parameters:

on (Sequence[str] | None, default: None ) –

Columns to group by for aggregation. If None, uses all target ID columns.

Returns:

Self –

Configured LineageRelationship for N:1 relationship.

Examples:

>>> # Aggregate on specific columns
>>> spec = FeatureSpec(
...     key="hourly_stats",
...     id_columns=["sensor_id", "hour"],
...     lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
... )

>>> # Aggregate on all ID columns (default)
>>> spec = FeatureSpec(
...     key="user_summary",
...     id_columns=["user_id"],
...     lineage=LineageRelationship.aggregation()
... )

Source code in src/metaxy/models/lineage.py

@classmethod
def aggregation(cls, on: Sequence[str] | None = None) -> Self:
    """Create an aggregation (N:1) relationship.

    Args:
        on: Columns to group by for aggregation. If None, uses all target ID columns.

    Returns:
        Configured LineageRelationship for N:1 relationship.

    Examples:
        >>> # Aggregate on specific columns
        >>> spec = FeatureSpec(
        ...     key="hourly_stats",
        ...     id_columns=["sensor_id", "hour"],
        ...     lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
        ... )

        >>> # Aggregate on all ID columns (default)
        >>> spec = FeatureSpec(
        ...     key="user_summary",
        ...     id_columns=["user_id"],
        ...     lineage=LineageRelationship.aggregation()
        ... )
    """
    return cls(relationship=AggregationRelationship(on=on))

metaxy.models.lineage.LineageRelationship.expansion `classmethod` ¶

expansion(on: Sequence[str], id_generation_pattern: str | None = None) -> Self

Create an expansion (1:N) relationship.

Parameters:

on (Sequence[str]) –

Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. Required - must explicitly specify which columns link parent to child.
id_generation_pattern (str | None, default: None ) –

Pattern for generating child IDs. Can be "sequential", "hash", or custom. If None, handled by load_input().

Returns:

Self –

Configured LineageRelationship for 1:N relationship.

Examples:

>>> # Sequential ID generation with explicit parent ID
>>> spec = FeatureSpec(
...     key="video_frames",
...     id_columns=["video_id", "frame_id"],
...     lineage=LineageRelationship.expansion(
...         on=["video_id"],
...         id_generation_pattern="sequential"
...     )
... )

>>> # Custom ID generation in load_input()
>>> spec = FeatureSpec(
...     key="text_chunks",
...     id_columns=["doc_id", "chunk_id"],
...     lineage=LineageRelationship.expansion(on=["doc_id"])
... )

Source code in src/metaxy/models/lineage.py

@classmethod
def expansion(
    cls,
    on: Sequence[str],
    id_generation_pattern: str | None = None,
) -> Self:
    """Create an expansion (1:N) relationship.

    Args:
        on: Parent ID columns that identify the parent record. Child records with
            the same parent IDs will share the same upstream provenance.
            Required - must explicitly specify which columns link parent to child.
        id_generation_pattern: Pattern for generating child IDs.
            Can be "sequential", "hash", or custom. If None, handled by load_input().

    Returns:
        Configured LineageRelationship for 1:N relationship.

    Examples:
        >>> # Sequential ID generation with explicit parent ID
        >>> spec = FeatureSpec(
        ...     key="video_frames",
        ...     id_columns=["video_id", "frame_id"],
        ...     lineage=LineageRelationship.expansion(
        ...         on=["video_id"],
        ...         id_generation_pattern="sequential"
        ...     )
        ... )

        >>> # Custom ID generation in load_input()
        >>> spec = FeatureSpec(
        ...     key="text_chunks",
        ...     id_columns=["doc_id", "chunk_id"],
        ...     lineage=LineageRelationship.expansion(on=["doc_id"])
        ... )
    """
    return cls(
        relationship=ExpansionRelationship(
            on=on, id_generation_pattern=id_generation_pattern
        )
    )

metaxy.models.lineage.LineageRelationship.get_aggregation_columns ¶

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

Get columns to aggregate on for this relationship.

Parameters:

target_id_columns (Sequence[str]) –

The target feature's ID columns.

Returns:

Sequence[str] | None –

Columns to group by for aggregation, or None if no aggregation needed.

Source code in src/metaxy/models/lineage.py

def get_aggregation_columns(
    self, target_id_columns: Sequence[str]
) -> Sequence[str] | None:
    """Get columns to aggregate on for this relationship.

    Args:
        target_id_columns: The target feature's ID columns.

    Returns:
        Columns to group by for aggregation, or None if no aggregation needed.
    """
    return self.relationship.get_aggregation_columns(target_id_columns)

metaxy.models.lineage.LineageRelationshipType ¶

Bases: str, Enum

Type of lineage relationship between features.

metaxy.models.lineage.IdentityRelationship `pydantic-model` ¶

Bases: BaseLineageRelationship

One-to-one relationship where each child row maps to exactly one parent row.

This is the default relationship type. Parent and child features share the same ID columns and have the same cardinality. No aggregation is performed.

Examples:

>>> # Default 1:1 relationship
>>> IdentityRelationship()

>>> # Or use the classmethod
>>> LineageRelationship.identity()

Show JSON schema:

{
  "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
  "properties": {
    "type": {
      "const": "1:1",
      "default": "1:1",
      "title": "Type",
      "type": "string"
    }
  },
  "title": "IdentityRelationship",
  "type": "object"
}

Fields:

type (Literal[IDENTITY])

Functions¶

metaxy.models.lineage.IdentityRelationship.get_aggregation_columns ¶

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

No aggregation needed for identity relationships.

Source code in src/metaxy/models/lineage.py

def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str] | None:
    """No aggregation needed for identity relationships."""
    return None

metaxy.models.lineage.ExpansionRelationship `pydantic-model` ¶

Bases: BaseLineageRelationship

One-to-many relationship where one parent row expands to multiple child rows.

Child features have more granular ID columns than the parent. Each parent row generates multiple child rows with additional ID columns.

Attributes:

on (Sequence[str]) –

Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. If not specified, will be inferred from the available columns.
id_generation_pattern (str | None) –

Optional pattern for generating child IDs. Can be "sequential", "hash", or a custom pattern. If not specified, the feature's load_input() method is responsible for ID generation.

Examples:

>>> # Video frames from video
>>> ExpansionRelationship(
...     on=["video_id"],  # Parent ID
...     id_generation_pattern="sequential"
... )
>>> # Parent has: video_id
>>> # Child has: video_id, frame_id (generated)

>>> # Text chunks from document
>>> ExpansionRelationship(on=["doc_id"])
>>> # Parent has: doc_id
>>> # Child has: doc_id, chunk_id (generated in load_input)

Show JSON schema:

{
  "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
  "properties": {
    "type": {
      "const": "1:N",
      "default": "1:N",
      "title": "Type",
      "type": "string"
    },
    "on": {
      "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
      "items": {
        "type": "string"
      },
      "title": "On",
      "type": "array"
    },
    "id_generation_pattern": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Pattern for generating child IDs. If None, handled by load_input().",
      "title": "Id Generation Pattern"
    }
  },
  "required": [
    "on"
  ],
  "title": "ExpansionRelationship",
  "type": "object"
}

Fields:

type (Literal[EXPANSION])
on (Sequence[str])
id_generation_pattern (str | None)

Attributes¶

metaxy.models.lineage.ExpansionRelationship.on `pydantic-field` ¶

on: Sequence[str]

Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.

metaxy.models.lineage.ExpansionRelationship.id_generation_pattern `pydantic-field` ¶

id_generation_pattern: str | None = None

Pattern for generating child IDs. If None, handled by load_input().

Functions¶

metaxy.models.lineage.ExpansionRelationship.get_aggregation_columns ¶

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

Get aggregation columns for the joiner phase.

For expansion relationships, returns None because aggregation happens during diff resolution, not during joining. The joiner should pass through all parent records without aggregation.

Parameters:

target_id_columns (Sequence[str]) –

The target (child) feature's ID columns.

Returns:

Sequence[str] | None –

None - no aggregation during join phase for expansion relationships.

Source code in src/metaxy/models/lineage.py

def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str] | None:
    """Get aggregation columns for the joiner phase.

    For expansion relationships, returns None because aggregation
    happens during diff resolution, not during joining. The joiner
    should pass through all parent records without aggregation.

    Args:
        target_id_columns: The target (child) feature's ID columns.

    Returns:
        None - no aggregation during join phase for expansion relationships.
    """
    # Expansion relationships don't aggregate during join phase
    # Aggregation happens later during diff resolution
    return None

metaxy.models.lineage.AggregationRelationship `pydantic-model` ¶

Bases: BaseLineageRelationship

Many-to-one relationship where multiple parent rows aggregate to one child row.

Parent features have more granular ID columns than the child. The child aggregates multiple parent rows by grouping on a subset of the parent's ID columns.

Attributes:

on (Sequence[str] | None) –

Columns to group by for aggregation. These should be a subset of the target feature's ID columns. If not specified, uses all target ID columns.

Examples:

>>> # Aggregate sensor readings by hour
>>> AggregationRelationship(on=["sensor_id", "hour"])
>>> # Parent has: sensor_id, hour, minute
>>> # Child has: sensor_id, hour

>>> # Or use the classmethod
>>> LineageRelationship.aggregation(on=["user_id", "session_id"])

Show JSON schema:

{
  "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
  "properties": {
    "type": {
      "const": "N:1",
      "default": "N:1",
      "title": "Type",
      "type": "string"
    },
    "on": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
      "title": "On"
    }
  },
  "title": "AggregationRelationship",
  "type": "object"
}

Fields:

type (Literal[AGGREGATION])
on (Sequence[str] | None)

Attributes¶

metaxy.models.lineage.AggregationRelationship.on `pydantic-field` ¶

on: Sequence[str] | None = None

Columns to group by for aggregation. Defaults to all target ID columns.

Functions¶

metaxy.models.lineage.AggregationRelationship.get_aggregation_columns ¶

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str]

Get columns to aggregate on.

Source code in src/metaxy/models/lineage.py

def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str]:
    """Get columns to aggregate on."""
    return self.on if self.on is not None else target_id_columns

Lineage Relationships¶

metaxy.models.lineage.LineageRelationship pydantic-model ¶

Functions¶

metaxy.models.lineage.LineageRelationship.identity classmethod ¶

metaxy.models.lineage.LineageRelationship.aggregation classmethod ¶

metaxy.models.lineage.LineageRelationship.expansion classmethod ¶

metaxy.models.lineage.LineageRelationship.get_aggregation_columns ¶

metaxy.models.lineage.LineageRelationshipType ¶

metaxy.models.lineage.IdentityRelationship pydantic-model ¶

Functions¶

metaxy.models.lineage.IdentityRelationship.get_aggregation_columns ¶

metaxy.models.lineage.ExpansionRelationship pydantic-model ¶

Attributes¶

metaxy.models.lineage.ExpansionRelationship.on pydantic-field ¶

metaxy.models.lineage.ExpansionRelationship.id_generation_pattern pydantic-field ¶

Functions¶

metaxy.models.lineage.ExpansionRelationship.get_aggregation_columns ¶

metaxy.models.lineage.AggregationRelationship pydantic-model ¶

Attributes¶

metaxy.models.lineage.AggregationRelationship.on pydantic-field ¶

Functions¶

metaxy.models.lineage.AggregationRelationship.get_aggregation_columns ¶

metaxy.models.lineage.LineageRelationship `pydantic-model` ¶

metaxy.models.lineage.LineageRelationship.identity `classmethod` ¶

metaxy.models.lineage.LineageRelationship.aggregation `classmethod` ¶

metaxy.models.lineage.LineageRelationship.expansion `classmethod` ¶

metaxy.models.lineage.IdentityRelationship `pydantic-model` ¶

metaxy.models.lineage.ExpansionRelationship `pydantic-model` ¶

metaxy.models.lineage.ExpansionRelationship.on `pydantic-field` ¶

metaxy.models.lineage.ExpansionRelationship.id_generation_pattern `pydantic-field` ¶

metaxy.models.lineage.AggregationRelationship `pydantic-model` ¶

metaxy.models.lineage.AggregationRelationship.on `pydantic-field` ¶