Skip to content

Lineage Relationships

metaxy.models.lineage.LineageRelationship pydantic-model

Bases: BaseModel

Wrapper class for lineage relationship configurations with convenient constructors.

This provides a cleaner API for creating lineage relationships while maintaining type safety through discriminated unions.

Show JSON schema:
{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    }
  },
  "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
  "properties": {
    "relationship": {
      "discriminator": {
        "mapping": {
          "1:1": "#/$defs/IdentityRelationship",
          "1:N": "#/$defs/ExpansionRelationship",
          "N:1": "#/$defs/AggregationRelationship"
        },
        "propertyName": "type"
      },
      "oneOf": [
        {
          "$ref": "#/$defs/IdentityRelationship"
        },
        {
          "$ref": "#/$defs/AggregationRelationship"
        },
        {
          "$ref": "#/$defs/ExpansionRelationship"
        }
      ],
      "title": "Relationship"
    }
  },
  "required": [
    "relationship"
  ],
  "title": "LineageRelationship",
  "type": "object"
}

Config:

  • frozen: True

Fields:

  • relationship (LineageRelationshipUnion)

Functions

metaxy.models.lineage.LineageRelationship.identity classmethod

identity() -> Self

Create an identity (1:1) relationship.

Returns:

  • Self

    Configured LineageRelationship for 1:1 relationship.

Examples:

>>> spec = FeatureSpec(
...     key="feature",
...     lineage=LineageRelationship.identity()
... )
Source code in src/metaxy/models/lineage.py
@classmethod
def identity(cls) -> Self:
    """Create an identity (1:1) relationship.

    Returns:
        Configured LineageRelationship for 1:1 relationship.

    Examples:
        >>> spec = FeatureSpec(
        ...     key="feature",
        ...     lineage=LineageRelationship.identity()
        ... )
    """
    return cls(relationship=IdentityRelationship())

metaxy.models.lineage.LineageRelationship.aggregation classmethod

aggregation(on: Sequence[str] | None = None) -> Self

Create an aggregation (N:1) relationship.

Parameters:

  • on (Sequence[str] | None, default: None ) –

    Columns to group by for aggregation. If None, uses all target ID columns.

Returns:

  • Self

    Configured LineageRelationship for N:1 relationship.

Examples:

>>> # Aggregate on specific columns
>>> spec = FeatureSpec(
...     key="hourly_stats",
...     id_columns=["sensor_id", "hour"],
...     lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
... )
>>> # Aggregate on all ID columns (default)
>>> spec = FeatureSpec(
...     key="user_summary",
...     id_columns=["user_id"],
...     lineage=LineageRelationship.aggregation()
... )
Source code in src/metaxy/models/lineage.py
@classmethod
def aggregation(cls, on: Sequence[str] | None = None) -> Self:
    """Create an aggregation (N:1) relationship.

    Args:
        on: Columns to group by for aggregation. If None, uses all target ID columns.

    Returns:
        Configured LineageRelationship for N:1 relationship.

    Examples:
        >>> # Aggregate on specific columns
        >>> spec = FeatureSpec(
        ...     key="hourly_stats",
        ...     id_columns=["sensor_id", "hour"],
        ...     lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
        ... )

        >>> # Aggregate on all ID columns (default)
        >>> spec = FeatureSpec(
        ...     key="user_summary",
        ...     id_columns=["user_id"],
        ...     lineage=LineageRelationship.aggregation()
        ... )
    """
    return cls(relationship=AggregationRelationship(on=on))

metaxy.models.lineage.LineageRelationship.expansion classmethod

expansion(on: Sequence[str], id_generation_pattern: str | None = None) -> Self

Create an expansion (1:N) relationship.

Parameters:

  • on (Sequence[str]) –

    Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. Required - must explicitly specify which columns link parent to child.

  • id_generation_pattern (str | None, default: None ) –

    Pattern for generating child IDs. Can be "sequential", "hash", or custom. If None, handled by load_input().

Returns:

  • Self

    Configured LineageRelationship for 1:N relationship.

Examples:

>>> # Sequential ID generation with explicit parent ID
>>> spec = FeatureSpec(
...     key="video_frames",
...     id_columns=["video_id", "frame_id"],
...     lineage=LineageRelationship.expansion(
...         on=["video_id"],
...         id_generation_pattern="sequential"
...     )
... )
>>> # Custom ID generation in load_input()
>>> spec = FeatureSpec(
...     key="text_chunks",
...     id_columns=["doc_id", "chunk_id"],
...     lineage=LineageRelationship.expansion(on=["doc_id"])
... )
Source code in src/metaxy/models/lineage.py
@classmethod
def expansion(
    cls,
    on: Sequence[str],
    id_generation_pattern: str | None = None,
) -> Self:
    """Create an expansion (1:N) relationship.

    Args:
        on: Parent ID columns that identify the parent record. Child records with
            the same parent IDs will share the same upstream provenance.
            Required - must explicitly specify which columns link parent to child.
        id_generation_pattern: Pattern for generating child IDs.
            Can be "sequential", "hash", or custom. If None, handled by load_input().

    Returns:
        Configured LineageRelationship for 1:N relationship.

    Examples:
        >>> # Sequential ID generation with explicit parent ID
        >>> spec = FeatureSpec(
        ...     key="video_frames",
        ...     id_columns=["video_id", "frame_id"],
        ...     lineage=LineageRelationship.expansion(
        ...         on=["video_id"],
        ...         id_generation_pattern="sequential"
        ...     )
        ... )

        >>> # Custom ID generation in load_input()
        >>> spec = FeatureSpec(
        ...     key="text_chunks",
        ...     id_columns=["doc_id", "chunk_id"],
        ...     lineage=LineageRelationship.expansion(on=["doc_id"])
        ... )
    """
    return cls(
        relationship=ExpansionRelationship(
            on=on, id_generation_pattern=id_generation_pattern
        )
    )

metaxy.models.lineage.LineageRelationship.get_aggregation_columns

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

Get columns to aggregate on for this relationship.

Parameters:

  • target_id_columns (Sequence[str]) –

    The target feature's ID columns.

Returns:

  • Sequence[str] | None

    Columns to group by for aggregation, or None if no aggregation needed.

Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
    self, target_id_columns: Sequence[str]
) -> Sequence[str] | None:
    """Get columns to aggregate on for this relationship.

    Args:
        target_id_columns: The target feature's ID columns.

    Returns:
        Columns to group by for aggregation, or None if no aggregation needed.
    """
    return self.relationship.get_aggregation_columns(target_id_columns)

metaxy.models.lineage.LineageRelationshipType

Bases: str, Enum

Type of lineage relationship between features.


metaxy.models.lineage.IdentityRelationship pydantic-model

Bases: BaseLineageRelationship

One-to-one relationship where each child row maps to exactly one parent row.

This is the default relationship type. Parent and child features share the same ID columns and have the same cardinality. No aggregation is performed.

Examples:

>>> # Default 1:1 relationship
>>> IdentityRelationship()
>>> # Or use the classmethod
>>> LineageRelationship.identity()
Show JSON schema:
{
  "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n    >>> # Default 1:1 relationship\n    >>> IdentityRelationship()\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.identity()",
  "properties": {
    "type": {
      "const": "1:1",
      "default": "1:1",
      "title": "Type",
      "type": "string"
    }
  },
  "title": "IdentityRelationship",
  "type": "object"
}

Fields:

Functions

metaxy.models.lineage.IdentityRelationship.get_aggregation_columns

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

No aggregation needed for identity relationships.

Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str] | None:
    """No aggregation needed for identity relationships."""
    return None

metaxy.models.lineage.ExpansionRelationship pydantic-model

Bases: BaseLineageRelationship

One-to-many relationship where one parent row expands to multiple child rows.

Child features have more granular ID columns than the parent. Each parent row generates multiple child rows with additional ID columns.

Attributes:

  • on (Sequence[str]) –

    Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. If not specified, will be inferred from the available columns.

  • id_generation_pattern (str | None) –

    Optional pattern for generating child IDs. Can be "sequential", "hash", or a custom pattern. If not specified, the feature's load_input() method is responsible for ID generation.

Examples:

>>> # Video frames from video
>>> ExpansionRelationship(
...     on=["video_id"],  # Parent ID
...     id_generation_pattern="sequential"
... )
>>> # Parent has: video_id
>>> # Child has: video_id, frame_id (generated)
>>> # Text chunks from document
>>> ExpansionRelationship(on=["doc_id"])
>>> # Parent has: doc_id
>>> # Child has: doc_id, chunk_id (generated in load_input)
Show JSON schema:
{
  "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExamples:\n    >>> # Video frames from video\n    >>> ExpansionRelationship(\n    ...     on=[\"video_id\"],  # Parent ID\n    ...     id_generation_pattern=\"sequential\"\n    ... )\n    >>> # Parent has: video_id\n    >>> # Child has: video_id, frame_id (generated)\n\n    >>> # Text chunks from document\n    >>> ExpansionRelationship(on=[\"doc_id\"])\n    >>> # Parent has: doc_id\n    >>> # Child has: doc_id, chunk_id (generated in load_input)",
  "properties": {
    "type": {
      "const": "1:N",
      "default": "1:N",
      "title": "Type",
      "type": "string"
    },
    "on": {
      "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
      "items": {
        "type": "string"
      },
      "title": "On",
      "type": "array"
    },
    "id_generation_pattern": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Pattern for generating child IDs. If None, handled by load_input().",
      "title": "Id Generation Pattern"
    }
  },
  "required": [
    "on"
  ],
  "title": "ExpansionRelationship",
  "type": "object"
}

Fields:

Attributes

metaxy.models.lineage.ExpansionRelationship.on pydantic-field

Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.

metaxy.models.lineage.ExpansionRelationship.id_generation_pattern pydantic-field

id_generation_pattern: str | None = None

Pattern for generating child IDs. If None, handled by load_input().

Functions

metaxy.models.lineage.ExpansionRelationship.get_aggregation_columns

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str] | None

Get aggregation columns for the joiner phase.

For expansion relationships, returns None because aggregation happens during diff resolution, not during joining. The joiner should pass through all parent records without aggregation.

Parameters:

  • target_id_columns (Sequence[str]) –

    The target (child) feature's ID columns.

Returns:

  • Sequence[str] | None

    None - no aggregation during join phase for expansion relationships.

Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str] | None:
    """Get aggregation columns for the joiner phase.

    For expansion relationships, returns None because aggregation
    happens during diff resolution, not during joining. The joiner
    should pass through all parent records without aggregation.

    Args:
        target_id_columns: The target (child) feature's ID columns.

    Returns:
        None - no aggregation during join phase for expansion relationships.
    """
    # Expansion relationships don't aggregate during join phase
    # Aggregation happens later during diff resolution
    return None

metaxy.models.lineage.AggregationRelationship pydantic-model

Bases: BaseLineageRelationship

Many-to-one relationship where multiple parent rows aggregate to one child row.

Parent features have more granular ID columns than the child. The child aggregates multiple parent rows by grouping on a subset of the parent's ID columns.

Attributes:

  • on (Sequence[str] | None) –

    Columns to group by for aggregation. These should be a subset of the target feature's ID columns. If not specified, uses all target ID columns.

Examples:

>>> # Aggregate sensor readings by hour
>>> AggregationRelationship(on=["sensor_id", "hour"])
>>> # Parent has: sensor_id, hour, minute
>>> # Child has: sensor_id, hour
>>> # Or use the classmethod
>>> LineageRelationship.aggregation(on=["user_id", "session_id"])
Show JSON schema:
{
  "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n    >>> # Aggregate sensor readings by hour\n    >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n    >>> # Parent has: sensor_id, hour, minute\n    >>> # Child has: sensor_id, hour\n\n    >>> # Or use the classmethod\n    >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
  "properties": {
    "type": {
      "const": "N:1",
      "default": "N:1",
      "title": "Type",
      "type": "string"
    },
    "on": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
      "title": "On"
    }
  },
  "title": "AggregationRelationship",
  "type": "object"
}

Fields:

Attributes

metaxy.models.lineage.AggregationRelationship.on pydantic-field

on: Sequence[str] | None = None

Columns to group by for aggregation. Defaults to all target ID columns.

Functions

metaxy.models.lineage.AggregationRelationship.get_aggregation_columns

get_aggregation_columns(target_id_columns: Sequence[str]) -> Sequence[str]

Get columns to aggregate on.

Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
    self,
    target_id_columns: Sequence[str],
) -> Sequence[str]:
    """Get columns to aggregate on."""
    return self.on if self.on is not None else target_id_columns