Lineage Relationships¶
metaxy.models.lineage.LineageRelationship
pydantic-model
¶
Bases: BaseModel
Wrapper class for lineage relationship configurations with convenient constructors.
This provides a cleaner API for creating lineage relationships while maintaining type safety through discriminated unions.
Show JSON schema:
{
"$defs": {
"AggregationRelationship": {
"description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n on: Columns to group by for aggregation. These should be a subset of the\n target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n >>> # Aggregate sensor readings by hour\n >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n >>> # Parent has: sensor_id, hour, minute\n >>> # Child has: sensor_id, hour\n\n >>> # Or use the classmethod\n >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
"properties": {
"type": {
"const": "N:1",
"default": "N:1",
"title": "Type",
"type": "string"
},
"on": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "Columns to group by for aggregation. Defaults to all target ID columns.",
"title": "On"
}
},
"title": "AggregationRelationship",
"type": "object"
},
"ExpansionRelationship": {
"description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n on: Parent ID columns that identify the parent record. Child records with\n the same parent IDs will share the same upstream provenance.\n If not specified, will be inferred from the available columns.\n id_generation_pattern: Optional pattern for generating child IDs.\n Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n the feature's load_input() method is responsible for ID generation.\n\nExamples:\n >>> # Video frames from video\n >>> ExpansionRelationship(\n ... on=[\"video_id\"], # Parent ID\n ... id_generation_pattern=\"sequential\"\n ... )\n >>> # Parent has: video_id\n >>> # Child has: video_id, frame_id (generated)\n\n >>> # Text chunks from document\n >>> ExpansionRelationship(on=[\"doc_id\"])\n >>> # Parent has: doc_id\n >>> # Child has: doc_id, chunk_id (generated in load_input)",
"properties": {
"type": {
"const": "1:N",
"default": "1:N",
"title": "Type",
"type": "string"
},
"on": {
"description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
"items": {
"type": "string"
},
"title": "On",
"type": "array"
},
"id_generation_pattern": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Pattern for generating child IDs. If None, handled by load_input().",
"title": "Id Generation Pattern"
}
},
"required": [
"on"
],
"title": "ExpansionRelationship",
"type": "object"
},
"IdentityRelationship": {
"description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n >>> # Default 1:1 relationship\n >>> IdentityRelationship()\n\n >>> # Or use the classmethod\n >>> LineageRelationship.identity()",
"properties": {
"type": {
"const": "1:1",
"default": "1:1",
"title": "Type",
"type": "string"
}
},
"title": "IdentityRelationship",
"type": "object"
}
},
"description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
"properties": {
"relationship": {
"discriminator": {
"mapping": {
"1:1": "#/$defs/IdentityRelationship",
"1:N": "#/$defs/ExpansionRelationship",
"N:1": "#/$defs/AggregationRelationship"
},
"propertyName": "type"
},
"oneOf": [
{
"$ref": "#/$defs/IdentityRelationship"
},
{
"$ref": "#/$defs/AggregationRelationship"
},
{
"$ref": "#/$defs/ExpansionRelationship"
}
],
"title": "Relationship"
}
},
"required": [
"relationship"
],
"title": "LineageRelationship",
"type": "object"
}
Config:
frozen:True
Fields:
-
relationship(LineageRelationshipUnion)
Functions¶
metaxy.models.lineage.LineageRelationship.identity
classmethod
¶
Create an identity (1:1) relationship.
Returns:
-
Self–Configured LineageRelationship for 1:1 relationship.
Examples:
Source code in src/metaxy/models/lineage.py
metaxy.models.lineage.LineageRelationship.aggregation
classmethod
¶
Create an aggregation (N:1) relationship.
Parameters:
-
on(Sequence[str] | None, default:None) –Columns to group by for aggregation. If None, uses all target ID columns.
Returns:
-
Self–Configured LineageRelationship for N:1 relationship.
Examples:
>>> # Aggregate on specific columns
>>> spec = FeatureSpec(
... key="hourly_stats",
... id_columns=["sensor_id", "hour"],
... lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
... )
>>> # Aggregate on all ID columns (default)
>>> spec = FeatureSpec(
... key="user_summary",
... id_columns=["user_id"],
... lineage=LineageRelationship.aggregation()
... )
Source code in src/metaxy/models/lineage.py
@classmethod
def aggregation(cls, on: Sequence[str] | None = None) -> Self:
"""Create an aggregation (N:1) relationship.
Args:
on: Columns to group by for aggregation. If None, uses all target ID columns.
Returns:
Configured LineageRelationship for N:1 relationship.
Examples:
>>> # Aggregate on specific columns
>>> spec = FeatureSpec(
... key="hourly_stats",
... id_columns=["sensor_id", "hour"],
... lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"])
... )
>>> # Aggregate on all ID columns (default)
>>> spec = FeatureSpec(
... key="user_summary",
... id_columns=["user_id"],
... lineage=LineageRelationship.aggregation()
... )
"""
return cls(relationship=AggregationRelationship(on=on))
metaxy.models.lineage.LineageRelationship.expansion
classmethod
¶
Create an expansion (1:N) relationship.
Parameters:
-
on(Sequence[str]) –Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. Required - must explicitly specify which columns link parent to child.
-
id_generation_pattern(str | None, default:None) –Pattern for generating child IDs. Can be "sequential", "hash", or custom. If None, handled by load_input().
Returns:
-
Self–Configured LineageRelationship for 1:N relationship.
Examples:
>>> # Sequential ID generation with explicit parent ID
>>> spec = FeatureSpec(
... key="video_frames",
... id_columns=["video_id", "frame_id"],
... lineage=LineageRelationship.expansion(
... on=["video_id"],
... id_generation_pattern="sequential"
... )
... )
>>> # Custom ID generation in load_input()
>>> spec = FeatureSpec(
... key="text_chunks",
... id_columns=["doc_id", "chunk_id"],
... lineage=LineageRelationship.expansion(on=["doc_id"])
... )
Source code in src/metaxy/models/lineage.py
@classmethod
def expansion(
cls,
on: Sequence[str],
id_generation_pattern: str | None = None,
) -> Self:
"""Create an expansion (1:N) relationship.
Args:
on: Parent ID columns that identify the parent record. Child records with
the same parent IDs will share the same upstream provenance.
Required - must explicitly specify which columns link parent to child.
id_generation_pattern: Pattern for generating child IDs.
Can be "sequential", "hash", or custom. If None, handled by load_input().
Returns:
Configured LineageRelationship for 1:N relationship.
Examples:
>>> # Sequential ID generation with explicit parent ID
>>> spec = FeatureSpec(
... key="video_frames",
... id_columns=["video_id", "frame_id"],
... lineage=LineageRelationship.expansion(
... on=["video_id"],
... id_generation_pattern="sequential"
... )
... )
>>> # Custom ID generation in load_input()
>>> spec = FeatureSpec(
... key="text_chunks",
... id_columns=["doc_id", "chunk_id"],
... lineage=LineageRelationship.expansion(on=["doc_id"])
... )
"""
return cls(
relationship=ExpansionRelationship(
on=on, id_generation_pattern=id_generation_pattern
)
)
metaxy.models.lineage.LineageRelationship.get_aggregation_columns
¶
Get columns to aggregate on for this relationship.
Parameters:
Returns:
Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
self, target_id_columns: Sequence[str]
) -> Sequence[str] | None:
"""Get columns to aggregate on for this relationship.
Args:
target_id_columns: The target feature's ID columns.
Returns:
Columns to group by for aggregation, or None if no aggregation needed.
"""
return self.relationship.get_aggregation_columns(target_id_columns)
metaxy.models.lineage.LineageRelationshipType
¶
metaxy.models.lineage.IdentityRelationship
pydantic-model
¶
Bases: BaseLineageRelationship
One-to-one relationship where each child row maps to exactly one parent row.
This is the default relationship type. Parent and child features share the same ID columns and have the same cardinality. No aggregation is performed.
Examples:
Show JSON schema:
{
"description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality. No aggregation is performed.\n\nExamples:\n >>> # Default 1:1 relationship\n >>> IdentityRelationship()\n\n >>> # Or use the classmethod\n >>> LineageRelationship.identity()",
"properties": {
"type": {
"const": "1:1",
"default": "1:1",
"title": "Type",
"type": "string"
}
},
"title": "IdentityRelationship",
"type": "object"
}
Fields:
-
type(Literal[IDENTITY])
metaxy.models.lineage.ExpansionRelationship
pydantic-model
¶
Bases: BaseLineageRelationship
One-to-many relationship where one parent row expands to multiple child rows.
Child features have more granular ID columns than the parent. Each parent row generates multiple child rows with additional ID columns.
Attributes:
-
on(Sequence[str]) –Parent ID columns that identify the parent record. Child records with the same parent IDs will share the same upstream provenance. If not specified, will be inferred from the available columns.
-
id_generation_pattern(str | None) –Optional pattern for generating child IDs. Can be "sequential", "hash", or a custom pattern. If not specified, the feature's load_input() method is responsible for ID generation.
Examples:
>>> # Video frames from video
>>> ExpansionRelationship(
... on=["video_id"], # Parent ID
... id_generation_pattern="sequential"
... )
>>> # Parent has: video_id
>>> # Child has: video_id, frame_id (generated)
>>> # Text chunks from document
>>> ExpansionRelationship(on=["doc_id"])
>>> # Parent has: doc_id
>>> # Child has: doc_id, chunk_id (generated in load_input)
Show JSON schema:
{
"description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nAttributes:\n on: Parent ID columns that identify the parent record. Child records with\n the same parent IDs will share the same upstream provenance.\n If not specified, will be inferred from the available columns.\n id_generation_pattern: Optional pattern for generating child IDs.\n Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n the feature's load_input() method is responsible for ID generation.\n\nExamples:\n >>> # Video frames from video\n >>> ExpansionRelationship(\n ... on=[\"video_id\"], # Parent ID\n ... id_generation_pattern=\"sequential\"\n ... )\n >>> # Parent has: video_id\n >>> # Child has: video_id, frame_id (generated)\n\n >>> # Text chunks from document\n >>> ExpansionRelationship(on=[\"doc_id\"])\n >>> # Parent has: doc_id\n >>> # Child has: doc_id, chunk_id (generated in load_input)",
"properties": {
"type": {
"const": "1:N",
"default": "1:N",
"title": "Type",
"type": "string"
},
"on": {
"description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
"items": {
"type": "string"
},
"title": "On",
"type": "array"
},
"id_generation_pattern": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Pattern for generating child IDs. If None, handled by load_input().",
"title": "Id Generation Pattern"
}
},
"required": [
"on"
],
"title": "ExpansionRelationship",
"type": "object"
}
Fields:
Attributes¶
metaxy.models.lineage.ExpansionRelationship.on
pydantic-field
¶
Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.
metaxy.models.lineage.ExpansionRelationship.id_generation_pattern
pydantic-field
¶
id_generation_pattern: str | None = None
Pattern for generating child IDs. If None, handled by load_input().
Functions¶
metaxy.models.lineage.ExpansionRelationship.get_aggregation_columns
¶
Get aggregation columns for the joiner phase.
For expansion relationships, returns None because aggregation happens during diff resolution, not during joining. The joiner should pass through all parent records without aggregation.
Parameters:
Returns:
Source code in src/metaxy/models/lineage.py
def get_aggregation_columns(
self,
target_id_columns: Sequence[str],
) -> Sequence[str] | None:
"""Get aggregation columns for the joiner phase.
For expansion relationships, returns None because aggregation
happens during diff resolution, not during joining. The joiner
should pass through all parent records without aggregation.
Args:
target_id_columns: The target (child) feature's ID columns.
Returns:
None - no aggregation during join phase for expansion relationships.
"""
# Expansion relationships don't aggregate during join phase
# Aggregation happens later during diff resolution
return None
metaxy.models.lineage.AggregationRelationship
pydantic-model
¶
Bases: BaseLineageRelationship
Many-to-one relationship where multiple parent rows aggregate to one child row.
Parent features have more granular ID columns than the child. The child aggregates multiple parent rows by grouping on a subset of the parent's ID columns.
Attributes:
-
on(Sequence[str] | None) –Columns to group by for aggregation. These should be a subset of the target feature's ID columns. If not specified, uses all target ID columns.
Examples:
>>> # Aggregate sensor readings by hour
>>> AggregationRelationship(on=["sensor_id", "hour"])
>>> # Parent has: sensor_id, hour, minute
>>> # Child has: sensor_id, hour
Show JSON schema:
{
"description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nAttributes:\n on: Columns to group by for aggregation. These should be a subset of the\n target feature's ID columns. If not specified, uses all target ID columns.\n\nExamples:\n >>> # Aggregate sensor readings by hour\n >>> AggregationRelationship(on=[\"sensor_id\", \"hour\"])\n >>> # Parent has: sensor_id, hour, minute\n >>> # Child has: sensor_id, hour\n\n >>> # Or use the classmethod\n >>> LineageRelationship.aggregation(on=[\"user_id\", \"session_id\"])",
"properties": {
"type": {
"const": "N:1",
"default": "N:1",
"title": "Type",
"type": "string"
},
"on": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "Columns to group by for aggregation. Defaults to all target ID columns.",
"title": "On"
}
},
"title": "AggregationRelationship",
"type": "object"
}
Fields: