Optional Dependencies¶
By default, feature dependencies use inner joins - a sample only exists in the downstream feature if it exists in all upstream dependencies. Optional dependencies allow you to preserve samples even when some upstream data is missing by using left joins instead.
Overview¶
When building features with multiple dependencies, you often want certain dependencies to be "optional" - meaning the downstream feature should still include samples even if the optional dependency has no matching data.
This is controlled by the optional parameter on FeatureDep.
from metaxy import FeatureSpec, FeatureDep
spec = FeatureSpec(
key="enriched/video",
id_columns=["video_id"],
deps=[
FeatureDep(feature=RawVideo), # Required (default)
FeatureDep(feature=AudioTranscript, optional=True), # Optional
],
fields=["analysis"],
)
Join Behavior¶
| Dependency Type | Join Type | Behavior |
|---|---|---|
Required (optional=False) |
Inner join | Sample must exist in both features |
Optional (optional=True) |
Left join | Sample preserved even if no match in optional dep |
| All optional | Outer join | Any row from any dependency can pass through |
Required dependencies are joined first (inner join), then optional dependencies are joined (left join). If all dependencies are optional, outer joins are used instead to allow any row to pass through.
When to Use Optional Dependencies¶
Enrichment Pattern¶
Use optional dependencies when you have a base feature that should always be present, with optional enrichment data that may or may not exist:
class VideoAnalysis(
BaseFeature,
spec=FeatureSpec(
key="video/analysis",
id_columns=["video_id"],
deps=[
FeatureDep(feature=RawVideo), # Required - defines sample universe
FeatureDep(
feature=AudioTranscription, optional=True
), # Optional enrichment
FeatureDep(feature=ManualAnnotations, optional=True), # Optional enrichment
],
fields=["analysis"],
),
):
pass
In this example:
- All samples from
RawVideoare preserved - If
AudioTranscriptionexists for a sample, it's included; otherwise, those columns areNULL - If
ManualAnnotationsexists for a sample, it's included; otherwise, those columns areNULL
Multi-Source Fusion¶
Use optional dependencies when combining data from multiple sources where not all sources have all samples:
class UnifiedUserProfile(
BaseFeature,
spec=FeatureSpec(
key="user/unified",
id_columns=["user_id"],
deps=[
FeatureDep(feature=CoreUserData), # Required base
FeatureDep(feature=SocialMediaData, optional=True),
FeatureDep(feature=PurchaseHistory, optional=True),
],
fields=["profile"],
),
):
pass
Provenance with Optional Dependencies¶
Metaxy tracks provenance correctly even when optional dependencies have missing data. When an optional dependency has no match, an empty string is used in provenance calculation. This means samples with vs without optional data will have different provenance hashes, which is correct behavior since the sample's lineage genuinely differs based on what upstream data contributed to it.