kumoai.trainer.DistillationPlan

kumoai.trainer.DistillationPlan#

class kumoai.trainer.DistillationPlan[source]#

Bases: object

Defines attributes that affect the features/interactions used to train the online serving model.

Variables:
  • embedding_keys – (list[str]) Key column(s) in the entity table used to extract embeddings from the deep model during distillation. A primary key extracts from the entity table itself; foreign key(s) extract from their corresponding 1-hop neighbor table(s).

  • max_embedding_offset – (TimeOffset) Maximum staleness of deep model embeddings relative to the anchor time. Defines the upper bound on the offset between the embedding seed time and the anchor time.

  • min_embedding_offset – (TimeOffset) Minimum staleness of deep model embeddings relative to the anchor time. Defines the lower bound on the offset between the embedding seed time and the anchor time. Models the latency between embedding generation and its availability at serving.

  • real_time_interactions – (dict[str, int]) Real-time interaction key paths mapped to the maximum number of recent interactions to incorporate at inference time. For entity-level predictions, formatted as 'entityTable.pkeyCol->interactionTable.fkeyCol'. For fact-level predictions, formatted as 'factTable.fkeyCol->entityTable.pkeyCol->interactionTable.fkeyCol'. Examples: {'users.id->orders.user_id': 32}, {'orders.user_id->users.id->views.user_id': 16}. (default: {}).

  • real_time_offset – (TimeOffset) Minimum offset between the anchor time and the most recent real-time interaction available at serving. Models the end-to-end ingestion latency of the real-time data pipeline; interactions arriving within this window of the anchor time are excluded.

__init__(embedding_keys=FieldInfo(default=???, min_length=1, extra={'metadata': Metadata(tunable=False, hidden=False, valid_task_types=[binary_classification, multiclass_classification, multilabel_classification, multilabel_ranking, regression, temporal_link_prediction, static_link_prediction, forecasting, link_prediction], valid_query_types=[static, temporal])}), max_embedding_offset=FieldInfo(default=???, extra={'metadata': Metadata(tunable=False, hidden=False, valid_task_types=[binary_classification, multiclass_classification, multilabel_classification, multilabel_ranking, regression, temporal_link_prediction, static_link_prediction, forecasting, link_prediction], valid_query_types=[static, temporal])}), min_embedding_offset=FieldInfo(default=???, extra={'metadata': Metadata(tunable=False, hidden=False, valid_task_types=[binary_classification, multiclass_classification, multilabel_classification, multilabel_ranking, regression, temporal_link_prediction, static_link_prediction, forecasting, link_prediction], valid_query_types=[static, temporal])}), real_time_interactions=FieldInfo(default={}, extra={'metadata': Metadata(tunable=False, hidden=False, valid_task_types=[binary_classification, multiclass_classification, multilabel_classification, multilabel_ranking, regression, temporal_link_prediction, static_link_prediction, forecasting, link_prediction], valid_query_types=[static, temporal])}), real_time_offset=FieldInfo(default=???, extra={'metadata': Metadata(tunable=False, hidden=False, valid_task_types=[binary_classification, multiclass_classification, multilabel_classification, multilabel_ranking, regression, temporal_link_prediction, static_link_prediction, forecasting, link_prediction], valid_query_types=[static, temporal])}))#