kumoai.pquery.TrainingTable#
- class kumoai.pquery.TrainingTable[source]#
Bases:
object
A training table in the Kumo platform. A training table can be initialized from a job ID of a completed training table generation job.
import kumoai # Create a Training Table from a training table generation job. Note # that the job ID passed here must be in a completed state: training_table = kumoai.TrainingTable("gen-traintable-job-...") # Read the training table as a Pandas DataFrame: training_df = training_table.data_df() # Get URLs to download the training table: training_download_urls = training_table.data_urls() # Add weight column to the training table: # see `kumo-sdk.examples.datasets.weighted_train_table.py` # for a more detailed example # 1. Export train table connector = kumo.S3Connector("s3_path") training_table.export(TrainingTableExportConfig( output_types={'training_table'}, output_connector=connector, output_table_name="<any_name>")) # 2. Assume the weight column was added to the train table # and it was saved to the same S3 path as "<mod_name>" training_table.update(SourceTable("<mod_table>", connector), TrainingTableSpec(weight_col="weight"))
- Parameters:
job_id (
str
) – ID of the training table generation job which generated this training table.
- data_urls()[source]#
Returns a list of URLs that can be used to view generated training table data. The list will contain more than one element if the table is partitioned; paths will be relative to the location of the Kumo data plane.
- data_df()[source]#
Returns a
DataFrame
object representing the generated training data. :rtype:DataFrame
Warning
This method will load the full training table into memory as a
DataFrame
object. If you are working on a machine with limited resources, please usedata_urls()
instead to download the data and perform analysis per-partition.
- export(output_config, non_blocking=True)[source]#
Export the training table to the connector. specified in the output config. Use the exported table to add a weight column then use update to update the training table.
- Parameters:
output_config (
TrainingTableExportConfig
) – The output configuration to write the training table.non_blocking (
bool
) – IfTrue
, the method will return a future object ArtifactExportJob representing the export job. IfFalse
, the method will block until the export job is complete and return ArtifactExportResult.
- Return type:
Union
[ArtifactExportJob
,ArtifactExportResult
]
- validate_custom_table(source_table_type, train_table_mod)[source]#
Validates the modified training table.
- Parameters:
source_table (SourceTable) – The source table to be used as the modified training table.
train_table_mod (TrainTableSpec) – The modification specification.
- Raises:
ValueError – If the modified training table is invalid.
- Return type:
- update(source_table, train_table_mod, validate=True)[source]#
Sets the source_table as the modified training table.
Note
The only allowed modification is the addition of weight column Any other modification might lead to unintentded ERRORS downstream.
The custom training table is ingested during trainer.fit() and is used as the training table.
- Parameters:
source_table (SourceTable) – The source table to be used as the modified training table.
table_mod_spec (TrainTableSpec) – The modification specification.
validate (bool) – Whether to validate the modified training table. This can be slow for large tables.
- Return type:
Self