kumoai.pquery.TrainingTable#
- class kumoai.pquery.TrainingTable[source]#
Bases:
object
A training table in the Kumo platform. A training table can be initialized from a job ID of a completed training table generation job.
import kumoai # Create a Training Table from a training table generation job. Note # that the job ID passed here must be in a completed state: training_table = kumoai.TrainingTable("gen-traintable-job-...") # Read the training table as a Pandas DataFrame: training_df = training_table.data_df() # Get URLs to download the training table: training_download_urls = training_table.data_urls() # Add weight column to the training table: # see `kumo-sdk.examples.datasets.weighted_train_table.py` # for a more detailed example # 1. Export train table connector = kumo.S3Connector("s3_path") training_table.export(TrainingTableExportConfig( output_types={'training_table'}, output_connector=connector, output_table_name="<any_name>")) # 2. Assume the weight column was added to the train table # and it was saved to the same S3 path as "<mod_name>" training_table.update(SourceTable("<mod_table>", connector), TrainingTableSpec(weight_col="weight"))
- Parameters:
job_id (
str
) – ID of the training table generation job which generated this training table.
- data_urls()[source]#
Returns a list of URLs that can be used to view generated training table data. The list will contain more than one element if the table is partitioned; paths will be relative to the location of the Kumo data plane.
- data_df()[source]#
Returns a
DataFrame
object representing the generated training data. :rtype:DataFrame
Warning
This method will load the full training table into memory as a
DataFrame
object. If you are working on a machine with limited resources, please usedata_urls()
instead to download the data and perform analysis per-partition.
- export(output_config, non_blocking=True)[source]#
Export the training table to the connector. specified in the output config. Use the exported table to add a weight column then use update to update the training table.
- Parameters:
output_config (
TrainingTableExportConfig
) – The output configuration to write the training table.non_blocking (
bool
) – IfTrue
, the method will return a future object ArtifactExportJob representing the export job. IfFalse
, the method will block until the export job is complete and return ArtifactExportResult.
- Return type:
Union
[ArtifactExportJob
,ArtifactExportResult
]
- validate_custom_table(source_table_type, train_table_mod)[source]#
Validates the modified training table.
- Parameters:
source_table_type (
Union
[S3SourceTable
,SnowflakeSourceTable
,DatabricksSourceTable
,BigQuerySourceTable
,GlueSourceTable
]) – The source table to be used as the modified training table.train_table_mod (
TrainingTableSpec
) – The modification specification.
- Raises:
ValueError – If the modified training table is invalid.
- Return type:
- update(source_table, train_table_mod, validate=True)[source]#
Sets the source_table as the modified training table.
Note
The only allowed modification is the addition of weight column Any other modification might lead to unintentded ERRORS while training. Further negative/NA weight values are not supported.
The custom training table is ingested during trainer.fit() and is used as the training table.
- Parameters:
source_table (
SourceTable
) – The source table to be used as the modified training table.train_table_mod (
TrainingTableSpec
) – The modification specification.validate (
bool
) – Whether to validate the modified training table. This can be slow for large tables.
- Return type:
Self