kumoai.pquery.PredictionTable#

class kumoai.pquery.PredictionTable[source]#

Bases: object

A prediction table in the Kumo platform. A prediction table can either be initialized from a job ID of a completed prediction table generation job, or a path on a supported object store (S3 for a SaaS or Databricks deployment, and Snowflake session storage for Snowflake).

Warning

Custom prediction table is an experimental feature; please work with your Kumo POC to ensure you are using it correctly!

import kumoai

# Create a Prediction Table from a prediction table generation job.
# Note that the job ID passed here must be in a completed state:
prediction_table = kumoai.PredictionTable("gen-predtable-job-...")

# Read the prediction table as a Pandas DataFrame:
prediction_df = prediction_table.data_df()

# Get URLs to download the prediction table:
prediction_download_urls = prediction_table.data_urls()
Parameters:
  • job_id (Optional[str]) – ID of the prediction table generation job which generated this prediction table. If a custom table data path is specified, this parameter should be left as None.

  • table_data_path (Optional[str]) – S3 path of the table data location, for which Kumo must at least have read access. If a job ID is specified, this parameter should be left as None.

__init__(job_id=None, table_data_path=None)[source]#
data_urls()[source]#

Returns a list of URLs that can be used to view generated prediction table data; if a custom data path was passed, this path is simply returned.

The list will contain more than one element if the table is partitioned; paths will be relative to the location of the Kumo data plane.

Return type:

List[str]

data_df()[source]#

Returns a Pandas DataFrame object representing the generated or custom-specified prediction table data. :rtype: DataFrame

Warning

This method will load the full prediction table into memory as a DataFrame object. If you are working on a machine with limited resources, please use data_urls() instead to download the data and perform analysis per-partition.

property anchor_time: Optional[datetime]#

Returns the anchor time corresponding to the generated prediction table data, if the data was not custom-specified.