kumoai.pquery.PredictiveQuery#

class kumoai.pquery.PredictiveQuery[source]#

Bases: object

The Kumo predictive query is a declarative syntax for describing a machine learning task. Predictive queries are written using the predictive query language (PQL), a concise SQL-like syntax that allows you to define a model for a new business problem.

A predictive query object can be created from a Graph and a query string. For information on the construction of a query string, please visit the Kumo documentation.

import kumoai

# See `Graph` documentation for more information:
graph = kumoai.Graph(...)

# Create a predictive query representing a machine learning problem
# over this Graph:
pquery = kumoai.PredictiveQuery(
    graph=graph,
    query=(
        "PREDICT MAX(transaction.Quantity, 0, 30) "
        "FOR EACH customer.CustomerID"
    ),
)

# Validate the predictive query configuration, for syntax and
# correctness:
pquery.validate(verbose=True)

# Get the machine learning task type corresponding to this predictive
# query (e.g. binary classification, regression, link prediction, etc.)
print(pquery.get_task_type())

# Suggest a training table generation plan and use it to generate a
# training table from this query, to be used in `Trainer.fit`:
training_table_plan = pquery.suggest_training_table_plan()
training_table = pquery.generate_training_table(training_table_plan)

# Suggest a prediction table generation plan and use it to generate a
# prediction table from this query, to be used in `Trainer.predict`:
pred_table_plan = pquery.suggest_prediction_table_plan()
pred_table = pquery.generate_prediction_table(pred_table_plan)
Parameters:
  • graph (Graph) – The Graph object which the predictive query is defined over.

  • query (str) – A string representation of the predictive query.

__init__(graph, query)[source]#
property id: str#

Returns the unique ID for this predictive query, determined from its schema and the schema of its associated graph. Two queries that differ either in their syntax or in their graph will have different ids.

property train_table: Union[TrainingTable, TrainingTableJob]#

Returns the training table that was last generated by this predictive query. If the predictive query has not yet generated a training table, raises a ValueError.

Note that the training table may be of type TrainingTable or TrainingTableJob, depending on whether the training table was generated with or without waiting for its completion, respectively.

property prediction_table: Union[PredictionTable, PredictionTableJob]#

Returns the prediction table that was last generated by this predictive query. If the predictive query has not yet generated a prediction table, raises a ValueError.

Note that the prediction table may be of type PredictionTable or PredictionTableJob, depending on whether the prediction table was generated with or without waiting for its completion, respectively.

get_task_type()[source]#

Returns the task type of this predictive query. The task type of the query corresponds to the machine learning problem that this query translates to in the Kumo platform; for more information about possible task types, please visit the Kumo documentation.

Return type:

TaskType

validate(verbose=True)[source]#

Validates the syntax of this predictive query, ensuring that the query is formulated correctly in Kumo’s Predictive Query Language and that the query makes semantic sense (defines a suitable predictive problem) on this Graph.

Parameters:

verbose (bool) – Whether to log non-error output of this validation.

Raises:

ValueError – if validation fails.

Return type:

Self

Example

>>> import kumoai
>>> query = kumoai.PredictiveQuery(...)  
>>> query.validate()  
ValidationResponse(warnings=[], errors=[])
save()[source]#

Saves a predictive query to Kumo, returning a unique ID for this query. The unique ID can later be used to load the predictive query object.

Return type:

str

Example

>>> import kumoai
>>> query = kumoai.PredictiveQuery(...)  
>>> query.save()  
pquery-xxx
save_as_template(name)[source]#

Saves a predictive query as a named, re-usable template to Kumo, and returns the saved name as a response. This method can be used to “templatize” / name a query configuration for ease of future reusability.

Parameters:

name (str) – The name of the template to save the query as. If the name is already associated with another query, that query will be overwritten.

Return type:

str

Example

>>> import kumoai
>>> query = kumoai.PredictiveQuery(...)  
>>> query.save_as_template("name")  
>>> loaded = kumoai.PredictiveQuery.load("name")  
>>> loaded == query  
True
classmethod load(pq_id_or_template)[source]#

Loads a predictive query from either a predictive query ID or a named template. Returns a PredictiveQuery object that contains the loaded query along with its associated graph, tables, etc.

Return type:

PredictiveQuery

classmethod load_from_training_job(training_job_id)[source]#

Loads a predictive query from a training job, regardless of the training job’s status. Returns a PredictiveQuery object that contains the loaded query along with its associated graph, tables, etc.

Return type:

PredictiveQuery

generate_training_table(plan=None, *, non_blocking=False, custom_tags={})[source]#

Generates a training table from the specified query string.

Parameters:
  • plan (Optional[TrainingTableGenerationPlan]) – A specification of the parameters for training table generation. If not provided, will use an intelligently generated default plan based on the query and graph. This plan is equivalent to the plan inferred with suggest_training_table_plan(run_mode=RunMode.NORMAL).

  • non_blocking (bool) – Whether this operation should return immediately after launching the training table generation job, or await completion of the generated training table.

  • custom_tags (Mapping[str, str]) – Additional, customer defined k-v tags to be associated with the job to be launched. Job tags are useful for grouping and searching jobs.

Returns:

If non_blocking=False, returns a training table object. If non_blocking=True, returns a training table future object.

Return type:

Union[TrainingTable, TrainingTableJob]

generate_prediction_table(plan=None, *, non_blocking=False, custom_tags={})[source]#

Generates a prediction table from the predictive query query string.

Parameters:
  • plan (Optional[PredictionTableGenerationPlan]) – A specification of the parameters for prediction table generation. If not provided, will use an intelligently generated default plan based on the query and graph. This plan is equivalent to the plan inferred with suggest_prediction_table_plan(run_mode=RunMode.NORMAL).

  • non_blocking (bool) – Whether this operation should return immediately after launching the prediction table generation job, or await completion of the generated prediction table.

  • custom_tags (Mapping[str, str]) – Additional, customer defined k-v tags to be associated with the job to be launched. Job tags are useful for grouping and searching jobs.

Returns:

If non_blocking=False, returns a prediction table object. If non_blocking=True, returns a prediction table future object.

Return type:

Union[PredictionTable, PredictionTableJob]

suggest_training_table_plan(run_mode=RunMode.FAST)[source]#

Suggests a training table generation plan given the predictive query and graph. This training table generation plan can be used to alter the approach Kumo uses to generate the training table for your predictive query.

Parameters:

run_mode (RunMode) – A representation of how quickly you would like your predictive query to complete. Faster run modes correspond to lower training times, at the cost of potentially lower performance.

Return type:

TrainingTableGenerationPlan

suggest_prediction_table_plan()[source]#

Suggests a prediction table generation plan given the predictive query and graph. This prediction table generation plan can be used to alter the approach Kumo uses to generate the prediction table for your predictive query.

Return type:

PredictionTableGenerationPlan

suggest_model_plan(run_mode=RunMode.FAST)[source]#

Suggests a modeling plan given the predictive query and graph. This model plan can be used to alter the approach Kumo uses to train your machine learning model.

Parameters:

run_mode (RunMode) – A representation of how quickly you would like your predictive query to complete. Faster run modes correspond to lower training times, at the cost of potentially lower performance.

Return type:

ModelPlan

fit(training_table_plan=None, model_plan=None, *, non_blocking=False)[source]#

Trains a Kumo model on this predictive query, given optional additional specifications of the training table generation plan and the model plan.

Parameters:
  • training_table_plan (Optional[TrainingTableGenerationPlan]) – A specification of the parameters for training table generation. If not provided, will use an intelligently generated default plan based on the query and graph. This plan is equivalent to the plan inferred with suggest_training_table_plan(run_mode=RunMode.NORMAL).

  • model_plan (Optional[ModelPlan]) – A specification of the parameters for model training. If not provided, will use an intelligently generated default plan based on the query and graph. This plan is equivalent to the plan inferred with suggest_model_plan(run_mode=RunMode.NORMAL).

  • non_blocking (bool) – Whether this operation should return immediately after launching the training job, or await completion of the training job.

Returns:

A tuple with two elements. The first element is the trainer object used to launch the training job. The second element is either a training job object (if non_blocking=True) or a training job future object (if non_blocking=False).

Return type:

Tuple[Trainer, Union[TrainingJobResult, TrainingJob]]

generate_baseline(metrics, train_table, *, non_blocking=False)[source]#

Runs a baseline model on this predictive query, given metrics and optional additional specifications of the training table generation plan.

Parameters:
  • metrics (List[str]) – A list to metrics that baseline model will be evaluated on.

  • train_table (Union[TrainingTable, TrainingTableJob]) – The TrainingTable, or in-progress TrainingTableJob that represents the training data produced by a PredictiveQuery on graph.

  • non_blocking (bool) – Whether this operation should return immediately after launching the baseline job, or await completion of the baseline job. Defaults to False.

Returns:

either a baseline job

object (if non_blocking=True) or a baseline job future object (if non_blocking=False).

Return type:

Union[BaselineJob, BaselineJobResult]