Writing Predictive Queries#

Here, we discuss the PredictiveQuery object, its usage, and how to frame your business problem in Kumo’s predictive query language (PQL). Predictive queries can be created on a Graph and a query string; the query should reference the names of tables in the graph, and should represent a predictive problem over these tables.

Note

For full documentation on the predictive query language and in-depth examples, please refer to the predictive query tutorial.

How do I start writing a predictive query?#

You can start writing a predictive query in one line of code: simply create a PredictiveQuery object on a graph, and fill in the query string as below:

graph = kumoai.Graph(...)
pquery = kumoai.PredictiveQuery(graph=graph, query="<your query here>")

Please refer to the predictive query tutorial for more information on the query language, to learn how to use PQL to solve your business problem.

How do I confirm my predictive query is correct?#

There are often many ways to represent the same business problem, each of which translates to its own predictive query and machine learning task. With the Kumo platform, you can experiment with all of these formulations, with minimal changes to your end-to-end flow.

For any given predictive query, the SDK offers multiple quick ways to ensure that the query matches your expectations.

  • For syntax validation, validate() will return any errors with query formulation that can be used to guide your query writing.

  • For functionality validation, get_task_type() will return the task type of a predictive query, to confirm that it matches the machine learning problem you are expecting to solve.

For a more in-depth look at the training and prediction data your query produces, you can leverage the kumoai.pquery.TrainingTable.data_df() and the kumoai.pquery.PredictionTable.data_df() methods after generating the corresponding data (see below), which allow you to inspect the generated data that Kumo will train its graph machine learning models on.

How do I use a predictive query for model training?#

Once you’ve defined a predictive query, you can leverage two methods to generate a training table associated with this predictive query, which is used in fit() to fit a model.

First, you can (optionally) suggest a training table generation plan with suggest_training_table_plan(), which returns a training table generation plan that can be customized for advanced use-cases (e.g. to change the split). Detailed documentation for these options is here. If you do not require a custom training table generation plan, the default (Kumo intelligently inferred) will be used when generating a training table.

Next, you can generate a training table with generate_training_table(). This method can be called with non_blocking=True (in which case it produces a Future object and schedules the task to run in the background), or non_blocking=False (in which case it waits until the training table is generated). Once a training table is generated, it can be viewed with methods on the TrainingTable object.

Finally, you can train a model on this training dataset with fit(); see Training Models and Generating Predictions for more details.

How do I use a predictive query for generating predictions?#

A predictive query can generate a prediction table in an identical manner to its use for generating training tables. A generated prediction table can be used in fit() to predict on a fitted model.

First, you can (optionally) suggest a prediction table generation plan with suggest_prediction_table_plan(), which returns a prediction table generation plan that can be customized for advanced use-cases (e.g. to change the anchor time). If you do not require a custom prediction table generation plan, the default (Kumo intelligently inferred) will be used when generating a prediction table.

Next, you can generate a prediction table with generate_prediction_table(). This method can be called with non_blocking=True (in which case it produces a Future object and schedules the task to run in the background), or non_blocking=False (in which case it waits until the prediction table is generated). Once a prediction table is generated, it can be viewed with methods on the PredictionTable object.

Finally, you can generate predictions on this prediction table with predict(); see Training Models and Generating Predictions for more details.