Querying KumoRFM#
Predictive Query is a querying language that allows you to define a predictive problem. Predictive Query (PQL) lets you define predictive problems by specifying:
The target aggregation expression
The entities to predict for
Optional filters that can be defined to refine the context data.
For the full thorough introduction to predictive query, please refer to
the predictive query tutorial.
In this page, you can find how to use predictive query to interact with
KumoRFM
.
Note
KumoRFM
is currently in experimental phase. Some of the predictive query
features are not fully supported yet.
Writing Queries in Kumo#
In general, follow these five steps to author a PQL:
Choose your entity – a table and its primary key you predict for.
Define the target – a raw column or an aggregation over a future window.
Pin the entity list – pass a single ID or multiple IDs to make predictions for.
(Optional) Refine the context – filters to restrict which historical rows are used for feature generation.
Run & fetch – run
KumoRFM.predict()
orKumoRFM.evaluate()
.
Defining entities#
The general PQL structure is:
PREDICT <aggregation_expression> FOR <entity_specification> WHERE <optional_filters>
Component |
Purpose |
---|---|
|
Declares the value or aggregate the model should predict |
|
Specifies the single ID or list of IDs to predict for |
|
Filters which historical rows are used to generate features |
Unlike the enterprise product, KumoRFM makes a prediction for a handful of selected entities at a time. As such, entities for each query can be specified in one of two ways:
By specifying a single entity id, e.g.
users.user_id=1
By specifying a tuple of entity ids, e.g.
users.user_id IN (1, 2, 3 )
Improving the context through entity filters#
KumoRFM
makes its entity-specific predictions based on context examples,
collected from the database. Just like entity filters allow you to control the
training data in the Kumo enterprise product, they can be used to provide
more control over KumoRFM
context examples.
For example, to exclude users without recent activity from the context, we
can write:
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=1 WHERE COUNT(orders.*, -30, 0, days) > 0
This limits the context examples to predicting churn for active users, limiting the context to examples relevant to your case and improving the performance. These filters are NOT applied to the provided entity list.
Evaluation mode#
Besides making predictions, KumoRFM
also defines an evaluation mode to
perform automatic evaluation on a sample of predictions.
>>> query = "EVALUATE PREDICT COUNT(orders.*, 0, 30, days) FOR users.user_id=1"
>>> metrics = rfm.evaluate(query)
>>> print(metrics)
Unsupported features#
Due to the experimental nature of KumoRFM
, some features are not yet fully
supported and will be added soon.
Only numerical and categorical columns are valid target columns, except for
LIST_DISTINCT()
aggregation, where only foreign key targets are supported.ASSUMING
clause is not permitted.Filtering by column value (e.g.,
WHERE users.age > 21
) is only supported for columns within the same table. Same goes for predicting a single non-aggregated value, e.g.,PREDICT users.age
.LIST_DISTINCT()
without a time interval is not supported.LAST()
andFIRST()
aggregations are not supported.