kumoai.experimental.rfm#
KumoRFM (Kumo Relational Foundation Model) is an experimental feature that provides a powerful interface to query relational data using a pre-trained foundation model. Unlike traditional machine learning approaches that require feature engineering and model training, KumoRFM can generate predictions directly from raw relational data using PQL queries.
Note
KumoRFM is currently in experimental phase. The API may change in future releases.
Overview#
KumoRFM consists of three main components:
LocalTable: A
pandas.DataFrame
wrapper that manages metadata including data types, semantic types, primary keys, and time columnsLocalGraph: A collection of related
LocalTable
objects with edges defining relationships between tablesKumoRFM: The main interface to query the relational foundation model
Workflow#
The typical KumoRFM workflow follows these steps:
Data Preparation: Load your relational data into
pandas.DataFrame
objectsTable Creation: Create
LocalTable
objects from your data framesGraph Construction: Build a
LocalGraph
that defines relationships between tablesModel Initialization: Initialize
KumoRFM
with your graphQuerying: Execute PQL queries to get predictions
Quick Example#
Here’s a simple example showing how to use KumoRFM with e-commerce data:
import pandas as pd
from kumoai.experimental.rfm import LocalTable, LocalGraph, KumoRFM
# Load your data
users_df = pd.DataFrame({
'user_id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'created_at': pd.date_range('2023-01-01', periods=5),
'age': [25, 30, 35, 40, 45]
})
orders_df = pd.DataFrame({
'order_id': [101, 102, 103, 104, 105],
'user_id': [1, 2, 1, 3, 4],
'amount': [100.0, 250.0, 75.0, 300.0, 150.0],
'order_date': pd.date_range('2023-02-01', periods=5)
})
# Create LocalGraph from data
graph = LocalGraph.from_data({
'users': users_df,
'orders': orders_df
})
# Initialize KumoRFM
rfm = KumoRFM(graph)
# Query the model
result = rfm.query(
"PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1"
)
# Result is a pandas DataFrame with prediction probabilities
print(result) # user_id COUNT(orders.*, 0, 30, days) > 0
# 1 0.85
Query Language#
KumoRFM uses the predictive query language (PQL) for making predictions. For a broader introduction to PQL, see the Writing Predictive Queries. The KumoRFM PQL syntax differs slightly from the general PQL syntax. The user must specify the entity (or entities) to make predictions for.
While the general PQL structure stays the same: .. code-block:: sql
PREDICT <aggregation_expression> FOR <entity_specification>
The entities for each query can be specified in one of two ways:
- By specifying a single entity id, e.g. users.user_id=1
- By specifying a tuple of entity ids, e.g. users.user_id IN (1, 2, 3 )
Classes#
LocalTable#
A LocalTable
represents a single table backed by a pandas DataFrame with
rich metadata support.
Key features:
Metadata Management: Automatic inference of data types and semantic types
Primary Key Support: Specify or auto-detect primary keys
Time Column Support: Handle temporal data with designated time columns
Validation: Comprehensive validation of table structure and metadata
Example usage:
# Create from DataFrame with explicit metadata
table = LocalTable(
df=df,
table_name="users",
primary_key="user_id",
time_column="created_at"
)
# Infer metadata automatically
table.infer_metadata()
# Access column metadata
column = table.column("user_id")
print(column.stype) # Stype.ID
LocalGraph#
A LocalGraph
represents relationships between multiple LocalTable
objects,
similar to a relational database schema.
Key features:
Multiple Construction Methods: Create from tables or directly from DataFrames
Relationship Management: Define and manage edges between tables
Automatic Link Inference: Intelligent detection of foreign key relationships
Graph Validation: Ensure graph structure meets requirements before using with KumoRFM
Example usage:
# Create from tables
graph = LocalGraph(tables=[users_table, orders_table])
# Or create directly from data
graph = LocalGraph.from_data({
'users': users_df,
'orders': orders_df
})
# Manual relationship management
graph.link('orders', 'user_id', 'users')
graph.unlink('orders', 'user_id', 'users')
# Validation
graph.validate()
KumoRFM#
The main KumoRFM
class provides the interface to query the relational
foundation model.
Key features:
Model Initialization: Automatic setup of serving endpoints
Query Interface: Execute PQL queries to get predictions
Async Operations: Non-blocking operations with status monitoring
Resource Management: Automatic cleanup of cloud resources
Example usage:
# Initialize with local graph
rfm = KumoRFM(graph)
# Query the model
result = rfm.query(
"PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1"
)
print(result) # user_id COUNT(orders.*, 0, 30, days) > 0
# 1 0.85
Best Practices#
Data Preparation#
Clean Data: Ensure your DataFrames are clean with no duplicate column names
Consistent Types: Use consistent data types across related columns
Consistent Column Names: Ensure column names are consistent across related tables
Primary Keys: Include a primary key column in each table if possible
Time Columns: Each table should have at most one time column
Graph Design#
Metapath lengths: Keep metapath lengths reasonable (ideally 2-3 hops) - Longer paths may lead to performance issues and less interpretable results - If your relational schema is very complex, it might be worth splitting it into multiple graphs
Meaningful Relationships: Ensure that the inferred relationships are meaningful/correct
Validation: Always validate your graph before using with KumoRFM.
Size Limits: There is a 10GB limit on the total size of the graph.
Querying#
Start Simple: Begin with basic
COUNT
queries before moving to complex aggregations.Time Windows: Use appropriate time windows for temporal queries.
Entity Specification: Be specific about which entities you’re predicting for.
Limitations#
Graph Size: Maximum graph size is 10GB
Experimental Status: API may change in future releases
See Also#
kumoai.graph - Core graph functionality
kumoai.trainer - Traditional ML training approaches