RelBench Datasets#
RelBench is a benchmark suite from Stanford for evaluating machine learning methods on relational data. KumoRFM can load RelBench datasets directly for benchmarking and experimentation.
Installation#
RelBench dataset loading requires the pooch package for downloading and
caching:
pip install pooch
Loading a Dataset#
import kumoai.experimental.rfm as rfm
graph = rfm.Graph.from_relbench("f1")
This downloads the dataset (if not cached), extracts the tables and
relationships, and returns a fully configured Graph ready for use
with KumoRFM.
Usage#
Once loaded, use the graph just like any other:
model = rfm.KumoRFM(graph)
# Explore the graph
graph.print_metadata()
graph.print_links()
# Make predictions or evaluate
result = model.predict("PREDICT ...")
metrics = model.evaluate("PREDICT ...")
Caching#
Datasets are cached locally using pooch and are stored in the system cache
directory (typically ~/.cache/relbench/ on Linux/macOS). Subsequent calls
to Graph.from_relbench() with the same dataset name will use the cached
version.
Available Datasets#
Pass any valid RelBench dataset name to Graph.from_relbench(). If an
invalid name is provided, the error message will suggest available datasets.
For the full list of available datasets, visit the RelBench website.