RelBench Datasets#

RelBench is a benchmark suite from Stanford for evaluating machine learning methods on relational data. KumoRFM can load RelBench datasets directly for benchmarking and experimentation.

Installation#

RelBench dataset loading requires the pooch package for downloading and caching:

pip install pooch

Loading a Dataset#

import kumoai.experimental.rfm as rfm

graph = rfm.Graph.from_relbench("f1")

This downloads the dataset (if not cached), extracts the tables and relationships, and returns a fully configured Graph ready for use with KumoRFM.

Usage#

Once loaded, use the graph just like any other:

model = rfm.KumoRFM(graph)

# Explore the graph
graph.print_metadata()
graph.print_links()

# Make predictions or evaluate
result = model.predict("PREDICT ...")
metrics = model.evaluate("PREDICT ...")

Caching#

Datasets are cached locally using pooch and are stored in the system cache directory (typically ~/.cache/relbench/ on Linux/macOS). Subsequent calls to Graph.from_relbench() with the same dataset name will use the cached version.

Available Datasets#

Pass any valid RelBench dataset name to Graph.from_relbench(). If an invalid name is provided, the error message will suggest available datasets.

For the full list of available datasets, visit the RelBench website.