kumoai.experimental.rfm.LocalGraph#

class kumoai.experimental.rfm.LocalGraph[source]#

Bases: object

A graph of LocalTable objects, akin to relationships between tables in a relational database.

Creating a graph is the final step of data definition; after a LocalGraph is created, you can use it to initialize the Kumo Relational Foundation Model (KumoRFM).

import pandas as pd
import kumoai.experimental.rfm as rfm

# Load data frames into memory:
df1 = pd.DataFrame(...)
df2 = pd.DataFrame(...)
df3 = pd.DataFrame(...)

# Define tables from data frames:
table1 = rfm.LocalTable(name="table1", data=df1)
table2 = rfm.LocalTable(name="table2", data=df2)
table3 = rfm.LocalTable(name="table3", data=df3)

# Create a graph from a dictionary of tables:
graph = rfm.LocalGraph({
    "table1": table1,
    "table2": table2,
    "table3": table3,
})

# Infer table metadata:
graph.infer_metadata()

# Infer links/edges:
graph.infer_links()

# Inspect table metadata:
for table in graph.tables.values():
    table.print_metadata()

# Visualize graph (if graphviz is installed):
graph.visualize()

# Add/Remove edges between tables:
graph.link(src_table="table1", fkey="id1", dst_table="table2")
graph.unlink(src_table="table1", fkey="id1", dst_table="table2")

# Validate graph:
graph.validate()
__init__(tables, edges=None)[source]#
classmethod from_data(df_dict, edges=None, infer_metadata=True, verbose=True)[source]#

Creates a LocalGraph from a dictionary of pandas.DataFrame objects.

Automatically infers table metadata and links.

import pandas as pd
import kumoai.experimental.rfm as rfm

# Load data frames into memory:
df1 = pd.DataFrame(...)
df2 = pd.DataFrame(...)
df3 = pd.DataFrame(...)

# Create a graph from a dictionary of data frames:
graph = rfm.LocalGraph.from_data({
    "table1": df1,
    "table2": df2,
    "table3": df3,
})

# Inspect table metadata:
for table in graph.tables.values():
    table.print_metadata()

# Visualize graph (if graphviz is installed):
graph.visualize()
Parameters:
  • df_dict (Dict[str, DataFrame]) – A dictionary of data frames, where the keys are the names of the tables and the values hold table data.

  • infer_metadata (bool) – Whether to infer metadata for all tables in the graph.

  • edges (Optional[List[Edge]]) – An optional list of Edge objects to add to the graph. If not provided, edges will be automatically inferred from the data.

  • verbose (bool) – Whether to print verbose output.

Return type:

Self

Note

This method will automatically infer metadata and links for the graph.

Example

>>> import kumoai.experimental.rfm as rfm
>>> df1 = pd.DataFrame(...)
>>> df2 = pd.DataFrame(...)
>>> df3 = pd.DataFrame(...)
>>> graph = rfm.LocalGraph.from_data(data={
...     "table1": df1,
...     "table2": df2,
...     "table3": df3,
... })
... graph.validate()
has_table(name)[source]#

Returns True if the graph has a table with name name; False otherwise.

Return type:

bool

table(name)[source]#

Returns the table with name name in the graph.

Raises:

KeyError – If name is not present in the graph.

Return type:

LocalTable

property tables: Dict[str, LocalTable]#

Returns the dictionary of table objects.

add_table(table)[source]#

Adds a table to the graph.

Parameters:

table (LocalTable) – The table to add.

Raises:

KeyError – If a table with the same name already exists in the graph.

Return type:

Self

remove_table(name)[source]#

Removes a table with name from the graph.

Parameters:

name (str) – The table to remove.

Raises:

KeyError – If no such table is present in the graph.

Return type:

Self

property metadata: DataFrame#

Returns a pandas.DataFrame object containing metadata information about the tables in this graph.

The returned dataframe has columns name, primary_key, and time_column, which provide an aggregate view of the properties of the tables of this graph.

Example

>>> import kumoai.experimental.rfm as rfm
>>> graph = rfm.LocalGraph(tables=...).infer_metadata()
>>> graph.metadata
    name  primary_key time_column
0   users     user_id           -
print_metadata()[source]#

Prints the metadata() of the graph.

Return type:

None

infer_metadata(verbose=True)[source]#

Infers metadata for all tables in the graph.

Parameters:

verbose (bool) – Whether to print verbose output.

Return type:

Self

Note

For more information, please see kumoai.experimental.rfm.LocalTable.infer_metadata().

property edges: List[Edge]#

Returns the edges of the graph.

Prints the edges() of the graph.

Return type:

None

Links two tables (src_table and dst_table) from the foreign key fkey in the source table to the primary key in the destination table.

The link is treated as bidirectional.

Parameters:
  • src_table (Union[str, LocalTable]) – The name of the source table of the edge. This table must have a foreign key with name fkey that links to the primary key in the destination table.

  • fkey (str) – The name of the foreign key in the source table.

  • dst_table (Union[str, LocalTable]) – The name of the destination table of the edge. This table must have a primary key that links to the source table’s foreign key.

Raises:

ValueError – if the edge is already present in the graph, if the source table does not exist in the graph, if the destination table does not exist in the graph, if the source key does not exist in the source table.

Return type:

Self

Removes an Edge from the graph.

Parameters:
  • src_table (Union[str, LocalTable]) – The name of the source table of the edge.

  • fkey (str) – The name of the foreign key in the source table.

  • dst_table (Union[str, LocalTable]) – The name of the destination table of the edge.

Raises:

ValueError – if the edge is not present in the graph.

Return type:

Self

Infers links for the tables and adds them as edges to the graph.

Parameters:

verbose (bool) – Whether to print verbose output.

Return type:

Self

Note

This function expects graph edges to be undefined upfront.

validate()[source]#

Validates the graph to ensure that all relevant metadata is specified for its tables and edges.

Concretely, validation ensures that edges properly link foreign keys to primary keys between valid tables. It additionally ensures that primary and foreign keys between tables in an Edge are of the same data type.

Raises:

ValueError – if validation fails.

Return type:

Self

visualize(path=None, show_columns=True)[source]#

Visualizes the tables and edges in this graph using the graphviz library.

Parameters:
  • path (Union[str, BytesIO, None]) – A path to write the produced image to. If None, the image will not be written to disk.

  • show_columns (bool) – Whether to show all columns of every table in the graph. If False, will only show the primary key, foreign key(s), and time column of each table.

Return type:

Graph

Returns:

A graphviz.Graph instance representing the visualized graph.