kumoai.experimental.rfm.LocalTable#

class kumoai.experimental.rfm.LocalTable[source]#

Bases: object

A table backed by a pandas.DataFrame.

A LocalTable fully specifies the relevant metadata, i.e. selected columns, column semantic types, primary keys and time columns. LocalTable is used to create a LocalGraph.

import kumoai.experimental.rfm as rfm
import pandas as pd

# Load data from a CSV file:
df = pd.read_csv("data.csv")

# Create a table from a `pandas.DataFrame` and infer its metadata:
table = rfm.LocalTable(df, table_name="my_table").infer_metadata()

# Create a table explicitly:
table = rfm.LocalTable(
    df=df,
    table_name="my_table",
    primary_key="id",
    time_column="time",
)

# Change the semantic type of a column:
table[column].stype = "text"
Parameters:
  • df (DataFrame) – The data frame to create the table from.

  • table_name (str) – The name of the table.

  • primary_key (Optional[str]) – The name of the primary key of this table, if it exists.

  • time_column (Optional[str]) – The name of the time column of this table, if it exists.

__init__(df, table_name, primary_key=None, time_column=None)[source]#
has_column(name)[source]#

Returns True if this table holds a column with name name; False otherwise.

Return type:

bool

column(name)[source]#

Returns the data column named with name name in this table.

Raises:

KeyError – If name is not present in this table.

Return type:

Column

property columns: List[Column]#

Returns a list of Column objects that represent the columns in this table.

has_primary_key()[source]#

Returns True` if this table has a primary key; False otherwise.

Return type:

bool

property primary_key: Optional[Column]#

The primary key column of this table.

The getter returns the primary key column of this table, or None if no such primary key is present.

The setter sets a column as a primary key on this table, and raises a ValueError if the primary key has a non-ID semantic type or if the column name does not match a column in the underlying data frame.

has_time_column()[source]#

Returns True if this table has a time column; False otherwise.

Return type:

bool

property time_column: Optional[Column]#

The time column of this table.

The getter returns the time column of this table, or None if no such time column is present.

The setter sets a column as a time column on this table, and raises a ValueError if the time column has a non-timestamp semantic type or if the column name does not match a column in the underlying data frame.

property metadata: DataFrame#

Returns a pandas.DataFrame object containing metadata information about the columns in this table.

The returned dataframe has columns name, dtype, stype, is_primary_key, and is_time_column, which provide an aggregate view of the properties of the columns of this table.

Example

>>> import kumoai.experimental.rfm as rfm
>>> table = rfm.LocalTable(df=..., table_name=...).infer_metadata()
>>> table.metadata
    name        dtype       stype    is_primary_key is_time_column
0   CustomerID  float64     ID       True            False
infer_metadata(verbose=False)[source]#

Infers metadata for all columns in the table.

Parameters:

verbose (bool) – Whether to print verbose output.

Return type:

Self

validate()[source]#

Validates the table configuration.

Parameters:

verbose – Whether to print validation messages.

Raises:

ValueError – If validation fails.

Return type:

Self