kumoai.experimental.rfm.LocalTable#

class kumoai.experimental.rfm.LocalTable[source]#

Bases: object

A table backed by a pandas.DataFrame.

A LocalTable fully specifies the relevant metadata, i.e. selected columns, column semantic types, primary keys and time columns. LocalTable is used to create a LocalGraph.

import pandas as pd
import kumoai.experimental.rfm as rfm

# Load data from a CSV file:
df = pd.read_csv("data.csv")

# Create a table from a `pandas.DataFrame` and infer its metadata ...
table = rfm.LocalTable(df, name="my_table").infer_metadata()

# ... or create a table explicitly:
table = rfm.LocalTable(
    df=df,
    name="my_table",
    primary_key="id",
    time_column="time",
)

# Verify metadata:
table.print_metadata()

# Change the semantic type of a column:
table[column].stype = "text"
Parameters:
  • df (DataFrame) – The data frame to create the table from.

  • name (str) – The name of the table.

  • primary_key (Optional[str]) – The name of the primary key of this table, if it exists.

  • time_column (Optional[str]) – The name of the time column of this table, if it exists.

__init__(df, name, primary_key=None, time_column=None)[source]#
property name: str#

The name of the table.

has_column(name)[source]#

Returns True if this table holds a column with name name; False otherwise.

Return type:

bool

column(name)[source]#

Returns the data column named with name name in this table.

Parameters:

name (str) – The name of the column.

Raises:

KeyError – If name is not present in this table.

Return type:

Column

property columns: List[Column]#

Returns a list of Column objects that represent the columns in this table.

remove_column(name)[source]#

Removes a column from this table.

Parameters:

name (str) – The name of the column.

Raises:

KeyError – If name is not present in this table.

Return type:

Self

has_primary_key()[source]#

Returns True` if this table has a primary key; False otherwise.

Return type:

bool

property primary_key: Column | None#

The primary key column of this table.

The getter returns the primary key column of this table, or None if no such primary key is present.

The setter sets a column as a primary key on this table, and raises a ValueError if the primary key has a non-ID semantic type or if the column name does not match a column in the data frame.

has_time_column()[source]#

Returns True if this table has a time column; False otherwise.

Return type:

bool

property time_column: Column | None#

The time column of this table.

The getter returns the time column of this table, or None if no such time column is present.

The setter sets a column as a time column on this table, and raises a ValueError if the time column has a non-timestamp semantic type or if the column name does not match a column in the data frame.

property metadata: DataFrame#

Returns a pandas.DataFrame object containing metadata information about the columns in this table.

The returned dataframe has columns name, dtype, stype, is_primary_key, and is_time_column, which provide an aggregate view of the properties of the columns of this table.

Example

>>> import kumoai.experimental.rfm as rfm
>>> table = rfm.LocalTable(df=..., name=...).infer_metadata()
>>> table.metadata
    name        dtype       stype    is_primary_key is_time_column
0   CustomerID  float64     ID       True            False
print_metadata()[source]#

Prints the metadata() of the table.

Return type:

None

infer_metadata(verbose=True)[source]#

Infers metadata, i.e., primary keys and time columns, in the table.

Parameters:

verbose (bool) – Whether to print verbose output.

Return type:

Self