Understanding Table Definitions#
LocalTable
objects wraps a pandas.DataFrame
with metadata about columns, primary keys, and time columns. The semantic types are required metadata, while the primary key and time column are optional.
Each table can have at most one primary key and at most one time column, but it can contain many foreign keys (primary keys of other tables).
Dtype and Metadata Inference#
When creating a LocalTable
, column dtypes and stypes are automatically inferred from the underlying data based on the pandas
data type and heuristics. The key metadata that needs to be properly set includes:
Stypes: Semantic types that determine model processing behavior
Primary key: Unique identifier for the table (optional but recommended)
Time column: Temporal column for time-based operations (optional)
The LocalTable.infer_metadata()
method automates much of this process:
Primary key detection: Uses heuristics to suggest potential primary keys based on column names, uniqueness, and data patterns
Time column detection: Identifies columns with temporal data types or time-related naming patterns
import kumoai.experimental.rfm as rfm
# Automatic inference (recommended for initial setup)
table = rfm.LocalTable(df, "users").infer_metadata()
# Manual override of inferred metadata when needed
table['user_id'].stype = Stype.ID # Override inferred stype
table.primary_key = "user_id" # Override inferred primary key
Basic Table Creation#
import pandas as pd
import kumoai.experimental.rfm as rfm
# Create table with automatic metadata inference
users_table = rfm.LocalTable(
df=df_users,
table_name="users"
).infer_metadata()
# Create table with explicit metadata
transactions_table = rfm.LocalTable(
df=df_transactions,
table_name="transactions",
primary_key="transaction_id",
time_column="timestamp"
)
Inspecting Table Metadata#
# Access table metadata
print(f"Primary key: {users_table.primary_key}")
print(f"Time column: {users_table.time_column}")
print(f"Columns: {[col.name for col in users_table.columns]}")
# View metadata summary
metadata_df = users_table.metadata
print(metadata_df)
# Check column information
print(f"Column: {users_table['age'].name}")
print(f"Dtype: {users_table['age'].dtype}")
print(f"Stype: {users_table['age'].stype}")
What Makes a Good Table#
A good LocalTable
should have:
Clean dtypes: Set proper pandas dtypes at DataFrame level before table creation
Meaningful stypes: ID columns use
Stype.ID
, categorical data usesStype.categorical
, text usesStype.text
, etcUnique primary key: Non-null, no duplicates, uniquely identifies each row, preferably stored as integer
Consistent naming: Foreign keys match their referenced primary key names
Single time column: One temporal column when temporal data is available