Working With Tables#
Here, we discuss common patterns for working with tables in the Kumo
SDK. Tables are created from SourceTable
objects;
while SourceTables simply represent a view of data behind a backing connector,
Tables contain additional metadata and information for the Kumo machine
learning platform. Concretely, this additional information includes:
Each included column’s
name
,dtype
(data type), andstype
(semantic type). For information about what types of columns to select, please reference this guide. For information on how to choose data and semantic types, please reference this guide.The primary key of the table, if present
The time column of the table, if present
The end time column of the table, if present
How do I create a table?#
Creating tables requires a SourceTable
object, which
can be obtained from any Connector
, either with
Python indexing semantics (e.g. connector[table_name]
) or with
table()
. After inspecting the source table
(e.g. with head()
) to verify its data
matches your expectations, you can either create a table implicitly with
from_source_table()
or explicitly by
specifying each field in the Table
constructor. We show
both methods below:
Implicit Creation. Implicit creation lets you create a
Table
from a SourceTable
in
one line:
table = kumoai.Table.from_source_table(source_table)
which will use all columns in the source table by default. You can customize
this and additionally specify any further metadata as part of this method
call; please see the documentation of
from_source_table()
for more details.
After this call, table
will be of type Table
, but
it will not have all metadata specified for its constituent columns (e.g.
dtype
and stype
). You can either explictly specify this metadata later
(see “How do I edit a table?”, or let Kumo infer it with
infer_metadata()
).
Explicit Creation. If you want to be more precise about table creation,
you can choose to manually create a table with the Table
constructor. This lets you specify (partially or fully) any of the attributes
that a Table specifies:
table = kumoai.Table(
source_table = source_table,
columns = [
kumoai.Column('string_col', 'string', 'text'),
# Columns can also be specified as dictionaries. Note here that the
# stype is left unspecified: this is OK, as long as we specify it
# later before using the Table in a Predictive Query:
dict(name='int_col', dtype='int')
],
# The name of the primary key column, if it exists:
primary_key = 'int_col',
)
Similar to implicit creation, a table created this way may not fully specify
all of its consituent elements (e.g. the semantic type of int_col
was
left unspecified above). You can either explictly specify this metadata later
(see “How do I edit a table?”, or let Kumo infer it with
infer_metadata()
).
How do I view the metadata of a table?#
Table
provides a convenience property for you to view
its metadata: metadata
, which outputs a
DataFrame
object containing a summary of every included
column’s name, type, and role.
Individual methods are also provided to access column and table-level metadata; please see the package reference for more details.
How do I edit a table?#
Editing a Table
is simple and Pythonic: every property
is modifiable with the typical Python style, for both column and table-level
attributes. We share some examples below:
Editing a table’s primary key (note: the primary key must already be a column of the table):
# Set the primary key:
table.primary_key = 'new_primary_key'
# Unset (remove) the primary key:
table.primary_key = None
# Check if a table has a primary key:
print(f"Table has primary key? {table.has_primary_key()}"")
Adding a column to a table, and editing its metadata:
# Adding a new column named 'col':
table.add_column(name="col", dtype="int")
# Editing the column's semantic type:
table.column("col").stype = "categorical"
# Removing the column altogether:
table.remove_column("col")
How do I save a table for future usage?#
Tables do not have names in the Kumo SDK; a table is fully specified by its configuration in code. That is, if you use the same table configuration in two different notebooks, they will refer to the same table object in the Kumo backend. And if you edit a table, it will refer to a new object in the Kumo backend, independent of other tables.
Note
We encourage users to fully specify their tables in production code, to avoid unexpected re-inferrals of metadata.