kumoai.connector.SourceTable#

class kumoai.connector.SourceTable[source]#

Bases: object

A source table is a reference to a table stored behind a backing Connector. It can be used to examine basic information about raw data connected to Kumo, including a sample of the table’s rows, basic statistics, and column data type information.

Once you are ready to use a table as part of a Graph, you may create a Table object from this source table, which includes additional specifying information (including column semantic types and column constraint information).

Parameters:
  • name (str) – The name of this table in the backing connector

  • connector (Connector) – The connector containing this table.

Note

Source tables can also be augmented with large language models to introduce contextual embeddings for language features. To do so, please consult add_llm().

Example

>>> import kumoai
>>> connector = kumoai.S3Connector(root_dir='s3://...')  
>>> articles_src = connector['articles']  
>>> articles_src = kumoai.SourceTable('articles', connector)  
__init__(name, connector)[source]#
property column_dict: Dict[str, SourceColumn]#

Returns the names of the columns in this table along with their SourceColumn information.

property columns: List[SourceColumn]#

Returns a list of the SourceColumn metadata of the columns in this table.

head(num_rows=5)[source]#

Returns the first num_rows rows of this source table by reading data from the backing connector.

Parameters:

num_rows (int) – The number of rows to select. If num_rows is larger than the number of available rows, all rows will be returned.

Return type:

DataFrame

Returns:

The first num_rows rows of the source table as a DataFrame.

add_llm(model, api_key, template, output_dir, output_column_name, output_table_name, dimensions=None, *, non_blocking=False)[source]#

Returns a new source table that includes a column computed via an LLM.

Warning

This method is still experimental; please consult with your Kumo POC before using it.

Parameters:
  • model (str) – The LLM model name, e.g., OpenAI’s "text-embedding-3-small".

  • api_key (str) – The API key to call the LLM service.

  • template (str) – A template string to be put into the LLM. For example, "{A1} and {A2}" will fuse columns A1 and A2 into a single string.

  • output_dir (str) – The S3 directory to store the output.

  • output_column_name (str) – The output column name for the LLM.

  • output_table_name (str) – The output table name.

  • dimensions (Optional[int]) – The desired LLM embedding dimension.

  • non_blocking (bool) – Whether making this function non-blocking.

Return type:

Union[SourceTable, SourceTableFuture]