kumoai.connector.SourceTable#
- class kumoai.connector.SourceTable[source]#
Bases:
object
A source table is a reference to a table stored behind a backing
Connector
. It can be used to examine basic information about raw data connected to Kumo, including a sample of the table’s rows, basic statistics, and column data type information.Once you are ready to use a table as part of a
Graph
, you may create aTable
object from this source table, which includes additional specifying information (including column semantic types and column constraint information).- Parameters:
name (
str
) – The name of this table in the backing connectorconnector (
Connector
) – The connector containing this table.
Note
Source tables can also be augmented with large language models to introduce contextual embeddings for language features. To do so, please consult
add_llm()
.Example
>>> import kumoai >>> connector = kumoai.S3Connector(root_dir='s3://...') >>> articles_src = connector['articles'] >>> articles_src = kumoai.SourceTable('articles', connector)
- property column_dict: Dict[str, SourceColumn]#
Returns the names of the columns in this table along with their
SourceColumn
information.
- property columns: List[SourceColumn]#
Returns a list of the
SourceColumn
metadata of the columns in this table.
- head(num_rows=5)[source]#
Returns the first
num_rows
rows of this source table by reading data from the backing connector.
- add_llm(model, api_key, template, output_dir, output_column_name, output_table_name, dimensions=None, *, non_blocking=False)[source]#
Returns a new source table that includes a column computed via an LLM.
Warning
This method is still experimental; please consult with your Kumo POC before using it.
- Parameters:
model (
str
) – The LLM model name, e.g., OpenAI’s"text-embedding-3-small"
.api_key (
str
) – The API key to call the LLM service.template (
str
) – A template string to be put into the LLM. For example,"{A1} and {A2}"
will fuse columnsA1
andA2
into a single string.output_dir (
str
) – The S3 directory to store the output.output_column_name (
str
) – The output column name for the LLM.output_table_name (
str
) – The output table name.dimensions (
Optional
[int
]) – The desired LLM embedding dimension.non_blocking (
bool
) – Whether making this function non-blocking.
- Return type: