kumoai.connector.SourceTable#
- class kumoai.connector.SourceTable[source]#
- Bases: - object- A source table is a reference to a table stored behind a backing - Connector. It can be used to examine basic information about raw data connected to Kumo, including a sample of the table’s rows, basic statistics, and column data type information.- Once you are ready to use a table as part of a - Graph, you may create a- Tableobject from this source table, which includes additional specifying information (including column semantic types and column constraint information).- Parameters:
- name ( - str) – The name of this table in the backing connector
- connector ( - Connector) – The connector containing this table.
 
 - Note - Source tables can also be augmented with large language models to introduce contextual embeddings for language features. To do so, please consult - add_llm().- Example - >>> import kumoai >>> connector = kumoai.S3Connector(root_dir='s3://...') >>> articles_src = connector['articles'] >>> articles_src = kumoai.SourceTable('articles', connector) - property column_dict: Dict[str, SourceColumn]#
- Returns the names of the columns in this table along with their - SourceColumninformation.
 - property columns: List[SourceColumn]#
- Returns a list of the - SourceColumnmetadata of the columns in this table.
 - head(num_rows=5)[source]#
- Returns the first - num_rowsrows of this source table by reading data from the backing connector.
 - add_llm(model, api_key, template, output_dir, output_column_name, output_table_name, dimensions=None, *, non_blocking=False)[source]#
- Experimental method which returns a new source table that includes a column computed via an LLM such as OpenAI embedding models. Please refer to the example script for more details. - Note - Current LLM embedding only works for - SourceTablein s3.- Note - Your - api_keywill be encrypted once we received it and it’s only decrypted just before we call the OpenAI text embeddings.- Note - Please keep track of the token usage in the OpenAI Dashboard. If number of tokens in the data exceeds the limit, the backend will raise an error and no result will be produced. - Warning - This method only supports text embedding with data that has less than ~6 million tokens. Number of tokens is estimated by following this guide. - Warning - This method is still experimental. Please consult with your Kumo POC before using it. - Parameters:
- model ( - str) – The LLM model name, e.g., OpenAI’s- "text-embedding-3-small".
- api_key ( - str) – The API key to call the LLM service.
- template ( - str) – A template string to be put into the LLM. For example,- "{A1} and {A2}"will fuse columns- A1and- A2into a single string.
- output_dir ( - str) – The S3 directory to store the output.
- output_column_name ( - str) – The output column name for the LLM.
- output_table_name ( - str) – The output table name.
- dimensions ( - Optional[- int]) – The desired LLM embedding dimension.
- non_blocking ( - bool) – Whether making this function non-blocking.
 
- Return type:
- Union[- SourceTable,- LLMSourceTableFuture]
 - Example - >>> import kumoai >>> connector = kumoai.S3Connector(root_dir='s3://...') >>> articles_src = connector['articles'] >>> articles_src_future = \ connector["articles"].add_llm( model="text-embedding-3-small", api_key=YOUR_OPENAI_API_KEY, template=("The product {prod_name} in the {section_name} section" "is categorized as {product_type_name} " "and has following description: {detail_desc}"), output_dir=YOUR_OUTPUT_DIR, output_column_name="embedding_column", output_table_name="articles_emb", dimensions=256, non_blocking=True, ) >>> articles_src_future.status() >>> articles_src_future.cancel() >>> articles_src = articles_src_future.result()