kumoai.connector.FileUploadConnector#

class kumoai.connector.FileUploadConnector[source]#

Bases: Connector

Defines a connector to files directly uploaded to Kumo, either as ‘parquet’ or ‘csv’ (non-partitioned) data.

To get started with file upload, please first upload a table with the upload() method in the FileUploadConnector class. You can then access this table behind the file upload connector as follows:

import kumoai

# Create the file upload connector:
connector = kumoai.FileUploadConnector(file_type="parquet")

# Upload the table; assume it is stored at `/data/users.parquet`
connector.upload(name="users", path="/data/users.parquet")

# Check that the file upload connector has a `users` table:
assert connector.has_table("users")
Parameters:

file_type (str) – The file type of uploaded data. Can be either "csv" or "parquet".

__init__(file_type)[source]#

Creates the connector to uploaded files of type file_type.

property name: str#

Returns the name of the connector.

Note

If the connector does not support naming, the name refers to an internal specifier.

property source_type: DataSourceType#

Returns the data source type accessible by this connector.

upload(name, path, auto_partition=True, partition_size_mb=250)[source]#

Synchronously uploads a table located on your local machine to the Kumo data plane.

Tables uploaded in this way can be accessed with this FileUploadConnector using the provided name, for example: connector_obj["my_table"]

For files larger than 1GB, the table will be automatically partitioned into smaller chunks and uploaded with common prefix that allows FileUploadConnector to union them when reading.

Warning

Uploaded tables must be single files, either in parquet or CSV format(must match connector type). Partitioned tables are not currently supported.

import kumoai
connector = kumoai.FileUploadConnector(file_type="parquet")

# Upload a small table
connector.upload(name="users", path="/data/users.parquet")

# Upload a large parquet table (will be automatically partitioned)
connector.upload(name="transactions",
            path="/data/large_transactions.parquet")

# Disable auto-partitioning (will raise error for large files)
upload(name="users", path="/data/users.parquet",
            auto_partition=False)

# Create a file upload connector for CSV files.
connectorCSV = kumoai.FileUploadConnector(file_type="csv")

# Upload a large CSV table (will be automatically partitioned)
connectorCSV.upload(name="sales", path="/data/large_sales.csv")
Parameters:
  • name (str) – The name of the table to be uploaded. The uploaded table can be accessed from the FileUploadConnector with this name.

  • path (str) – The full path of the table to be uploaded, on the local machine. File Type must match the connector type.

  • auto_partition (bool) – Whether to automatically partition large files (>1GB). If False and file is >1GB, raises ValueError. Supports both Parquet and CSV files.

  • partition_size_mb (int) – The size of each partition in MB. Only used if auto_partition is True.

Return type:

None

delete(name, file_type)[source]#

Synchronously deletes a previously uploaded table from the Kumo data plane.

# Assume we have uploaded a `.parquet` table named `users`, and a
# `FileUploadConnector` has been created called `connector`, and
# we want to delete this table from Kumo:
connector.delete(name="users", file_type="parquet")
Parameters:
  • name (str) – The name of the table to be deleted. This table must have previously been uploaded with a call to upload().

  • file_type (str) – The file type of the table to be deleted; this can either be "parquet" or "csv", and must match the connector file_type.

Return type:

None

has_table(name)#

Returns True if the table exists in this connector, False otherwise.

Parameters:

name (str) – The table name.

Return type:

bool

table(name)#

Returns a SourceTable object corresponding to a source table behind this connector. A source table is a view into the raw data of table name. To use a source table in Kumo, you will need to construct a Table from the source table.

Parameters:

name (str) – The table name.

Raises:

ValueError – if name does not exist in the backing connector.

Return type:

SourceTable

table_names()#

Returns a list of table names accessible through this connector.

Return type:

List[str]