kumoai.connector.FileUploadConnector#

class kumoai.connector.FileUploadConnector[source]#

Bases: Connector

Defines a connector to files directly uploaded to Kumo, either as ‘parquet’ or ‘csv’ (non-partitioned) data.

To get started with file upload, please first upload a table with the upload() method in the FileUploadConnector class. You can then access this table behind the file upload connector as follows:

import kumoai

# Create the file upload connector:
connector = kumoai.FileUploadConnector(file_type="parquet")

# Upload the table; assume it is stored at `/data/users.parquet`
connector.upload(name="users", path="/data/users.parquet")

# Check that the file upload connector has a `users` table:
assert connector.has_table("users")
Parameters:

file_type (str) – The file type of uploaded data. Can be either "csv" or "parquet".

__init__(file_type)[source]#

Creates the connector to uploaded files of type file_type.

property name: str#

Returns the name of the connector.

Note

If the connector does not support naming, the name refers to an internal specifier.

property source_type: DataSourceType#

Returns the data source type accessible by this connector.

upload(name, path, auto_partition=True, partition_size_mb=250)[source]#

Upload a table to Kumo from a local or remote path.

Supports s3://, gs://, abfs://, abfss://, and az://

Tables uploaded this way can be accessed from this FileUploadConnector using the provided name, e.g., connector_obj["my_table"].

Return type:

None

Local files#

  • Accepts one .parquet or .csv file (must match this connector’s file_type).

  • If the file is > 1 GiB and auto_partition=True, it is split into ~``partition_size_mb`` MiB parts and uploaded under a common prefix so the connector can read them as one table.

Remote paths#

  • Single file (.parquet/.csv): validated and uploaded via multipart PUT. Files > 1 GiB are rejected — re-shard to ~200 MiB and upload the directory instead.

  • Directory: must contain only one format (all Parquet or all CSV) matching this connector’s file_type. Files are validated (consistent schema; CSV headers sanitized) and uploaded in parallel with memory-safe budgeting.

Warning

For local uploads, input must be a single CSV or Parquet file (matching the connector type). For remote uploads, mixed CSV/Parquet directories are not supported. Remote single files larger than 1 GiB are not supported.

Examples:#

import kumoai
conn = kumoai.FileUploadConnector(file_type="parquet")

# Local: small file
conn.upload(name="users", path="/data/users.parquet")

# Local: large file (auto-partitions)
conn.upload(
    name="txns",
    path="/data/large_txns.parquet",
)

# Local: disable auto-partitioning (raises if > 1 GiB)
conn.upload(
    name="users",
    path="/data/users.parquet",
    auto_partition=False,
)

# CSV connector
csv_conn = kumoai.FileUploadConnector(file_type="csv")
csv_conn.upload(name="sales", path="/data/sales.csv")

# Remote: single file (<= 1 GiB)
conn.upload(name="logs", path="s3://bkt/path/logs.parquet")

# Remote: directory of shards (uniform format)
csv_conn.upload(name="events", path="gs://mybkt/events_csv/")
type name:

str

param name:

Table name to create in Kumo; access later via this connector.

type path:

str

param path:

Local path or remote URL to a .parquet/.csv file or a directory (uniform format). The format must match this connector’s file_type.

type auto_partition:

bool

param auto_partition:

Local-only. If True and the local file is > 1 GiB, split into ~``partition_size_mb`` MiB parts.

type partition_size_mb:

int

param partition_size_mb:

Local-only. Target partition size (100–1000 MiB) when auto_partition is True.

delete(name)[source]#

Synchronously deletes a previously uploaded table from the Kumo data plane.

# Assume we have uploaded a `.parquet` table named `users`, and a
# `FileUploadConnector` has been created called `connector`, and
# we want to delete this table from Kumo:
connector.delete(name="users")
Parameters:

name (str) – The name of the table to be deleted. This table must have previously been uploaded with a call to upload().

Return type:

None

has_table(name)#

Returns True if the table exists in this connector, False otherwise.

Parameters:

name (str) – The table name.

Return type:

bool

table(name)#

Returns a SourceTable object corresponding to a source table behind this connector. A source table is a view into the raw data of table name. To use a source table in Kumo, you will need to construct a Table from the source table.

Parameters:

name (str) – The table name.

Raises:

ValueError – if name does not exist in the backing connector.

Return type:

SourceTable

table_names()#

Returns a list of table names accessible through this connector.

Return type:

List[str]