kumoai.connector.FileUploadConnector#
- class kumoai.connector.FileUploadConnector[source]#
Bases:
Connector
Defines a connector to files directly uploaded to Kumo, either as ‘parquet’ or ‘csv’ (non-partitioned) data.
To get started with file upload, please first upload a table with the
upload()
method in theFileUploadConnector
class. You can then access this table behind the file upload connector as follows:import kumoai # Create the file upload connector: connector = kumoai.FileUploadConnector(file_type="parquet") # Upload the table; assume it is stored at `/data/users.parquet` connector.upload(name="users", path="/data/users.parquet") # Check that the file upload connector has a `users` table: assert connector.has_table("users")
- Parameters:
file_type (
str
) – The file type of uploaded data. Can be either"csv"
or"parquet"
.
- property name: str#
Returns the name of the connector.
Note
If the connector does not support naming, the name refers to an internal specifier.
- property source_type: DataSourceType#
Returns the data source type accessible by this connector.
- upload(name, path, auto_partition=True, partition_size_mb=250)[source]#
Upload a table to Kumo from a local or remote path.
Supports
s3://
,gs://
,abfs://
,abfss://
, andaz://
Tables uploaded this way can be accessed from this
FileUploadConnector
using the provided name, e.g.,connector_obj["my_table"]
.- Return type:
Local files#
Accepts one
.parquet
or.csv
file (must match this connector’sfile_type
).If the file is > 1 GiB and
auto_partition=True
, it is split into ~``partition_size_mb`` MiB parts and uploaded under a common prefix so the connector can read them as one table.
Remote paths#
Single file (
.parquet
/.csv
): validated and uploaded via multipart PUT. Files > 1 GiB are rejected — re-shard to ~200 MiB and upload the directory instead.Directory: must contain only one format (all Parquet or all CSV) matching this connector’s
file_type
. Files are validated (consistent schema; CSV headers sanitized) and uploaded in parallel with memory-safe budgeting.
Warning
For local uploads, input must be a single CSV or Parquet file (matching the connector type). For remote uploads, mixed CSV/Parquet directories are not supported. Remote single files larger than 1 GiB are not supported.
Examples:#
import kumoai conn = kumoai.FileUploadConnector(file_type="parquet") # Local: small file conn.upload(name="users", path="/data/users.parquet") # Local: large file (auto-partitions) conn.upload( name="txns", path="/data/large_txns.parquet", ) # Local: disable auto-partitioning (raises if > 1 GiB) conn.upload( name="users", path="/data/users.parquet", auto_partition=False, ) # CSV connector csv_conn = kumoai.FileUploadConnector(file_type="csv") csv_conn.upload(name="sales", path="/data/sales.csv") # Remote: single file (<= 1 GiB) conn.upload(name="logs", path="s3://bkt/path/logs.parquet") # Remote: directory of shards (uniform format) csv_conn.upload(name="events", path="gs://mybkt/events_csv/")
- type name:
- param name:
Table name to create in Kumo; access later via this connector.
- type path:
- param path:
Local path or remote URL to a
.parquet
/.csv
file or a directory (uniform format). The format must match this connector’sfile_type
.- type auto_partition:
- param auto_partition:
Local-only. If
True
and the local file is > 1 GiB, split into ~``partition_size_mb`` MiB parts.- type partition_size_mb:
- param partition_size_mb:
Local-only. Target partition size (100–1000 MiB) when
auto_partition
isTrue
.
- delete(name)[source]#
Synchronously deletes a previously uploaded table from the Kumo data plane.
# Assume we have uploaded a `.parquet` table named `users`, and a # `FileUploadConnector` has been created called `connector`, and # we want to delete this table from Kumo: connector.delete(name="users")
- table(name)#
Returns a
SourceTable
object corresponding to a source table behind this connector. A source table is a view into the raw data of tablename
. To use a source table in Kumo, you will need to construct aTable
from the source table.- Parameters:
name (
str
) – The table name.- Raises:
ValueError – if
name
does not exist in the backing connector.- Return type: