kumoai.connector.FileUploadConnector#
- class kumoai.connector.FileUploadConnector[source]#
Bases:
Connector
Defines a connector to files directly uploaded to Kumo, either as ‘parquet’ or ‘csv’ (non-partitioned) data.
To get started with file upload, please first upload a table with the
upload()
method in theFileUploadConnector
class. You can then access this table behind the file upload connector as follows:import kumoai # Create the file upload connector: connector = kumoai.FileUploadConnector(file_type="parquet") # Upload the table; assume it is stored at `/data/users.parquet` connector.upload(name="users", path="/data/users.parquet") # Check that the file upload connector has a `users` table: assert connector.has_table("users")
- Parameters:
file_type (
str
) – The file type of uploaded data. Can be either"csv"
or"parquet"
.
- property name: str#
Returns the name of the connector.
Note
If the connector does not support naming, the name refers to an internal specifier.
- property source_type: DataSourceType#
Returns the data source type accessible by this connector.
- upload(name, path, auto_partition=True, partition_size_mb=250)[source]#
Synchronously uploads a table located on your local machine to the Kumo data plane.
Tables uploaded in this way can be accessed with this
FileUploadConnector
using the provided name, for example:connector_obj["my_table"]
For files larger than 1GB, the table will be automatically partitioned into smaller chunks and uploaded with common prefix that allows FileUploadConnector to union them when reading.
Warning
Uploaded tables must be single files, either in parquet or CSV format(must match connector type). Partitioned tables are not currently supported.
import kumoai connector = kumoai.FileUploadConnector(file_type="parquet") # Upload a small table connector.upload(name="users", path="/data/users.parquet") # Upload a large parquet table (will be automatically partitioned) connector.upload(name="transactions", path="/data/large_transactions.parquet") # Disable auto-partitioning (will raise error for large files) upload(name="users", path="/data/users.parquet", auto_partition=False) # Create a file upload connector for CSV files. connectorCSV = kumoai.FileUploadConnector(file_type="csv") # Upload a large CSV table (will be automatically partitioned) connectorCSV.upload(name="sales", path="/data/large_sales.csv")
- Parameters:
name (
str
) – The name of the table to be uploaded. The uploaded table can be accessed from theFileUploadConnector
with this name.path (
str
) – The full path of the table to be uploaded, on the local machine. File Type must match the connector type.auto_partition (
bool
) – Whether to automatically partition large files (>1GB). If False and file is >1GB, raises ValueError. Supports both Parquet and CSV files.partition_size_mb (
int
) – The size of each partition in MB. Only used if auto_partition is True.
- Return type:
- delete(name, file_type)[source]#
Synchronously deletes a previously uploaded table from the Kumo data plane.
# Assume we have uploaded a `.parquet` table named `users`, and a # `FileUploadConnector` has been created called `connector`, and # we want to delete this table from Kumo: connector.delete(name="users", file_type="parquet")
- table(name)#
Returns a
SourceTable
object corresponding to a source table behind this connector. A source table is a view into the raw data of tablename
. To use a source table in Kumo, you will need to construct aTable
from the source table.- Parameters:
name (
str
) – The table name.- Raises:
ValueError – if
name
does not exist in the backing connector.- Return type: