public final class Table extends Object implements AutoCloseable
| Modifier and Type | Class and Description |
|---|---|
static class |
Table.AggregateOperation
Class representing aggregate operations
|
static class |
Table.OrderByArg |
static class |
Table.TableOperation |
static class |
Table.TestBuilder
Create a table on the GPU with data from the CPU.
|
| Constructor and Description |
|---|
Table(ColumnVector... columns)
Table class makes a copy of the array of
ColumnVectors passed to it. |
| Modifier and Type | Method and Description |
|---|---|
static Table.OrderByArg |
asc(int index) |
static Table.OrderByArg |
asc(int index,
boolean isNullSmallest) |
void |
close() |
static Table |
concatenate(Table... tables)
Concatenate multiple tables together to form a single table.
|
ContiguousTable[] |
contiguousSplit(int... indices)
Split a table at given boundaries, but the result of each split has memory that is laid out
in a contiguous range of memory.
|
static Aggregate |
count(int index)
Returns count aggregation with only valid values.
|
static Aggregate |
count(int index,
boolean include_nulls)
Returns count aggregation
|
static Table.OrderByArg |
desc(int index) |
static Table.OrderByArg |
desc(int index,
boolean isNullSmallest) |
Table |
filter(ColumnVector mask)
Filters this table using a column of boolean values as a mask, returning a new one.
|
static Aggregate |
first(int index,
boolean includeNulls)
Returns first aggregation.
|
ColumnVector |
getColumn(int index)
Return the
ColumnVector at the specified index. |
long |
getDeviceMemorySize()
Returns the Device memory buffer size.
|
int |
getNumberOfColumns() |
long |
getRowCount() |
Table.AggregateOperation |
groupBy(GroupByOptions groupByOptions,
int... indices)
Returns aggregate operations grouped by columns provided in indices
|
Table.AggregateOperation |
groupBy(int... indices)
Returns aggregate operations grouped by columns provided in indices
null is considered as key while grouping.
|
ColumnVector |
interleaveColumns()
Interleave all columns into a single column.
|
static Aggregate |
last(int index,
boolean includeNulls)
Returns last aggregation.
|
ColumnVector |
lowerBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Given a sorted table return the lower bound.
|
static Aggregate |
max(int index)
Returns max aggregation.
|
static Aggregate |
mean(int index)
Returns mean aggregation.
|
static Aggregate |
median(int index)
Returns median aggregation.
|
static Aggregate |
min(int index)
Returns min aggregation.
|
Table.TableOperation |
onColumns(int... indices) |
Table |
orderBy(Table.OrderByArg... args)
Orders the table using the sortkeys returning a new allocated table.
|
static Table |
readCSV(Schema schema,
byte[] buffer)
Read CSV formatted data using the default CSVOptions.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
File path)
Read a CSV file.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
File path)
Read a CSV file using the default CSVOptions.
|
static Table |
readORC(byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(File path)
Read a ORC file using the default ORCOptions.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
File path)
Read a ORC file.
|
static Table |
readORC(ORCOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readParquet(byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(File path)
Read a Parquet file using the default ParquetOptions.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer,
long offset,
long len)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
File path)
Read a Parquet file.
|
static Table |
readParquet(ParquetOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read parquet formatted data.
|
PartitionedTable |
roundRobinPartition(int numberOfPartitions,
int startPartition)
Round-robin partition a table into the specified number of partitions.
|
static Aggregate |
sum(int index)
Returns sum aggregation.
|
String |
toString() |
ColumnVector |
upperBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Given a sorted table return the upper bound.
|
void |
writeORC(File outputFile)
Deprecated.
please use writeORCChunked instead
|
void |
writeORC(ORCWriterOptions options,
File outputFile)
Deprecated.
please use writeORCChunked instead
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
File outputFile)
Get a table writer to write ORC data to a file.
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write ORC data and handle each chunk with a callback.
|
void |
writeParquet(File outputFile)
Deprecated.
please use writeParquetChunked instead
|
void |
writeParquet(ParquetWriterOptions options,
File outputFile)
Deprecated.
please use writeParquetChunked instead
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
File outputFile)
Get a table writer to write parquet data to a file.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write parquet data and handle each chunk with a callback.
|
public Table(ColumnVector... columns)
ColumnVectors passed to it. The class
will decrease the refcount
on itself and all its contents when closed and free resources if refcount is zerocolumns - - Array of ColumnVectorspublic ColumnVector getColumn(int index)
ColumnVector at the specified index. If you want to keep a reference to
the column around past the life time of the table, you will need to increment the reference
count on the column yourself.public final long getRowCount()
public final int getNumberOfColumns()
public void close()
close in interface AutoCloseablepublic long getDeviceMemorySize()
public static Table readCSV(Schema schema, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.path - the local file to read.public static Table readCSV(Schema schema, CSVOptions opts, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.path - the local file to read.public static Table readCSV(Schema schema, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(File path)
path - the local file to read.public static Table readParquet(ParquetOptions opts, File path)
opts - various parquet parsing options.path - the local file to read.public static Table readParquet(byte[] buffer)
buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(File path)
path - the local file to read.public static Table readORC(ORCOptions opts, File path)
opts - ORC parsing options.path - the local file to read.public static Table readORC(byte[] buffer)
buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static TableWriter writeParquetChunked(ParquetWriterOptions options, File outputFile)
options - the parquet writer options.outputFile - where to write the file.public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer)
options - the parquet writer options.consumer - a class that will be called when host buffers are ready with parquet
formatted data in them.@Deprecated public void writeParquet(File outputFile)
outputFile - file to write the table to@Deprecated public void writeParquet(ParquetWriterOptions options, File outputFile)
options - parameters for the writeroutputFile - file to write the table topublic static TableWriter writeORCChunked(ORCWriterOptions options, File outputFile)
options - the ORC writer options.outputFile - where to write the file.public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer)
options - the ORC writer options.consumer - a class that will be called when host buffers are ready with ORC
formatted data in them.@Deprecated public void writeORC(File outputFile)
outputFile - - File to write the table to@Deprecated public void writeORC(ORCWriterOptions options, File outputFile)
outputFile - - File to write the table topublic static Table concatenate(Table... tables)
public ColumnVector interleaveColumns()
public ColumnVector lowerBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
areNullsSmallest - true if nulls are assumed smallestvalueTable - the table of values that need to be inserteddescFlags - indicates the ordering of the column(s), true if descendingpublic ColumnVector upperBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
areNullsSmallest - true if nulls are assumed smallestvalueTable - the table of values that need to be inserteddescFlags - indicates the ordering of the column(s), true if descendingpublic Table orderBy(Table.OrderByArg... args)
ColumnVector returned as part of the output Table
Example usage: orderBy(true, Table.asc(0), Table.desc(3)...);
args - - Suppliers to initialize sortKeys.public static Table.OrderByArg asc(int index)
public static Table.OrderByArg desc(int index)
public static Table.OrderByArg asc(int index, boolean isNullSmallest)
public static Table.OrderByArg desc(int index, boolean isNullSmallest)
public static Aggregate count(int index)
index - Column on which aggregation is to be performedpublic static Aggregate count(int index, boolean include_nulls)
index - Column on which aggregation is to be performed.include_nulls - Include nulls if set to truepublic static Aggregate max(int index)
index - Column on which max aggregation is to be performed.public static Aggregate min(int index)
index - Column on which min aggregation is to be performed.public static Aggregate sum(int index)
index - Column on which sum aggregation is to be performed.public static Aggregate mean(int index)
index - Column on which mean aggregation is to be performed.public static Aggregate median(int index)
index - Column on which median aggregation is to be performed.public static Aggregate first(int index, boolean includeNulls)
index - Column on which first aggregation is to be performed.includeNulls - Specifies whether null values are included in the aggregate operation.public static Aggregate last(int index, boolean includeNulls)
index - Column on which last aggregation is to be performed.includeNulls - Specifies whether null values are included in the aggregate operation.public Table.AggregateOperation groupBy(GroupByOptions groupByOptions, int... indices)
groupByOptions - Options provided in the builderindices - columnns to be considered for groupBypublic Table.AggregateOperation groupBy(int... indices)
indices - columnns to be considered for groupBypublic PartitionedTable roundRobinPartition(int numberOfPartitions, int startPartition)
numberOfPartitions - - number of partitions to usestartPartition - - starting partition index (i.e.: where first row is placed).PartitionedTable - Table that exposes a limited functionality of the
Table classpublic Table.TableOperation onColumns(int... indices)
public Table filter(ColumnVector mask)
Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.
This table and mask columns must have the same number of rows.
The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.
If the original table row count is zero, there is no error, and an empty table is returned.
mask - column of type DType.BOOL8 used as a mask to filter
the input columnpublic ContiguousTable[] contiguousSplit(int... indices)
Example:
input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
{50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
splits: {2, 5, 9}
output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}},
{{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
indices - A vector of indices where to make the splitCopyright © 2020. All rights reserved.