public final class Table extends Object implements AutoCloseable
| Modifier and Type | Class and Description |
|---|---|
static class |
Table.DuplicateKeepOption
Enum to specify which of duplicate rows/elements will be copied to the output.
|
static class |
Table.GroupByOperation
Class representing groupby operations
|
static class |
Table.TableOperation |
static class |
Table.TestBuilder
Create a table on the GPU with data from the CPU.
|
| Constructor and Description |
|---|
Table(ColumnVector... columns)
Table class makes a copy of the array of
ColumnVectors passed to it. |
Table(long[] cudfColumns)
Create a Table from an array of existing on device cudf::column pointers.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
static Table |
concatenate(Table... tables)
Concatenate multiple tables together to form a single table.
|
GatherMap[] |
conditionalFullJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of a full join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalInnerJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalInnerJoinGatherMaps(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables when a conditional expression is true.
|
long |
conditionalInnerJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of an inner join between two tables when a
conditional expression is true.
|
GatherMap |
conditionalLeftAntiJoinGatherMap(Table rightTable,
CompiledExpression condition)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables when a conditional expression is true.
|
GatherMap |
conditionalLeftAntiJoinGatherMap(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables when a conditional expression is true.
|
long |
conditionalLeftAntiJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left anti join between two tables when a
conditional expression is true.
|
GatherMap[] |
conditionalLeftJoinGatherMaps(Table rightTable,
CompiledExpression condition)
Computes the gather maps that can be used to manifest the result of a left join between
two tables when a conditional expression is true.
|
GatherMap[] |
conditionalLeftJoinGatherMaps(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a left join between
two tables when a conditional expression is true.
|
long |
conditionalLeftJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left join between two tables when a
conditional expression is true.
|
GatherMap |
conditionalLeftSemiJoinGatherMap(Table rightTable,
CompiledExpression condition)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables when a conditional expression is true.
|
GatherMap |
conditionalLeftSemiJoinGatherMap(Table rightTable,
CompiledExpression condition,
long outputRowCount)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables when a conditional expression is true.
|
long |
conditionalLeftSemiJoinRowCount(Table rightTable,
CompiledExpression condition)
Computes the number of rows from the result of a left semi join between two tables when a
conditional expression is true.
|
ContiguousTable[] |
contiguousSplit(int... indices)
Split a table at given boundaries, but the result of each split has memory that is laid out
in a contiguous range of memory.
|
static Table |
convertFromRows(ColumnView vec,
DType... schema)
Convert a column of list of bytes that is formatted like the output from `convertToRows`
and convert it back to a table.
|
static Table |
convertFromRowsFixedWidthOptimized(ColumnView vec,
DType... schema)
Convert a column of list of bytes that is formatted like the output from `convertToRows`
and convert it back to a table.
|
ColumnVector[] |
convertToRows()
For details about how this method functions refer to
convertToRowsFixedWidthOptimized(). |
ColumnVector[] |
convertToRowsFixedWidthOptimized()
Convert this table of columns into a row major format that is useful for interacting with other
systems that do row major processing of the data.
|
Table |
crossJoin(Table right)
Joins two tables all of the left against all of the right.
|
Table |
dropDuplicates(int[] keyColumns,
Table.DuplicateKeepOption keep,
boolean nullsEqual)
Copy rows of the current table to an output table such that duplicate rows in the key columns
are ignored (i.e., only one row from the duplicate ones will be copied).
|
Table |
explode(int index)
Explodes a list column's elements.
|
Table |
explodeOuter(int index)
Explodes a list column's elements.
|
Table |
explodeOuterPosition(int index)
Explodes a list column's elements retaining any null entries or empty lists and includes a
position column.
|
Table |
explodePosition(int index)
Explodes a list column's elements and includes a position column.
|
Table |
filter(ColumnView mask)
Filters this table using a column of boolean values as a mask, returning a new one.
|
static Table |
fromPackedTable(ByteBuffer metadata,
DeviceMemoryBuffer data)
Construct a table from a packed representation.
|
GatherMap[] |
fullJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of a full equi-join between
two tables.
|
GatherMap[] |
fullJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a full equi-join between
two tables.
|
GatherMap[] |
fullJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an full equi-join between
two tables.
|
long |
fullJoinRowCount(HashJoin rightHash)
Computes the number of rows resulting from a full equi-join between two tables.
|
Table |
gather(ColumnView gatherMap)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
Table |
gather(ColumnView gatherMap,
OutOfBoundsPolicy outOfBoundsPolicy)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
ColumnVector |
getColumn(int index)
Return the
ColumnVector at the specified index. |
static TableWriter |
getCSVBufferWriter(CSVWriterOptions options,
HostBufferConsumer bufferConsumer) |
long |
getDeviceMemorySize()
Returns the Device memory buffer size.
|
long |
getNativeView()
Return the native table view handle for this table
|
int |
getNumberOfColumns() |
long |
getRowCount() |
Table.GroupByOperation |
groupBy(GroupByOptions groupByOptions,
int... indices)
Returns aggregate operations grouped by columns provided in indices
|
Table.GroupByOperation |
groupBy(int... indices)
Returns aggregate operations grouped by columns provided in indices
with default options as below:
- null is considered as key while grouping.
|
GatherMap[] |
innerJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
GatherMap[] |
innerJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
GatherMap[] |
innerJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
long |
innerJoinRowCount(HashJoin otherHash)
Computes the number of rows resulting from an inner equi-join between two tables.
|
ColumnVector |
interleaveColumns()
Interleave all columns into a single column.
|
GatherMap |
leftAntiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left anti-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(HashJoin rightHash)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(HashJoin rightHash,
long outputRowCount)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
long |
leftJoinRowCount(HashJoin rightHash)
Computes the number of rows resulting from a left equi-join between two tables.
|
GatherMap |
leftSemiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left semi-join between
two tables.
|
ColumnVector |
lowerBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
lowerBound(Table valueTable,
OrderByArg... args)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
static Table |
merge(List<Table> tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
static Table |
merge(Table[] tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
static GatherMap[] |
mixedFullJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of a full join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedInnerJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedInnerJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather maps that can be used to manifest the result of an inner join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedInnerJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for an inner join between two tables using a mix of equality
and inequality conditions.
|
static GatherMap |
mixedLeftAntiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap |
mixedLeftAntiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather map that can be used to manifest the result of a left anti join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedLeftAntiJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for a left anti join between two tables using a mix of
equality and inequality conditions.
|
static GatherMap[] |
mixedLeftJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather maps that can be used to manifest the result of a left join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap[] |
mixedLeftJoinGatherMaps(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather maps that can be used to manifest the result of a left join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedLeftJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for a left join between two tables using a mix of equality
and inequality conditions.
|
static GatherMap |
mixedLeftSemiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables using a mix of equality and inequality conditions.
|
static GatherMap |
mixedLeftSemiJoinGatherMap(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality,
MixedJoinSize joinSize)
Computes the gather map that can be used to manifest the result of a left semi join between
two tables using a mix of equality and inequality conditions.
|
static MixedJoinSize |
mixedLeftSemiJoinSize(Table leftKeys,
Table rightKeys,
Table leftConditional,
Table rightConditional,
CompiledExpression condition,
NullEquality nullEquality)
Computes output size information for a left semi join between two tables using a mix of
equality and inequality conditions.
|
Table.TableOperation |
onColumns(int... indices) |
Table |
orderBy(OrderByArg... args)
Orders the table using the sortkeys returning a new allocated table.
|
PartitionedTable |
partition(ColumnView partitionMap,
int numberOfPartitions)
Partition this table using the mapping in partitionMap.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
HostBufferProvider provider)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(HostBufferProvider provider)
Get a reader that will return tables.
|
static Table |
readAvro(AvroOptions opts,
byte[] buffer)
Read Avro formatted data.
|
static Table |
readAvro(AvroOptions opts,
byte[] buffer,
long offset,
long len)
Read Avro formatted data.
|
static Table |
readAvro(AvroOptions opts,
File path)
Read an Avro file.
|
static Table |
readAvro(AvroOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read Avro formatted data.
|
static Table |
readAvro(byte[] buffer)
Read Avro formatted data.
|
static Table |
readAvro(File path)
Read an Avro file using the default AvroOptions.
|
static Table |
readCSV(Schema schema,
byte[] buffer)
Read CSV formatted data using the default CSVOptions.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
File path)
Read a CSV file.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
File path)
Read a CSV file using the default CSVOptions.
|
static TableWithMeta |
readJSON(JSONOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read JSON formatted data and infer the column names and schema.
|
static Table |
readJSON(Schema schema,
byte[] buffer)
Read JSON formatted data using the default JSONOptions.
|
static Table |
readJSON(Schema schema,
File path)
Read a JSON file using the default JSONOptions.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
byte[] buffer,
long offset,
long len)
Read JSON formatted data.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
File path)
Read a JSON file.
|
static Table |
readJSON(Schema schema,
JSONOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read JSON formatted data.
|
static Table |
readORC(byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(File path)
Read a ORC file using the default ORCOptions.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
File path)
Read a ORC file.
|
static Table |
readORC(ORCOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readParquet(byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(File path)
Read a Parquet file using the default ParquetOptions.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer,
long offset,
long len)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
File path)
Read a Parquet file.
|
static Table |
readParquet(ParquetOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read parquet formatted data.
|
Table |
repeat(ColumnView counts)
Create a new table by repeating each row of this table.
|
Table |
repeat(int count)
Repeat each row of this table count times.
|
PartitionedTable |
roundRobinPartition(int numberOfPartitions,
int startPartition)
Round-robin partition a table into the specified number of partitions.
|
ColumnVector |
rowBitCount()
Returns an approximate cumulative size in bits of all columns in the `table_view` for each row.
|
Table |
sample(long n,
boolean replacement,
long seed)
Gather `n` samples from table randomly
Note: does not preserve the ordering
Example:
input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}}
n: 3
replacement: false
output: {col1: {3, 1, 4}, col2: {8, 6, 9}}
replacement: true
output: {col1: {3, 1, 1}, col2: {8, 6, 6}}
throws "logic_error" if `n` > table rows and `replacement` == FALSE.
|
Table |
scatter(ColumnView scatterMap,
Table target)
Scatters values from the source table into the target table out-of-place, returning a new
result table.
|
static Table |
scatter(Scalar[] source,
ColumnView scatterMap,
Table target)
Scatters values from the source rows into the target table out-of-place, returning a new result
table.
|
ColumnVector |
sortOrder(OrderByArg... args)
Get back a gather map that can be used to sort the data.
|
String |
toString() |
ColumnVector |
upperBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
upperBound(Table valueTable,
OrderByArg... args)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
File outputFile)
Get a table writer to write arrow IPC data to a file.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write arrow IPC data and handle each chunk with a callback.
|
static void |
writeColumnViewsToParquet(ParquetWriterOptions options,
HostBufferConsumer consumer,
ColumnView... columnViews)
This is an evolving API and most likely be removed in future releases.
|
void |
writeCSVToFile(CSVWriterOptions options,
String outputPath) |
static TableWriter |
writeORCChunked(ORCWriterOptions options,
File outputFile)
Get a table writer to write ORC data to a file.
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write ORC data and handle each chunk with a callback.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
File outputFile)
Get a table writer to write parquet data to a file.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write parquet data and handle each chunk with a callback.
|
public Table(ColumnVector... columns)
ColumnVectors passed to it. The class
will decrease the refcount
on itself and all its contents when closed and free resources if refcount is zerocolumns - - Array of ColumnVectorspublic Table(long[] cudfColumns)
cudfColumns - - Array of nativeHandlespublic long getNativeView()
public ColumnVector getColumn(int index)
ColumnVector at the specified index. If you want to keep a reference to
the column around past the life time of the table, you will need to increment the reference
count on the column yourself.public final long getRowCount()
public final int getNumberOfColumns()
public void close()
close in interface AutoCloseablepublic long getDeviceMemorySize()
public static Table readCSV(Schema schema, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.path - the local file to read.public static Table readCSV(Schema schema, CSVOptions opts, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.path - the local file to read.public static Table readCSV(Schema schema, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public void writeCSVToFile(CSVWriterOptions options, String outputPath)
public static TableWriter getCSVBufferWriter(CSVWriterOptions options, HostBufferConsumer bufferConsumer)
public static Table readJSON(Schema schema, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.path - the local file to read.public static Table readJSON(Schema schema, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.buffer - raw UTF8 formatted bytes.public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various JSON parsing options.buffer - raw UTF8 formatted bytes.public static Table readJSON(Schema schema, JSONOptions opts, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.opts - various JSON parsing options.path - the local file to read.public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various JSON parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static TableWithMeta readJSON(JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various JSON parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readJSON(Schema schema, JSONOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various JSON parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(File path)
path - the local file to read.public static Table readParquet(ParquetOptions opts, File path)
opts - various parquet parsing options.path - the local file to read.public static Table readParquet(byte[] buffer)
buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readAvro(File path)
path - the local file to read.public static Table readAvro(AvroOptions opts, File path)
opts - various Avro parsing options.path - the local file to read.public static Table readAvro(byte[] buffer)
buffer - raw Avro formatted bytes.public static Table readAvro(AvroOptions opts, byte[] buffer)
opts - various Avro parsing options.buffer - raw Avro formatted bytes.public static Table readAvro(AvroOptions opts, byte[] buffer, long offset, long len)
opts - various Avro parsing options.buffer - raw Avro formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readAvro(AvroOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various Avro parsing options.buffer - raw Avro formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(File path)
path - the local file to read.public static Table readORC(ORCOptions opts, File path)
opts - ORC parsing options.path - the local file to read.public static Table readORC(byte[] buffer)
buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static TableWriter writeParquetChunked(ParquetWriterOptions options, File outputFile)
options - the parquet writer options.outputFile - where to write the file.public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer)
options - the parquet writer options.consumer - a class that will be called when host buffers are ready with parquet
formatted data in them.public static void writeColumnViewsToParquet(ParquetWriterOptions options, HostBufferConsumer consumer, ColumnView... columnViews)
options - the Parquet writer options.consumer - a class that will be called when host buffers are ready with Parquet
formatted data in them.columnViews - ColumnViews to write to Parquetpublic static TableWriter writeORCChunked(ORCWriterOptions options, File outputFile)
options - the ORC writer options.outputFile - where to write the file.public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer)
options - the ORC writer options.consumer - a class that will be called when host buffers are ready with ORC
formatted data in them.public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, File outputFile)
options - the arrow IPC writer options.outputFile - where to write the file.public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer)
options - the arrow IPC writer options.consumer - a class that will be called when host buffers are ready with arrow IPC
formatted data in them.public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, File inputFile)
options - options for reading.inputFile - the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(File inputFile)
inputFile - the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider)
options - options for reading.provider - what will provide the data being read.public static StreamedTableReader readArrowIPCChunked(HostBufferProvider provider)
provider - what will provide the data being read.public static Table concatenate(Table... tables)
public ColumnVector interleaveColumns()
public Table repeat(int count)
count - the number of times to repeat each row.public Table repeat(ColumnView counts)
counts - the number of times to repeat each row. Cannot have nulls, must be an
Integer type, and must have one entry for each row in the table.CudfException - on any error.public PartitionedTable partition(ColumnView partitionMap, int numberOfPartitions)
partitionMap - the partitions for each row.numberOfPartitions - number of partitionsPartitionedTable Table that exposes a limited functionality of the
Table classpublic ColumnVector lowerBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example:
Single column:
idx 0 1 2 3 4
inputTable = { 10, 20, 20, 30, 50 }
valuesTable = { 20 }
result = { 1 }
Multi Column:
idx 0 1 2 3 4
inputTable = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
valuesTable = {{ 20 },
{ .7 },
{ 61 }}
result = { 3 }
The input table and the values table need to be non-empty (row count > 0)areNullsSmallest - per column, true if nulls are assumed smallestvalueTable - the table of values to find insertion locations fordescFlags - per column indicates the ordering, true if descending.public ColumnVector lowerBound(Table valueTable, OrderByArg... args)
valueTable - the table of values to find insertion locations forargs - the sort order used to sort this table.public ColumnVector upperBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example:
Single column:
idx 0 1 2 3 4
inputTable = { 10, 20, 20, 30, 50 }
valuesTable = { 20 }
result = { 3 }
Multi Column:
idx 0 1 2 3 4
inputTable = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
valuesTable = {{ 20 },
{ .7 },
{ 61 }}
result = { 5 }
The input table and the values table need to be non-empty (row count > 0)areNullsSmallest - per column, true if nulls are assumed smallestvalueTable - the table of values to find insertion locations fordescFlags - per column indicates the ordering, true if descending.public ColumnVector upperBound(Table valueTable, OrderByArg... args)
valueTable - the table of values to find insertion locations forargs - the sort order used to sort this table.public Table crossJoin(Table right)
right - the right tablepublic ColumnVector sortOrder(OrderByArg... args)
args - what order to sort the data bypublic Table orderBy(OrderByArg... args)
ColumnVector returned as part of the output Table
Example usage: orderBy(true, OrderByArg.asc(0), OrderByArg.desc(3)...);
args - Suppliers to initialize sortKeys.public static Table merge(Table[] tables, OrderByArg... args)
tables - the tables that should be merged.args - the ordering of the tables. Should match how they were sorted
initially.public static Table merge(List<Table> tables, OrderByArg... args)
tables - the tables that should be merged.args - the ordering of the tables. Should match how they were sorted
initially.public Table.GroupByOperation groupBy(GroupByOptions groupByOptions, int... indices)
groupByOptions - Options provided in the builderindices - columns to be considered for groupBypublic Table.GroupByOperation groupBy(int... indices)
indices - columns to be considered for groupBypublic PartitionedTable roundRobinPartition(int numberOfPartitions, int startPartition)
numberOfPartitions - - number of partitions to usestartPartition - - starting partition index (i.e.: where first row is placed).PartitionedTable - Table that exposes a limited functionality of the
Table classpublic Table.TableOperation onColumns(int... indices)
public Table filter(ColumnView mask)
Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.
This table and mask columns must have the same number of rows.
The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.
If the original table row count is zero, there is no error, and an empty table is returned.
mask - column of type DType.BOOL8 used as a mask to filter
the input columnpublic Table dropDuplicates(int[] keyColumns, Table.DuplicateKeepOption keep, boolean nullsEqual)
keyColumns - Array of indices representing key columns from the current table.keep - Option specifying to keep any, first, last, or none of the found duplicates.nullsEqual - Flag to denote whether nulls are treated as equal when comparing rows of the
key columns to check for uniqueness.public ContiguousTable[] contiguousSplit(int... indices)
Example:
input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
{50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
splits: {2, 5, 9}
output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}},
{{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
indices - A vector of indices where to make the splitpublic Table explode(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index - Column index to explode inside the table.public Table explodePosition(int index)
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate in different ways depending on what is null or empty.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are not included in the resulting table, but nulls inside
lists and empty lists will be represented with a null entry for that column in that row.index - Column index to explode inside the table.public Table explodeOuter(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100],
[null, 200]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index - Column index to explode inside the table.public Table explodeOuterPosition(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate as null entries in the result.
input: [[5,null,15], 100],
[null, 200],
[[], 300]
index: 0
output: [0, 5, 100],
[1, null, 100],
[2, 15, 100],
[0, null, 200],
[0, null, 300]
returnsindex - Column index to explode inside the table.public ColumnVector rowBitCount()
public Table gather(ColumnView gatherMap)
gatherMap - the map of indexes. Must be non-nullable and integral type.public Table gather(ColumnView gatherMap, OutOfBoundsPolicy outOfBoundsPolicy)
gatherMap - the map of indexes. Must be non-nullable and integral type.outOfBoundsPolicy - policy to use when an out-of-range value is in `gatherMap`.public Table scatter(ColumnView scatterMap, Table target)
scatterMap - The map of indexes. Must be non-nullable and integral type.target - The table into which rows from the current table are to be scattered out-of-place.public static Table scatter(Scalar[] source, ColumnView scatterMap, Table target)
source - The input scalars containing values to be scattered into the target table.scatterMap - The map of indexes. Must be non-nullable and integral type.target - The table into which the values from source are to be scattered out-of-place.public GatherMap[] leftJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic long leftJoinRowCount(HashJoin rightHash)
HashJoin argument has been constructed from the key columns from the right table.rightHash - hash table built from join key columns from the right tablepublic GatherMap[] leftJoinGatherMaps(HashJoin rightHash)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash - hash table built from join key columns from the right tablepublic GatherMap[] leftJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
leftJoinRowCount(HashJoin).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash - hash table built from join key columns from the right tableoutputRowCount - number of output rows in the join resultpublic long conditionalLeftJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinpublic GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinpublic GatherMap[] conditionalLeftJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
conditionalLeftJoinRowCount(Table, CompiledExpression).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinoutputRowCount - number of output rows in the join resultpublic static MixedJoinSize mixedLeftJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap[] mixedLeftJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedLeftJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equaljoinSize - mixed join size resultpublic GatherMap[] innerJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic long innerJoinRowCount(HashJoin otherHash)
otherHash - hash table built from join key columns from the other tablepublic GatherMap[] innerJoinGatherMaps(HashJoin rightHash)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash - hash table built from join key columns from the right tablepublic GatherMap[] innerJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
innerJoinRowCount(HashJoin).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash - hash table built from join key columns from the right tableoutputRowCount - number of output rows in the join resultpublic long conditionalInnerJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinpublic GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinpublic GatherMap[] conditionalInnerJoinGatherMaps(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
conditionalInnerJoinRowCount(Table, CompiledExpression).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinoutputRowCount - number of output rows in the join resultpublic static MixedJoinSize mixedInnerJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap[] mixedInnerJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedInnerJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equaljoinSize - mixed join size resultpublic GatherMap[] fullJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic long fullJoinRowCount(HashJoin rightHash)
HashJoin argument has been constructed from the key columns from the right table.
Note that unlike leftJoinRowCount(HashJoin) and #innerJoinRowCount(HashJoin),
this will perform some redundant calculations compared to
{@link #fullJoinGatherMaps(HashJoin, long)}.rightHash - hash table built from join key columns from the right tablepublic GatherMap[] fullJoinGatherMaps(HashJoin rightHash)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightHash - hash table built from join key columns from the right tablepublic GatherMap[] fullJoinGatherMaps(HashJoin rightHash, long outputRowCount)
HashJoin argument has been constructed from the key columns from the right table.
Two GatherMap instances will be returned that can be used to gather the left and right
tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing an output row count that was previously computed from
fullJoinRowCount(HashJoin).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightHash - hash table built from join key columns from the right tableoutputRowCount - number of output rows in the join resultpublic GatherMap[] conditionalFullJoinGatherMaps(Table rightTable, CompiledExpression condition)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinpublic static GatherMap[] mixedFullJoinGatherMaps(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap instances will be returned that can be used to gather
the left and right tables, respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic GatherMap leftSemiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left semi-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic long conditionalLeftSemiJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinpublic GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition)
GatherMap instance returned can be used to gather the left table
to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instance.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinpublic GatherMap conditionalLeftSemiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap instance returned can be used to gather the left table
to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instance.
This interface allows passing an output row count that was previously computed from
conditionalLeftSemiJoinRowCount(Table, CompiledExpression).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinoutputRowCount - number of output rows in the join resultpublic static MixedJoinSize mixedLeftSemiJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap mixedLeftSemiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap instance will be returned that can be used to gather
the left table to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap mixedLeftSemiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap instance will be returned that can be used to gather
the left table to produce the result of the left semi join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedLeftSemiJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equaljoinSize - mixed join size resultpublic GatherMap leftAntiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left anti-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic long conditionalLeftAntiJoinRowCount(Table rightTable, CompiledExpression condition)
rightTable - the right side table of the join in the joincondition - conditional expression to evaluate during the joinpublic GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition)
GatherMap instance returned can be used to gather the left table
to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instance.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinpublic GatherMap conditionalLeftAntiJoinGatherMap(Table rightTable, CompiledExpression condition, long outputRowCount)
GatherMap instance returned can be used to gather the left table
to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instance.
This interface allows passing an output row count that was previously computed from
conditionalLeftAntiJoinRowCount(Table, CompiledExpression).
WARNING: Passing a row count that is smaller than the actual row count will result
in undefined behavior.rightTable - the right side table of the joincondition - conditional expression to evaluate during the joinoutputRowCount - number of output rows in the join resultpublic static MixedJoinSize mixedLeftAntiJoinSize(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap mixedLeftAntiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality)
GatherMap instance will be returned that can be used to gather
the left table to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instances.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equalpublic static GatherMap mixedLeftAntiJoinGatherMap(Table leftKeys, Table rightKeys, Table leftConditional, Table rightConditional, CompiledExpression condition, NullEquality nullEquality, MixedJoinSize joinSize)
GatherMap instance will be returned that can be used to gather
the left table to produce the result of the left anti join.
It is the responsibility of the caller to close the resulting gather map instances.
This interface allows passing the size result from
mixedLeftAntiJoinSize(Table, Table, Table, Table, CompiledExpression, NullEquality)
when the output size was computed previously.leftKeys - the left table's key columns for the equality conditionrightKeys - the right table's key columns for the equality conditionleftConditional - the left table's columns needed to evaluate the inequality conditionrightConditional - the right table's columns needed to evaluate the inequality conditioncondition - the inequality condition of the joinnullEquality - whether nulls should compare as equaljoinSize - mixed join size resultpublic ColumnVector[] convertToRows()
convertToRowsFixedWidthOptimized().
The only thing different between this method and convertToRowsFixedWidthOptimized()
is that this can handle roughly 250M columns while convertToRowsFixedWidthOptimized()
can only handle columns less than 100public ColumnVector[] convertToRowsFixedWidthOptimized()
result[0]: | row 0 | validity for row 0 | padding | ... | row N | validity for row N | padding | result[1]: |row N+1 | validity for row N+1 | padding | ...The format of each row is similar in layout to a C struct where each column will have padding in front of it to align it properly. Each row has padding inserted at the end so the next row is aligned to a 64-bit boundary. This is so that the first column will always start at the beginning (first byte) of the list of bytes and each row has a consistent layout for fixed width types. Validity bytes are added to the end of the row. There will be one byte for each 8 columns in a row. Because the validity is byte aligned there is no padding between it and the last column in the row. For example a table consisting of the following columns A, B, C with the corresponding types
| A - BOOL8 (8-bit) | B - INT16 (16-bit) | C - DURATION_DAYS (32-bit) |Will have a layout that looks like
| A_0 | P | B_0 | B_1 | C_0 | C_1 | C_2 | C_3 | V0 | P | P | P | P | P | P | P |In this P corresponds to a byte of padding, [LETTER]_[NUMBER] represents the NUMBER byte of the corresponding LETTER column, and V[NUMBER] is a validity byte for the `NUMBER * 8` to `(NUMBER + 1) * 8` columns. The order of the columns will not be changed, but to reduce the total amount of padding it is recommended to order the columns in the following way.
| C_0 | C_1 | C_2 | C_3 | B_0 | B_1 | A_0 | V0 |This would have reduced the overall size of the data transferred by half. One of the main motivations for doing a row conversion on the GPU is to avoid cache problems when walking through columnar data on the CPU in a row wise manner. If you are not transferring very many columns it is likely to be more efficient to just pull back the columns and walk through them. This is especially true of a single column of fixed width data. The extra padding will slow down the transfer and looking at only a handful of buffers is not likely to cause cache issues. There are some limits on the size of a single row. If the row is larger than 1KB this will throw an exception.
public static Table convertFromRows(ColumnView vec, DType... schema)
vec - the row data to process.schema - the types of each column.public static Table convertFromRowsFixedWidthOptimized(ColumnView vec, DType... schema)
vec - the row data to process.schema - the types of each column.public static Table fromPackedTable(ByteBuffer metadata, DeviceMemoryBuffer data)
metadata - host-based metadata for the tabledata - GPU data buffer for the tablepublic Table sample(long n, boolean replacement, long seed)
n - non-negative number of samples expected from tablereplacement - Allow or disallow sampling of the same row more than once.seed - Seed value to initiate random number generator.Copyright © 2023. All rights reserved.