public final class Table extends Object implements AutoCloseable
| Modifier and Type | Class and Description |
|---|---|
static class |
Table.GroupByOperation
Class representing groupby operations
|
static class |
Table.TableOperation |
static class |
Table.TestBuilder
Create a table on the GPU with data from the CPU.
|
| Constructor and Description |
|---|
Table(ColumnVector... columns)
Table class makes a copy of the array of
ColumnVectors passed to it. |
Table(long[] cudfColumns)
Create a Table from an array of existing on device cudf::column pointers.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
static Table |
concatenate(Table... tables)
Concatenate multiple tables together to form a single table.
|
ContiguousTable[] |
contiguousSplit(int... indices)
Split a table at given boundaries, but the result of each split has memory that is laid out
in a contiguous range of memory.
|
static Table |
convertFromRows(ColumnView vec,
DType... schema)
Convert a column of list of bytes that is formatted like the output from `convertToRows`
and convert it back to a table.
|
ColumnVector[] |
convertToRows()
Convert this table of columns into a row major format that is useful for interacting with other
systems that do row major processing of the data.
|
Table |
crossJoin(Table right)
Joins two tables all of the left against all of the right.
|
Table |
explode(int index)
Explodes a list column's elements.
|
Table |
explodeOuter(int index)
Explodes a list column's elements.
|
Table |
explodeOuterPosition(int index)
Explodes a list column's elements retaining any null entries or empty lists and includes a
position column.
|
Table |
explodePosition(int index)
Explodes a list column's elements and includes a position column.
|
Table |
filter(ColumnView mask)
Filters this table using a column of boolean values as a mask, returning a new one.
|
static Table |
fromPackedTable(ByteBuffer metadata,
DeviceMemoryBuffer data)
Construct a table from a packed representation.
|
GatherMap[] |
fullJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an full equi-join between
two tables.
|
Table |
gather(ColumnView gatherMap)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
Table |
gather(ColumnView gatherMap,
boolean checkBounds)
Gathers the rows of this table according to `gatherMap` such that row "i"
in the resulting table's columns will contain row "gatherMap[i]" from this table.
|
ColumnVector |
getColumn(int index)
Return the
ColumnVector at the specified index. |
long |
getDeviceMemorySize()
Returns the Device memory buffer size.
|
int |
getNumberOfColumns() |
long |
getRowCount() |
Table.GroupByOperation |
groupBy(GroupByOptions groupByOptions,
int... indices)
Returns aggregate operations grouped by columns provided in indices
|
Table.GroupByOperation |
groupBy(int... indices)
Returns aggregate operations grouped by columns provided in indices
with default options as below:
- null is considered as key while grouping.
|
GatherMap[] |
innerJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of an inner equi-join between
two tables.
|
ColumnVector |
interleaveColumns()
Interleave all columns into a single column.
|
GatherMap |
leftAntiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left anti-join between
two tables.
|
GatherMap[] |
leftJoinGatherMaps(Table rightKeys,
boolean compareNullsEqual)
Computes the gather maps that can be used to manifest the result of a left equi-join between
two tables.
|
GatherMap |
leftSemiJoinGatherMap(Table rightKeys,
boolean compareNullsEqual)
Computes the gather map that can be used to manifest the result of a left semi-join between
two tables.
|
ColumnVector |
lowerBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
lowerBound(Table valueTable,
OrderByArg... args)
Find smallest indices in a sorted table where values should be inserted to maintain order.
|
static Table |
merge(List<Table> tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
static Table |
merge(Table[] tables,
OrderByArg... args)
Merge multiple already sorted tables keeping the sort order the same.
|
Table.TableOperation |
onColumns(int... indices) |
Table |
orderBy(OrderByArg... args)
Orders the table using the sortkeys returning a new allocated table.
|
PartitionedTable |
partition(ColumnView partitionMap,
int numberOfPartitions)
Partition this table using the mapping in partitionMap.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(ArrowIPCOptions options,
HostBufferProvider provider)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(File inputFile)
Get a reader that will return tables.
|
static StreamedTableReader |
readArrowIPCChunked(HostBufferProvider provider)
Get a reader that will return tables.
|
static Table |
readCSV(Schema schema,
byte[] buffer)
Read CSV formatted data using the default CSVOptions.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
byte[] buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
File path)
Read a CSV file.
|
static Table |
readCSV(Schema schema,
CSVOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read CSV formatted data.
|
static Table |
readCSV(Schema schema,
File path)
Read a CSV file using the default CSVOptions.
|
static Table |
readORC(byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(File path)
Read a ORC file using the default ORCOptions.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
byte[] buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readORC(ORCOptions opts,
File path)
Read a ORC file.
|
static Table |
readORC(ORCOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read ORC formatted data.
|
static Table |
readParquet(byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(File path)
Read a Parquet file using the default ParquetOptions.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
byte[] buffer,
long offset,
long len)
Read parquet formatted data.
|
static Table |
readParquet(ParquetOptions opts,
File path)
Read a Parquet file.
|
static Table |
readParquet(ParquetOptions opts,
HostMemoryBuffer buffer,
long offset,
long len)
Read parquet formatted data.
|
Table |
repeat(ColumnView counts)
Create a new table by repeating each row of this table.
|
Table |
repeat(ColumnView counts,
boolean checkCount)
Create a new table by repeating each row of this table.
|
Table |
repeat(int count)
Repeat each row of this table count times.
|
PartitionedTable |
roundRobinPartition(int numberOfPartitions,
int startPartition)
Round-robin partition a table into the specified number of partitions.
|
ColumnVector |
rowBitCount()
Returns an approximate cumulative size in bits of all columns in the `table_view` for each row.
|
ColumnVector |
sortOrder(OrderByArg... args)
Get back a gather map that can be used to sort the data.
|
String |
toString() |
ColumnVector |
upperBound(boolean[] areNullsSmallest,
Table valueTable,
boolean[] descFlags)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
ColumnVector |
upperBound(Table valueTable,
OrderByArg... args)
Find largest indices in a sorted table where values should be inserted to maintain order.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
File outputFile)
Get a table writer to write arrow IPC data to a file.
|
static TableWriter |
writeArrowIPCChunked(ArrowIPCWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write arrow IPC data and handle each chunk with a callback.
|
void |
writeORC(File outputFile)
Deprecated.
please use writeORCChunked instead
|
void |
writeORC(ORCWriterOptions options,
File outputFile)
Deprecated.
please use writeORCChunked instead
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
File outputFile)
Get a table writer to write ORC data to a file.
|
static TableWriter |
writeORCChunked(ORCWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write ORC data and handle each chunk with a callback.
|
void |
writeParquet(ParquetWriterOptions options,
File outputFile)
Deprecated.
please use writeParquetChunked instead
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
File outputFile)
Get a table writer to write parquet data to a file.
|
static TableWriter |
writeParquetChunked(ParquetWriterOptions options,
HostBufferConsumer consumer)
Get a table writer to write parquet data and handle each chunk with a callback.
|
public Table(ColumnVector... columns)
ColumnVectors passed to it. The class
will decrease the refcount
on itself and all its contents when closed and free resources if refcount is zerocolumns - - Array of ColumnVectorspublic Table(long[] cudfColumns)
cudfColumns - - Array of nativeHandlespublic ColumnVector getColumn(int index)
ColumnVector at the specified index. If you want to keep a reference to
the column around past the life time of the table, you will need to increment the reference
count on the column yourself.public final long getRowCount()
public final int getNumberOfColumns()
public void close()
close in interface AutoCloseablepublic long getDeviceMemorySize()
public static Table readCSV(Schema schema, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.path - the local file to read.public static Table readCSV(Schema schema, CSVOptions opts, File path)
schema - the schema of the file. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.path - the local file to read.public static Table readCSV(Schema schema, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.public static Table readCSV(Schema schema, CSVOptions opts, byte[] buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readCSV(Schema schema, CSVOptions opts, HostMemoryBuffer buffer, long offset, long len)
schema - the schema of the data. You may use Schema.INFERRED to infer the schema.opts - various CSV parsing options.buffer - raw UTF8 formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(File path)
path - the local file to read.public static Table readParquet(ParquetOptions opts, File path)
opts - various parquet parsing options.path - the local file to read.public static Table readParquet(byte[] buffer)
buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.public static Table readParquet(ParquetOptions opts, byte[] buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readParquet(ParquetOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various parquet parsing options.buffer - raw parquet formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(File path)
path - the local file to read.public static Table readORC(ORCOptions opts, File path)
opts - ORC parsing options.path - the local file to read.public static Table readORC(byte[] buffer)
buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.public static Table readORC(ORCOptions opts, byte[] buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static Table readORC(ORCOptions opts, HostMemoryBuffer buffer, long offset, long len)
opts - various ORC parsing options.buffer - raw ORC formatted bytes.offset - the starting offset into buffer.len - the number of bytes to parse.public static TableWriter writeParquetChunked(ParquetWriterOptions options, File outputFile)
options - the parquet writer options.outputFile - where to write the file.public static TableWriter writeParquetChunked(ParquetWriterOptions options, HostBufferConsumer consumer)
options - the parquet writer options.consumer - a class that will be called when host buffers are ready with parquet
formatted data in them.@Deprecated public void writeParquet(ParquetWriterOptions options, File outputFile)
options - parameters for the writeroutputFile - file to write the table topublic static TableWriter writeORCChunked(ORCWriterOptions options, File outputFile)
options - the ORC writer options.outputFile - where to write the file.public static TableWriter writeORCChunked(ORCWriterOptions options, HostBufferConsumer consumer)
options - the ORC writer options.consumer - a class that will be called when host buffers are ready with ORC
formatted data in them.@Deprecated public void writeORC(File outputFile)
outputFile - - File to write the table to@Deprecated public void writeORC(ORCWriterOptions options, File outputFile)
outputFile - - File to write the table topublic static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, File outputFile)
options - the arrow IPC writer options.outputFile - where to write the file.public static TableWriter writeArrowIPCChunked(ArrowIPCWriterOptions options, HostBufferConsumer consumer)
options - the arrow IPC writer options.consumer - a class that will be called when host buffers are ready with arrow IPC
formatted data in them.public static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, File inputFile)
options - options for reading.inputFile - the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(File inputFile)
inputFile - the file to read the Arrow IPC formatted data frompublic static StreamedTableReader readArrowIPCChunked(ArrowIPCOptions options, HostBufferProvider provider)
options - options for reading.provider - what will provide the data being read.public static StreamedTableReader readArrowIPCChunked(HostBufferProvider provider)
provider - what will provide the data being read.public static Table concatenate(Table... tables)
public ColumnVector interleaveColumns()
public Table repeat(int count)
count - the number of times to repeat each row.public Table repeat(ColumnView counts)
counts - the number of times to repeat each row. Cannot have nulls, must be an
Integer type, and must have one entry for each row in the table.CudfException - on any error.public Table repeat(ColumnView counts, boolean checkCount)
counts - the number of times to repeat each row. Cannot have nulls, must be an
Integer type, and must have one entry for each row in the table.checkCount - should counts be checked for errors before processing. Be careful if you
disable this because if you pass in bad data you might just get back an
empty table or bad data.CudfException - on any error.public PartitionedTable partition(ColumnView partitionMap, int numberOfPartitions)
partitionMap - the partitions for each row.numberOfPartitions - number of partitionsPartitionedTable Table that exposes a limited functionality of the
Table classpublic ColumnVector lowerBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example:
Single column:
idx 0 1 2 3 4
inputTable = { 10, 20, 20, 30, 50 }
valuesTable = { 20 }
result = { 1 }
Multi Column:
idx 0 1 2 3 4
inputTable = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
valuesTable = {{ 20 },
{ .7 },
{ 61 }}
result = { 3 }
The input table and the values table need to be non-empty (row count > 0)areNullsSmallest - per column, true if nulls are assumed smallestvalueTable - the table of values to find insertion locations fordescFlags - per column indicates the ordering, true if descending.public ColumnVector lowerBound(Table valueTable, OrderByArg... args)
valueTable - the table of values to find insertion locations forargs - the sort order used to sort this table.public ColumnVector upperBound(boolean[] areNullsSmallest, Table valueTable, boolean[] descFlags)
Example:
Single column:
idx 0 1 2 3 4
inputTable = { 10, 20, 20, 30, 50 }
valuesTable = { 20 }
result = { 3 }
Multi Column:
idx 0 1 2 3 4
inputTable = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
valuesTable = {{ 20 },
{ .7 },
{ 61 }}
result = { 5 }
The input table and the values table need to be non-empty (row count > 0)areNullsSmallest - per column, true if nulls are assumed smallestvalueTable - the table of values to find insertion locations fordescFlags - per column indicates the ordering, true if descending.public ColumnVector upperBound(Table valueTable, OrderByArg... args)
valueTable - the table of values to find insertion locations forargs - the sort order used to sort this table.public Table crossJoin(Table right)
right - the right tablepublic ColumnVector sortOrder(OrderByArg... args)
args - what order to sort the data bypublic Table orderBy(OrderByArg... args)
ColumnVector returned as part of the output Table
Example usage: orderBy(true, OrderByArg.asc(0), OrderByArg.desc(3)...);
args - Suppliers to initialize sortKeys.public static Table merge(Table[] tables, OrderByArg... args)
tables - the tables that should be merged.args - the ordering of the tables. Should match how they were sorted
initially.public static Table merge(List<Table> tables, OrderByArg... args)
tables - the tables that should be merged.args - the ordering of the tables. Should match how they were sorted
initially.public Table.GroupByOperation groupBy(GroupByOptions groupByOptions, int... indices)
groupByOptions - Options provided in the builderindices - columns to be considered for groupBypublic Table.GroupByOperation groupBy(int... indices)
indices - columns to be considered for groupBypublic PartitionedTable roundRobinPartition(int numberOfPartitions, int startPartition)
numberOfPartitions - - number of partitions to usestartPartition - - starting partition index (i.e.: where first row is placed).PartitionedTable - Table that exposes a limited functionality of the
Table classpublic Table.TableOperation onColumns(int... indices)
public Table filter(ColumnView mask)
Given a mask column, each element `i` from the input columns is copied to the output columns if the corresponding element `i` in the mask is non-null and `true`. This operation is stable: the input order is preserved.
This table and mask columns must have the same number of rows.
The output table has size equal to the number of elements in boolean_mask that are both non-null and `true`.
If the original table row count is zero, there is no error, and an empty table is returned.
mask - column of type DType.BOOL8 used as a mask to filter
the input columnpublic ContiguousTable[] contiguousSplit(int... indices)
Example:
input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
{50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
splits: {2, 5, 9}
output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}},
{{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
indices - A vector of indices where to make the splitpublic Table explode(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index - Column index to explode inside the table.public Table explodePosition(int index)
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300]
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate in different ways depending on what is null or empty.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100]
Note that null lists are not included in the resulting table, but nulls inside
lists and empty lists will be represented with a null entry for that column in that row.index - Column index to explode inside the table.public Table explodeOuter(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [5, 100],
[10, 100],
[15, 100],
[20, 200],
[25, 200],
[30, 300]
Nulls propagate in different ways depending on what is null.
input: [[5,null,15], 100],
[null, 200]
index: 0
output: [5, 100],
[null, 100],
[15, 100],
[null, 200]
Note that null lists are completely removed from the output
and nulls inside lists are pulled out and remain.index - Column index to explode inside the table.public Table explodeOuterPosition(int index)
Example:
input: [[5,10,15], 100],
[[20,25], 200],
[[30], 300],
index: 0
output: [0, 5, 100],
[1, 10, 100],
[2, 15, 100],
[0, 20, 200],
[1, 25, 200],
[0, 30, 300]
Nulls and empty lists propagate as null entries in the result.
input: [[5,null,15], 100],
[null, 200],
[[], 300]
index: 0
output: [0, 5, 100],
[1, null, 100],
[2, 15, 100],
[0, null, 200],
[0, null, 300]
returnsindex - Column index to explode inside the table.public ColumnVector rowBitCount()
public Table gather(ColumnView gatherMap)
gatherMap - the map of indexes. Must be non-nullable and integral type.public Table gather(ColumnView gatherMap, boolean checkBounds)
gatherMap - the map of indexes. Must be non-nullable and integral type.checkBounds - if true bounds checking is performed on the value. Be very careful
when setting this to false.public GatherMap[] leftJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the left join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic GatherMap[] innerJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the inner join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic GatherMap[] fullJoinGatherMaps(Table rightKeys, boolean compareNullsEqual)
GatherMap
instances will be returned that can be used to gather the left and right tables,
respectively, to produce the result of the full join.
It is the responsibility of the caller to close the resulting gather map instances.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic GatherMap leftSemiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left semi-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic GatherMap leftAntiJoinGatherMap(Table rightKeys, boolean compareNullsEqual)
GatherMap
instance returned can be used to gather the left table to produce the result of the
left anti-join.
It is the responsibility of the caller to close the resulting gather map instance.rightKeys - join key columns from the right tablecompareNullsEqual - true if null key values should match otherwise falsepublic ColumnVector[] convertToRows()
result[0]: | row 0 | validity for row 0 | padding | ... | row N | validity for row N | padding | result[1]: |row N+1 | validity for row N+1 | padding | ...The format of each row is similar in layout to a C struct where each column will have padding in front of it to align it properly. Each row has padding inserted at the end so the next row is aligned to a 64-bit boundary. This is so that the first column will always start at the beginning (first byte) of the list of bytes and each row has a consistent layout for fixed width types. Validity bytes are added to the end of the row. There will be one byte for each 8 columns in a row. Because the validity is byte aligned there is no padding between it and the last column in the row. For example a table consisting of the following columns A, B, C with the corresponding types
| A - BOOL8 (8-bit) | B - INT16 (16-bit) | C - DURATION_DAYS (32-bit) |Will have a layout that looks like
| A_0 | P | B_0 | B_1 | C_0 | C_1 | C_2 | C_3 | V0 | P | P | P | P | P | P | P |In this P corresponds to a byte of padding, [LETTER]_[NUMBER] represents the NUMBER byte of the corresponding LETTER column, and V[NUMBER] is a validity byte for the `NUMBER * 8` to `(NUMBER + 1) * 8` columns. The order of the columns will not be changed, but to reduce the total amount of padding it is recommended to order the columns in the following way.
| C_0 | C_1 | C_2 | C_3 | B_0 | B_1 | A_0 | V0 |This would have reduced the overall size of the data transferred by half. One of the main motivations for doing a row conversion on the GPU is to avoid cache problems when walking through columnar data on the CPU in a row wise manner. If you are not transferring very many columns it is likely to be more efficient to just pull back the columns and walk through them. This is especially true of a single column of fixed width data. The extra padding will slow down the transfer and looking at only a handful of buffers is not likely to cause cache issues. There are some limits on the size of a single row. If the row is larger than 1KB this will throw an exception.
public static Table convertFromRows(ColumnView vec, DType... schema)
vec - the row data to process.schema - the types of each column.public static Table fromPackedTable(ByteBuffer metadata, DeviceMemoryBuffer data)
metadata - host-based metadata for the tabledata - GPU data buffer for the tableCopyright © 2021. All rights reserved.