public final class ColumnVector extends Object implements AutoCloseable, BinaryOperable
| Modifier and Type | Class and Description |
|---|---|
protected static class |
ColumnVector.OffHeapState
Holds the off heap state of the column vector so we can clean it up, even if it is leaked.
|
| Constructor and Description |
|---|
ColumnVector(DType type,
long rows,
Optional<Long> nullCount,
DeviceMemoryBuffer dataBuffer,
DeviceMemoryBuffer validityBuffer,
DeviceMemoryBuffer offsetBuffer)
Create a new column vector based off of data already on the device.
|
| Modifier and Type | Method and Description |
|---|---|
ColumnVector |
abs()
Calculate the abs, output is the same type as input.
|
Scalar |
all()
Returns a boolean scalar that is true if all of the elements in
the column are true or non-zero otherwise false.
|
Scalar |
all(DType outType)
Returns a scalar is true or 1, depending on the specified type,
if all of the elements in the column are true or non-zero
otherwise false or 0.
|
Scalar |
any()
Returns a boolean scalar that is true if any of the elements in
the column are true or non-zero otherwise false.
|
Scalar |
any(DType outType)
Returns a scalar is true or 1, depending on the specified type,
if any of the elements in the column are true or non-zero
otherwise false or 0.
|
ColumnVector |
arccos()
Calculate the arccos, output is the same type as input.
|
ColumnVector |
arccosh()
Calculate the hyperbolic arccos, output is the same type as input.
|
ColumnVector |
arcsin()
Calculate the arcsin, output is the same type as input.
|
ColumnVector |
arcsinh()
Calculate the hyperbolic arcsin, output is the same type as input.
|
ColumnVector |
arctan()
Calculate the arctan, output is the same type as input.
|
ColumnVector |
arctanh()
Calculate the hyperbolic arctan, output is the same type as input.
|
ColumnVector |
asBytes()
Cast to Byte - ColumnVector
This method takes the value provided by the ColumnVector and casts to byte
When casting from a Date, Timestamp, or Boolean to a byte type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asDoubles()
Cast to Double - ColumnVector
This method takes the value provided by the ColumnVector and casts to double
When casting from a Date, Timestamp, or Boolean to a double type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asFloats()
Cast to Float - ColumnVector
This method takes the value provided by the ColumnVector and casts to float
When casting from a Date, Timestamp, or Boolean to a float type the underlying numerical
representatio of the data will be used for the cast.
|
ColumnVector |
asInts()
Cast to Int - ColumnVector
This method takes the value provided by the ColumnVector and casts to int
When casting from a Date, Timestamp, or Boolean to a int type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asLongs()
Cast to Long - ColumnVector
This method takes the value provided by the ColumnVector and casts to long
When casting from a Date, Timestamp, or Boolean to a long type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asShorts()
Cast to Short - ColumnVector
This method takes the value provided by the ColumnVector and casts to short
When casting from a Date, Timestamp, or Boolean to a short type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
asStrings()
Cast to Strings.
|
ColumnVector |
asStrings(String format)
Method to parse and convert a timestamp column vector to string column vector.
|
ColumnVector |
asTimestamp(DType timestampType,
String format)
Parse a string to a timestamp.
|
ColumnVector |
asTimestampDays()
Cast to TIMESTAMP_DAYS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_DAYS
|
ColumnVector |
asTimestampDays(String format)
Cast to TIMESTAMP_DAYS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_DAYS
|
ColumnVector |
asTimestampMicroseconds()
Cast to TIMESTAMP_MICROSECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
|
ColumnVector |
asTimestampMicroseconds(String format)
Cast to TIMESTAMP_MICROSECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MICROSECONDS
|
ColumnVector |
asTimestampMilliseconds()
Cast to TIMESTAMP_MILLISECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
|
ColumnVector |
asTimestampMilliseconds(String format)
Cast to TIMESTAMP_MILLISECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_MILLISECONDS.
|
ColumnVector |
asTimestampNanoseconds()
Cast to TIMESTAMP_NANOSECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
|
ColumnVector |
asTimestampNanoseconds(String format)
Cast to TIMESTAMP_NANOSECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_NANOSECONDS.
|
ColumnVector |
asTimestampSeconds()
Cast to TIMESTAMP_SECONDS - ColumnVector
This method takes the value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
|
ColumnVector |
asTimestampSeconds(String format)
Cast to TIMESTAMP_SECONDS - ColumnVector
This method takes the string value provided by the ColumnVector and casts to TIMESTAMP_SECONDS
|
ColumnVector |
binaryOp(BinaryOp op,
BinaryOperable rhs,
DType outType)
Multiple different binary operations.
|
ColumnVector |
bitInvert()
invert the bits, output is the same type as input.
|
static ColumnVector |
boolFromBytes(byte... values)
Create a new vector from the given values.
|
static ColumnVector |
build(DType type,
int rows,
java.util.function.Consumer<HostColumnVector.Builder> init)
Create a new vector.
|
static ColumnVector |
build(int rows,
long stringBufferSize,
java.util.function.Consumer<HostColumnVector.Builder> init) |
ColumnVector |
castTo(DType type)
Generic method to cast ColumnVector
When casting from a Date, Timestamp, or Boolean to a numerical type the underlying numerical
representation of the data will be used for the cast.
|
ColumnVector |
cbrt()
Calculate the cube root, output is the same type as input.
|
ColumnVector |
ceil()
Calculate the ceil, output is the same type as input.
|
ColumnVector |
clamp(Scalar lo,
Scalar hi)
Replaces values less than `lo` in `input` with `lo`,
and values greater than `hi` with `hi`.
|
ColumnVector |
clamp(Scalar lo,
Scalar loReplace,
Scalar hi,
Scalar hiReplace)
Replaces values less than `lo` in `input` with `lo_replace`,
and values greater than `hi` with `hi_replace`.
|
void |
close()
Close this Vector and free memory allocated for HostMemoryBuffer and DeviceMemoryBuffer
|
static ColumnVector |
concatenate(ColumnVector... columns)
Create a new vector by concatenating multiple columns together.
|
ColumnVector |
contains(ColumnVector needles)
Returns a new ColumnVector of
DType.BOOL8 elements containing true if the corresponding
entry in haystack is contained in needles and false if it is not. |
boolean |
contains(Scalar needle)
Find if the `needle` is present in this col
example:
Single Column:
idx 0 1 2 3 4
col = { 10, 20, 20, 30, 50 }
Scalar:
value = { 20 }
result = true
|
ColumnVector |
containsRe(String pattern)
Returns a boolean ColumnVector identifying rows which
match the given regex pattern starting at any location.
|
HostColumnVector |
copyToHost()
Copy the data to the host.
|
ColumnVector |
cos()
Calculate the cos, output is the same type as input.
|
ColumnVector |
cosh()
Calculate the hyperbolic cos, output is the same type as input.
|
ColumnVector |
day()
Get day from a timestamp.
|
static ColumnVector |
daysFromInts(int... values)
Create a new vector from the given values.
|
ColumnVector |
endsWith(Scalar pattern)
Checks if each string in a column ends with a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
exp()
Calculate the exp, output is the same type as input.
|
Table |
extractRe(String pattern)
For each captured group specified in the given regular expression
return a column in the table.
|
ColumnVector |
findAndReplaceAll(ColumnVector oldValues,
ColumnVector newValues)
Returns a vector with all values "oldValues[i]" replaced with "newValues[i]".
|
ColumnVector |
floor()
Calculate the floor, output is the same type as input.
|
static ColumnVector |
fromBoxedBooleans(Boolean... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedBytes(Byte... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedDoubles(Double... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedFloats(Float... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedInts(Integer... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedLongs(Long... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBoxedShorts(Short... values)
Create a new vector from the given values.
|
static ColumnVector |
fromBytes(byte... values)
Create a new vector from the given values.
|
static ColumnVector |
fromDoubles(double... values)
Create a new vector from the given values.
|
static ColumnVector |
fromFloats(float... values)
Create a new vector from the given values.
|
static ColumnVector |
fromInts(int... values)
Create a new vector from the given values.
|
static ColumnVector |
fromLongs(long... values)
Create a new vector from the given values.
|
static ColumnVector |
fromScalar(Scalar scalar,
int rows)
Create a new vector of length rows, where each row is filled with the Scalar's
value
|
static ColumnVector |
fromShorts(short... values)
Create a new vector from the given values.
|
static ColumnVector |
fromStrings(String... values)
Create a new string vector from the given values.
|
ColumnVector |
getByteCount()
Retrieve the number of bytes for each string.
|
ColumnVector |
getCharLengths()
Retrieve the number of characters in each string.
|
BaseDeviceMemoryBuffer |
getDeviceBufferFor(BufferType type)
Get access to the raw device buffer for this column.
|
long |
getDeviceMemorySize()
Returns the amount of device memory used.
|
long |
getNativeView()
USE WITH CAUTION: This method exposes the address of the native cudf::column_view.
|
long |
getNullCount()
Returns the number of nulls in the data.
|
long |
getRowCount()
Returns the number of rows in this vector.
|
DType |
getType()
Returns the type of this vector.
|
boolean |
hasNulls()
Returns if the vector has nulls.
|
boolean |
hasValidityVector()
Returns if the vector has a validity vector allocated or not.
|
ColumnVector |
hour()
Get hour from a timestamp with time resolution.
|
ColumnVector |
ifElse(ColumnVector trueValues,
ColumnVector falseValues)
For a BOOL8 vector, computes a vector whose rows are selected from two other vectors
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(ColumnVector trueValues,
Scalar falseValue)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(Scalar trueValue,
ColumnVector falseValues)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
ifElse(Scalar trueValue,
Scalar falseValue)
For a BOOL8 vector, computes a vector whose rows are selected from two other inputs
based on the boolean value of this vector in the corresponding row.
|
ColumnVector |
incRefCount()
Increment the reference count for this column.
|
ColumnVector |
isFloat()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is a float, and FALSE if its not a float.
|
ColumnVector |
isInteger()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is an integer, and FALSE if its not an integer.
|
ColumnVector |
isNan()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is NaN, and FALSE if null or a valid floating point value
|
ColumnVector |
isNotNan()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is null or a valid floating point value, FALSE otherwise
|
ColumnVector |
isNotNull()
Returns a Boolean vector with the same number of rows as this instance, that has
TRUE for any entry that is not null, and FALSE for any null entry (as per the validity mask)
|
ColumnVector |
isNull()
Returns a Boolean vector with the same number of rows as this instance, that has
FALSE for any entry that is not null, and TRUE for any null entry (as per the validity mask)
|
ColumnVector |
log()
Calculate the log, output is the same type as input.
|
ColumnVector |
log10()
Calculate the log with base 10, output is the same type as input.
|
ColumnVector |
log2()
Calculate the log with base 2, output is the same type as input.
|
ColumnVector |
lower()
Convert a string to lower case.
|
ColumnVector |
lstrip()
Removes whitespace from the beginning of a string.
|
ColumnVector |
lstrip(Scalar toStrip)
Removes the specified characters from the beginning of each string.
|
ColumnVector |
matchesRe(String pattern)
Returns a boolean ColumnVector identifying rows which
match the given regex pattern but only at the beginning of the string.
|
Scalar |
max()
Returns the maximum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
max(DType outType)
Returns the maximum of all values in the column, returning a scalar
of the specified type.
|
Scalar |
mean()
Returns the arithmetic mean of all values in the column, returning a
FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.
|
Scalar |
mean(DType outType)
Returns the arithmetic mean of all values in the column, returning a
scalar of the specified type.
|
Scalar |
min()
Returns the minimum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
min(DType outType)
Returns the minimum of all values in the column, returning a scalar
of the specified type.
|
ColumnVector |
minute()
Get minute from a timestamp with time resolution.
|
ColumnVector |
month()
Get month from a timestamp.
|
ColumnVector |
nansToNulls()
Returns a new ColumnVector with NaNs converted to nulls, preserving the existing null values.
|
ColumnVector |
normalizeNANsAndZeros()
Create a new vector of "normalized" values, where:
1.
|
ColumnVector |
not()
Returns a vector of the logical `not` of each value in the input
column (this)
|
void |
noWarnLeakExpected()
This is a really ugly API, but it is possible that the lifecycle of a column of
data may not have a clear lifecycle thanks to java and GC.
|
Scalar |
product()
Returns the product of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
product(DType outType)
Returns the product of all values in the column, returning a scalar
of the specified type.
|
ColumnVector |
quantile(QuantileMethod method,
double[] quantiles)
Calculate various quantiles of this ColumnVector.
|
Scalar |
reduce(ai.rapids.cudf.AggregateOp op)
Computes the reduction of the values in all rows of a column.
|
Scalar |
reduce(ai.rapids.cudf.AggregateOp op,
DType outType)
Computes the reduction of the values in all rows of a column.
|
ColumnVector |
replaceNulls(Scalar scalar)
Returns a ColumnVector with any null values replaced with a scalar.
|
ColumnVector |
rint()
Rounds a floating-point argument to the closest integer value, but returns it as a float.
|
ColumnVector |
rollingWindow(ai.rapids.cudf.AggregateOp op,
WindowOptions options)
This function aggregates values in a window around each element i of the input
column.
|
ColumnVector |
rstrip()
Removes whitespace from the end of a string.
|
ColumnVector |
rstrip(Scalar toStrip)
Removes the specified characters from the end of each string.
|
ColumnVector |
second()
Get second from a timestamp with time resolution.
|
static ColumnVector |
sequence(Scalar initialValue,
int rows)
Create a new vector of length rows, starting at the initialValue and going by 1 each time.
|
static ColumnVector |
sequence(Scalar initialValue,
Scalar step,
int rows)
Create a new vector of length rows, starting at the initialValue and going by step each time.
|
ColumnVector |
sin()
Calculate the sin, output is the same type as input.
|
ColumnVector |
sinh()
Calculate the hyperbolic sin, output is the same type as input.
|
ColumnVector[] |
slice(int... indices)
Slices a column (including null values) into a set of columns
according to a set of indices.
|
ColumnVector[] |
split(int... indices)
Splits a column (including null values) into a set of columns
according to a set of indices.
|
ColumnVector |
sqrt()
Calculate the sqrt, output is the same type as input.
|
Scalar |
standardDeviation()
Returns the sample standard deviation of all values in the column,
returning a FLOAT64 scalar unless the column type is FLOAT32 then
a FLOAT32 scalar is returned.
|
Scalar |
standardDeviation(DType outType)
Returns the sample standard deviation of all values in the column,
returning a scalar of the specified type.
|
ColumnVector |
startsWith(Scalar pattern)
Checks if each string in a column starts with a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
stringConcatenate(ColumnVector[] columns)
Concatenate columns of strings together, combining a corresponding row from each column
into a single string row of a new column with no separator string inserted between each
combined string and maintaining null values in combined rows.
|
static ColumnVector |
stringConcatenate(Scalar separator,
Scalar narep,
ColumnVector[] columns)
Concatenate columns of strings together, combining a corresponding row from each column into
a single string row of a new column.
|
ColumnVector |
stringContains(Scalar compString)
Checks if each string in a column contains a specified comparison string, resulting in a
parallel column of the boolean results.
|
ColumnVector |
stringLocate(Scalar substring)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringLocate(Scalar substring,
int start)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringLocate(Scalar substring,
int start,
int end)
Locates the starting index of the first instance of the given string in each row of a column.
|
ColumnVector |
stringReplace(Scalar target,
Scalar replace)
Returns a new strings column where target string within each string is replaced with the specified
replacement string.
|
ColumnVector |
stringReplaceWithBackrefs(String pattern,
String replace)
For each string, replaces any character sequence matching the given pattern
using the replace template for back-references.
|
Table |
stringSplit()
Returns a list of columns by splitting each string using whitespace as the delimiter.
|
Table |
stringSplit(Scalar delimiter)
Returns a list of columns by splitting each string using the specified delimiter.
|
ColumnVector |
strip()
Removes whitespace from the beginning and end of a string.
|
ColumnVector |
strip(Scalar toStrip)
Removes the specified characters from the beginning and end of each string.
|
ColumnVector |
substring(ColumnVector start,
ColumnVector end)
Returns a new strings column that contains substrings of the strings in the provided column
which uses unique ranges for each string
|
ColumnVector |
substring(int start)
Returns a new strings column that contains substrings of the strings in the provided column.
|
ColumnVector |
substring(int start,
int end)
Returns a new strings column that contains substrings of the strings in the provided column.
|
ColumnVector |
subVector(int start)
Return a subVector from start inclusive to the end of the vector.
|
ColumnVector |
subVector(int start,
int end)
Return a subVector.
|
Scalar |
sum()
Computes the sum of all values in the column, returning a scalar
of the same type as this column.
|
Scalar |
sum(DType outType)
Computes the sum of all values in the column, returning a scalar
of the specified type.
|
Scalar |
sumOfSquares()
Returns the sum of squares of all values in the column, returning a
scalar of the same type as this column.
|
Scalar |
sumOfSquares(DType outType)
Returns the sum of squares of all values in the column, returning a
scalar of the specified type.
|
ColumnVector |
tan()
Calculate the tan, output is the same type as input.
|
ColumnVector |
tanh()
Calculate the hyperbolic tan, output is the same type as input.
|
static ColumnVector |
timestampDaysFromBoxedInts(Integer... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampMicroSecondsFromBoxedLongs(Long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampMicroSecondsFromLongs(long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampMilliSecondsFromBoxedLongs(Long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampMilliSecondsFromLongs(long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampNanoSecondsFromBoxedLongs(Long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampNanoSecondsFromLongs(long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampSecondsFromBoxedLongs(Long... values)
Create a new vector from the given values.
|
static ColumnVector |
timestampSecondsFromLongs(long... values)
Create a new vector from the given values.
|
String |
toString() |
ColumnVector |
toTitle()
Returns a column of strings where, for each string row in the input,
the first character after spaces is modified to upper-case,
while all the remaining characters in a word are modified to lower-case.
|
ColumnVector |
transform(String udf,
boolean isPtx)
Transform a vector using a custom function.
|
ColumnVector |
unaryOp(UnaryOp op)
Multiple different unary operations.
|
ColumnVector |
upper()
Convert a string to upper case.
|
Scalar |
variance()
Returns the variance of all values in the column, returning a
FLOAT64 scalar unless the column type is FLOAT32 then a FLOAT32 scalar is returned.
|
Scalar |
variance(DType outType)
Returns the variance of all values in the column, returning a
scalar of the specified type.
|
ColumnVector |
year()
Get year from a timestamp.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitadd, add, and, and, arctan2, arctan2, bitAnd, bitAnd, bitOr, bitOr, bitXor, bitXor, div, div, equalTo, equalTo, equalToNullAware, equalToNullAware, floorDiv, floorDiv, greaterOrEqualTo, greaterOrEqualTo, greaterThan, greaterThan, implicitConversion, lessOrEqualTo, lessOrEqualTo, lessThan, lessThan, log, log, maxNullAware, maxNullAware, minNullAware, minNullAware, mod, mod, mul, mul, notEqualTo, notEqualTo, or, or, pmod, pmod, pow, pow, shiftLeft, shiftLeft, shiftRight, shiftRight, shiftRightUnsigned, shiftRightUnsigned, sub, sub, trueDiv, trueDivpublic ColumnVector(DType type, long rows, Optional<Long> nullCount, DeviceMemoryBuffer dataBuffer, DeviceMemoryBuffer validityBuffer, DeviceMemoryBuffer offsetBuffer)
type - the type of the vectorrows - the number of rows in this vector.nullCount - the number of nulls in the dataset.dataBuffer - the data stored on the device. The column vector takes ownership of the
buffer. Do not use the buffer after calling this.validityBuffer - an optional validity buffer. Must be provided if nullCount != 0. The
column vector takes ownership of the buffer. Do not use the buffer
after calling this.offsetBuffer - a host buffer required for strings and string categories. The column
vector takes ownership of the buffer. Do not use the buffer after calling
this.public ColumnVector toTitle()
public void noWarnLeakExpected()
public void close()
close in interface AutoCloseablepublic ColumnVector incRefCount()
public ColumnVector nansToNulls()
public long getRowCount()
public long getDeviceMemorySize()
public DType getType()
getType in interface BinaryOperablepublic long getNullCount()
public boolean hasValidityVector()
public boolean hasNulls()
public HostColumnVector copyToHost()
public BaseDeviceMemoryBuffer getDeviceBufferFor(BufferType type)
type - the type of buffer to get access to.public ColumnVector getCharLengths()
public ColumnVector getByteCount()
public ColumnVector isNotNull()
public ColumnVector isNull()
public ColumnVector isInteger()
public ColumnVector isFloat()
public ColumnVector isNan()
public ColumnVector isNotNan()
public ColumnVector findAndReplaceAll(ColumnVector oldValues, ColumnVector newValues)
oldValues - - A vector containing values that should be replacednewValues - - A vector containing new valuespublic ColumnVector replaceNulls(Scalar scalar)
scalar - - Scalar value to use as replacementpublic ColumnVector ifElse(ColumnVector trueValues, ColumnVector falseValues)
trueValues - the values to select if a row in this column is truefalseValues - the values to select if a row in this column is not truepublic ColumnVector ifElse(ColumnVector trueValues, Scalar falseValue)
trueValues - the values to select if a row in this column is truefalseValue - the value to select if a row in this column is not truepublic ColumnVector ifElse(Scalar trueValue, ColumnVector falseValues)
trueValue - the value to select if a row in this column is truefalseValues - the values to select if a row in this column is not truepublic ColumnVector ifElse(Scalar trueValue, Scalar falseValue)
trueValue - the value to select if a row in this column is truefalseValue - the value to select if a row in this column is not truepublic ColumnVector[] slice(int... indices)
indices - public ColumnVector subVector(int start)
start - the index to start at.public ColumnVector subVector(int start, int end)
start - the index to start at (inclusive).end - the index to end at (exclusive).public ColumnVector[] split(int... indices)
indices - the indexes to split withpublic static ColumnVector fromScalar(Scalar scalar, int rows)
scalar - - Scalar to use to fill rowsrows - - Number of rows in the new ColumnVectorpublic static ColumnVector sequence(Scalar initialValue, Scalar step, int rows)
initialValue - the initial value to start at.step - the step to add to each subsequent row.rows - the total number of rowspublic static ColumnVector sequence(Scalar initialValue, int rows)
initialValue - the initial value to start at.rows - the total number of rowspublic static ColumnVector concatenate(ColumnVector... columns)
public ColumnVector normalizeNANsAndZeros()
Double.longBitsToDouble(long)
describes how equivalent values of NaN/-NaN might have different bitwise representations.
This method may be used to compare different bitwise values of 0.0 or NaN as logically
equivalent. For instance, if these values appear in a groupby key column, without normalization
0.0 and -0.0 would be erroneously treated as distinct groups, as will each representation of NaN.public ColumnVector year()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector month()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector day()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector hour()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector minute()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector second()
Postconditions - A new vector is allocated with the result. The caller owns the vector and is responsible for its lifecycle.
public ColumnVector transform(String udf, boolean isPtx)
udf - This function will be applied to every element in the vectorisPtx - is the code of the function ptx? true or C/C++ false.public ColumnVector unaryOp(UnaryOp op)
op - the operation to performpublic ColumnVector sin()
public ColumnVector cos()
public ColumnVector tan()
public ColumnVector arcsin()
public ColumnVector arccos()
public ColumnVector arctan()
public ColumnVector sinh()
public ColumnVector cosh()
public ColumnVector tanh()
public ColumnVector arcsinh()
public ColumnVector arccosh()
public ColumnVector arctanh()
public ColumnVector exp()
public ColumnVector log()
public ColumnVector log2()
public ColumnVector log10()
public ColumnVector sqrt()
public ColumnVector cbrt()
public ColumnVector ceil()
public ColumnVector floor()
public ColumnVector abs()
public ColumnVector rint()
public ColumnVector bitInvert()
public ColumnVector binaryOp(BinaryOp op, BinaryOperable rhs, DType outType)
binaryOp in interface BinaryOperableop - the operation to performrhs - the rhs of the operationoutType - the type of output you want.public Scalar sum()
public Scalar sum(DType outType)
public Scalar min()
public Scalar min(DType outType)
public Scalar max()
public Scalar max(DType outType)
public Scalar product()
public Scalar product(DType outType)
public Scalar sumOfSquares()
public Scalar sumOfSquares(DType outType)
public Scalar mean()
public Scalar mean(DType outType)
public Scalar variance()
public Scalar variance(DType outType)
public Scalar standardDeviation()
public Scalar standardDeviation(DType outType)
public Scalar any()
public Scalar any(DType outType)
public Scalar all()
public Scalar all(DType outType)
public Scalar reduce(ai.rapids.cudf.AggregateOp op)
op - The reduction operation to performScalar.isValid() method of the result will return false.public Scalar reduce(ai.rapids.cudf.AggregateOp op, DType outType)
op - The reduction operation to performoutType - The type of scalar value to returnScalar.isValid() method of the result will return false.public ColumnVector quantile(QuantileMethod method, double[] quantiles)
method - the method used to calculate the quantilesquantiles - the quantile values [0,1]public ColumnVector rollingWindow(ai.rapids.cudf.AggregateOp op, WindowOptions options)
op - the operation to perform.options - various window function arguments.IllegalArgumentException - if unsupported window specification * (i.e. other than WindowOptions.FrameType.ROWS is used.public ColumnVector not()
public boolean contains(Scalar needle)
needle - public ColumnVector contains(ColumnVector needles)
DType.BOOL8 elements containing true if the corresponding
entry in haystack is contained in needles and false if it is not. The caller will be responsible
for the lifecycle of the new vector.
example:
haystack = { 10, 20, 30, 40, 50 }
needles = { 20, 40, 60, 80 }
result = { false, true, false, true, false }needles - DType.BOOL8public ColumnVector castTo(DType type)
asTimestamp(DType, String)
and asStrings(String) for casting string to timestamp when the format
is known
Float values when converted to String could be different from the expected default behavior in
Java
e.g.
12.3 => "12.30000019" instead of "12.3"
Double.POSITIVE_INFINITY => "Inf" instead of "INFINITY"
Double.NEGATIVE_INFINITY => "-Inf" instead of "-INFINITY"type - type of the resulting ColumnVectorpublic ColumnVector asBytes()
public ColumnVector asShorts()
public ColumnVector asInts()
public ColumnVector asLongs()
public ColumnVector asFloats()
public ColumnVector asDoubles()
public ColumnVector asTimestampDays()
public ColumnVector asTimestampDays(String format)
format - timestamp string format specifier, ignored if the column type is not stringpublic ColumnVector asTimestampSeconds()
public ColumnVector asTimestampSeconds(String format)
format - timestamp string format specifier, ignored if the column type is not stringpublic ColumnVector asTimestampMicroseconds()
public ColumnVector asTimestampMicroseconds(String format)
format - timestamp string format specifier, ignored if the column type is not stringpublic ColumnVector asTimestampMilliseconds()
public ColumnVector asTimestampMilliseconds(String format)
format - timestamp string format specifier, ignored if the column type is not stringpublic ColumnVector asTimestampNanoseconds()
public ColumnVector asTimestampNanoseconds(String format)
format - timestamp string format specifier, ignored if the column type is not stringpublic ColumnVector asTimestamp(DType timestampType, String format)
timestampType - timestamp DType that includes the time unit to parse the timestamp into.format - strptime format specifier string of the timestamp. Used to parse and convert
the timestamp with. Supports %Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z format specifiers.
See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md
for full parsing format specification and documentation.public ColumnVector asStrings()
DType.TIMESTAMP_DAYS - "%Y-%m-%d"
DType.TIMESTAMP_SECONDS - "%Y-%m-%d %H:%M:%S"
DType.TIMESTAMP_MICROSECONDS - "%Y-%m-%d %H:%M:%S.%f"
DType.TIMESTAMP_MILLISECONDS - "%Y-%m-%d %H:%M:%S.%f"
DType.TIMESTAMP_NANOSECONDS - "%Y-%m-%d %H:%M:%S.%f"public ColumnVector asStrings(String format)
format - - strftime format specifier string of the timestamp. Its used to parse and convert
the timestamp with. Supports %m,%j,%d,%H,%M,%S,%y,%Y,%f format specifiers.
%d Day of the month: 01-31
%m Month of the year: 01-12
%y Year without century: 00-99c
%Y Year with century: 0001-9999
%H 24-hour of the day: 00-23
%M Minute of the hour: 00-59
%S Second of the minute: 00-59
%f 6-digit microsecond: 000000-999999
See https://github.com/rapidsai/custrings/blob/branch-0.10/docs/source/datetime.md
Reported bugs
https://github.com/rapidsai/cudf/issues/4160 after the bug is fixed this method should
also support
%I 12-hour of the day: 01-12
%p Only 'AM', 'PM'
%j day of the yearpublic ColumnVector upper()
public ColumnVector lower()
public ColumnVector stringConcatenate(ColumnVector[] columns)
columns - array of columns containing strings.public static ColumnVector stringConcatenate(Scalar separator, Scalar narep, ColumnVector[] columns)
separator - string scalar inserted between each string being merged.narep - string scalar indicating null behavior. If set to null and any string in the row
is null the resulting string will be null. If not null, null values in any column
will be replaced by the specified string.columns - array of columns containing strings, must be more than 2 columnspublic ColumnVector stringLocate(Scalar substring)
substring - scalar containing the string to locate within each row.public ColumnVector stringLocate(Scalar substring, int start)
substring - scalar containing the string to locate within each row.start - character index to start the search from (inclusive).public ColumnVector stringLocate(Scalar substring, int start, int end)
substring - scalar containing the string scalar to locate within each row.start - character index to start the search from (inclusive).end - character index to end the search on (exclusive).public Table stringSplit(Scalar delimiter)
delimiter - UTF-8 encoded string identifying the split points in each string.
Default of empty string indicates split on whitespace.public Table stringSplit()
public ColumnVector substring(int start)
start - first character index to begin the substring(inclusive).public ColumnVector substring(int start, int end)
start - first character index to begin the substring(inclusive).end - last character index to stop the substring(exclusive)public ColumnVector substring(ColumnVector start, ColumnVector end)
start - Vector containing start indices of each stringend - Vector containing end indices of each string. -1 indicated to read until end of string.public ColumnVector stringReplace(Scalar target, Scalar replace)
target - String to search for within each string.replace - Replacement string if target is found.public ColumnVector stringReplaceWithBackrefs(String pattern, String replace)
pattern - The regular expression patterns to search within each string.replace - The replacement template for creating the output string.public ColumnVector startsWith(Scalar pattern)
pattern - scalar containing the string being searched for at the beginning of the column's strings.public ColumnVector endsWith(Scalar pattern)
pattern - scalar containing the string being searched for at the end of the column's strings.public ColumnVector strip()
public ColumnVector strip(Scalar toStrip)
toStrip - UTF-8 encoded characters to strip from each string.public ColumnVector lstrip()
public ColumnVector lstrip(Scalar toStrip)
toStrip - UTF-8 encoded characters to strip from each string.public ColumnVector rstrip()
public ColumnVector rstrip(Scalar toStrip)
toStrip - UTF-8 encoded characters to strip from each string.public ColumnVector stringContains(Scalar compString)
compString - scalar containing the string being searched for.public ColumnVector clamp(Scalar lo, Scalar hi)
lo - - Minimum clamp value. All elements less than `lo` will be replaced by `lo`.
Ignored if null.hi - - Maximum clamp value. All elements greater than `hi` will be replaced by `hi`.
Ignored if null.public ColumnVector clamp(Scalar lo, Scalar loReplace, Scalar hi, Scalar hiReplace)
lo - - Minimum clamp value. All elements less than `lo` will be replaced by `loReplace`. Ignored if null.loReplace - - All elements less than `lo` will be replaced by `loReplace`.hi - - Maximum clamp value. All elements greater than `hi` will be replaced by `hiReplace`. Ignored if null.hiReplace - - All elements greater than `hi` will be replaced by `hiReplace`.public ColumnVector matchesRe(String pattern)
pattern - Regex pattern to match to each string.public ColumnVector containsRe(String pattern)
pattern - Regex pattern to match to each string.public Table extractRe(String pattern) throws CudfException
pattern - the pattern to useCudfException - if any error happens including if the RE does
not contain any capture groups.public long getNativeView()
public static ColumnVector build(DType type, int rows, java.util.function.Consumer<HostColumnVector.Builder> init)
type - the type of vector to build.rows - maximum number of rows that the vector can hold.init - what will initialize the vector.public static ColumnVector build(int rows, long stringBufferSize, java.util.function.Consumer<HostColumnVector.Builder> init)
public static ColumnVector boolFromBytes(byte... values)
public static ColumnVector fromBytes(byte... values)
public static ColumnVector fromShorts(short... values)
public static ColumnVector fromInts(int... values)
public static ColumnVector fromLongs(long... values)
public static ColumnVector fromFloats(float... values)
public static ColumnVector fromDoubles(double... values)
public static ColumnVector daysFromInts(int... values)
public static ColumnVector timestampSecondsFromLongs(long... values)
public static ColumnVector timestampMilliSecondsFromLongs(long... values)
public static ColumnVector timestampMicroSecondsFromLongs(long... values)
public static ColumnVector timestampNanoSecondsFromLongs(long... values)
public static ColumnVector fromStrings(String... values)
public static ColumnVector fromBoxedBooleans(Boolean... values)
public static ColumnVector fromBoxedBytes(Byte... values)
public static ColumnVector fromBoxedShorts(Short... values)
public static ColumnVector fromBoxedInts(Integer... values)
public static ColumnVector fromBoxedLongs(Long... values)
public static ColumnVector fromBoxedFloats(Float... values)
public static ColumnVector fromBoxedDoubles(Double... values)
public static ColumnVector timestampDaysFromBoxedInts(Integer... values)
public static ColumnVector timestampSecondsFromBoxedLongs(Long... values)
public static ColumnVector timestampMilliSecondsFromBoxedLongs(Long... values)
public static ColumnVector timestampMicroSecondsFromBoxedLongs(Long... values)
public static ColumnVector timestampNanoSecondsFromBoxedLongs(Long... values)
Copyright © 2020. All rights reserved.