class ParquetCachedBatchSerializer extends GpuCachedBatchSerializer
This class assumes, the data is Columnar and the plugin is on. Note, this class should not be referenced directly in source code. It should be loaded by reflection using ShimLoader.newInstanceOf, see ./docs/dev/shims.md
- Attributes
- protected
- Alphabetic
- By Inheritance
- ParquetCachedBatchSerializer
- GpuCachedBatchSerializer
- CachedBatchSerializer
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new ParquetCachedBatchSerializer()
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
buildFilter(predicates: Seq[Expression], cachedAttributes: Seq[Attribute]): (Int, Iterator[CachedBatch]) ⇒ Iterator[CachedBatch]
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
convertCachedBatchToColumnarBatch(input: RDD[CachedBatch], cacheAttributes: Seq[Attribute], selectedAttributes: Seq[Attribute], conf: SQLConf): RDD[ColumnarBatch]
Convert the cached data into a ColumnarBatch taking the result data back to the host
Convert the cached data into a ColumnarBatch taking the result data back to the host
- input
the cached batches that should be converted.
- cacheAttributes
the attributes of the data in the batch.
- selectedAttributes
the fields that should be loaded from the data and the order they should appear in the output batch.
- conf
the configuration for the job.
- returns
an RDD of the input cached batches transformed into the ColumnarBatch format.
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
def
convertCachedBatchToInternalRow(input: RDD[CachedBatch], cacheAttributes: Seq[Attribute], selectedAttributes: Seq[Attribute], conf: SQLConf): RDD[InternalRow]
Convert the cached batch into
InternalRows.Convert the cached batch into
InternalRows.- input
the cached batches that should be converted.
- cacheAttributes
the attributes of the data in the batch.
- selectedAttributes
the field that should be loaded from the data and the order they should appear in the output rows.
- conf
the configuration for the job.
- returns
RDD of the rows that were stored in the cached batches.
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
def
convertColumnarBatchToCachedBatch(input: RDD[ColumnarBatch], schema: Seq[Attribute], storageLevel: StorageLevel, conf: SQLConf): RDD[CachedBatch]
Convert an
RDD[ColumnarBatch]into anRDD[CachedBatch]in preparation for caching the data.Convert an
RDD[ColumnarBatch]into anRDD[CachedBatch]in preparation for caching the data. This method uses Parquet Writer on the GPU to write the cached batch- input
the input
RDDto be converted.- schema
the schema of the data being stored.
- storageLevel
where the data will be stored.
- conf
the config for the query.
- returns
The data converted into a format more suitable for caching.
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
def
convertInternalRowToCachedBatch(input: RDD[InternalRow], schema: Seq[Attribute], storageLevel: StorageLevel, conf: SQLConf): RDD[CachedBatch]
Convert an
RDD[InternalRow]into anRDD[CachedBatch]in preparation for caching the data.Convert an
RDD[InternalRow]into anRDD[CachedBatch]in preparation for caching the data. We use the RowToColumnarIterator and convert each batch at a time- input
the input
RDDto be converted.- schema
the schema of the data being stored.
- storageLevel
where the data will be stored.
- conf
the config for the query.
- returns
The data converted into a format more suitable for caching.
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
- def getBytesAllowedPerBatch(conf: SQLConf): Long
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getParquetWriterOptions(useCompression: Boolean, schema: StructType): ParquetWriterOptions
-
def
gpuConvertCachedBatchToColumnarBatch(input: RDD[CachedBatch], cacheAttributes: Seq[Attribute], selectedAttributes: Seq[Attribute], conf: SQLConf): RDD[ColumnarBatch]
This method decodes the CachedBatch leaving it on the GPU to avoid the extra copying back to the host
This method decodes the CachedBatch leaving it on the GPU to avoid the extra copying back to the host
- input
the cached batches that should be converted.
- cacheAttributes
the attributes of the data in the batch.
- selectedAttributes
the fields that should be loaded from the data and the order they should appear in the output batch.
- conf
the configuration for the job.
- returns
an RDD of the input cached batches transformed into the ColumnarBatch format.
- Definition Classes
- ParquetCachedBatchSerializer → GpuCachedBatchSerializer
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isSchemaSupportedByCudf(schema: Seq[Attribute]): Boolean
- def isSupportedByCudf(dataType: DataType): Boolean
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
supportsColumnarInput(schema: Seq[Attribute]): Boolean
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
def
supportsColumnarOutput(schema: StructType): Boolean
- Definition Classes
- ParquetCachedBatchSerializer → CachedBatchSerializer
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
vectorTypes(attributes: Seq[Attribute], conf: SQLConf): Option[Seq[String]]
- Definition Classes
- CachedBatchSerializer
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- object PcbToRowsIterator