T - The type of objects written by the constructed ParquetWriter.SELF - The type of this builder that is returned by builder methodspublic abstract static class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>> extends Object
Object models should extend this builder to provide writer configuration options.
| Modifier | Constructor and Description |
|---|---|
protected |
Builder(OutputFile path) |
protected |
Builder(org.apache.hadoop.fs.Path path) |
| Modifier and Type | Method and Description |
|---|---|
ParquetWriter<T> |
build()
Build a
ParquetWriter with the accumulated configuration. |
SELF |
config(String property,
String value)
Set a property that will be available to the read path.
|
SELF |
enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.
|
SELF |
enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.
|
SELF |
enableValidation()
Enables validation for the constructed writer.
|
protected abstract WriteSupport<T> |
getWriteSupport(org.apache.hadoop.conf.Configuration conf)
Deprecated.
Use
getWriteSupport(ParquetConfiguration) instead |
protected WriteSupport<T> |
getWriteSupport(ParquetConfiguration conf) |
protected abstract SELF |
self() |
SELF |
withAdaptiveBloomFilterEnabled(boolean enabled)
When NDV (number of distinct values) for a specified column is not set, whether to use
`AdaptiveBloomFilter` to automatically adjust the BloomFilter size according to `parquet.bloom.filter.max.bytes`
|
SELF |
withAllocator(ByteBufferAllocator allocator)
Sets the ByteBuffer allocator instance to be used for allocating memory for writing.
|
SELF |
withBloomFilterCandidateNumber(String columnPath,
int number)
When `AdaptiveBloomFilter` is enabled, set how many bloom filter candidates to use.
|
SELF |
withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabled
|
SELF |
withBloomFilterEnabled(String columnPath,
boolean enabled)
Sets the bloom filter enabled/disabled for the specified column.
|
SELF |
withBloomFilterFPP(String columnPath,
double fpp) |
SELF |
withBloomFilterNDV(String columnPath,
long ndv)
Sets the NDV (number of distinct values) for the specified column.
|
SELF |
withByteStreamSplitEncoding(boolean enableByteStreamSplit) |
SELF |
withCodecFactory(CompressionCodecFactory codecFactory)
Set the
codec factory used by the
constructed writer. |
SELF |
withColumnIndexTruncateLength(int length)
Sets the length to be used for truncating binary values in a binary column index.
|
SELF |
withCompressionCodec(CompressionCodecName codecName)
Set the
compression codec used by the
constructed writer. |
SELF |
withConf(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration used by the constructed writer. |
SELF |
withConf(ParquetConfiguration conf)
Set the
ParquetConfiguration used by the constructed writer. |
SELF |
withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.
|
SELF |
withDictionaryEncoding(String columnPath,
boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.
|
SELF |
withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed
writer.
|
SELF |
withEncryption(FileEncryptionProperties encryptionProperties)
Set the
file encryption properties used by the
constructed writer. |
SELF |
withExtraMetaData(Map<String,String> extraMetaData)
Sets additional metadata entries to be included in the file footer.
|
SELF |
withMaxBloomFilterBytes(int maxBloomFilterBytes)
Set max Bloom filter bytes for related columns.
|
SELF |
withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align
row groups with blocks in the underlying filesystem.
|
SELF |
withMaxRowCountForPageSizeCheck(int max)
Sets the maximum number of rows to write before a page size check is done.
|
SELF |
withMinRowCountForPageSizeCheck(int min)
Sets the minimum number of rows to write before a page size check is done.
|
SELF |
withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.
|
SELF |
withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.
|
SELF |
withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.
|
SELF |
withRowGroupSize(int rowGroupSize)
Deprecated.
Use
withRowGroupSize(long) instead |
SELF |
withRowGroupSize(long rowGroupSize)
Set the Parquet format row group size used by the constructed writer.
|
SELF |
withStatisticsTruncateLength(int length)
Sets the length which the min/max binary values in row groups are truncated to.
|
SELF |
withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.
|
SELF |
withWriteMode(ParquetFileWriter.Mode mode)
Set the
write mode used when creating the
backing file for this writer. |
SELF |
withWriterVersion(ParquetProperties.WriterVersion version)
Set the
format version used by the constructed
writer. |
protected Builder(org.apache.hadoop.fs.Path path)
protected Builder(OutputFile path)
protected abstract SELF self()
@Deprecated protected abstract WriteSupport<T> getWriteSupport(org.apache.hadoop.conf.Configuration conf)
getWriteSupport(ParquetConfiguration) insteadconf - a configurationprotected WriteSupport<T> getWriteSupport(ParquetConfiguration conf)
conf - a configurationpublic SELF withConf(org.apache.hadoop.conf.Configuration conf)
Configuration used by the constructed writer.conf - a Configurationpublic SELF withConf(ParquetConfiguration conf)
ParquetConfiguration used by the constructed writer.conf - a ParquetConfigurationpublic SELF withWriteMode(ParquetFileWriter.Mode mode)
write mode used when creating the
backing file for this writer.mode - a ParquetFileWriter.Modepublic SELF withCompressionCodec(CompressionCodecName codecName)
compression codec used by the
constructed writer.codecName - a CompressionCodecNamepublic SELF withCodecFactory(CompressionCodecFactory codecFactory)
codec factory used by the
constructed writer.codecFactory - a CompressionCodecFactorypublic SELF withEncryption(FileEncryptionProperties encryptionProperties)
file encryption properties used by the
constructed writer.encryptionProperties - a FileEncryptionProperties@Deprecated public SELF withRowGroupSize(int rowGroupSize)
withRowGroupSize(long) insteadrowGroupSize - an integer size in bytespublic SELF withRowGroupSize(long rowGroupSize)
rowGroupSize - an integer size in bytespublic SELF withPageSize(int pageSize)
pageSize - an integer size in bytespublic SELF withPageRowCountLimit(int rowCount)
rowCount - limit for the number of rows stored in a pagepublic SELF withDictionaryPageSize(int dictionaryPageSize)
dictionaryPageSize - an integer size in bytespublic SELF withMaxPaddingSize(int maxPaddingSize)
maxPaddingSize - an integer size in bytespublic SELF enableDictionaryEncoding()
public SELF withDictionaryEncoding(boolean enableDictionary)
enableDictionary - whether dictionary encoding should be enabledpublic SELF withByteStreamSplitEncoding(boolean enableByteStreamSplit)
public SELF withDictionaryEncoding(String columnPath, boolean enableDictionary)
columnPath - the path of the column (dot-string)enableDictionary - whether dictionary encoding should be enabledpublic SELF enableValidation()
public SELF withValidation(boolean enableValidation)
enableValidation - whether validation should be enabledpublic SELF withWriterVersion(ParquetProperties.WriterVersion version)
format version used by the constructed
writer.version - a WriterVersionpublic SELF enablePageWriteChecksum()
public SELF withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
enablePageWriteChecksum - whether page checksums should be written outpublic SELF withMaxBloomFilterBytes(int maxBloomFilterBytes)
maxBloomFilterBytes - the max bytes of a Bloom filter bitset for a column.public SELF withBloomFilterNDV(String columnPath, long ndv)
columnPath - the path of the column (dot-string)ndv - the NDV of the columnpublic SELF withAdaptiveBloomFilterEnabled(boolean enabled)
enabled - whether to write bloom filter for the columnpublic SELF withBloomFilterCandidateNumber(String columnPath, int number)
columnPath - the path of the column (dot-string)number - the number of candidatepublic SELF withBloomFilterEnabled(boolean enabled)
enabled - whether to write bloom filterspublic SELF withBloomFilterEnabled(String columnPath, boolean enabled)
withBloomFilterEnabled(boolean).columnPath - the path of the column (dot-string)enabled - whether to write bloom filter for the columnpublic SELF withMinRowCountForPageSizeCheck(int min)
min - writes at least `min` rows before invoking a page size checkpublic SELF withMaxRowCountForPageSizeCheck(int max)
max - makes a page size check after `max` rows have been writtenpublic SELF withColumnIndexTruncateLength(int length)
length - the length to truncate topublic SELF withStatisticsTruncateLength(int length)
length - the length to truncate topublic SELF withExtraMetaData(Map<String,String> extraMetaData)
extraMetaData - a Map of additional stringly-typed metadata entriespublic SELF withAllocator(ByteBufferAllocator allocator)
allocator - the allocator instancepublic SELF config(String property, String value)
property - a String property namevalue - a String property valuepublic ParquetWriter<T> build() throws IOException
ParquetWriter with the accumulated configuration.ParquetWriter instance.IOException - if there is an error while creating the writerCopyright © 2023 The Apache Software Foundation. All rights reserved.