public static class RewriteOptions.Builder extends Object
| Constructor and Description |
|---|
Builder(org.apache.hadoop.conf.Configuration conf,
List<org.apache.hadoop.fs.Path> inputFiles,
org.apache.hadoop.fs.Path outputFile)
Create a builder to create a RewriterOptions.
|
Builder(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path inputFile,
org.apache.hadoop.fs.Path outputFile)
Create a builder to create a RewriterOptions.
|
Builder(ParquetConfiguration conf,
InputFile inputFile,
OutputFile outputFile)
Create a builder to create a RewriterOptions.
|
Builder(ParquetConfiguration conf,
List<InputFile> inputFiles,
OutputFile outputFile)
Create a builder to create a RewriterOptions.
|
| Modifier and Type | Method and Description |
|---|---|
RewriteOptions.Builder |
addInputFile(InputFile inputFile)
Add an input file to read from.
|
RewriteOptions.Builder |
addInputFile(org.apache.hadoop.fs.Path path)
Add an input file to read from.
|
RewriteOptions |
build()
Build the RewriterOptions.
|
RewriteOptions.Builder |
encrypt(List<String> encryptColumns)
Set the columns to encrypt.
|
RewriteOptions.Builder |
encryptionProperties(FileEncryptionProperties fileEncryptionProperties)
Set the encryption properties to use for the output file.
|
RewriteOptions.Builder |
indexCacheStrategy(IndexCache.CacheStrategy cacheStrategy)
Set the index(ColumnIndex, Offset and BloomFilter) cache strategy.
|
RewriteOptions.Builder |
mask(Map<String,MaskMode> maskColumns)
Set the columns to mask.
|
RewriteOptions.Builder |
prune(List<String> columns)
Set the columns to prune.
|
RewriteOptions.Builder |
transform(CompressionCodecName newCodecName)
Set the compression codec to use for the output file.
|
public Builder(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path inputFile,
org.apache.hadoop.fs.Path outputFile)
conf - configuration for reading from input files and writing to output fileinputFile - input file path to read fromoutputFile - output file path to rewrite topublic Builder(ParquetConfiguration conf, InputFile inputFile, OutputFile outputFile)
conf - configuration for reading from input files and writing to output fileinputFile - input file to read fromoutputFile - output file to rewrite topublic Builder(org.apache.hadoop.conf.Configuration conf,
List<org.apache.hadoop.fs.Path> inputFiles,
org.apache.hadoop.fs.Path outputFile)
Please note that if merging more than one file, the schema of all files must be the same. Otherwise, the rewrite will fail.
The rewrite will keep original row groups from all input files. This may not be optimal if row groups are very small and will not solve small file problems. Instead, it will make it worse to have a large file footer in the output file. TODO: support rewrite by record to break the original row groups into reasonable ones.
conf - configuration for reading from input files and writing to output fileinputFiles - list of input file paths to read fromoutputFile - output file path to rewrite topublic Builder(ParquetConfiguration conf, List<InputFile> inputFiles, OutputFile outputFile)
Please note that if merging more than one file, the schema of all files must be the same. Otherwise, the rewrite will fail.
The rewrite will keep original row groups from all input files. This may not be optimal if row groups are very small and will not solve small file problems. Instead, it will make it worse to have a large file footer in the output file. TODO: support rewrite by record to break the original row groups into reasonable ones.
conf - configuration for reading from input files and writing to output fileinputFiles - list of input file paths to read fromoutputFile - output file path to rewrite topublic RewriteOptions.Builder prune(List<String> columns)
By default, all columns are kept.
columns - list of columns to prunepublic RewriteOptions.Builder transform(CompressionCodecName newCodecName)
By default, the codec is the same as the input file.
newCodecName - compression codec to usepublic RewriteOptions.Builder mask(Map<String,MaskMode> maskColumns)
By default, no columns are masked.
maskColumns - map of columns to mask to the masking modepublic RewriteOptions.Builder encrypt(List<String> encryptColumns)
By default, no columns are encrypted.
encryptColumns - list of columns to encryptpublic RewriteOptions.Builder encryptionProperties(FileEncryptionProperties fileEncryptionProperties)
This is required if encrypting columns are not empty.
fileEncryptionProperties - encryption properties to usepublic RewriteOptions.Builder addInputFile(org.apache.hadoop.fs.Path path)
path - input file path to read frompublic RewriteOptions.Builder addInputFile(InputFile inputFile)
inputFile - input file to read frompublic RewriteOptions.Builder indexCacheStrategy(IndexCache.CacheStrategy cacheStrategy)
This could reduce the random seek while rewriting with PREFETCH_BLOCK strategy, NONE by default.
cacheStrategy - the index cache strategy, supports: IndexCache.CacheStrategy#NONE or
IndexCache.CacheStrategy#PREFETCH_BLOCKpublic RewriteOptions build()
Copyright © 2023 The Apache Software Foundation. All rights reserved.