public class ParquetUtils extends FileFormatUtils
| Constructor and Description |
|---|
ParquetUtils() |
| Modifier and Type | Method and Description |
|---|---|
List<HoodieKey> |
fetchHoodieKeys(HoodieStorage storage,
StoragePath filePath)
Fetch
HoodieKeys from the given parquet file. |
List<HoodieKey> |
fetchHoodieKeys(HoodieStorage storage,
StoragePath filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Fetch
HoodieKeys from the given parquet file. |
Set<String> |
filterRowKeys(HoodieStorage storage,
StoragePath filePath,
Set<String> filter)
Read the rowKey list matching the given filter, from the given parquet file.
|
static org.apache.parquet.hadoop.metadata.CompressionCodecName |
getCompressionCodecName(String codecName) |
HoodieFileFormat |
getFormat() |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(HoodieStorage storage,
StoragePath filePath) |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(HoodieStorage storage,
StoragePath filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Returns a closable iterator for reading the given parquet file.
|
long |
getRowCount(HoodieStorage storage,
StoragePath filePath)
Returns the number of records in the parquet file.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(HoodieStorage storage,
StoragePath filePath)
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(HoodieStorage storage,
StoragePath filePath,
org.apache.avro.Schema schema) |
org.apache.avro.Schema |
readAvroSchema(HoodieStorage storage,
StoragePath filePath) |
List<HoodieColumnRangeMetadata<Comparable>> |
readColumnStatsFromMetadata(HoodieStorage storage,
StoragePath filePath,
List<String> columnList) |
Map<String,String> |
readFooter(HoodieStorage storage,
boolean required,
StoragePath filePath,
String... footerNames) |
static org.apache.parquet.hadoop.metadata.ParquetMetadata |
readMetadata(HoodieStorage storage,
StoragePath parquetFilePath) |
org.apache.parquet.schema.MessageType |
readSchema(HoodieStorage storage,
StoragePath parquetFilePath)
Get the schema of the given parquet file.
|
byte[] |
serializeRecordsToLogBlock(HoodieStorage storage,
List<HoodieRecord> records,
org.apache.avro.Schema writerSchema,
org.apache.avro.Schema readerSchema,
String keyFieldName,
Map<String,String> paramsMap) |
void |
writeMetaFile(HoodieStorage storage,
StoragePath filePath,
Properties props) |
readBloomFilterFromMetadata, readMinMaxRecordKeys, readRowKeyspublic Set<String> filterRowKeys(HoodieStorage storage, StoragePath filePath, Set<String> filter)
filterRowKeys in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.filter - record keys filterpublic static org.apache.parquet.hadoop.metadata.ParquetMetadata readMetadata(HoodieStorage storage, StoragePath parquetFilePath)
public static org.apache.parquet.hadoop.metadata.CompressionCodecName getCompressionCodecName(String codecName)
codecName - codec name in String.CompressionCodecName Enum.public List<HoodieKey> fetchHoodieKeys(HoodieStorage storage, StoragePath filePath)
HoodieKeys from the given parquet file.fetchHoodieKeys in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.List of HoodieKeys fetched from the parquet filepublic ClosableIterator<HoodieKey> getHoodieKeyIterator(HoodieStorage storage, StoragePath filePath)
getHoodieKeyIterator in class FileFormatUtilspublic ClosableIterator<HoodieKey> getHoodieKeyIterator(HoodieStorage storage, StoragePath filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
getHoodieKeyIterator in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file pathkeyGeneratorOpt - instance of KeyGeneratorClosableIterator of HoodieKeys for reading the parquet filepublic List<HoodieKey> fetchHoodieKeys(HoodieStorage storage, StoragePath filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
HoodieKeys from the given parquet file.fetchHoodieKeys in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.keyGeneratorOpt - instance of KeyGenerator.List of HoodieKeys fetched from the parquet filepublic org.apache.parquet.schema.MessageType readSchema(HoodieStorage storage, StoragePath parquetFilePath)
public Map<String,String> readFooter(HoodieStorage storage, boolean required, StoragePath filePath, String... footerNames)
readFooter in class FileFormatUtilspublic org.apache.avro.Schema readAvroSchema(HoodieStorage storage, StoragePath filePath)
readAvroSchema in class FileFormatUtilspublic List<HoodieColumnRangeMetadata<Comparable>> readColumnStatsFromMetadata(HoodieStorage storage, StoragePath filePath, List<String> columnList)
readColumnStatsFromMetadata in class FileFormatUtilspublic HoodieFileFormat getFormat()
getFormat in class FileFormatUtilspublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(HoodieStorage storage, StoragePath filePath)
readAvroRecords in class FileFormatUtilspublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(HoodieStorage storage, StoragePath filePath, org.apache.avro.Schema schema)
readAvroRecords in class FileFormatUtilspublic long getRowCount(HoodieStorage storage, StoragePath filePath)
getRowCount in class FileFormatUtilsstorage - HoodieStorage instance.filePath - path of the filepublic void writeMetaFile(HoodieStorage storage, StoragePath filePath, Properties props) throws IOException
writeMetaFile in class FileFormatUtilsIOExceptionpublic byte[] serializeRecordsToLogBlock(HoodieStorage storage, List<HoodieRecord> records, org.apache.avro.Schema writerSchema, org.apache.avro.Schema readerSchema, String keyFieldName, Map<String,String> paramsMap) throws IOException
serializeRecordsToLogBlock in class FileFormatUtilsIOExceptionCopyright © 2024 The Apache Software Foundation. All rights reserved.