public class ParquetUtils extends BaseFileUtils
| Constructor and Description |
|---|
ParquetUtils() |
| Modifier and Type | Method and Description |
|---|---|
List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Fetch
HoodieKeys from the given parquet file. |
List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Fetch
HoodieKeys from the given parquet file. |
Set<String> |
filterRowKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Set<String> filter)
Read the rowKey list matching the given filter, from the given parquet file.
|
HoodieFileFormat |
getFormat() |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Provides a closable iterator for reading the given data file.
|
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Returns a closable iterator for reading the given parquet file.
|
long |
getRowCount(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path parquetFilePath)
Returns the number of records in the parquet file.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
org.apache.avro.Schema schema)
Read the data file using the given schema
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
org.apache.avro.Schema |
readAvroSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path parquetFilePath)
Read the Avro schema of the data file.
|
Map<String,String> |
readFooter(org.apache.hadoop.conf.Configuration configuration,
boolean required,
org.apache.hadoop.fs.Path parquetFilePath,
String... footerNames)
Read the footer data of the given data file.
|
static org.apache.parquet.hadoop.metadata.ParquetMetadata |
readMetadata(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path parquetFilePath) |
List<HoodieColumnRangeMetadata<Comparable>> |
readRangeFromParquetMetadata(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path parquetFilePath,
List<String> cols)
Parse min/max statistics stored in parquet footers for all columns.
|
org.apache.parquet.schema.MessageType |
readSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path parquetFilePath)
Get the schema of the given parquet file.
|
getInstance, getInstance, getInstance, readBloomFilterFromMetadata, readMinMaxRecordKeys, readRowKeyspublic Set<String> filterRowKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Set<String> filter)
filterRowKeys in class BaseFileUtilsfilePath - The parquet file path.configuration - configuration to build fs objectfilter - record keys filterpublic static org.apache.parquet.hadoop.metadata.ParquetMetadata readMetadata(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path parquetFilePath)
public List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
HoodieKeys from the given parquet file.fetchHoodieKeys in class BaseFileUtilsfilePath - The parquet file path.configuration - configuration to build fs objectList of HoodieKeys fetched from the parquet filepublic ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
BaseFileUtilsgetHoodieKeyIterator in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The data file pathClosableIterator of HoodieKeys for reading the filepublic ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
getHoodieKeyIterator in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The parquet file pathkeyGeneratorOpt - instance of KeyGeneratorClosableIterator of HoodieKeys for reading the parquet filepublic List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
HoodieKeys from the given parquet file.fetchHoodieKeys in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The parquet file path.keyGeneratorOpt - instance of KeyGenerator.List of HoodieKeys fetched from the parquet filepublic org.apache.parquet.schema.MessageType readSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path parquetFilePath)
public Map<String,String> readFooter(org.apache.hadoop.conf.Configuration configuration, boolean required, org.apache.hadoop.fs.Path parquetFilePath, String... footerNames)
BaseFileUtilsreadFooter in class BaseFileUtilsconfiguration - Configurationrequired - require the footer data to be in data fileparquetFilePath - The data file pathfooterNames - The footer names to readpublic org.apache.avro.Schema readAvroSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path parquetFilePath)
BaseFileUtilsreadAvroSchema in class BaseFileUtilsconfiguration - ConfigurationparquetFilePath - The data file pathpublic HoodieFileFormat getFormat()
getFormat in class BaseFileUtilsHoodieFileFormat.public List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
readAvroRecords in class BaseFileUtilsconfiguration - ConfigurationfilePath - The data file pathpublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, org.apache.avro.Schema schema)
BaseFileUtilsreadAvroRecords in class BaseFileUtilsconfiguration - ConfigurationfilePath - The data file pathpublic long getRowCount(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path parquetFilePath)
getRowCount in class BaseFileUtilsconf - ConfigurationparquetFilePath - path of the filepublic List<HoodieColumnRangeMetadata<Comparable>> readRangeFromParquetMetadata(@Nonnull org.apache.hadoop.conf.Configuration conf, @Nonnull org.apache.hadoop.fs.Path parquetFilePath, @Nonnull List<String> cols)
Copyright © 2022 The Apache Software Foundation. All rights reserved.