public class OrcUtils extends BaseFileUtils
| Constructor and Description |
|---|
OrcUtils() |
| Modifier and Type | Method and Description |
|---|---|
List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Fetch
HoodieKeys from the given ORC file. |
List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Fetch
HoodieKeys from the given data file. |
Set<String> |
filterRowKeys(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path filePath,
Set<String> filter)
Read the rowKey list matching the given filter, from the given ORC file.
|
HoodieFileFormat |
getFormat() |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Provides a closable iterator for reading the given ORC file.
|
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Provides a closable iterator for reading the given data file.
|
long |
getRowCount(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path orcFilePath)
Returns the number of records in the data file.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
org.apache.avro.Schema avroSchema)
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
org.apache.avro.Schema |
readAvroSchema(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path orcFilePath)
Read the Avro schema of the data file.
|
Map<String,String> |
readFooter(org.apache.hadoop.conf.Configuration conf,
boolean required,
org.apache.hadoop.fs.Path orcFilePath,
String... footerNames)
Read the footer data of the given data file.
|
getInstance, getInstance, getInstance, readBloomFilterFromMetadata, readMinMaxRecordKeys, readRowKeyspublic ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
getHoodieKeyIterator in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The ORC file pathClosableIterator of HoodieKeys for reading the ORC filepublic List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
HoodieKeys from the given ORC file.fetchHoodieKeys in class BaseFileUtilsfilePath - The ORC file path.configuration - configuration to build fs objectList of HoodieKeys fetched from the ORC filepublic List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
BaseFileUtilsHoodieKeys from the given data file.fetchHoodieKeys in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The data file pathkeyGeneratorOpt - instance of KeyGenerator.List of HoodieKeys fetched from the data filepublic ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
BaseFileUtilsgetHoodieKeyIterator in class BaseFileUtilsconfiguration - configuration to build fs objectfilePath - The data file pathkeyGeneratorOpt - instance of KeyGenerator.ClosableIterator of HoodieKeys for reading the filepublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
readAvroRecords in class BaseFileUtilsconfiguration - ConfigurationfilePath - The data file pathpublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, org.apache.avro.Schema avroSchema)
readAvroRecords in class BaseFileUtilsconfiguration - ConfigurationfilePath - The data file pathpublic Set<String> filterRowKeys(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path filePath, Set<String> filter) throws HoodieIOException
filterRowKeys in class BaseFileUtilsconf - configuration to build fs object.filePath - The ORC file path.filter - record keys filterHoodieIOExceptionpublic Map<String,String> readFooter(org.apache.hadoop.conf.Configuration conf, boolean required, org.apache.hadoop.fs.Path orcFilePath, String... footerNames)
BaseFileUtilsreadFooter in class BaseFileUtilsconf - Configurationrequired - require the footer data to be in data fileorcFilePath - The data file pathfooterNames - The footer names to readpublic org.apache.avro.Schema readAvroSchema(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path orcFilePath)
BaseFileUtilsreadAvroSchema in class BaseFileUtilsconf - ConfigurationorcFilePath - The data file pathpublic HoodieFileFormat getFormat()
getFormat in class BaseFileUtilsHoodieFileFormat.public long getRowCount(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path orcFilePath)
BaseFileUtilsgetRowCount in class BaseFileUtilsconf - ConfigurationorcFilePath - The data file pathCopyright © 2022 The Apache Software Foundation. All rights reserved.