public abstract class BaseFileUtils extends Object
| Constructor and Description |
|---|
BaseFileUtils() |
| Modifier and Type | Method and Description |
|---|---|
abstract List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Fetch
HoodieKeys from the given data file. |
abstract List<HoodieKey> |
fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Fetch
HoodieKeys from the given data file. |
abstract Set<String> |
filterRowKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Set<String> filter)
Read the rowKey list matching the given filter, from the given data file.
|
abstract HoodieFileFormat |
getFormat() |
abstract ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Provides a closable iterator for reading the given data file.
|
abstract ClosableIterator<HoodieKey> |
getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
Option<BaseKeyGenerator> keyGeneratorOpt)
Provides a closable iterator for reading the given data file.
|
static BaseFileUtils |
getInstance(HoodieFileFormat fileFormat) |
static BaseFileUtils |
getInstance(HoodieTableMetaClient metaClient) |
static BaseFileUtils |
getInstance(String path) |
abstract long |
getRowCount(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Returns the number of records in the data file.
|
abstract List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Read the data file
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
abstract List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath,
org.apache.avro.Schema schema)
Read the data file using the given schema
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
abstract org.apache.avro.Schema |
readAvroSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Read the Avro schema of the data file.
|
BloomFilter |
readBloomFilterFromMetadata(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Read the bloom filter from the metadata of the given data file.
|
abstract Map<String,String> |
readFooter(org.apache.hadoop.conf.Configuration configuration,
boolean required,
org.apache.hadoop.fs.Path filePath,
String... footerNames)
Read the footer data of the given data file.
|
String[] |
readMinMaxRecordKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Read the min and max record key from the metadata of the given data file.
|
Set<String> |
readRowKeys(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
Read the rowKey list from the given data file.
|
public static BaseFileUtils getInstance(String path)
public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)
public static BaseFileUtils getInstance(HoodieTableMetaClient metaClient)
public Set<String> readRowKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
filePath - The data file pathconfiguration - configuration to build fs objectpublic BloomFilter readBloomFilterFromMetadata(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
configuration - ConfigurationfilePath - The data file pathpublic String[] readMinMaxRecordKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
configuration - ConfigurationfilePath - The data file pathpublic abstract List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
configuration - ConfigurationfilePath - The data file pathpublic abstract List<org.apache.avro.generic.GenericRecord> readAvroRecords(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, org.apache.avro.Schema schema)
configuration - ConfigurationfilePath - The data file pathpublic abstract Map<String,String> readFooter(org.apache.hadoop.conf.Configuration configuration, boolean required, org.apache.hadoop.fs.Path filePath, String... footerNames)
configuration - Configurationrequired - require the footer data to be in data filefilePath - The data file pathfooterNames - The footer names to readpublic abstract long getRowCount(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
configuration - ConfigurationfilePath - The data file pathpublic abstract Set<String> filterRowKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Set<String> filter)
filePath - The data file pathconfiguration - configuration to build fs objectfilter - record keys filterpublic abstract List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
HoodieKeys from the given data file.public abstract ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
configuration - configuration to build fs objectfilePath - The data file pathkeyGeneratorOpt - instance of KeyGenerator.ClosableIterator of HoodieKeys for reading the filepublic abstract ClosableIterator<HoodieKey> getHoodieKeyIterator(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath)
configuration - configuration to build fs objectfilePath - The data file pathClosableIterator of HoodieKeys for reading the filepublic abstract List<HoodieKey> fetchHoodieKeys(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path filePath, Option<BaseKeyGenerator> keyGeneratorOpt)
HoodieKeys from the given data file.public abstract org.apache.avro.Schema readAvroSchema(org.apache.hadoop.conf.Configuration configuration,
org.apache.hadoop.fs.Path filePath)
configuration - ConfigurationfilePath - The data file pathpublic abstract HoodieFileFormat getFormat()
HoodieFileFormat.Copyright © 2022 The Apache Software Foundation. All rights reserved.