public class DFSAvroDeltaInputReader extends DFSDeltaInputReader
DeltaOutputMode.DFS and DeltaInputType.AVRO.| Modifier and Type | Field and Description |
|---|---|
protected org.apache.hadoop.fs.PathFilter |
filter |
| Constructor and Description |
|---|
DFSAvroDeltaInputReader(org.apache.spark.sql.SparkSession sparkSession,
String schemaStr,
String basePath,
org.apache.hudi.common.util.Option<String> structName,
org.apache.hudi.common.util.Option<String> nameSpace) |
| Modifier and Type | Method and Description |
|---|---|
protected long |
analyzeSingleFile(String filePath)
Implementation of
DeltaInputReaders to provide a way to read a single file on DFS and provide an
average number of records across N files. |
org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> |
read(int numPartitions,
int numFiles,
double percentageRecordsPerFile) |
org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> |
read(int numPartitions,
int numFiles,
long approxNumRecords) |
org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> |
read(int numPartitions,
long approxNumRecords) |
org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> |
read(long totalRecordsToRead)
Attempts to reads an approximate number of records close to approxNumRecords.
|
getFilePathsToRead, getFileStatusIndexRangepublic DFSAvroDeltaInputReader(org.apache.spark.sql.SparkSession sparkSession,
String schemaStr,
String basePath,
org.apache.hudi.common.util.Option<String> structName,
org.apache.hudi.common.util.Option<String> nameSpace)
public org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> read(long totalRecordsToRead)
throws IOException
DeltaInputReaderIOExceptionpublic org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> read(int numPartitions,
long approxNumRecords)
throws IOException
IOException - Attempts to read approx number of records (exact if equal or more records available)
across requested number of
partitions.public org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> read(int numPartitions,
int numFiles,
long approxNumRecords)
throws IOException
IOException - Attempts to read approx number of records (exact if equal or more records available)
across requested number of
partitions and number of files.
1. Find numFiles across numPartitions
2. numRecordsToReadPerFile = approxNumRecords / numFilespublic org.apache.spark.api.java.JavaRDD<org.apache.avro.generic.GenericRecord> read(int numPartitions,
int numFiles,
double percentageRecordsPerFile)
throws IOException
IOException - Attempts to a % of records per file across requested number of partitions and number of files.
1. Find numFiles across numPartitions
2. numRecordsToReadPerFile = approxNumRecordsPerFile * percentageRecordsPerFileprotected long analyzeSingleFile(String filePath)
DFSDeltaInputReaderDeltaInputReaders to provide a way to read a single file on DFS and provide an
average number of records across N files.analyzeSingleFile in class DFSDeltaInputReaderCopyright © 2023 The Apache Software Foundation. All rights reserved.