Package org.apache.parquet.avro
Class AvroParquetOutputFormat<T>
- java.lang.Object
-
- org.apache.hadoop.mapreduce.OutputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Void,T>
-
- org.apache.parquet.hadoop.ParquetOutputFormat<T>
-
- org.apache.parquet.avro.AvroParquetOutputFormat<T>
-
public class AvroParquetOutputFormat<T> extends org.apache.parquet.hadoop.ParquetOutputFormat<T>A HadoopOutputFormatfor Parquet files.
-
-
Field Summary
-
Fields inherited from class org.apache.parquet.hadoop.ParquetOutputFormat
BLOCK_SIZE, BLOOM_FILTER_ENABLED, BLOOM_FILTER_EXPECTED_NDV, BLOOM_FILTER_FPP, BLOOM_FILTER_MAX_BYTES, COLUMN_INDEX_TRUNCATE_LENGTH, COMPRESSION, DICTIONARY_PAGE_SIZE, ENABLE_DICTIONARY, ENABLE_JOB_SUMMARY, ESTIMATE_PAGE_SIZE_CHECK, JOB_SUMMARY_LEVEL, MAX_PADDING_BYTES, MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK, MEMORY_POOL_RATIO, MIN_MEMORY_ALLOCATION, MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK, PAGE_ROW_COUNT_LIMIT, PAGE_SIZE, PAGE_WRITE_CHECKSUM_ENABLED, STATISTICS_TRUNCATE_LENGTH, VALIDATION, WRITE_SUPPORT_CLASS, WRITER_VERSION
-
-
Constructor Summary
Constructors Constructor Description AvroParquetOutputFormat()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidsetAvroDataSupplier(org.apache.hadoop.mapreduce.Job job, Class<? extends AvroDataSupplier> supplierClass)Sets theAvroDataSupplierclass that will be used.static voidsetSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)Set the Avro schema to use for writing.-
Methods inherited from class org.apache.parquet.hadoop.ParquetOutputFormat
createEncryptionProperties, getBlockSize, getBlockSize, getBloomFilterEnabled, getBloomFilterMaxBytes, getCompression, getCompression, getDictionaryPageSize, getDictionaryPageSize, getEnableDictionary, getEnableDictionary, getEstimatePageSizeCheck, getJobSummaryLevel, getLongBlockSize, getMaxRowCountForPageSizeCheck, getMemoryManager, getMinRowCountForPageSizeCheck, getOutputCommitter, getPageSize, getPageSize, getPageWriteChecksumEnabled, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getValidation, getValidation, getWriterVersion, getWriteSupport, getWriteSupportClass, isCompressionSet, isCompressionSet, setBlockSize, setColumnIndexTruncateLength, setColumnIndexTruncateLength, setCompression, setDictionaryPageSize, setEnableDictionary, setMaxPaddingSize, setMaxPaddingSize, setPageRowCountLimit, setPageRowCountLimit, setPageSize, setPageWriteChecksumEnabled, setPageWriteChecksumEnabled, setStatisticsTruncateLength, setValidation, setValidation, setWriteSupportClass, setWriteSupportClass
-
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
-
-
-
-
Method Detail
-
setSchema
public static void setSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)Set the Avro schema to use for writing. The schema is translated into a Parquet schema so that the records can be written in Parquet format. It is also stored in the Parquet metadata so that records can be reconstructed as Avro objects at read time without specifying a read schema.- Parameters:
job- a jobschema- a schema for the data that will be written- See Also:
AvroParquetInputFormat.setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
-
setAvroDataSupplier
public static void setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job, Class<? extends AvroDataSupplier> supplierClass)Sets theAvroDataSupplierclass that will be used. The data supplier provides instances ofGenericDatathat are used to deconstruct records.- Parameters:
job- aJobto configuresupplierClass- a supplier class
-
-