Class AvroParquetOutputFormat<T>


  • public class AvroParquetOutputFormat<T>
    extends org.apache.parquet.hadoop.ParquetOutputFormat<T>
    A Hadoop OutputFormat for Parquet files.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.parquet.hadoop.ParquetOutputFormat

        org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel
      • Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

        org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
    • Field Summary

      • Fields inherited from class org.apache.parquet.hadoop.ParquetOutputFormat

        BLOCK_SIZE, BLOOM_FILTER_ENABLED, BLOOM_FILTER_EXPECTED_NDV, BLOOM_FILTER_FPP, BLOOM_FILTER_MAX_BYTES, COLUMN_INDEX_TRUNCATE_LENGTH, COMPRESSION, DICTIONARY_PAGE_SIZE, ENABLE_DICTIONARY, ENABLE_JOB_SUMMARY, ESTIMATE_PAGE_SIZE_CHECK, JOB_SUMMARY_LEVEL, MAX_PADDING_BYTES, MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK, MEMORY_POOL_RATIO, MIN_MEMORY_ALLOCATION, MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK, PAGE_ROW_COUNT_LIMIT, PAGE_SIZE, PAGE_WRITE_CHECKSUM_ENABLED, STATISTICS_TRUNCATE_LENGTH, VALIDATION, WRITE_SUPPORT_CLASS, WRITER_VERSION
      • Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

        BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void setAvroDataSupplier​(org.apache.hadoop.mapreduce.Job job, Class<? extends AvroDataSupplier> supplierClass)
      Sets the AvroDataSupplier class that will be used.
      static void setSchema​(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
      Set the Avro schema to use for writing.
      • Methods inherited from class org.apache.parquet.hadoop.ParquetOutputFormat

        createEncryptionProperties, getBlockSize, getBlockSize, getBloomFilterEnabled, getBloomFilterMaxBytes, getCompression, getCompression, getDictionaryPageSize, getDictionaryPageSize, getEnableDictionary, getEnableDictionary, getEstimatePageSizeCheck, getJobSummaryLevel, getLongBlockSize, getMaxRowCountForPageSizeCheck, getMemoryManager, getMinRowCountForPageSizeCheck, getOutputCommitter, getPageSize, getPageSize, getPageWriteChecksumEnabled, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getRecordWriter, getValidation, getValidation, getWriterVersion, getWriteSupport, getWriteSupportClass, isCompressionSet, isCompressionSet, setBlockSize, setColumnIndexTruncateLength, setColumnIndexTruncateLength, setCompression, setDictionaryPageSize, setEnableDictionary, setMaxPaddingSize, setMaxPaddingSize, setPageRowCountLimit, setPageRowCountLimit, setPageSize, setPageWriteChecksumEnabled, setPageWriteChecksumEnabled, setStatisticsTruncateLength, setValidation, setValidation, setWriteSupportClass, setWriteSupportClass
      • Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

        checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
    • Constructor Detail

      • AvroParquetOutputFormat

        public AvroParquetOutputFormat()
    • Method Detail

      • setSchema

        public static void setSchema​(org.apache.hadoop.mapreduce.Job job,
                                     org.apache.avro.Schema schema)
        Set the Avro schema to use for writing. The schema is translated into a Parquet schema so that the records can be written in Parquet format. It is also stored in the Parquet metadata so that records can be reconstructed as Avro objects at read time without specifying a read schema.
        Parameters:
        job - a job
        schema - a schema for the data that will be written
        See Also:
        AvroParquetInputFormat.setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
      • setAvroDataSupplier

        public static void setAvroDataSupplier​(org.apache.hadoop.mapreduce.Job job,
                                               Class<? extends AvroDataSupplier> supplierClass)
        Sets the AvroDataSupplier class that will be used. The data supplier provides instances of GenericData that are used to deconstruct records.
        Parameters:
        job - a Job to configure
        supplierClass - a supplier class