Class AvroParquetOutputFormat<T>


  • public class AvroParquetOutputFormat<T>
    extends ParquetOutputFormat<T>
    A Hadoop OutputFormat for Parquet files.
    • Constructor Detail

      • AvroParquetOutputFormat

        public AvroParquetOutputFormat()
    • Method Detail

      • setSchema

        public static void setSchema​(org.apache.hadoop.mapreduce.Job job,
                                     org.apache.avro.Schema schema)
        Set the Avro schema to use for writing. The schema is translated into a Parquet schema so that the records can be written in Parquet format. It is also stored in the Parquet metadata so that records can be reconstructed as Avro objects at read time without specifying a read schema.
        Parameters:
        job - a job
        schema - a schema for the data that will be written
        See Also:
        AvroParquetInputFormat.setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
      • setAvroDataSupplier

        public static void setAvroDataSupplier​(org.apache.hadoop.mapreduce.Job job,
                                               Class<? extends AvroDataSupplier> supplierClass)
        Sets the AvroDataSupplier class that will be used. The data supplier provides instances of GenericData that are used to deconstruct records.
        Parameters:
        job - a Job to configure
        supplierClass - a supplier class