Class AvroUtils


  • public class AvroUtils
    extends java.lang.Object
    Utils to convert AVRO records to Beam rows. Imposes a mapping between common avro types and Beam portable schemas (https://s.apache.org/beam-schemas):
       Avro                Beam Field Type
       INT         <-----> INT32
       LONG        <-----> INT64
       FLOAT       <-----> FLOAT
       DOUBLE      <-----> DOUBLE
       BOOLEAN     <-----> BOOLEAN
       STRING      <-----> STRING
       BYTES       <-----> BYTES
                   <------ LogicalType(urn="beam:logical_type:var_bytes:v1")
       FIXED       <-----> LogicalType(urn="beam:logical_type:fixed_bytes:v1")
       ARRAY       <-----> ARRAY
       ENUM        <-----> LogicalType(EnumerationType)
       MAP         <-----> MAP
       RECORD      <-----> ROW
       UNION       <-----> LogicalType(OneOfType)
       LogicalTypes.Date              <-----> LogicalType(DATE)
                                      <------ LogicalType(urn="beam:logical_type:date:v1")
       LogicalTypes.TimestampMillis   <-----> DATETIME
       LogicalTypes.Decimal           <-----> DECIMAL
     
    For SQL CHAR/VARCHAR types, an Avro schema
       LogicalType({"type":"string","logicalType":"char","maxLength":MAX_LENGTH}) or
       LogicalType({"type":"string","logicalType":"varchar","maxLength":MAX_LENGTH})
     
    is used.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void addLogicalTypeConversions​(org.apache.avro.generic.GenericData data)  
      static @PolyNull java.lang.Object convertAvroFieldStrict​(@PolyNull java.lang.Object value, org.apache.avro.Schema avroSchema, org.apache.beam.sdk.schemas.Schema.FieldType fieldType)
      Strict conversion from AVRO to Beam, strict because it doesn't do widening or narrowing during conversion.
      static org.apache.beam.sdk.transforms.SimpleFunction<byte[],​org.apache.beam.sdk.values.Row> getAvroBytesToRowFunction​(org.apache.beam.sdk.schemas.Schema beamSchema)
      Returns a function mapping encoded AVRO GenericRecords to Beam Rows.
      static <T> org.apache.beam.sdk.schemas.SchemaUserTypeCreator getCreator​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor, org.apache.beam.sdk.schemas.Schema schema)
      Get an object creator for an AVRO-generated SpecificRecord.
      static <T> java.util.List<org.apache.beam.sdk.schemas.FieldValueTypeInformation> getFieldTypes​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor, org.apache.beam.sdk.schemas.Schema schema)
      Get field types for an AVRO-generated SpecificRecord or a POJO.
      static <T> org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row,​T> getFromRowFunction​(java.lang.Class<T> clazz)  
      static org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,​org.apache.beam.sdk.values.Row> getGenericRecordToRowFunction​(@Nullable org.apache.beam.sdk.schemas.Schema schema)
      Returns a function mapping AVRO GenericRecords to Beam Rows for use in PCollection.setSchema(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.values.TypeDescriptor<T>, org.apache.beam.sdk.transforms.SerializableFunction<T, org.apache.beam.sdk.values.Row>, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row, T>).
      static <T> java.util.List<org.apache.beam.sdk.schemas.FieldValueGetter<@NonNull T,​java.lang.Object>> getGetters​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor, org.apache.beam.sdk.schemas.Schema schema)
      Get generated getters for an AVRO-generated SpecificRecord or a POJO.
      static org.apache.beam.sdk.transforms.SimpleFunction<org.apache.beam.sdk.values.Row,​byte[]> getRowToAvroBytesFunction​(org.apache.beam.sdk.schemas.Schema beamSchema)
      Returns a function mapping Beam Rows to encoded AVRO GenericRecords.
      static org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row,​org.apache.avro.generic.GenericRecord> getRowToGenericRecordFunction​(@Nullable org.apache.avro.Schema avroSchema)
      Returns a function mapping Beam Rows to AVRO GenericRecords for use in PCollection.setSchema(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.values.TypeDescriptor<T>, org.apache.beam.sdk.transforms.SerializableFunction<T, org.apache.beam.sdk.values.Row>, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row, T>).
      static <T> @Nullable org.apache.beam.sdk.schemas.Schema getSchema​(java.lang.Class<T> clazz, @Nullable org.apache.avro.Schema schema)  
      static <T> org.apache.beam.sdk.transforms.SerializableFunction<T,​org.apache.beam.sdk.values.Row> getToRowFunction​(java.lang.Class<T> clazz, org.apache.avro.Schema schema)  
      static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(java.lang.Class<T> clazz)
      Returns an SchemaCoder instance for the provided element class.
      static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(java.lang.Class<T> clazz, org.apache.avro.Schema schema)
      Returns an SchemaCoder instance for the provided element type using the provided Avro schema.
      static org.apache.beam.sdk.schemas.SchemaCoder<org.apache.avro.generic.GenericRecord> schemaCoder​(org.apache.avro.Schema schema)
      Returns an SchemaCoder instance for the Avro schema.
      static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(AvroCoder<T> avroCoder)
      Returns an SchemaCoder instance based on the provided AvroCoder for the element type.
      static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(org.apache.beam.sdk.values.TypeDescriptor<T> type)
      Returns an SchemaCoder instance for the provided element type.
      static org.apache.avro.Schema.Field toAvroField​(org.apache.beam.sdk.schemas.Schema.Field field, java.lang.String namespace)
      Get Avro Field from Beam Field.
      static org.apache.avro.Schema toAvroSchema​(org.apache.beam.sdk.schemas.Schema beamSchema)  
      static org.apache.avro.Schema toAvroSchema​(org.apache.beam.sdk.schemas.Schema beamSchema, @Nullable java.lang.String name, @Nullable java.lang.String namespace)
      Converts a Beam Schema into an AVRO schema.
      static org.apache.beam.sdk.schemas.Schema.Field toBeamField​(org.apache.avro.Schema.Field field)
      Get Beam Field from avro Field.
      static org.apache.beam.sdk.values.Row toBeamRowStrict​(org.apache.avro.generic.GenericRecord record, @Nullable org.apache.beam.sdk.schemas.Schema schema)
      Strict conversion from AVRO to Beam, strict because it doesn't do widening or narrowing during conversion.
      static org.apache.beam.sdk.schemas.Schema toBeamSchema​(java.lang.Class<?> clazz)
      Converts AVRO schema to Beam row schema.
      static org.apache.beam.sdk.schemas.Schema toBeamSchema​(org.apache.avro.Schema schema)
      Converts AVRO schema to Beam row schema.
      static org.apache.avro.generic.GenericRecord toGenericRecord​(org.apache.beam.sdk.values.Row row)
      Convert from a Beam Row to an AVRO GenericRecord.
      static org.apache.avro.generic.GenericRecord toGenericRecord​(org.apache.beam.sdk.values.Row row, @Nullable org.apache.avro.Schema avroSchema)
      Convert from a Beam Row to an AVRO GenericRecord.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • addLogicalTypeConversions

        public static void addLogicalTypeConversions​(org.apache.avro.generic.GenericData data)
      • toBeamField

        public static org.apache.beam.sdk.schemas.Schema.Field toBeamField​(org.apache.avro.Schema.Field field)
        Get Beam Field from avro Field.
      • toAvroField

        public static org.apache.avro.Schema.Field toAvroField​(org.apache.beam.sdk.schemas.Schema.Field field,
                                                               java.lang.String namespace)
        Get Avro Field from Beam Field.
      • toBeamSchema

        public static org.apache.beam.sdk.schemas.Schema toBeamSchema​(java.lang.Class<?> clazz)
        Converts AVRO schema to Beam row schema.
        Parameters:
        clazz - avro class
      • toBeamSchema

        public static org.apache.beam.sdk.schemas.Schema toBeamSchema​(org.apache.avro.Schema schema)
        Converts AVRO schema to Beam row schema.
        Parameters:
        schema - schema of type RECORD
      • toAvroSchema

        public static org.apache.avro.Schema toAvroSchema​(org.apache.beam.sdk.schemas.Schema beamSchema,
                                                          @Nullable java.lang.String name,
                                                          @Nullable java.lang.String namespace)
        Converts a Beam Schema into an AVRO schema.
      • toAvroSchema

        public static org.apache.avro.Schema toAvroSchema​(org.apache.beam.sdk.schemas.Schema beamSchema)
      • toBeamRowStrict

        public static org.apache.beam.sdk.values.Row toBeamRowStrict​(org.apache.avro.generic.GenericRecord record,
                                                                     @Nullable org.apache.beam.sdk.schemas.Schema schema)
        Strict conversion from AVRO to Beam, strict because it doesn't do widening or narrowing during conversion. If Schema is not provided, one is inferred from the AVRO schema.
      • toGenericRecord

        public static org.apache.avro.generic.GenericRecord toGenericRecord​(org.apache.beam.sdk.values.Row row)
        Convert from a Beam Row to an AVRO GenericRecord. The Avro Schema is inferred from the Beam schema on the row.
      • toGenericRecord

        public static org.apache.avro.generic.GenericRecord toGenericRecord​(org.apache.beam.sdk.values.Row row,
                                                                            @Nullable org.apache.avro.Schema avroSchema)
        Convert from a Beam Row to an AVRO GenericRecord. If a Schema is not provided, one is inferred from the Beam schema on the row.
      • getToRowFunction

        public static <T> org.apache.beam.sdk.transforms.SerializableFunction<T,​org.apache.beam.sdk.values.Row> getToRowFunction​(java.lang.Class<T> clazz,
                                                                                                                                       org.apache.avro.Schema schema)
      • getFromRowFunction

        public static <T> org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row,​T> getFromRowFunction​(java.lang.Class<T> clazz)
      • getSchema

        public static <T> @Nullable org.apache.beam.sdk.schemas.Schema getSchema​(java.lang.Class<T> clazz,
                                                                                 @Nullable org.apache.avro.Schema schema)
      • getAvroBytesToRowFunction

        public static org.apache.beam.sdk.transforms.SimpleFunction<byte[],​org.apache.beam.sdk.values.Row> getAvroBytesToRowFunction​(org.apache.beam.sdk.schemas.Schema beamSchema)
        Returns a function mapping encoded AVRO GenericRecords to Beam Rows.
      • getRowToAvroBytesFunction

        public static org.apache.beam.sdk.transforms.SimpleFunction<org.apache.beam.sdk.values.Row,​byte[]> getRowToAvroBytesFunction​(org.apache.beam.sdk.schemas.Schema beamSchema)
        Returns a function mapping Beam Rows to encoded AVRO GenericRecords.
      • getGenericRecordToRowFunction

        public static org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,​org.apache.beam.sdk.values.Row> getGenericRecordToRowFunction​(@Nullable org.apache.beam.sdk.schemas.Schema schema)
        Returns a function mapping AVRO GenericRecords to Beam Rows for use in PCollection.setSchema(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.values.TypeDescriptor<T>, org.apache.beam.sdk.transforms.SerializableFunction<T, org.apache.beam.sdk.values.Row>, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row, T>).
      • getRowToGenericRecordFunction

        public static org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row,​org.apache.avro.generic.GenericRecord> getRowToGenericRecordFunction​(@Nullable org.apache.avro.Schema avroSchema)
        Returns a function mapping Beam Rows to AVRO GenericRecords for use in PCollection.setSchema(org.apache.beam.sdk.schemas.Schema, org.apache.beam.sdk.values.TypeDescriptor<T>, org.apache.beam.sdk.transforms.SerializableFunction<T, org.apache.beam.sdk.values.Row>, org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.Row, T>).
      • schemaCoder

        public static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(org.apache.beam.sdk.values.TypeDescriptor<T> type)
        Returns an SchemaCoder instance for the provided element type.
        Type Parameters:
        T - the element type
      • schemaCoder

        public static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(java.lang.Class<T> clazz)
        Returns an SchemaCoder instance for the provided element class.
        Type Parameters:
        T - the element type
      • schemaCoder

        public static org.apache.beam.sdk.schemas.SchemaCoder<org.apache.avro.generic.GenericRecord> schemaCoder​(org.apache.avro.Schema schema)
        Returns an SchemaCoder instance for the Avro schema. The implicit type is GenericRecord.
      • schemaCoder

        public static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(java.lang.Class<T> clazz,
                                                                                 org.apache.avro.Schema schema)
        Returns an SchemaCoder instance for the provided element type using the provided Avro schema.

        If the type argument is GenericRecord, the schema may be arbitrary. Otherwise, the schema must correspond to the type provided.

        Type Parameters:
        T - the element type
      • schemaCoder

        public static <T> org.apache.beam.sdk.schemas.SchemaCoder<T> schemaCoder​(AvroCoder<T> avroCoder)
        Returns an SchemaCoder instance based on the provided AvroCoder for the element type.
        Type Parameters:
        T - the element type
      • getFieldTypes

        public static <T> java.util.List<org.apache.beam.sdk.schemas.FieldValueTypeInformation> getFieldTypes​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor,
                                                                                                              org.apache.beam.sdk.schemas.Schema schema)
        Get field types for an AVRO-generated SpecificRecord or a POJO.
      • getGetters

        public static <T> java.util.List<org.apache.beam.sdk.schemas.FieldValueGetter<@NonNull T,​java.lang.Object>> getGetters​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor,
                                                                                                                                     org.apache.beam.sdk.schemas.Schema schema)
        Get generated getters for an AVRO-generated SpecificRecord or a POJO.
      • getCreator

        public static <T> org.apache.beam.sdk.schemas.SchemaUserTypeCreator getCreator​(org.apache.beam.sdk.values.TypeDescriptor<T> typeDescriptor,
                                                                                       org.apache.beam.sdk.schemas.Schema schema)
        Get an object creator for an AVRO-generated SpecificRecord.
      • convertAvroFieldStrict

        public static @PolyNull java.lang.Object convertAvroFieldStrict​(@PolyNull java.lang.Object value,
                                                                        @Nonnull
                                                                        org.apache.avro.Schema avroSchema,
                                                                        @Nonnull
                                                                        org.apache.beam.sdk.schemas.Schema.FieldType fieldType)
        Strict conversion from AVRO to Beam, strict because it doesn't do widening or narrowing during conversion.
        Parameters:
        value - GenericRecord or any nested value
        avroSchema - schema for value
        fieldType - target beam field type
        Returns:
        value converted for Row