Class AvroIO
- java.lang.Object
-
- org.apache.beam.sdk.extensions.avro.io.AvroIO
-
public class AvroIO extends java.lang.ObjectPTransforms for reading and writing Avro files.Reading Avro files
To read a
PCollectionfrom one or more Avro files with the same schema known at pipeline construction time, useread(java.lang.Class<T>), usingAvroIO.Read.from(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)to specify the filename or filepattern to read from. If the filepatterns to be read are themselves in aPCollectionyou can useFileIOto match them andreadFiles(java.lang.Class<T>)to read them. If the schema is unknown at pipeline construction time, useparseGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord, T>)orparseFilesGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord, T>).Many configuration options below apply to several or all of these transforms.
See
FileSystemsfor information on supported file systems and filepatterns.Filepattern expansion and watching
By default, the filepatterns are expanded only once.
AvroIO.Read.watchForNewFiles(org.joda.time.Duration, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String, ?>, boolean)or the combination ofFileIO.Match.continuously(Duration, TerminationCondition)andreadFiles(Class)allow streaming of new files matching the filepattern(s).By default,
read(java.lang.Class<T>)prohibits filepatterns that match no files, andreadFiles(Class)allows them in case the filepattern contains a glob wildcard character. UseAvroIO.Read.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment)orFileIO.Match.withEmptyMatchTreatment(EmptyMatchTreatment)plusreadFiles(Class)to configure this behavior.Reading records of a known schema
To read specific records, such as Avro-generated classes, use
read(Class). To readGenericRecords, usereadGenericRecords(Schema)which takes aSchemaobject, orreadGenericRecords(String)which takes an Avro schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema. Likewise, to read aPCollectionof filepatterns, applyFileIOmatching plusreadFilesGenericRecords(org.apache.avro.Schema).For example:
Pipeline p = ...; // Read Avro-generated classes from files on GCS PCollection<AvroAutoGenClass> records = p.apply(AvroIO.read(AvroAutoGenClass.class).from("gs://my_bucket/path/to/records-*.avro")); // Read GenericRecord's of the given schema from files on GCS Schema schema = new Schema.Parser().parse(new File("schema.avsc")); PCollection<GenericRecord> records = p.apply(AvroIO.readGenericRecords(schema) .from("gs://my_bucket/path/to/records-*.avro"));Reading records of an unknown schema
To read records from files whose schema is unknown at pipeline construction time or differs between files, use
parseGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord, T>)- in this case, you will need to specify a parsing function for converting eachGenericRecordinto a value of your custom type. Likewise, to read aPCollectionof filepatterns with unknown schema, useFileIOmatching plusparseFilesGenericRecords(SerializableFunction).For example:
Pipeline p = ...; PCollection<Foo> records = p.apply(AvroIO.parseGenericRecords(new SerializableFunction<GenericRecord, Foo>() { public Foo apply(GenericRecord record) { // If needed, access the schema of the record using record.getSchema() return ...; } }));Reading from a
PCollectionof filepatternsPipeline p = ...; PCollection<String> filepatterns = p.apply(...); PCollection<AvroAutoGenClass> records = filepatterns.apply(AvroIO.readAll(AvroAutoGenClass.class)); PCollection<AvroAutoGenClass> records = filepatterns .apply(FileIO.matchAll()) .apply(FileIO.readMatches()) .apply(AvroIO.readFiles(AvroAutoGenClass.class)); PCollection<GenericRecord> genericRecords = filepatterns.apply(AvroIO.readGenericRecords(schema)); PCollection<Foo> records = filepatterns .apply(FileIO.matchAll()) .apply(FileIO.readMatches()) .apply(AvroIO.parseFilesGenericRecords(new SerializableFunction...);Streaming new files matching a filepattern
Pipeline p = ...; PCollection<AvroAutoGenClass> lines = p.apply(AvroIO .read(AvroAutoGenClass.class) .from("gs://my_bucket/path/to/records-*.avro") .watchForNewFiles( // Check for new files every minute Duration.standardMinutes(1), // Stop watching the filepattern if no new files appear within an hour afterTimeSinceNewOutput(Duration.standardHours(1))));Reading a very large number of files
If it is known that the filepattern will match a very large number of files (e.g. tens of thousands or more), use
AvroIO.Read.withHintMatchesManyFiles()for better performance and scalability. Note that it may decrease performance if the filepattern matches only a small number of files.Inferring Beam schemas from Avro files
If you want to use SQL or schema based operations on an Avro-based PCollection, you must configure the read transform to infer the Beam schema and automatically setup the Beam related coders by doing:
PCollection<AvroAutoGenClass> records = p.apply(AvroIO.read(...).from(...).withBeamSchemas(true));Inferring Beam schemas from Avro PCollections
If you created an Avro-based PCollection by other means e.g. reading records from Kafka or as the output of another PTransform, you may be interested on making your PCollection schema-aware so you can use the Schema-based APIs or Beam's SqlTransform.
If you are using Avro specific records (generated classes from an Avro schema), you can register a schema provider for the specific Avro class to make any PCollection of these objects schema-aware.
You can also manually set an Avro-backed Schema coder for a PCollection usingpipeline.getSchemaRegistry().registerSchemaProvider(AvroAutoGenClass.class, AvroAutoGenClass.getClassSchema());AvroUtils.schemaCoder(Class, Schema)to make it schema-aware.PCollection<AvroAutoGenClass> records = ... AvroCoder<AvroAutoGenClass> coder = (AvroCoder<AvroAutoGenClass>) users.getCoder(); records.setCoder(AvroUtils.schemaCoder(coder.getType(), coder.getSchema()));If you are using GenericRecords you may need to set a specific Beam schema coder for each PCollection to match their internal Avro schema.
org.apache.avro.Schema avroSchema = ... PCollection<GenericRecord> records = ... records.setCoder(AvroUtils.schemaCoder(avroSchema));Writing Avro files
To write a
PCollectionto one or more Avro files, useAvroIO.Write, usingAvroIO.write().to(String)to specify the output filename prefix. The defaultDefaultFilenamePolicywill use this prefix, in conjunction with aShardNameTemplate(set viaAvroIO.Write.withShardNameTemplate(String)) and optional filename suffix (set viaAvroIO.Write.withSuffix(String), to generate output filenames in a sharded way. You can override this default write filename policy usingAvroIO.Write.to(FilenamePolicy)to specify a custom file naming policy.By default,
AvroIO.Writeproduces output files that are compressed using theCodecFactory.snappyCodec(). This default can be changed or overridden usingAvroIO.Write.withCodec(org.apache.avro.file.CodecFactory).Writing specific or generic records
To write specific records, such as Avro-generated classes, use
write(Class). To writeGenericRecords, use eitherwriteGenericRecords(Schema)which takes aSchemaobject, orwriteGenericRecords(String)which takes a schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema.For example:
// A simple Write to a local file (only runs locally): PCollection<AvroAutoGenClass> records = ...; records.apply(AvroIO.write(AvroAutoGenClass.class).to("/path/to/file.avro")); // A Write to a sharded GCS file (runs locally and using remote execution): Schema schema = new Schema.Parser().parse(new File("schema.avsc")); PCollection<GenericRecord> records = ...; records.apply("WriteToAvro", AvroIO.writeGenericRecords(schema) .to("gs://my_bucket/path/to/numbers") .withSuffix(".avro"));Writing windowed or unbounded data
By default, all input is put into the global window before writing. If per-window writes are desired - for example, when using a streaming runner -
AvroIO.Write.withWindowedWrites()will cause windowing and triggering to be preserved. When producing windowed writes with a streaming runner that supports triggers, the number of output shards must be set explicitly usingAvroIO.Write.withNumShards(int); some runners may set this for you to a runner-chosen value, so you may need not set it yourself. AFileBasedSink.FilenamePolicymust be set, and unique windows and triggers must produce unique filenames.Writing data to multiple destinations
The following shows a more-complex example of AvroIO.Write usage, generating dynamic file destinations as well as a dynamic Avro schema per file. In this example, a PCollection of user events (e.g. actions on a website) is written out to Avro files. Each event contains the user id as an integer field. We want events for each user to go into a specific directory for that user, and each user's data should be written with a specific schema for that user; a side input is used, so the schema can be calculated in a different stage.
// This is the user class that controls dynamic destinations for this avro write. The input to // AvroIO.Write will be UserEvent, and we will be writing GenericRecords to the file (in order // to have dynamic schemas). Everything is per userid, so we define a dynamic destination type // of Integer. class UserDynamicAvroDestinations extends DynamicAvroDestinations<UserEvent, Integer, GenericRecord> { private final PCollectionView<Map<Integer, String>> userToSchemaMap; public UserDynamicAvroDestinations( PCollectionView<Map<Integer, String>> userToSchemaMap) { this.userToSchemaMap = userToSchemaMap; } public GenericRecord formatRecord(UserEvent record) { return formatUserRecord(record, getSchema(record.getUserId())); } public Schema getSchema(Integer userId) { return new Schema.Parser().parse(sideInput(userToSchemaMap).get(userId)); } public Integer getDestination(UserEvent record) { return record.getUserId(); } public Integer getDefaultDestination() { return 0; } public FilenamePolicy getFilenamePolicy(Integer userId) { return DefaultFilenamePolicy.fromParams(new Params().withBaseFilename(baseDir + "/user-" + userId + "/events")); } public List<PCollectionView<?>> getSideInputs() { return ImmutableList.<PCollectionView<?>>of(userToSchemaMap); } } PCollection<UserEvents> events = ...; PCollectionView<Map<Integer, String>> userToSchemaMap = events.apply( "ComputePerUserSchemas", new ComputePerUserSchemas()); events.apply("WriteAvros", AvroIO.<Integer>writeCustomTypeToGenericRecords() .to(new UserDynamicAvroDestinations(userToSchemaMap)));Error handling for writing records that are malformed can be handled by using
AvroIO.TypedWrite.withBadRecordErrorHandler(ErrorHandler). See documentation inFileIOfor details on usage
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classAvroIO.Parse<T>static classAvroIO.ParseAll<T>Deprecated.SeeparseAllGenericRecords(SerializableFunction)for details.static classAvroIO.ParseFiles<T>static classAvroIO.Read<T>Implementation ofread(java.lang.Class<T>)andreadGenericRecords(org.apache.avro.Schema).static classAvroIO.ReadAll<T>Deprecated.SeereadAll(Class)for details.static classAvroIO.ReadFiles<T>Implementation ofreadFiles(java.lang.Class<T>).static interfaceAvroIO.RecordFormatter<ElementT>Deprecated.Users can achieve the same by providing this transform in aParDobefore using write in AvroIOwrite(Class).static classAvroIO.Sink<ElementT>static classAvroIO.TypedWrite<UserT,DestinationT,OutputT>Implementation ofwrite(java.lang.Class<T>).static classAvroIO.Write<T>This class is used as the default return value ofwrite(java.lang.Class<T>)
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static <UserT,OutputT>
DynamicAvroDestinations<UserT,java.lang.Void,OutputT>constantDestinations(org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy filenamePolicy, org.apache.avro.Schema schema, java.util.Map<java.lang.String,java.lang.Object> metadata, org.apache.avro.file.CodecFactory codec, org.apache.beam.sdk.transforms.SerializableFunction<UserT,OutputT> formatFunction)Returns aDynamicAvroDestinationsthat always returns the sameFileBasedSink.FilenamePolicy, schema, metadata, and codec.static <UserT,OutputT>
DynamicAvroDestinations<UserT,java.lang.Void,OutputT>constantDestinations(org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy filenamePolicy, org.apache.avro.Schema schema, java.util.Map<java.lang.String,java.lang.Object> metadata, org.apache.avro.file.CodecFactory codec, org.apache.beam.sdk.transforms.SerializableFunction<UserT,OutputT> formatFunction, @Nullable AvroSink.DatumWriterFactory<OutputT> datumWriterFactory)Returns aDynamicAvroDestinationsthat always returns the sameFileBasedSink.FilenamePolicy, schema, metadata, and codec.static <T> AvroIO.ParseAll<T>parseAllGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)Deprecated.You can achieve The functionality ofparseAllGenericRecords(SerializableFunction)usingFileIOmatching plusparseFilesGenericRecords(SerializableFunction)()}.static <T> AvroIO.ParseFiles<T>parseFilesGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)LikeparseGenericRecords(SerializableFunction), but reads eachFileIO.ReadableFilein the inputPCollection.static <T> AvroIO.Parse<T>parseGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)Reads Avro file(s) containing records of an unspecified schema and converting each record to a custom type.static <T> AvroIO.Read<T>read(java.lang.Class<T> recordClass)Reads records of the given type from an Avro file (or multiple Avro files matching a pattern).static <T> AvroIO.ReadAll<T>readAll(java.lang.Class<T> recordClass)Deprecated.You can achieve The functionality ofreadAll(java.lang.Class<T>)usingFileIOmatching plusreadFiles(Class).static AvroIO.ReadAll<org.apache.avro.generic.GenericRecord>readAllGenericRecords(java.lang.String schema)Deprecated.You can achieve The functionality ofreadAllGenericRecords(String)usingFileIOmatching plusreadFilesGenericRecords(String).static AvroIO.ReadAll<org.apache.avro.generic.GenericRecord>readAllGenericRecords(org.apache.avro.Schema schema)Deprecated.You can achieve The functionality ofreadAllGenericRecords(Schema)usingFileIOmatching plusreadFilesGenericRecords(Schema).static <T> AvroIO.ReadFiles<T>readFiles(java.lang.Class<T> recordClass)Likeread(java.lang.Class<T>), but reads each file in aPCollectionofFileIO.ReadableFile, returned byFileIO.readMatches().static AvroIO.ReadFiles<org.apache.avro.generic.GenericRecord>readFilesGenericRecords(java.lang.String schema)LikereadGenericRecords(String), but forFileIO.ReadableFilecollections.static AvroIO.ReadFiles<org.apache.avro.generic.GenericRecord>readFilesGenericRecords(org.apache.avro.Schema schema)LikereadGenericRecords(Schema), but for aPCollectionofFileIO.ReadableFile, for example, returned byFileIO.readMatches().static AvroIO.Read<org.apache.avro.generic.GenericRecord>readGenericRecords(java.lang.String schema)Reads Avro file(s) containing records of the specified schema.static AvroIO.Read<org.apache.avro.generic.GenericRecord>readGenericRecords(org.apache.avro.Schema schema)Reads Avro file(s) containing records of the specified schema.static <ElementT> AvroIO.Sink<ElementT>sink(java.lang.Class<ElementT> clazz)AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements of the given generated class, likewrite(Class).static <ElementT extends org.apache.avro.generic.IndexedRecord>
AvroIO.Sink<ElementT>sink(java.lang.String jsonSchema)AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements with a given (common) schema, likewriteGenericRecords(String).static <ElementT extends org.apache.avro.generic.IndexedRecord>
AvroIO.Sink<ElementT>sink(org.apache.avro.Schema schema)AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements with a given (common) schema, likewriteGenericRecords(Schema).static <ElementT> AvroIO.Sink<ElementT>sinkViaGenericRecords(org.apache.avro.Schema schema, AvroIO.RecordFormatter<ElementT> formatter)Deprecated.RecordFormatter will be removed in future versions.static <T> AvroIO.Write<T>write(java.lang.Class<T> recordClass)Writes aPCollectionto an Avro file (or multiple Avro files matching a sharding pattern).static <UserT> AvroIO.TypedWrite<UserT,java.lang.Void,java.lang.Object>writeCustomType()Deprecated.UsewriteCustomType(Class)instead and provide the custom record classstatic <UserT,OutputT>
AvroIO.TypedWrite<UserT,java.lang.Void,OutputT>writeCustomType(java.lang.Class<OutputT> recordClass)APTransformthat writes aPCollectionto an avro file (or multiple avro files matching a sharding pattern), with each element of the input collection encoded into its own record of type OutputT.static <UserT> AvroIO.TypedWrite<UserT,java.lang.Void,org.apache.avro.generic.GenericRecord>writeCustomTypeToGenericRecords()Similar towriteCustomType(), but specialized for the case where the output type isGenericRecord.static AvroIO.Write<org.apache.avro.generic.GenericRecord>writeGenericRecords(java.lang.String schema)Writes Avro records of the specified schema.static AvroIO.Write<org.apache.avro.generic.GenericRecord>writeGenericRecords(org.apache.avro.Schema schema)Writes Avro records of the specified schema.
-
-
-
Method Detail
-
read
public static <T> AvroIO.Read<T> read(java.lang.Class<T> recordClass)
Reads records of the given type from an Avro file (or multiple Avro files matching a pattern).The schema must be specified using one of the
withSchemafunctions.
-
readFiles
public static <T> AvroIO.ReadFiles<T> readFiles(java.lang.Class<T> recordClass)
Likeread(java.lang.Class<T>), but reads each file in aPCollectionofFileIO.ReadableFile, returned byFileIO.readMatches().
-
readAll
@Deprecated public static <T> AvroIO.ReadAll<T> readAll(java.lang.Class<T> recordClass)
Deprecated.You can achieve The functionality ofreadAll(java.lang.Class<T>)usingFileIOmatching plusreadFiles(Class). This is the preferred method to make composition explicit.AvroIO.ReadAllwill not receive upgrades and will be removed in a future version of Beam.Likeread(java.lang.Class<T>), but reads each filepattern in the inputPCollection.
-
readGenericRecords
public static AvroIO.Read<org.apache.avro.generic.GenericRecord> readGenericRecords(org.apache.avro.Schema schema)
Reads Avro file(s) containing records of the specified schema.
-
readFilesGenericRecords
public static AvroIO.ReadFiles<org.apache.avro.generic.GenericRecord> readFilesGenericRecords(org.apache.avro.Schema schema)
LikereadGenericRecords(Schema), but for aPCollectionofFileIO.ReadableFile, for example, returned byFileIO.readMatches().
-
readAllGenericRecords
@Deprecated public static AvroIO.ReadAll<org.apache.avro.generic.GenericRecord> readAllGenericRecords(org.apache.avro.Schema schema)
Deprecated.You can achieve The functionality ofreadAllGenericRecords(Schema)usingFileIOmatching plusreadFilesGenericRecords(Schema). This is the preferred method to make composition explicit.AvroIO.ReadAllwill not receive upgrades and will be removed in a future version of Beam.LikereadGenericRecords(Schema), but for aPCollectionofFileIO.ReadableFile, for example, returned byFileIO.readMatches().
-
readGenericRecords
public static AvroIO.Read<org.apache.avro.generic.GenericRecord> readGenericRecords(java.lang.String schema)
Reads Avro file(s) containing records of the specified schema. The schema is specified as a JSON-encoded string.
-
readFilesGenericRecords
public static AvroIO.ReadFiles<org.apache.avro.generic.GenericRecord> readFilesGenericRecords(java.lang.String schema)
LikereadGenericRecords(String), but forFileIO.ReadableFilecollections.
-
readAllGenericRecords
@Deprecated public static AvroIO.ReadAll<org.apache.avro.generic.GenericRecord> readAllGenericRecords(java.lang.String schema)
Deprecated.You can achieve The functionality ofreadAllGenericRecords(String)usingFileIOmatching plusreadFilesGenericRecords(String). This is the preferred method to make composition explicit.AvroIO.ReadAllwill not receive upgrades and will be removed in a future version of Beam.LikereadGenericRecords(String), but reads each filepattern in the inputPCollection.
-
parseGenericRecords
public static <T> AvroIO.Parse<T> parseGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)
Reads Avro file(s) containing records of an unspecified schema and converting each record to a custom type.
-
parseFilesGenericRecords
public static <T> AvroIO.ParseFiles<T> parseFilesGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)
LikeparseGenericRecords(SerializableFunction), but reads eachFileIO.ReadableFilein the inputPCollection.
-
parseAllGenericRecords
@Deprecated public static <T> AvroIO.ParseAll<T> parseAllGenericRecords(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.avro.generic.GenericRecord,T> parseFn)
Deprecated.You can achieve The functionality ofparseAllGenericRecords(SerializableFunction)usingFileIOmatching plusparseFilesGenericRecords(SerializableFunction)()}. This is the preferred method to make composition explicit.AvroIO.ParseAllwill not receive upgrades and will be removed in a future version of Beam.LikeparseGenericRecords(SerializableFunction), but reads each filepattern in the inputPCollection.
-
write
public static <T> AvroIO.Write<T> write(java.lang.Class<T> recordClass)
Writes aPCollectionto an Avro file (or multiple Avro files matching a sharding pattern).
-
writeGenericRecords
public static AvroIO.Write<org.apache.avro.generic.GenericRecord> writeGenericRecords(org.apache.avro.Schema schema)
Writes Avro records of the specified schema.
-
writeCustomType
@Deprecated public static <UserT> AvroIO.TypedWrite<UserT,java.lang.Void,java.lang.Object> writeCustomType()
Deprecated.UsewriteCustomType(Class)instead and provide the custom record classAPTransformthat writes aPCollectionto an avro file (or multiple avro files matching a sharding pattern), with each element of the input collection encoded into its own record of type OutputT.This version allows you to apply
AvroIOwrites to a PCollection of a custom typeAvroIO. A format mechanism that converts the input typeAvroIOto the output type that will be written to the file must be specified. If using a customDynamicAvroDestinationsobject this is done usingFileBasedSink.DynamicDestinations.formatRecord(UserT), otherwise theAvroIO.TypedWrite.withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<UserT, OutputT>)can be used to specify a format function.The advantage of using a custom type is that is it allows a user-provided
DynamicAvroDestinationsobject, set viaAvroIO.Write.to(DynamicAvroDestinations)to examine the custom type when choosing a destination.If the output type is
GenericRecordusewriteCustomTypeToGenericRecords()instead.
-
writeCustomType
public static <UserT,OutputT> AvroIO.TypedWrite<UserT,java.lang.Void,OutputT> writeCustomType(java.lang.Class<OutputT> recordClass)
APTransformthat writes aPCollectionto an avro file (or multiple avro files matching a sharding pattern), with each element of the input collection encoded into its own record of type OutputT.This version allows you to apply
AvroIOwrites to a PCollection of a custom typeAvroIO. A format mechanism that converts the input typeAvroIOto the output type that will be written to the file must be specified. If using a customDynamicAvroDestinationsobject this is done usingFileBasedSink.DynamicDestinations.formatRecord(UserT), otherwise theAvroIO.TypedWrite.withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<UserT, OutputT>)can be used to specify a format function.The advantage of using a custom type is that is it allows a user-provided
DynamicAvroDestinationsobject, set viaAvroIO.Write.to(DynamicAvroDestinations)to examine the custom type when choosing a destination.If the output type is
GenericRecordusewriteCustomTypeToGenericRecords()instead.
-
writeCustomTypeToGenericRecords
public static <UserT> AvroIO.TypedWrite<UserT,java.lang.Void,org.apache.avro.generic.GenericRecord> writeCustomTypeToGenericRecords()
Similar towriteCustomType(), but specialized for the case where the output type isGenericRecord. A schema must be specified either inDynamicAvroDestinations.getSchema(DestinationT)or if not using dynamic destinations, by usingAvroIO.TypedWrite.withSchema(Schema).
-
writeGenericRecords
public static AvroIO.Write<org.apache.avro.generic.GenericRecord> writeGenericRecords(java.lang.String schema)
Writes Avro records of the specified schema. The schema is specified as a JSON-encoded string.
-
constantDestinations
public static <UserT,OutputT> DynamicAvroDestinations<UserT,java.lang.Void,OutputT> constantDestinations(org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy filenamePolicy, org.apache.avro.Schema schema, java.util.Map<java.lang.String,java.lang.Object> metadata, org.apache.avro.file.CodecFactory codec, org.apache.beam.sdk.transforms.SerializableFunction<UserT,OutputT> formatFunction)
Returns aDynamicAvroDestinationsthat always returns the sameFileBasedSink.FilenamePolicy, schema, metadata, and codec.
-
constantDestinations
public static <UserT,OutputT> DynamicAvroDestinations<UserT,java.lang.Void,OutputT> constantDestinations(org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy filenamePolicy, org.apache.avro.Schema schema, java.util.Map<java.lang.String,java.lang.Object> metadata, org.apache.avro.file.CodecFactory codec, org.apache.beam.sdk.transforms.SerializableFunction<UserT,OutputT> formatFunction, @Nullable AvroSink.DatumWriterFactory<OutputT> datumWriterFactory)
Returns aDynamicAvroDestinationsthat always returns the sameFileBasedSink.FilenamePolicy, schema, metadata, and codec.
-
sink
public static <ElementT> AvroIO.Sink<ElementT> sink(java.lang.Class<ElementT> clazz)
AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements of the given generated class, likewrite(Class).
-
sink
public static <ElementT extends org.apache.avro.generic.IndexedRecord> AvroIO.Sink<ElementT> sink(org.apache.avro.Schema schema)
AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements with a given (common) schema, likewriteGenericRecords(Schema).
-
sink
public static <ElementT extends org.apache.avro.generic.IndexedRecord> AvroIO.Sink<ElementT> sink(java.lang.String jsonSchema)
AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements with a given (common) schema, likewriteGenericRecords(String).
-
sinkViaGenericRecords
@Deprecated public static <ElementT> AvroIO.Sink<ElementT> sinkViaGenericRecords(org.apache.avro.Schema schema, AvroIO.RecordFormatter<ElementT> formatter)
Deprecated.RecordFormatter will be removed in future versions.AAvroIO.Sinkfor use withFileIO.write()andFileIO.writeDynamic(), writing elements by converting each one to aGenericRecordwith a given (common) schema, likewriteCustomTypeToGenericRecords().
-
-