Class AvroIO.Read<T>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
-
- org.apache.beam.sdk.extensions.avro.io.AvroIO.Read<T>
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
- Enclosing class:
- AvroIO
public abstract static class AvroIO.Read<T> extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>Implementation ofAvroIO.read(java.lang.Class<T>)andAvroIO.readGenericRecords(org.apache.avro.Schema).- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Read()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.beam.sdk.values.PCollection<T>expand(org.apache.beam.sdk.values.PBegin input)AvroIO.Read<T>from(java.lang.String filepattern)Likefrom(ValueProvider).AvroIO.Read<T>from(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)Reads from the given filename or filepattern.voidpopulateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)AvroIO.Read<T>watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)Same aswatchForNewFiles(Duration, TerminationCondition, boolean)withmatchUpdatedFiles=false.AvroIO.Read<T>watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition, boolean matchUpdatedFiles)Continuously watches for new files matching the filepattern, polling it at the given interval, until the given termination condition is reached.AvroIO.Read<T>withBeamSchemas(boolean withBeamSchemas)If set to true, a Beam schema will be inferred from the AVRO schema.AvroIO.Read<T>withCoder(org.apache.beam.sdk.coders.Coder<T> coder)Sets a coder for the result of the read function.AvroIO.Read<T>withDatumReaderFactory(AvroSource.DatumReaderFactory<T> readerFactory)Sets a customAvroSource.DatumReaderFactoryfor reading.AvroIO.Read<T>withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)Configures whether or not a filepattern matching no files is allowed.AvroIO.Read<T>withHintMatchesManyFiles()Hints that the filepattern specified infrom(String)matches a very large number of files.AvroIO.Read<T>withMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)Sets theFileIO.MatchConfiguration.-
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
-
-
-
Method Detail
-
from
public AvroIO.Read<T> from(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
Reads from the given filename or filepattern.If it is known that the filepattern will match a very large number of files (at least tens of thousands), use
withHintMatchesManyFiles()for better performance and scalability.
-
from
public AvroIO.Read<T> from(java.lang.String filepattern)
Likefrom(ValueProvider).
-
withMatchConfiguration
public AvroIO.Read<T> withMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
Sets theFileIO.MatchConfiguration.
-
withEmptyMatchTreatment
public AvroIO.Read<T> withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
Configures whether or not a filepattern matching no files is allowed.
-
watchForNewFiles
public AvroIO.Read<T> watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition, boolean matchUpdatedFiles)
Continuously watches for new files matching the filepattern, polling it at the given interval, until the given termination condition is reached. The returnedPCollectionis unbounded. IfmatchUpdatedFilesis set, also watches for files with timestamp change.This works only in runners supporting splittable
DoFn.
-
watchForNewFiles
public AvroIO.Read<T> watchForNewFiles(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)
Same aswatchForNewFiles(Duration, TerminationCondition, boolean)withmatchUpdatedFiles=false.
-
withHintMatchesManyFiles
public AvroIO.Read<T> withHintMatchesManyFiles()
Hints that the filepattern specified infrom(String)matches a very large number of files.This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
-
withBeamSchemas
public AvroIO.Read<T> withBeamSchemas(boolean withBeamSchemas)
If set to true, a Beam schema will be inferred from the AVRO schema. This allows the output to be used by SQL and by the schema-transform library.
-
withCoder
public AvroIO.Read<T> withCoder(org.apache.beam.sdk.coders.Coder<T> coder)
Sets a coder for the result of the read function.
-
withDatumReaderFactory
public AvroIO.Read<T> withDatumReaderFactory(AvroSource.DatumReaderFactory<T> readerFactory)
Sets a customAvroSource.DatumReaderFactoryfor reading. Pass aAvroDatumFactoryto also use the factory for the default outputAvroCoder
-
expand
public org.apache.beam.sdk.values.PCollection<T> expand(org.apache.beam.sdk.values.PBegin input)
- Specified by:
expandin classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
-
populateDisplayData
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
- Specified by:
populateDisplayDatain interfaceorg.apache.beam.sdk.transforms.display.HasDisplayData- Overrides:
populateDisplayDatain classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
-
-