Class AvroIO.Read<T>

    • Field Summary

      • Fields inherited from class org.apache.beam.sdk.transforms.PTransform

        annotations, displayData, name, resourceHints
    • Constructor Summary

      Constructors 
      Constructor Description
      Read()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.values.PCollection<T> expand​(org.apache.beam.sdk.values.PBegin input)  
      AvroIO.Read<T> from​(java.lang.String filepattern)
      AvroIO.Read<T> from​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
      Reads from the given filename or filepattern.
      void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)  
      AvroIO.Read<T> watchForNewFiles​(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,​?> terminationCondition)
      Same as watchForNewFiles(Duration, TerminationCondition, boolean) with matchUpdatedFiles=false.
      AvroIO.Read<T> watchForNewFiles​(org.joda.time.Duration pollInterval, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,​?> terminationCondition, boolean matchUpdatedFiles)
      Continuously watches for new files matching the filepattern, polling it at the given interval, until the given termination condition is reached.
      AvroIO.Read<T> withBeamSchemas​(boolean withBeamSchemas)
      If set to true, a Beam schema will be inferred from the AVRO schema.
      AvroIO.Read<T> withCoder​(org.apache.beam.sdk.coders.Coder<T> coder)
      Sets a coder for the result of the read function.
      AvroIO.Read<T> withDatumReaderFactory​(AvroSource.DatumReaderFactory<T> readerFactory)
      Sets a custom AvroSource.DatumReaderFactory for reading.
      AvroIO.Read<T> withEmptyMatchTreatment​(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
      Configures whether or not a filepattern matching no files is allowed.
      AvroIO.Read<T> withHintMatchesManyFiles()
      Hints that the filepattern specified in from(String) matches a very large number of files.
      AvroIO.Read<T> withMatchConfiguration​(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
      Sets the FileIO.MatchConfiguration.
      • Methods inherited from class org.apache.beam.sdk.transforms.PTransform

        addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Read

        public Read()
    • Method Detail

      • from

        public AvroIO.Read<T> from​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
        Reads from the given filename or filepattern.

        If it is known that the filepattern will match a very large number of files (at least tens of thousands), use withHintMatchesManyFiles() for better performance and scalability.

      • withMatchConfiguration

        public AvroIO.Read<T> withMatchConfiguration​(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
        Sets the FileIO.MatchConfiguration.
      • withEmptyMatchTreatment

        public AvroIO.Read<T> withEmptyMatchTreatment​(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
        Configures whether or not a filepattern matching no files is allowed.
      • watchForNewFiles

        public AvroIO.Read<T> watchForNewFiles​(org.joda.time.Duration pollInterval,
                                               org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String,​?> terminationCondition,
                                               boolean matchUpdatedFiles)
        Continuously watches for new files matching the filepattern, polling it at the given interval, until the given termination condition is reached. The returned PCollection is unbounded. If matchUpdatedFiles is set, also watches for files with timestamp change.

        This works only in runners supporting splittable DoFn.

      • withHintMatchesManyFiles

        public AvroIO.Read<T> withHintMatchesManyFiles()
        Hints that the filepattern specified in from(String) matches a very large number of files.

        This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).

      • withBeamSchemas

        public AvroIO.Read<T> withBeamSchemas​(boolean withBeamSchemas)
        If set to true, a Beam schema will be inferred from the AVRO schema. This allows the output to be used by SQL and by the schema-transform library.
      • withCoder

        public AvroIO.Read<T> withCoder​(org.apache.beam.sdk.coders.Coder<T> coder)
        Sets a coder for the result of the read function.
      • expand

        public org.apache.beam.sdk.values.PCollection<T> expand​(org.apache.beam.sdk.values.PBegin input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<T>>
      • populateDisplayData

        public void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        Overrides:
        populateDisplayData in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<T>>