@DoFn.UnboundedPerElement public class DetectNewPartitionsDoFn extends org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
PartitionMetadata.State.CREATED, update their state to PartitionMetadata.State.SCHEDULED and output them to the next
stage in the pipeline.org.apache.beam.sdk.transforms.DoFn.AlwaysFetched, org.apache.beam.sdk.transforms.DoFn.BoundedPerElement, org.apache.beam.sdk.transforms.DoFn.BundleFinalizer, org.apache.beam.sdk.transforms.DoFn.Element, org.apache.beam.sdk.transforms.DoFn.FieldAccess, org.apache.beam.sdk.transforms.DoFn.FinishBundle, org.apache.beam.sdk.transforms.DoFn.FinishBundleContext, org.apache.beam.sdk.transforms.DoFn.GetInitialRestriction, org.apache.beam.sdk.transforms.DoFn.GetInitialWatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder, org.apache.beam.sdk.transforms.DoFn.GetSize, org.apache.beam.sdk.transforms.DoFn.GetWatermarkEstimatorStateCoder, org.apache.beam.sdk.transforms.DoFn.Key, org.apache.beam.sdk.transforms.DoFn.MultiOutputReceiver, org.apache.beam.sdk.transforms.DoFn.NewTracker, org.apache.beam.sdk.transforms.DoFn.NewWatermarkEstimator, org.apache.beam.sdk.transforms.DoFn.OnTimer, org.apache.beam.sdk.transforms.DoFn.OnTimerContext, org.apache.beam.sdk.transforms.DoFn.OnTimerFamily, org.apache.beam.sdk.transforms.DoFn.OnWindowExpiration, org.apache.beam.sdk.transforms.DoFn.OnWindowExpirationContext, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<T>, org.apache.beam.sdk.transforms.DoFn.ProcessContext, org.apache.beam.sdk.transforms.DoFn.ProcessContinuation, org.apache.beam.sdk.transforms.DoFn.ProcessElement, org.apache.beam.sdk.transforms.DoFn.RequiresStableInput, org.apache.beam.sdk.transforms.DoFn.RequiresTimeSortedInput, org.apache.beam.sdk.transforms.DoFn.Restriction, org.apache.beam.sdk.transforms.DoFn.Setup, org.apache.beam.sdk.transforms.DoFn.SideInput, org.apache.beam.sdk.transforms.DoFn.SplitRestriction, org.apache.beam.sdk.transforms.DoFn.StartBundle, org.apache.beam.sdk.transforms.DoFn.StartBundleContext, org.apache.beam.sdk.transforms.DoFn.StateId, org.apache.beam.sdk.transforms.DoFn.Teardown, org.apache.beam.sdk.transforms.DoFn.TimerFamily, org.apache.beam.sdk.transforms.DoFn.TimerId, org.apache.beam.sdk.transforms.DoFn.Timestamp, org.apache.beam.sdk.transforms.DoFn.TruncateRestriction, org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement, org.apache.beam.sdk.transforms.DoFn.WatermarkEstimatorState, org.apache.beam.sdk.transforms.DoFn.WindowedContext| Constructor and Description |
|---|
DetectNewPartitionsDoFn(DaoFactory daoFactory,
MapperFactory mapperFactory,
ActionFactory actionFactory,
ChangeStreamMetrics metrics)
This class needs a
DaoFactory to build DAOs to access the partition metadata tables. |
| Modifier and Type | Method and Description |
|---|---|
org.joda.time.Instant |
getInitialWatermarkEstimatorState(PartitionMetadata partition) |
double |
getSize(TimestampRange restriction) |
TimestampRange |
initialRestriction(PartitionMetadata partition)
Uses an
TimestampRange with a max range. |
DetectNewPartitionsRangeTracker |
newTracker(TimestampRange restriction) |
org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> |
newWatermarkEstimator(org.joda.time.Instant watermarkEstimatorState) |
org.apache.beam.sdk.transforms.DoFn.ProcessContinuation |
processElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker,
org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver,
org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
Main processing function for the
DetectNewPartitionsDoFn function. |
void |
setAveragePartitionBytesSize(long averagePartitionBytesSize)
Sets the average partition bytes size to estimate the backlog of this DoFn.
|
void |
setup()
Obtains the instance of
DetectNewPartitionsAction. |
public DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, ChangeStreamMetrics metrics)
DaoFactory to build DAOs to access the partition metadata tables. It
uses mappers to transform database rows into the PartitionMetadata model. It builds the
delegating action class using the ActionFactory. It emits metrics for the partitions
read using the ChangeStreamMetrics. It re-schedules the process element function to be
executed according to the default resume interval as in DEFAULT_RESUME_DURATION (best effort).daoFactory - the DaoFactory to construct PartitionMetadataDaosmapperFactory - the MapperFactory to construct PartitionMetadataMappersactionFactory - the ActionFactory to construct actionsmetrics - the ChangeStreamMetrics to emit partition related metrics@DoFn.GetInitialWatermarkEstimatorState
public org.joda.time.Instant getInitialWatermarkEstimatorState(@DoFn.Element
PartitionMetadata partition)
@DoFn.NewWatermarkEstimator
public org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> newWatermarkEstimator(@DoFn.WatermarkEstimatorState
org.joda.time.Instant watermarkEstimatorState)
@DoFn.GetInitialRestriction public TimestampRange initialRestriction(@DoFn.Element PartitionMetadata partition)
TimestampRange with a max range. This is because it does not know beforehand
how many partitions it will schedule.@DoFn.GetSize
public double getSize(@DoFn.Restriction
TimestampRange restriction)
@DoFn.NewTracker public DetectNewPartitionsRangeTracker newTracker(@DoFn.Restriction TimestampRange restriction)
@DoFn.Setup public void setup()
DetectNewPartitionsAction.@DoFn.ProcessElement public org.apache.beam.sdk.transforms.DoFn.ProcessContinuation processElement(org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, org.apache.beam.sdk.transforms.DoFn.OutputReceiver<PartitionMetadata> receiver, org.apache.beam.sdk.transforms.splittabledofn.ManualWatermarkEstimator<org.joda.time.Instant> watermarkEstimator)
DetectNewPartitionsDoFn function. It will delegate to
the DetectNewPartitionsAction class.public void setAveragePartitionBytesSize(long averagePartitionBytesSize)
averagePartitionBytesSize - the estimated average size of a partition record used in the
backlog bytes calculation (DoFn.GetSize)