Class StreamAppenderatorDriver

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class StreamAppenderatorDriver
    extends BaseAppenderatorDriver
    This class is specialized for streaming ingestion. In streaming ingestion, the segment lifecycle is like:
     APPENDING -> APPEND_FINISHED -> PUBLISHED
     
    • APPENDING: Segment is available for appending.
    • APPEND_FINISHED: Segment cannot be updated (data cannot be added anymore) and is waiting for being published.
    • PUBLISHED: Segment is pushed to deep storage, its metadata is published to metastore, and finally the segment is dropped from local storage
    • Constructor Detail

      • StreamAppenderatorDriver

        public StreamAppenderatorDriver​(Appenderator appenderator,
                                        SegmentAllocator segmentAllocator,
                                        SegmentHandoffNotifierFactory handoffNotifierFactory,
                                        UsedSegmentChecker usedSegmentChecker,
                                        org.apache.druid.segment.loading.DataSegmentKiller dataSegmentKiller,
                                        com.fasterxml.jackson.databind.ObjectMapper objectMapper,
                                        FireDepartmentMetrics metrics)
        Create a driver.
        Parameters:
        appenderator - appenderator
        segmentAllocator - segment allocator
        handoffNotifierFactory - handoff notifier factory
        usedSegmentChecker - used segment checker
        objectMapper - object mapper, used for serde of commit metadata
        metrics - Firedepartment metrics
    • Method Detail

      • add

        public AppenderatorDriverAddResult add​(org.apache.druid.data.input.InputRow row,
                                               String sequenceName,
                                               com.google.common.base.Supplier<org.apache.druid.data.input.Committer> committerSupplier,
                                               boolean skipSegmentLineageCheck,
                                               boolean allowIncrementalPersists)
                                        throws IOException
        Add a row. Must not be called concurrently from multiple threads.
        Parameters:
        row - the row to add
        sequenceName - sequenceName for this row's segment
        committerSupplier - supplier of a committer associated with all data that has been added, including this row if {@param allowIncrementalPersists} is set to false then this will not be used
        skipSegmentLineageCheck - Should be set false to perform lineage validation using previousSegmentId for this sequence. Note that for Kafka Streams we should disable this check and set this parameter to true. if true, skips, does not enforce, lineage validation.
        allowIncrementalPersists - whether to allow persist to happen when maxRowsInMemory or intermediate persist period threshold is hit
        Returns:
        AppenderatorDriverAddResult
        Throws:
        IOException - if there is an I/O error while allocating or writing to a segment
      • moveSegmentOut

        public void moveSegmentOut​(String sequenceName,
                                   List<SegmentIdWithShardSpec> identifiers)
        Move a set of identifiers out from "active", making way for newer segments. This method is to support KafkaIndexTask's legacy mode and will be removed in the future. See KakfaIndexTask.runLegacy().
      • persistAsync

        public com.google.common.util.concurrent.ListenableFuture<Object> persistAsync​(org.apache.druid.data.input.Committer committer)
        Persist all data indexed through this driver so far. Returns a future of persisted commitMetadata.

        Should be called after all data has been added through add(InputRow, String, Supplier, boolean, boolean).

        Parameters:
        committer - committer representing all data that has been added so far
        Returns:
        future containing commitMetadata persisted
      • publish

        public com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> publish​(TransactionalSegmentPublisher publisher,
                                                                                                     org.apache.druid.data.input.Committer committer,
                                                                                                     Collection<String> sequenceNames)
        Execute a task in background to publish all segments corresponding to the given sequence names. The task internally pushes the segments to the deep storage first, and then publishes the metadata to the metadata storage.
        Parameters:
        publisher - segment publisher
        committer - committer
        sequenceNames - a collection of sequence names to be published
        Returns:
        a ListenableFuture for the submitted task which removes published sequenceNames from activeSegments and publishPendingSegments