Interface IndexerMetadataStorageCoordinator

    • Method Detail

      • retrieveUsedSegmentsForInterval

        default Collection<org.apache.druid.timeline.DataSegment> retrieveUsedSegmentsForInterval​(String dataSource,
                                                                                                  org.joda.time.Interval interval,
                                                                                                  Segments visibility)
        Retrieve all published segments which may include any data in the interval and are marked as used from the metadata store. The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in the collection only once.
        Parameters:
        dataSource - The data source to query
        interval - The interval for which all applicable and used segmented are requested.
        visibility - Whether only visible or visible as well as overshadowed segments should be returned. The visibility is considered within the specified interval: that is, a segment which is visible outside of the specified interval, but overshadowed within the specified interval will not be returned if Segments.ONLY_VISIBLE is passed. See more precise description in the doc for Segments.
        Returns:
        The DataSegments which include data in the requested interval. These segments may contain data outside the requested interval.
      • retrieveAllUsedSegments

        Collection<org.apache.druid.timeline.DataSegment> retrieveAllUsedSegments​(String dataSource,
                                                                                  Segments visibility)
        Retrieve all published used segments in the data source from the metadata store.
        Parameters:
        dataSource - The data source to query
        Returns:
        all segments belonging to the given data source
        See Also:
        similar to this method but also accepts data interval.
      • retrieveUsedSegmentsAndCreatedDates

        Collection<org.apache.druid.java.util.common.Pair<org.apache.druid.timeline.DataSegment,​String>> retrieveUsedSegmentsAndCreatedDates​(String dataSource,
                                                                                                                                                   List<org.joda.time.Interval> intervals)
        Retrieve all published segments which are marked as used and the created_date of these segments belonging to the given data source and list of intervals from the metadata store. Unlike other similar methods in this interface, this method doesn't accept a Segments "visibility" parameter. The returned collection may include overshadowed segments and their created_dates, as if Segments.INCLUDING_OVERSHADOWED was passed. It's the responsibility of the caller to filter out overshadowed ones if needed.
        Parameters:
        dataSource - The data source to query
        intervals - The list of interval to query
        Returns:
        The DataSegments and the related created_date of segments
      • retrieveUsedSegmentsForIntervals

        Collection<org.apache.druid.timeline.DataSegment> retrieveUsedSegmentsForIntervals​(String dataSource,
                                                                                           List<org.joda.time.Interval> intervals,
                                                                                           Segments visibility)
        Retrieve all published segments which may include any data in the given intervals and are marked as used from the metadata store. The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in the collection only once.
        Parameters:
        dataSource - The data source to query
        intervals - The intervals for which all applicable and used segments are requested.
        visibility - Whether only visible or visible as well as overshadowed segments should be returned. The visibility is considered within the specified intervals: that is, a segment which is visible outside of the specified intervals, but overshadowed on the specified intervals will not be returned if Segments.ONLY_VISIBLE is passed. See more precise description in the doc for Segments.
        Returns:
        The DataSegments which include data in the requested intervals. These segments may contain data outside the requested intervals.
      • retrieveUnusedSegmentsForInterval

        List<org.apache.druid.timeline.DataSegment> retrieveUnusedSegmentsForInterval​(String dataSource,
                                                                                      org.joda.time.Interval interval,
                                                                                      @Nullable
                                                                                      Integer limit,
                                                                                      @Nullable
                                                                                      org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
        Retrieve all published segments which include ONLY data within the given interval and are marked as unused from the metadata store.
        Parameters:
        dataSource - The data source the segments belong to
        interval - Filter the data segments to ones that include data in this interval exclusively.
        limit - The maximum number of unused segments to retreive. If null, no limit is applied.
        maxUsedStatusLastUpdatedTime - The maximum used_status_last_updated time. Any unused segment in interval with used_status_last_updated no later than this time will be included in the kill task. Segments without used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored
        Returns:
        DataSegments which include ONLY data within the requested interval and are marked as unused. Segments NOT returned here may include data in the interval
      • markSegmentsAsUnusedWithinInterval

        int markSegmentsAsUnusedWithinInterval​(String dataSource,
                                               org.joda.time.Interval interval)
        Mark as unused segments which include ONLY data within the given interval.
        Parameters:
        dataSource - The data source the segments belong to
        interval - Filter the data segments to ones that include data in this interval exclusively.
        Returns:
        number of segments marked unused
      • commitSegments

        Set<org.apache.druid.timeline.DataSegment> commitSegments​(Set<org.apache.druid.timeline.DataSegment> segments)
                                                           throws IOException
        Attempts to insert a set of segments to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added).
        Parameters:
        segments - set of segments to add
        Returns:
        set of segments actually added
        Throws:
        IOException
      • allocatePendingSegments

        Map<SegmentCreateRequest,​SegmentIdWithShardSpec> allocatePendingSegments​(String dataSource,
                                                                                       org.joda.time.Interval interval,
                                                                                       boolean skipSegmentLineageCheck,
                                                                                       List<SegmentCreateRequest> requests)
        Allocates pending segments for the given requests in the pending segments table. The segment id allocated for a request will not be given out again unless a request is made with the same SegmentCreateRequest.
        Parameters:
        dataSource - dataSource for which to allocate a segment
        interval - interval for which to allocate a segment
        skipSegmentLineageCheck - if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same order
        requests - Requests for which to allocate segments. All the requests must share the same partition space.
        Returns:
        Map from request to allocated segment id. The map does not contain entries for failed requests.
      • allocatePendingSegment

        SegmentIdWithShardSpec allocatePendingSegment​(String dataSource,
                                                      String sequenceName,
                                                      @Nullable
                                                      String previousSegmentId,
                                                      org.joda.time.Interval interval,
                                                      org.apache.druid.timeline.partition.PartialShardSpec partialShardSpec,
                                                      String maxVersion,
                                                      boolean skipSegmentLineageCheck)
        Allocate a new pending segment in the pending segments table. This segment identifier will never be given out again, unless another call is made with the same dataSource, sequenceName, and previousSegmentId.

        The sequenceName and previousSegmentId parameters are meant to make it easy for two independent ingestion tasks to produce the same series of segments.

        Note that a segment sequence may include segments with a variety of different intervals and versions.

        Parameters:
        dataSource - dataSource for which to allocate a segment
        sequenceName - name of the group of ingestion tasks producing a segment series
        previousSegmentId - previous segment in the series; may be null or empty, meaning this is the first segment
        interval - interval for which to allocate a segment
        partialShardSpec - partialShardSpec containing all necessary information to create a shardSpec for the new segmentId
        maxVersion - use this version if we have no better version to use. The returned segment identifier may have a version lower than this one, but will not have one higher.
        skipSegmentLineageCheck - if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same order
        Returns:
        the pending segment identifier, or null if it was impossible to allocate a new segment
      • deletePendingSegmentsCreatedInInterval

        int deletePendingSegmentsCreatedInInterval​(String dataSource,
                                                   org.joda.time.Interval deleteInterval)
        Delete pending segments created in the given interval belonging to the given data source from the pending segments table. The created_date field of the pending segments table is checked to find segments to be deleted. Note that the semantic of the interval (for `created_date`s) is different from the semantic of the interval parameters in some other methods in this class, such as retrieveUsedSegmentsForInterval(java.lang.String, org.joda.time.Interval, org.apache.druid.indexing.overlord.Segments) (where the interval is about the time column value in rows belonging to the segment).
        Parameters:
        dataSource - dataSource
        deleteInterval - interval to check the created_date of pendingSegments
        Returns:
        number of deleted pending segments
      • commitSegmentsAndMetadata

        SegmentPublishResult commitSegmentsAndMetadata​(Set<org.apache.druid.timeline.DataSegment> segments,
                                                       @Nullable
                                                       DataSourceMetadata startMetadata,
                                                       @Nullable
                                                       DataSourceMetadata endMetadata)
                                                throws IOException
        Attempts to insert a set of segments to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added).

        If startMetadata and endMetadata are set, this insertion will be atomic with a compare-and-swap on dataSource commit metadata. If segmentsToDrop is not null and not empty, this insertion will be atomic with a insert-and-drop on inserting {@param segments} and dropping {@param segmentsToDrop}.

        Parameters:
        segments - set of segments to add, must all be from the same dataSource
        startMetadata - dataSource metadata pre-insert must match this startMetadata according to DataSourceMetadata.matches(DataSourceMetadata). If null, this insert will not involve a metadata transaction
        endMetadata - dataSource metadata post-insert will have this endMetadata merged in with DataSourceMetadata.plus(DataSourceMetadata). If null, this insert will not involve a metadata transaction
        Returns:
        segment publish result indicating transaction success or failure, and set of segments actually published. This method must only return a failure code if it is sure that the transaction did not happen. If it is not sure, it must throw an exception instead.
        Throws:
        IllegalArgumentException - if startMetadata and endMetadata are not either both null or both non-null
        RuntimeException - if the state of metadata storage after this call is unknown
        IOException
      • commitAppendSegments

        SegmentPublishResult commitAppendSegments​(Set<org.apache.druid.timeline.DataSegment> appendSegments,
                                                  Map<org.apache.druid.timeline.DataSegment,​ReplaceTaskLock> appendSegmentToReplaceLock)
        Commits segments created by an APPEND task. This method also handles segment upgrade scenarios that may result from concurrent append and replace.
        • If a REPLACE task committed a segment that overlaps with any of the appendSegments while this APPEND task was in progress, the appendSegments are upgraded to the version of the replace segment.
        • If an appendSegment is covered by a currently active REPLACE lock, then an entry is created for it in the upgrade_segments table, so that when the REPLACE task finishes, it can upgrade the appendSegment as required.
        Parameters:
        appendSegments - All segments created by an APPEND task that must be committed in a single transaction.
        appendSegmentToReplaceLock - Map from append segment to the currently active REPLACE lock (if any) covering it
      • upgradePendingSegmentsOverlappingWith

        Map<SegmentIdWithShardSpec,​SegmentIdWithShardSpec> upgradePendingSegmentsOverlappingWith​(Set<org.apache.druid.timeline.DataSegment> replaceSegments,
                                                                                                       Set<String> activeRealtimeSequencePrefixes)
        Creates and inserts new IDs for the pending segments hat overlap with the given replace segments being committed. The newly created pending segment IDs:
        • Have the same interval and version as that of an overlapping segment committed by the REPLACE task.
        • Cannot be committed but are only used to serve realtime queries against those versions.
        Parameters:
        replaceSegments - Segments being committed by a REPLACE task
        activeRealtimeSequencePrefixes - Set of sequence prefixes of active and pending completion task groups of the supervisor (if any) for this datasource
        Returns:
        Map from originally allocated pending segment to its new upgraded ID.
      • retrieveDataSourceMetadata

        @Nullable
        DataSourceMetadata retrieveDataSourceMetadata​(String dataSource)
        Retrieves data source's metadata from the metadata store. Returns null if there is no metadata.
      • deleteDataSourceMetadata

        boolean deleteDataSourceMetadata​(String dataSource)
        Removes entry for 'dataSource' from the dataSource metadata table.
        Parameters:
        dataSource - identifier
        Returns:
        true if the entry was deleted, false otherwise
      • resetDataSourceMetadata

        boolean resetDataSourceMetadata​(String dataSource,
                                        DataSourceMetadata dataSourceMetadata)
                                 throws IOException
        Resets dataSourceMetadata entry for 'dataSource' to the one supplied.
        Parameters:
        dataSource - identifier
        dataSourceMetadata - value to set
        Returns:
        true if the entry was reset, false otherwise
        Throws:
        IOException
      • insertDataSourceMetadata

        boolean insertDataSourceMetadata​(String dataSource,
                                         DataSourceMetadata dataSourceMetadata)
        Insert dataSourceMetadata entry for 'dataSource'.
        Parameters:
        dataSource - identifier
        dataSourceMetadata - value to set
        Returns:
        true if the entry was inserted, false otherwise
      • removeDataSourceMetadataOlderThan

        int removeDataSourceMetadataOlderThan​(long timestamp,
                                              @NotNull
                                              @NotNull Set<String> excludeDatasources)
        Remove datasource metadata created before the given timestamp and not in given excludeDatasources set.
        Parameters:
        timestamp - timestamp in milliseconds
        excludeDatasources - set of datasource names to exclude from removal
        Returns:
        number of datasource metadata removed
      • updateSegmentMetadata

        void updateSegmentMetadata​(Set<org.apache.druid.timeline.DataSegment> segments)
      • deleteSegments

        void deleteSegments​(Set<org.apache.druid.timeline.DataSegment> segments)
      • retrieveSegmentForId

        org.apache.druid.timeline.DataSegment retrieveSegmentForId​(String id,
                                                                   boolean includeUnused)
        Retrieve the segment for a given id from the metadata store. Return null if no such segment exists
        If includeUnused is set, the segment id retrieval should also consider the set of unused segments in the metadata store. Unused segments could be deleted by a kill task at any time and might lead to unexpected behaviour. This option exists mainly to provide a consistent view of the metadata, for example, in calls from MSQ controller and worker and would generally not be required.
        Parameters:
        id - The segment id to retrieve
        Returns:
        DataSegment used segment corresponding to given id
      • deleteUpgradeSegmentsForTask

        int deleteUpgradeSegmentsForTask​(String taskId)
        Delete entries from the upgrade segments table after the corresponding replace task has ended
        Parameters:
        taskId - - id of the task with replace locks
        Returns:
        number of deleted entries from the metadata store