Interface IndexerMetadataStorageCoordinator
-
- All Known Implementing Classes:
IndexerSQLMetadataStorageCoordinator
public interface IndexerMetadataStorageCoordinator
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description SegmentIdWithShardSpecallocatePendingSegment(String dataSource, String sequenceName, String previousSegmentId, org.joda.time.Interval interval, org.apache.druid.timeline.partition.PartialShardSpec partialShardSpec, String maxVersion, boolean skipSegmentLineageCheck)Allocate a new pending segment in the pending segments table.Map<SegmentCreateRequest,SegmentIdWithShardSpec>allocatePendingSegments(String dataSource, org.joda.time.Interval interval, boolean skipSegmentLineageCheck, List<SegmentCreateRequest> requests)Allocates pending segments for the given requests in the pending segments table.SegmentPublishResultcommitAppendSegments(Set<org.apache.druid.timeline.DataSegment> appendSegments, Map<org.apache.druid.timeline.DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock)Commits segments created by an APPEND task.SegmentPublishResultcommitAppendSegmentsAndMetadata(Set<org.apache.druid.timeline.DataSegment> appendSegments, Map<org.apache.druid.timeline.DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)Commits segments created by an APPEND task.SegmentPublishResultcommitMetadataOnly(String dataSource, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)Similar tocommitSegments(java.util.Set<org.apache.druid.timeline.DataSegment>), but meant for streaming ingestion tasks for handling the case where the task ingested no records and created no segments, but still needs to update the metadata with the progress that the task made.SegmentPublishResultcommitReplaceSegments(Set<org.apache.druid.timeline.DataSegment> replaceSegments, Set<ReplaceTaskLock> locksHeldByReplaceTask)Commits segments created by a REPLACE task.Set<org.apache.druid.timeline.DataSegment>commitSegments(Set<org.apache.druid.timeline.DataSegment> segments)Attempts to insert a set of segments to the metadata storage.SegmentPublishResultcommitSegmentsAndMetadata(Set<org.apache.druid.timeline.DataSegment> segments, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)Attempts to insert a set of segments to the metadata storage.booleandeleteDataSourceMetadata(String dataSource)Removes entry for 'dataSource' from the dataSource metadata table.intdeletePendingSegments(String dataSource)Delete all pending segments belonging to the given data source from the pending segments table.intdeletePendingSegmentsCreatedInInterval(String dataSource, org.joda.time.Interval deleteInterval)Delete pending segments created in the given interval belonging to the given data source from the pending segments table.voiddeleteSegments(Set<org.apache.druid.timeline.DataSegment> segments)intdeleteUpgradeSegmentsForTask(String taskId)Delete entries from the upgrade segments table after the corresponding replace task has endedbooleaninsertDataSourceMetadata(String dataSource, DataSourceMetadata dataSourceMetadata)Insert dataSourceMetadata entry for 'dataSource'.intmarkSegmentsAsUnusedWithinInterval(String dataSource, org.joda.time.Interval interval)Mark as unused segments which include ONLY data within the given interval.intremoveDataSourceMetadataOlderThan(long timestamp, @NotNull Set<String> excludeDatasources)Remove datasource metadata created before the given timestamp and not in given excludeDatasources set.booleanresetDataSourceMetadata(String dataSource, DataSourceMetadata dataSourceMetadata)Resets dataSourceMetadata entry for 'dataSource' to the one supplied.Collection<org.apache.druid.timeline.DataSegment>retrieveAllUsedSegments(String dataSource, Segments visibility)Retrieve all published used segments in the data source from the metadata store.DataSourceMetadataretrieveDataSourceMetadata(String dataSource)Retrieves data source's metadata from the metadata store.org.apache.druid.timeline.DataSegmentretrieveSegmentForId(String id, boolean includeUnused)Retrieve the segment for a given id from the metadata store.List<org.apache.druid.timeline.DataSegment>retrieveUnusedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, Integer limit, org.joda.time.DateTime maxUsedStatusLastUpdatedTime)Retrieve all published segments which include ONLY data within the given interval and are marked as unused from the metadata store.Collection<org.apache.druid.java.util.common.Pair<org.apache.druid.timeline.DataSegment,String>>retrieveUsedSegmentsAndCreatedDates(String dataSource, List<org.joda.time.Interval> intervals)Retrieve all published segments which are marked as used and the created_date of these segments belonging to the given data source and list of intervals from the metadata store.default Collection<org.apache.druid.timeline.DataSegment>retrieveUsedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, Segments visibility)Retrieve all published segments which may include any data in the interval and are marked as used from the metadata store.Collection<org.apache.druid.timeline.DataSegment>retrieveUsedSegmentsForIntervals(String dataSource, List<org.joda.time.Interval> intervals, Segments visibility)Retrieve all published segments which may include any data in the given intervals and are marked as used from the metadata store.voidupdateSegmentMetadata(Set<org.apache.druid.timeline.DataSegment> segments)Map<SegmentIdWithShardSpec,SegmentIdWithShardSpec>upgradePendingSegmentsOverlappingWith(Set<org.apache.druid.timeline.DataSegment> replaceSegments, Set<String> activeRealtimeSequencePrefixes)Creates and inserts new IDs for the pending segments hat overlap with the given replace segments being committed.
-
-
-
Method Detail
-
retrieveUsedSegmentsForInterval
default Collection<org.apache.druid.timeline.DataSegment> retrieveUsedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, Segments visibility)
Retrieve all published segments which may include any data in the interval and are marked as used from the metadata store. The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in the collection only once.- Parameters:
dataSource- The data source to queryinterval- The interval for which all applicable and used segmented are requested.visibility- Whether only visible or visible as well as overshadowed segments should be returned. The visibility is considered within the specified interval: that is, a segment which is visible outside of the specified interval, but overshadowed within the specified interval will not be returned ifSegments.ONLY_VISIBLEis passed. See more precise description in the doc forSegments.- Returns:
- The DataSegments which include data in the requested interval. These segments may contain data outside the requested interval.
-
retrieveAllUsedSegments
Collection<org.apache.druid.timeline.DataSegment> retrieveAllUsedSegments(String dataSource, Segments visibility)
Retrieve all published used segments in the data source from the metadata store.- Parameters:
dataSource- The data source to query- Returns:
- all segments belonging to the given data source
- See Also:
similar to this method but also accepts data interval.
-
retrieveUsedSegmentsAndCreatedDates
Collection<org.apache.druid.java.util.common.Pair<org.apache.druid.timeline.DataSegment,String>> retrieveUsedSegmentsAndCreatedDates(String dataSource, List<org.joda.time.Interval> intervals)
Retrieve all published segments which are marked as used and the created_date of these segments belonging to the given data source and list of intervals from the metadata store. Unlike other similar methods in this interface, this method doesn't accept aSegments"visibility" parameter. The returned collection may include overshadowed segments and their created_dates, as ifSegments.INCLUDING_OVERSHADOWEDwas passed. It's the responsibility of the caller to filter out overshadowed ones if needed.- Parameters:
dataSource- The data source to queryintervals- The list of interval to query- Returns:
- The DataSegments and the related created_date of segments
-
retrieveUsedSegmentsForIntervals
Collection<org.apache.druid.timeline.DataSegment> retrieveUsedSegmentsForIntervals(String dataSource, List<org.joda.time.Interval> intervals, Segments visibility)
Retrieve all published segments which may include any data in the given intervals and are marked as used from the metadata store. The order of segments within the returned collection is unspecified, but each segment is guaranteed to appear in the collection only once.- Parameters:
dataSource- The data source to queryintervals- The intervals for which all applicable and used segments are requested.visibility- Whether only visible or visible as well as overshadowed segments should be returned. The visibility is considered within the specified intervals: that is, a segment which is visible outside of the specified intervals, but overshadowed on the specified intervals will not be returned ifSegments.ONLY_VISIBLEis passed. See more precise description in the doc forSegments.- Returns:
- The DataSegments which include data in the requested intervals. These segments may contain data outside the requested intervals.
-
retrieveUnusedSegmentsForInterval
List<org.apache.druid.timeline.DataSegment> retrieveUnusedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, @Nullable Integer limit, @Nullable org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
Retrieve all published segments which include ONLY data within the given interval and are marked as unused from the metadata store.- Parameters:
dataSource- The data source the segments belong tointerval- Filter the data segments to ones that include data in this interval exclusively.limit- The maximum number of unused segments to retreive. If null, no limit is applied.maxUsedStatusLastUpdatedTime- The maximumused_status_last_updatedtime. Any unused segment inintervalwithused_status_last_updatedno later than this time will be included in the kill task. Segments withoutused_status_last_updatedtime (due to an upgrade from legacy Druid) will havemaxUsedStatusLastUpdatedTimeignored- Returns:
- DataSegments which include ONLY data within the requested interval and are marked as unused. Segments NOT returned here may include data in the interval
-
markSegmentsAsUnusedWithinInterval
int markSegmentsAsUnusedWithinInterval(String dataSource, org.joda.time.Interval interval)
Mark as unused segments which include ONLY data within the given interval.- Parameters:
dataSource- The data source the segments belong tointerval- Filter the data segments to ones that include data in this interval exclusively.- Returns:
- number of segments marked unused
-
commitSegments
Set<org.apache.druid.timeline.DataSegment> commitSegments(Set<org.apache.druid.timeline.DataSegment> segments) throws IOException
Attempts to insert a set of segments to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added).- Parameters:
segments- set of segments to add- Returns:
- set of segments actually added
- Throws:
IOException
-
allocatePendingSegments
Map<SegmentCreateRequest,SegmentIdWithShardSpec> allocatePendingSegments(String dataSource, org.joda.time.Interval interval, boolean skipSegmentLineageCheck, List<SegmentCreateRequest> requests)
Allocates pending segments for the given requests in the pending segments table. The segment id allocated for a request will not be given out again unless a request is made with the sameSegmentCreateRequest.- Parameters:
dataSource- dataSource for which to allocate a segmentinterval- interval for which to allocate a segmentskipSegmentLineageCheck- if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same orderrequests- Requests for which to allocate segments. All the requests must share the same partition space.- Returns:
- Map from request to allocated segment id. The map does not contain entries for failed requests.
-
allocatePendingSegment
SegmentIdWithShardSpec allocatePendingSegment(String dataSource, String sequenceName, @Nullable String previousSegmentId, org.joda.time.Interval interval, org.apache.druid.timeline.partition.PartialShardSpec partialShardSpec, String maxVersion, boolean skipSegmentLineageCheck)
Allocate a new pending segment in the pending segments table. This segment identifier will never be given out again, unless another call is made with the same dataSource, sequenceName, and previousSegmentId. The sequenceName and previousSegmentId parameters are meant to make it easy for two independent ingestion tasks to produce the same series of segments. Note that a segment sequence may include segments with a variety of different intervals and versions.- Parameters:
dataSource- dataSource for which to allocate a segmentsequenceName- name of the group of ingestion tasks producing a segment seriespreviousSegmentId- previous segment in the series; may be null or empty, meaning this is the first segmentinterval- interval for which to allocate a segmentpartialShardSpec- partialShardSpec containing all necessary information to create a shardSpec for the new segmentIdmaxVersion- use this version if we have no better version to use. The returned segment identifier may have a version lower than this one, but will not have one higher.skipSegmentLineageCheck- if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same order- Returns:
- the pending segment identifier, or null if it was impossible to allocate a new segment
-
deletePendingSegmentsCreatedInInterval
int deletePendingSegmentsCreatedInInterval(String dataSource, org.joda.time.Interval deleteInterval)
Delete pending segments created in the given interval belonging to the given data source from the pending segments table. Thecreated_datefield of the pending segments table is checked to find segments to be deleted. Note that the semantic of the interval (for `created_date`s) is different from the semantic of the interval parameters in some other methods in this class, such asretrieveUsedSegmentsForInterval(java.lang.String, org.joda.time.Interval, org.apache.druid.indexing.overlord.Segments)(where the interval is about the time column value in rows belonging to the segment).- Parameters:
dataSource- dataSourcedeleteInterval- interval to check thecreated_dateof pendingSegments- Returns:
- number of deleted pending segments
-
deletePendingSegments
int deletePendingSegments(String dataSource)
Delete all pending segments belonging to the given data source from the pending segments table.- Returns:
- number of deleted pending segments
- See Also:
similar to this method but also accepts interval for segments' `created_date`s
-
commitSegmentsAndMetadata
SegmentPublishResult commitSegmentsAndMetadata(Set<org.apache.druid.timeline.DataSegment> segments, @Nullable DataSourceMetadata startMetadata, @Nullable DataSourceMetadata endMetadata) throws IOException
Attempts to insert a set of segments to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added). If startMetadata and endMetadata are set, this insertion will be atomic with a compare-and-swap on dataSource commit metadata. If segmentsToDrop is not null and not empty, this insertion will be atomic with a insert-and-drop on inserting {@param segments} and dropping {@param segmentsToDrop}.- Parameters:
segments- set of segments to add, must all be from the same dataSourcestartMetadata- dataSource metadata pre-insert must match this startMetadata according toDataSourceMetadata.matches(DataSourceMetadata). If null, this insert will not involve a metadata transactionendMetadata- dataSource metadata post-insert will have this endMetadata merged in withDataSourceMetadata.plus(DataSourceMetadata). If null, this insert will not involve a metadata transaction- Returns:
- segment publish result indicating transaction success or failure, and set of segments actually published. This method must only return a failure code if it is sure that the transaction did not happen. If it is not sure, it must throw an exception instead.
- Throws:
IllegalArgumentException- if startMetadata and endMetadata are not either both null or both non-nullRuntimeException- if the state of metadata storage after this call is unknownIOException
-
commitAppendSegments
SegmentPublishResult commitAppendSegments(Set<org.apache.druid.timeline.DataSegment> appendSegments, Map<org.apache.druid.timeline.DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock)
Commits segments created by an APPEND task. This method also handles segment upgrade scenarios that may result from concurrent append and replace.- If a REPLACE task committed a segment that overlaps with any of the appendSegments while this APPEND task was in progress, the appendSegments are upgraded to the version of the replace segment.
- If an appendSegment is covered by a currently active REPLACE lock, then an entry is created for it in the upgrade_segments table, so that when the REPLACE task finishes, it can upgrade the appendSegment as required.
- Parameters:
appendSegments- All segments created by an APPEND task that must be committed in a single transaction.appendSegmentToReplaceLock- Map from append segment to the currently active REPLACE lock (if any) covering it
-
commitAppendSegmentsAndMetadata
SegmentPublishResult commitAppendSegmentsAndMetadata(Set<org.apache.druid.timeline.DataSegment> appendSegments, Map<org.apache.druid.timeline.DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)
Commits segments created by an APPEND task. This method also handles segment upgrade scenarios that may result from concurrent append and replace. Also commits start and endDataSourceMetadata.- See Also:
commitAppendSegments(java.util.Set<org.apache.druid.timeline.DataSegment>, java.util.Map<org.apache.druid.timeline.DataSegment, org.apache.druid.metadata.ReplaceTaskLock>),commitSegmentsAndMetadata(java.util.Set<org.apache.druid.timeline.DataSegment>, org.apache.druid.indexing.overlord.DataSourceMetadata, org.apache.druid.indexing.overlord.DataSourceMetadata)
-
commitReplaceSegments
SegmentPublishResult commitReplaceSegments(Set<org.apache.druid.timeline.DataSegment> replaceSegments, Set<ReplaceTaskLock> locksHeldByReplaceTask)
Commits segments created by a REPLACE task. This method also handles the segment upgrade scenarios that may result from concurrent append and replace.- If an APPEND task committed a segment to an interval locked by this task,
the append segment is upgraded to the version of the corresponding lock.
This is done with the help of entries created in the upgrade_segments table
in
commitAppendSegments(java.util.Set<org.apache.druid.timeline.DataSegment>, java.util.Map<org.apache.druid.timeline.DataSegment, org.apache.druid.metadata.ReplaceTaskLock>)
- Parameters:
replaceSegments- All segments created by a REPLACE task that must be committed in a single transaction.locksHeldByReplaceTask- All active non-revoked REPLACE locks held by the task
- If an APPEND task committed a segment to an interval locked by this task,
the append segment is upgraded to the version of the corresponding lock.
This is done with the help of entries created in the upgrade_segments table
in
-
upgradePendingSegmentsOverlappingWith
Map<SegmentIdWithShardSpec,SegmentIdWithShardSpec> upgradePendingSegmentsOverlappingWith(Set<org.apache.druid.timeline.DataSegment> replaceSegments, Set<String> activeRealtimeSequencePrefixes)
Creates and inserts new IDs for the pending segments hat overlap with the given replace segments being committed. The newly created pending segment IDs:- Have the same interval and version as that of an overlapping segment committed by the REPLACE task.
- Cannot be committed but are only used to serve realtime queries against those versions.
- Parameters:
replaceSegments- Segments being committed by a REPLACE taskactiveRealtimeSequencePrefixes- Set of sequence prefixes of active and pending completion task groups of the supervisor (if any) for this datasource- Returns:
- Map from originally allocated pending segment to its new upgraded ID.
-
retrieveDataSourceMetadata
@Nullable DataSourceMetadata retrieveDataSourceMetadata(String dataSource)
Retrieves data source's metadata from the metadata store. Returns null if there is no metadata.
-
deleteDataSourceMetadata
boolean deleteDataSourceMetadata(String dataSource)
Removes entry for 'dataSource' from the dataSource metadata table.- Parameters:
dataSource- identifier- Returns:
- true if the entry was deleted, false otherwise
-
resetDataSourceMetadata
boolean resetDataSourceMetadata(String dataSource, DataSourceMetadata dataSourceMetadata) throws IOException
Resets dataSourceMetadata entry for 'dataSource' to the one supplied.- Parameters:
dataSource- identifierdataSourceMetadata- value to set- Returns:
- true if the entry was reset, false otherwise
- Throws:
IOException
-
insertDataSourceMetadata
boolean insertDataSourceMetadata(String dataSource, DataSourceMetadata dataSourceMetadata)
Insert dataSourceMetadata entry for 'dataSource'.- Parameters:
dataSource- identifierdataSourceMetadata- value to set- Returns:
- true if the entry was inserted, false otherwise
-
removeDataSourceMetadataOlderThan
int removeDataSourceMetadataOlderThan(long timestamp, @NotNull @NotNull Set<String> excludeDatasources)Remove datasource metadata created before the given timestamp and not in given excludeDatasources set.- Parameters:
timestamp- timestamp in millisecondsexcludeDatasources- set of datasource names to exclude from removal- Returns:
- number of datasource metadata removed
-
commitMetadataOnly
SegmentPublishResult commitMetadataOnly(String dataSource, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)
Similar tocommitSegments(java.util.Set<org.apache.druid.timeline.DataSegment>), but meant for streaming ingestion tasks for handling the case where the task ingested no records and created no segments, but still needs to update the metadata with the progress that the task made. The metadata should undergo the same validation checks as performed bycommitSegments(java.util.Set<org.apache.druid.timeline.DataSegment>).- Parameters:
dataSource- the datasourcestartMetadata- dataSource metadata pre-insert must match this startMetadata according toDataSourceMetadata.matches(DataSourceMetadata).endMetadata- dataSource metadata post-insert will have this endMetadata merged in withDataSourceMetadata.plus(DataSourceMetadata).- Returns:
- segment publish result indicating transaction success or failure. This method must only return a failure code if it is sure that the transaction did not happen. If it is not sure, it must throw an exception instead.
- Throws:
IllegalArgumentException- if either startMetadata and endMetadata are nullRuntimeException- if the state of metadata storage after this call is unknown
-
updateSegmentMetadata
void updateSegmentMetadata(Set<org.apache.druid.timeline.DataSegment> segments)
-
deleteSegments
void deleteSegments(Set<org.apache.druid.timeline.DataSegment> segments)
-
retrieveSegmentForId
org.apache.druid.timeline.DataSegment retrieveSegmentForId(String id, boolean includeUnused)
Retrieve the segment for a given id from the metadata store. Return null if no such segment exists
IfincludeUnusedis set, the segmentidretrieval should also consider the set of unused segments in the metadata store. Unused segments could be deleted by a kill task at any time and might lead to unexpected behaviour. This option exists mainly to provide a consistent view of the metadata, for example, in calls from MSQ controller and worker and would generally not be required.- Parameters:
id- The segment id to retrieve- Returns:
- DataSegment used segment corresponding to given id
-
deleteUpgradeSegmentsForTask
int deleteUpgradeSegmentsForTask(String taskId)
Delete entries from the upgrade segments table after the corresponding replace task has ended- Parameters:
taskId- - id of the task with replace locks- Returns:
- number of deleted entries from the metadata store
-
-