Class AbstractSegmentMetadataCache<T extends DataSourceInformation>
- java.lang.Object
-
- org.apache.druid.segment.metadata.AbstractSegmentMetadataCache<T>
-
- Type Parameters:
T- The type of information associated with the data source, which must extendDataSourceInformation.
- Direct Known Subclasses:
CoordinatorSegmentMetadataCache
public abstract class AbstractSegmentMetadataCache<T extends DataSourceInformation> extends Object
An abstract class that listens for segment change events and caches segment metadata. It periodically refreshes the segments, by fetching their metadata which includes schema information from sources like data nodes, tasks, metadata database and builds table schema.At startup, the cache awaits the initialization of the timeline. If the cache employs a segment metadata query to retrieve segment schema, it attempts to refresh a maximum of
MAX_SEGMENTS_PER_QUERYsegments for each datasource in each refresh cycle. Once all datasources have undergone this process, the initial schema of each datasource is constructed, and the cache is marked as initialized. Subsequently, the cache continues to periodically refresh segments and update the datasource schema. It is also important to note that a failure in segment refresh results in pausing the refresh work, and the process is resumed in the next refresh cycle.This class has an abstract method
refresh(Set, Set)which the child class must override with the logic to build and cache table schema.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceAbstractSegmentMetadataCache.ColumnTypeMergePolicyColumnTypeMergePolicy defines the rules of which type to use when faced with the possibility of different types for the same column from segment to segment.static classAbstractSegmentMetadataCache.FirstTypeMergePolicyClassic logic, we use the first type we encounter.static classAbstractSegmentMetadataCache.LeastRestrictiveTypeMergePolicyResolves types usingColumnType.leastRestrictiveType(ColumnType, ColumnType)to find the ColumnType that can best represent all data contained across all segments.
-
Field Summary
Fields Modifier and Type Field Description protected ExecutorServicecallbackExecprotected Set<String>dataSourcesNeedingRebuildprotected org.apache.druid.java.util.emitter.service.ServiceEmitteremitterprotected booleanisServerViewInitializedprotected ObjectlockThis lock coordinates the access from multiple threads to those variables guarded by this lock.protected TreeSet<org.apache.druid.timeline.SegmentId>mutableSegmentsprotected static com.google.common.collect.Interner<org.apache.druid.segment.column.RowSignature>ROW_SIGNATURE_INTERNERprotected static Comparator<org.apache.druid.timeline.SegmentId>SEGMENT_ORDERprotected ConcurrentHashMap<String,ConcurrentSkipListMap<org.apache.druid.timeline.SegmentId,AvailableSegmentMetadata>>segmentMetadataInfoDataSource -> Segment -> AvailableSegmentMetadata(contains RowSignature) for that segment.protected TreeSet<org.apache.druid.timeline.SegmentId>segmentsNeedingRefreshprotected ConcurrentMap<String,T>tablesMap of datasource and generic object extending DataSourceInformation.
-
Constructor Summary
Constructors Constructor Description AbstractSegmentMetadataCache(QueryLifecycleFactory queryLifecycleFactory, SegmentMetadataCacheConfig config, Escalator escalator, InternalQueryConfig internalQueryConfig, org.apache.druid.java.util.emitter.service.ServiceEmitter emitter)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidaddSegment(DruidServerMetadata server, org.apache.druid.timeline.DataSegment segment)voidawaitInitialization()org.apache.druid.segment.column.RowSignaturebuildDataSourceRowSignature(String dataSource)protected voiddoInLock(Runnable runnable)This is a helper method for unit tests to emulate heavy work done withlock.AvailableSegmentMetadatagetAvailableSegmentMetadata(String datasource, org.apache.druid.timeline.SegmentId segmentId)Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.TgetDatasource(String name)Fetch schema for the given datasource.Map<String,T>getDataSourceInformationMap()Set<String>getDatasourceNames()Set<String>getDataSourcesNeedingRebuild()TreeSet<org.apache.druid.timeline.SegmentId>getMutableSegments()Map<org.apache.druid.timeline.SegmentId,AvailableSegmentMetadata>getSegmentMetadataSnapshot()Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.TreeSet<org.apache.druid.timeline.SegmentId>getSegmentsNeedingRefresh()intgetTotalSegments()Returns total number of segments.voidmarkDataSourceAsNeedRebuild(String datasource)abstract voidrefresh(Set<org.apache.druid.timeline.SegmentId> segmentsToRefresh, Set<String> dataSourcesToRebuild)The child classes must override this method with the logic to build and cache table schema.Set<org.apache.druid.timeline.SegmentId>refreshSegments(Set<org.apache.druid.timeline.SegmentId> segments)Attempt to refresh "segmentSignatures" for a set of segments.voidremoveSegment(org.apache.druid.timeline.DataSegment segment)voidremoveServerSegment(DruidServerMetadata server, org.apache.druid.timeline.DataSegment segment)org.apache.druid.java.util.common.guava.Sequence<org.apache.druid.query.metadata.metadata.SegmentAnalysis>runSegmentMetadataQuery(Iterable<org.apache.druid.timeline.SegmentId> segments)Execute a SegmentMetadata query and return aSequenceofSegmentAnalysis.voidsetAvailableSegmentMetadata(org.apache.druid.timeline.SegmentId segmentId, AvailableSegmentMetadata availableSegmentMetadata)This method is not thread-safe and must be used only in unit tests.voidstart()voidstop()
-
-
-
Field Detail
-
SEGMENT_ORDER
protected static final Comparator<org.apache.druid.timeline.SegmentId> SEGMENT_ORDER
-
ROW_SIGNATURE_INTERNER
protected static final com.google.common.collect.Interner<org.apache.druid.segment.column.RowSignature> ROW_SIGNATURE_INTERNER
-
segmentMetadataInfo
protected final ConcurrentHashMap<String,ConcurrentSkipListMap<org.apache.druid.timeline.SegmentId,AvailableSegmentMetadata>> segmentMetadataInfo
DataSource -> Segment -> AvailableSegmentMetadata(contains RowSignature) for that segment. Use SortedMap for segments so they are merged in deterministic order, from older to newer. This map is updated by these two threads. -callbackExeccan update it inaddSegment(org.apache.druid.server.coordination.DruidServerMetadata, org.apache.druid.timeline.DataSegment),removeServerSegment(org.apache.druid.server.coordination.DruidServerMetadata, org.apache.druid.timeline.DataSegment), andremoveSegment(org.apache.druid.timeline.DataSegment). -cacheExeccan update it inrefreshSegmentsForDataSource(java.lang.String, java.util.Set<org.apache.druid.timeline.SegmentId>). While it is being updated, this map is read by these two types of thread. -cacheExeccan iterate allAvailableSegmentMetadatas per datasource. SeebuildDataSourceRowSignature(java.lang.String). - Query threads can create a snapshot of the entire map for processing queries on the system table. SeegetSegmentMetadataSnapshot(). As the access pattern of this map is read-intensive, we should minimize the contention between writers and readers. Since there are two threads that can update this map at the same time, those writers should lock the inner map first and then lock the entry before it updates segment metadata. This can be done usingConcurrentMap.compute(K, java.util.function.BiFunction<? super K, ? super V, ? extends V>)as below. Note that, if you need to update the variables guarded bylockinside of compute(), you should get the lock before calling compute() to keep the function executed in compute() not expensive.segmentMedataInfo.compute( datasourceParam, (datasource, segmentsMap) -> { if (segmentsMap == null) return null; else { segmentsMap.compute( segmentIdParam, (segmentId, segmentMetadata) -> { // update segmentMetadata } ); return segmentsMap; } } );Readers can simply delegate the locking to the concurrent map and iterate map entries.
-
callbackExec
protected final ExecutorService callbackExec
-
isServerViewInitialized
protected boolean isServerViewInitialized
-
emitter
protected final org.apache.druid.java.util.emitter.service.ServiceEmitter emitter
-
tables
protected final ConcurrentMap<String,T extends DataSourceInformation> tables
Map of datasource and generic object extending DataSourceInformation. This structure can be accessed bycacheExecandcallbackExecthreads.
-
lock
protected final Object lock
This lock coordinates the access from multiple threads to those variables guarded by this lock. Currently, there are 2 threads that can access these variables. -callbackExecexecutes the timeline callbacks whenever BrokerServerView changes. -cacheExecperiodically refreshes segment metadata andDataSourceInformationif necessary based on the information collected via timeline callbacks.
-
mutableSegments
protected final TreeSet<org.apache.druid.timeline.SegmentId> mutableSegments
-
segmentsNeedingRefresh
protected final TreeSet<org.apache.druid.timeline.SegmentId> segmentsNeedingRefresh
-
-
Constructor Detail
-
AbstractSegmentMetadataCache
public AbstractSegmentMetadataCache(QueryLifecycleFactory queryLifecycleFactory, SegmentMetadataCacheConfig config, Escalator escalator, InternalQueryConfig internalQueryConfig, org.apache.druid.java.util.emitter.service.ServiceEmitter emitter)
-
-
Method Detail
-
start
public void start() throws InterruptedException- Throws:
InterruptedException
-
stop
public void stop()
-
awaitInitialization
public void awaitInitialization() throws InterruptedException- Throws:
InterruptedException
-
getDatasource
public T getDatasource(String name)
Fetch schema for the given datasource.- Parameters:
name- datasource- Returns:
- schema information for the given datasource
-
getDataSourceInformationMap
public Map<String,T> getDataSourceInformationMap()
- Returns:
- Map of datasource and corresponding schema information.
-
getDatasourceNames
public Set<String> getDatasourceNames()
- Returns:
- Set of datasources for which schema information is cached.
-
getSegmentMetadataSnapshot
public Map<org.apache.druid.timeline.SegmentId,AvailableSegmentMetadata> getSegmentMetadataSnapshot()
Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.- Returns:
- Map of segmentId and corresponding metadata.
-
getAvailableSegmentMetadata
@Nullable public AvailableSegmentMetadata getAvailableSegmentMetadata(String datasource, org.apache.druid.timeline.SegmentId segmentId)
Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.- Parameters:
datasource- segment datasourcesegmentId- segment Id- Returns:
- Metadata information for the given segment
-
getTotalSegments
public int getTotalSegments()
Returns total number of segments. This method doesn't use the lock intentionally to avoid expensive contention. As a result, the returned value might be inexact.
-
refresh
public abstract void refresh(Set<org.apache.druid.timeline.SegmentId> segmentsToRefresh, Set<String> dataSourcesToRebuild) throws IOException
The child classes must override this method with the logic to build and cache table schema.- Parameters:
segmentsToRefresh- segments for which the schema might have changeddataSourcesToRebuild- datasources for which the schema might have changed- Throws:
IOException- when querying segment schema from data nodes and tasks
-
addSegment
public void addSegment(DruidServerMetadata server, org.apache.druid.timeline.DataSegment segment)
-
removeSegment
public void removeSegment(org.apache.druid.timeline.DataSegment segment)
-
removeServerSegment
public void removeServerSegment(DruidServerMetadata server, org.apache.druid.timeline.DataSegment segment)
-
markDataSourceAsNeedRebuild
public void markDataSourceAsNeedRebuild(String datasource)
-
refreshSegments
public Set<org.apache.druid.timeline.SegmentId> refreshSegments(Set<org.apache.druid.timeline.SegmentId> segments) throws IOException
Attempt to refresh "segmentSignatures" for a set of segments. Returns the set of segments actually refreshed, which may be a subset of the asked-for set.- Throws:
IOException
-
buildDataSourceRowSignature
@Nullable public org.apache.druid.segment.column.RowSignature buildDataSourceRowSignature(String dataSource)
-
getSegmentsNeedingRefresh
public TreeSet<org.apache.druid.timeline.SegmentId> getSegmentsNeedingRefresh()
-
getMutableSegments
public TreeSet<org.apache.druid.timeline.SegmentId> getMutableSegments()
-
runSegmentMetadataQuery
public org.apache.druid.java.util.common.guava.Sequence<org.apache.druid.query.metadata.metadata.SegmentAnalysis> runSegmentMetadataQuery(Iterable<org.apache.druid.timeline.SegmentId> segments)
Execute a SegmentMetadata query and return aSequenceofSegmentAnalysis.- Parameters:
segments- Iterable ofSegmentIdobjects that are subject of the SegmentMetadata query.- Returns:
SequenceofSegmentAnalysisobjects
-
setAvailableSegmentMetadata
public void setAvailableSegmentMetadata(org.apache.druid.timeline.SegmentId segmentId, AvailableSegmentMetadata availableSegmentMetadata)This method is not thread-safe and must be used only in unit tests.
-
-