Class AbstractSegmentMetadataCache<T extends DataSourceInformation>

  • Type Parameters:
    T - The type of information associated with the data source, which must extend DataSourceInformation.
    Direct Known Subclasses:
    CoordinatorSegmentMetadataCache

    public abstract class AbstractSegmentMetadataCache<T extends DataSourceInformation>
    extends Object
    An abstract class that listens for segment change events and caches segment metadata. It periodically refreshes the segments, by fetching their metadata which includes schema information from sources like data nodes, tasks, metadata database and builds table schema.

    At startup, the cache awaits the initialization of the timeline. If the cache employs a segment metadata query to retrieve segment schema, it attempts to refresh a maximum of MAX_SEGMENTS_PER_QUERY segments for each datasource in each refresh cycle. Once all datasources have undergone this process, the initial schema of each datasource is constructed, and the cache is marked as initialized. Subsequently, the cache continues to periodically refresh segments and update the datasource schema. It is also important to note that a failure in segment refresh results in pausing the refresh work, and the process is resumed in the next refresh cycle.

    This class has an abstract method refresh(Set, Set) which the child class must override with the logic to build and cache table schema.

    • Method Detail

      • stop

        public void stop()
      • getDatasource

        public T getDatasource​(String name)
        Fetch schema for the given datasource.
        Parameters:
        name - datasource
        Returns:
        schema information for the given datasource
      • getDataSourceInformationMap

        public Map<String,​T> getDataSourceInformationMap()
        Returns:
        Map of datasource and corresponding schema information.
      • getDatasourceNames

        public Set<String> getDatasourceNames()
        Returns:
        Set of datasources for which schema information is cached.
      • getSegmentMetadataSnapshot

        public Map<org.apache.druid.timeline.SegmentId,​AvailableSegmentMetadata> getSegmentMetadataSnapshot()
        Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.
        Returns:
        Map of segmentId and corresponding metadata.
      • getAvailableSegmentMetadata

        @Nullable
        public AvailableSegmentMetadata getAvailableSegmentMetadata​(String datasource,
                                                                    org.apache.druid.timeline.SegmentId segmentId)
        Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.
        Parameters:
        datasource - segment datasource
        segmentId - segment Id
        Returns:
        Metadata information for the given segment
      • getTotalSegments

        public int getTotalSegments()
        Returns total number of segments. This method doesn't use the lock intentionally to avoid expensive contention. As a result, the returned value might be inexact.
      • refresh

        public abstract void refresh​(Set<org.apache.druid.timeline.SegmentId> segmentsToRefresh,
                                     Set<String> dataSourcesToRebuild)
                              throws IOException
        The child classes must override this method with the logic to build and cache table schema.
        Parameters:
        segmentsToRefresh - segments for which the schema might have changed
        dataSourcesToRebuild - datasources for which the schema might have changed
        Throws:
        IOException - when querying segment schema from data nodes and tasks
      • addSegment

        public void addSegment​(DruidServerMetadata server,
                               org.apache.druid.timeline.DataSegment segment)
      • removeSegment

        public void removeSegment​(org.apache.druid.timeline.DataSegment segment)
      • removeServerSegment

        public void removeServerSegment​(DruidServerMetadata server,
                                        org.apache.druid.timeline.DataSegment segment)
      • markDataSourceAsNeedRebuild

        public void markDataSourceAsNeedRebuild​(String datasource)
      • refreshSegments

        public Set<org.apache.druid.timeline.SegmentId> refreshSegments​(Set<org.apache.druid.timeline.SegmentId> segments)
                                                                 throws IOException
        Attempt to refresh "segmentSignatures" for a set of segments. Returns the set of segments actually refreshed, which may be a subset of the asked-for set.
        Throws:
        IOException
      • buildDataSourceRowSignature

        @Nullable
        public org.apache.druid.segment.column.RowSignature buildDataSourceRowSignature​(String dataSource)
      • getSegmentsNeedingRefresh

        public TreeSet<org.apache.druid.timeline.SegmentId> getSegmentsNeedingRefresh()
      • getMutableSegments

        public TreeSet<org.apache.druid.timeline.SegmentId> getMutableSegments()
      • getDataSourcesNeedingRebuild

        public Set<String> getDataSourcesNeedingRebuild()
      • runSegmentMetadataQuery

        public org.apache.druid.java.util.common.guava.Sequence<org.apache.druid.query.metadata.metadata.SegmentAnalysis> runSegmentMetadataQuery​(Iterable<org.apache.druid.timeline.SegmentId> segments)
        Execute a SegmentMetadata query and return a Sequence of SegmentAnalysis.
        Parameters:
        segments - Iterable of SegmentId objects that are subject of the SegmentMetadata query.
        Returns:
        Sequence of SegmentAnalysis objects
      • setAvailableSegmentMetadata

        public void setAvailableSegmentMetadata​(org.apache.druid.timeline.SegmentId segmentId,
                                                AvailableSegmentMetadata availableSegmentMetadata)
        This method is not thread-safe and must be used only in unit tests.
      • doInLock

        protected void doInLock​(Runnable runnable)
        This is a helper method for unit tests to emulate heavy work done with lock. It must be used only in unit tests.