Class BaseDeleteLoader

  • All Implemented Interfaces:
    DeleteLoader

    public class BaseDeleteLoader
    extends java.lang.Object
    implements DeleteLoader
    • Constructor Summary

      Constructors 
      Constructor Description
      BaseDeleteLoader​(java.util.function.Function<org.apache.iceberg.DeleteFile,​org.apache.iceberg.io.InputFile> loadInputFile)  
      BaseDeleteLoader​(java.util.function.Function<org.apache.iceberg.DeleteFile,​org.apache.iceberg.io.InputFile> loadInputFile, java.util.concurrent.ExecutorService workerPool)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected boolean canCache​(long size)
      Checks if the given number of bytes can be cached.
      protected <V> V getOrLoad​(java.lang.String key, java.util.function.Supplier<V> valueSupplier, long valueSize)
      Gets the cached value for the key or populates the cache with a new mapping.
      org.apache.iceberg.util.StructLikeSet loadEqualityDeletes​(java.lang.Iterable<org.apache.iceberg.DeleteFile> deleteFiles, org.apache.iceberg.Schema projection)
      Loads the content of equality delete files into a set.
      org.apache.iceberg.deletes.PositionDeleteIndex loadPositionDeletes​(java.lang.Iterable<org.apache.iceberg.DeleteFile> deleteFiles, java.lang.CharSequence filePath)
      Loads the content of a deletion vector or position delete files for a given data file path into a position index.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • BaseDeleteLoader

        public BaseDeleteLoader​(java.util.function.Function<org.apache.iceberg.DeleteFile,​org.apache.iceberg.io.InputFile> loadInputFile)
      • BaseDeleteLoader

        public BaseDeleteLoader​(java.util.function.Function<org.apache.iceberg.DeleteFile,​org.apache.iceberg.io.InputFile> loadInputFile,
                                java.util.concurrent.ExecutorService workerPool)
    • Method Detail

      • canCache

        protected boolean canCache​(long size)
        Checks if the given number of bytes can be cached.

        Implementations should override this method if they support caching. It is also recommended to use the provided size as a guideline to decide whether the value is eligible for caching. For instance, it may be beneficial to discard values that are too large to optimize the cache performance and utilization.

      • getOrLoad

        protected <V> V getOrLoad​(java.lang.String key,
                                  java.util.function.Supplier<V> valueSupplier,
                                  long valueSize)
        Gets the cached value for the key or populates the cache with a new mapping.

        If the value for the specified key is in the cache, it should be returned. If the value is not in the cache, implementations should compute the value using the provided supplier, cache it, and then return it.

        This method will be called only if canCache(long) returned true.

      • loadEqualityDeletes

        public org.apache.iceberg.util.StructLikeSet loadEqualityDeletes​(java.lang.Iterable<org.apache.iceberg.DeleteFile> deleteFiles,
                                                                         org.apache.iceberg.Schema projection)
        Description copied from interface: DeleteLoader
        Loads the content of equality delete files into a set.
        Specified by:
        loadEqualityDeletes in interface DeleteLoader
        Parameters:
        deleteFiles - equality delete files
        projection - a projection of columns to load
        Returns:
        a set of equality deletes
      • loadPositionDeletes

        public org.apache.iceberg.deletes.PositionDeleteIndex loadPositionDeletes​(java.lang.Iterable<org.apache.iceberg.DeleteFile> deleteFiles,
                                                                                  java.lang.CharSequence filePath)
        Loads the content of a deletion vector or position delete files for a given data file path into a position index.

        The deletion vector is currently loaded without caching as the existing Puffin reader requires at least 3 requests to fetch the entire file. Caching a single deletion vector may only be useful when multiple data file splits are processed on the same node, which is unlikely as task locality is not guaranteed.

        For position delete files, however, there is no efficient way to read deletes for a particular data file. Therefore, caching may be more effective as such delete files potentially apply to many data files, especially in unpartitioned tables and tables with deep partitions. If a position delete file qualifies for caching, this method will attempt to cache a position index for each referenced data file.

        Specified by:
        loadPositionDeletes in interface DeleteLoader
        Parameters:
        deleteFiles - a deletion vector or position delete files
        filePath - the data file path for which to load deletes
        Returns:
        a position delete index for the provided data file path