Class TupleDomainParquetPredicate

java.lang.Object
io.trino.parquet.predicate.TupleDomainParquetPredicate

public class TupleDomainParquetPredicate extends Object
  • Constructor Summary

    Constructors
    Constructor
    Description
    TupleDomainParquetPredicate(TupleDomain<org.apache.parquet.column.ColumnDescriptor> effectivePredicate, List<org.apache.parquet.column.ColumnDescriptor> columns, org.joda.time.DateTimeZone timeZone)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static long
    asLong(Object value)
     
    static boolean
    checkInBloomFilter(org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, Object predicateValue, Type sqlType)
    Check if the predicateValue might be in the bloomfilter
    static Domain
    getDomain(Type type, long columnValuesCount, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, ParquetDataSourceId id, org.apache.parquet.column.ColumnDescriptor descriptor, org.joda.time.DateTimeZone timeZone)
     
    static Domain
    getDomain(Type type, DictionaryDescriptor dictionaryDescriptor)
     
    static Domain
    getDomain(org.apache.parquet.column.ColumnDescriptor column, Type type, long columnValuesCount, org.apache.parquet.column.statistics.Statistics<?> statistics, ParquetDataSourceId id, org.joda.time.DateTimeZone timeZone)
     
    Optional<List<org.apache.parquet.column.ColumnDescriptor>>
    getIndexLookupCandidates(Map<org.apache.parquet.column.ColumnDescriptor,Long> valueCounts, Map<org.apache.parquet.column.ColumnDescriptor,org.apache.parquet.column.statistics.Statistics<?>> statistics, ParquetDataSourceId id)
    Should the Parquet Reader process a file section with the specified statistics, and if it should, then return the columns are candidates for further inspection of more granular statistics from column index and dictionary.
    boolean
    matches(BloomFilterStore bloomFilterStore, int domainCompactionThreshold)
    Should the Parquet Reader process a file section with the specified bloomfilter Store
    boolean
    Should the Parquet Reader process a file section with the specified dictionary based on that single dictionary.
    boolean
    matches(Map<org.apache.parquet.column.ColumnDescriptor,Long> valueCounts, org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore columnIndexStore, ParquetDataSourceId id)
    Should the Parquet Reader process a file section with the specified statistics.
    Optional<org.apache.parquet.filter2.predicate.FilterPredicate>
    toParquetFilter(org.joda.time.DateTimeZone timeZone)
    Convert Predicate to Parquet filter if possible.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • TupleDomainParquetPredicate

      public TupleDomainParquetPredicate(TupleDomain<org.apache.parquet.column.ColumnDescriptor> effectivePredicate, List<org.apache.parquet.column.ColumnDescriptor> columns, org.joda.time.DateTimeZone timeZone)
  • Method Details

    • getIndexLookupCandidates

      public Optional<List<org.apache.parquet.column.ColumnDescriptor>> getIndexLookupCandidates(Map<org.apache.parquet.column.ColumnDescriptor,Long> valueCounts, Map<org.apache.parquet.column.ColumnDescriptor,org.apache.parquet.column.statistics.Statistics<?>> statistics, ParquetDataSourceId id) throws ParquetCorruptionException
      Should the Parquet Reader process a file section with the specified statistics, and if it should, then return the columns are candidates for further inspection of more granular statistics from column index and dictionary.
      Parameters:
      valueCounts - the number of values for a column in the segment; this can be used with Statistics to determine if a column is only null
      statistics - column statistics
      id - Parquet file name
      Returns:
      Optional.empty() if statistics were sufficient to eliminate the file section. Otherwise, a list of columns for which page-level indices and dictionary could be consulted to potentially eliminate the file section. An optional with empty list is returned if there is going to be no benefit in looking at column index or dictionary for any column.
      Throws:
      ParquetCorruptionException
    • matches

      public boolean matches(DictionaryDescriptor dictionary)
      Should the Parquet Reader process a file section with the specified dictionary based on that single dictionary. This is safe to check repeatedly to avoid loading more parquet dictionaries if the section can already be eliminated.
      Parameters:
      dictionary - The single column dictionary
    • matches

      public boolean matches(Map<org.apache.parquet.column.ColumnDescriptor,Long> valueCounts, org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore columnIndexStore, ParquetDataSourceId id) throws ParquetCorruptionException
      Should the Parquet Reader process a file section with the specified statistics.
      Parameters:
      valueCounts - the number of values for a column in the segment; this can be used with Statistics to determine if a column is only null
      columnIndexStore - column index (statistics) store
      id - Parquet file name
      Throws:
      ParquetCorruptionException
    • matches

      public boolean matches(BloomFilterStore bloomFilterStore, int domainCompactionThreshold)
      Should the Parquet Reader process a file section with the specified bloomfilter Store
      Parameters:
      bloomFilterStore - bloomfilter Store
    • toParquetFilter

      public Optional<org.apache.parquet.filter2.predicate.FilterPredicate> toParquetFilter(org.joda.time.DateTimeZone timeZone)
      Convert Predicate to Parquet filter if possible.
      Parameters:
      timeZone - current Parquet timezone
      Returns:
      Converted Parquet filter or null if conversion not possible
    • getDomain

      public static Domain getDomain(org.apache.parquet.column.ColumnDescriptor column, Type type, long columnValuesCount, org.apache.parquet.column.statistics.Statistics<?> statistics, ParquetDataSourceId id, org.joda.time.DateTimeZone timeZone) throws ParquetCorruptionException
      Throws:
      ParquetCorruptionException
    • getDomain

      public static Domain getDomain(Type type, long columnValuesCount, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, ParquetDataSourceId id, org.apache.parquet.column.ColumnDescriptor descriptor, org.joda.time.DateTimeZone timeZone) throws ParquetCorruptionException
      Throws:
      ParquetCorruptionException
    • getDomain

      public static Domain getDomain(Type type, DictionaryDescriptor dictionaryDescriptor)
    • asLong

      public static long asLong(Object value)
    • checkInBloomFilter

      public static boolean checkInBloomFilter(org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, Object predicateValue, Type sqlType)
      Check if the predicateValue might be in the bloomfilter
      Parameters:
      bloomFilter - parquet bloomfilter.
      predicateValue - effective discrete predicate value.
      sqlType - Type that contains information about the type schema from connector's metadata
      Returns:
      true if the predicateValue might be in the bloomfilter, false if the predicateValue absolutely is not in the bloomfilter