Package io.trino.parquet.predicate
Class TupleDomainParquetPredicate
java.lang.Object
io.trino.parquet.predicate.TupleDomainParquetPredicate
-
Constructor Summary
ConstructorsConstructorDescriptionTupleDomainParquetPredicate(TupleDomain<org.apache.parquet.column.ColumnDescriptor> effectivePredicate, List<org.apache.parquet.column.ColumnDescriptor> columns, org.joda.time.DateTimeZone timeZone) -
Method Summary
Modifier and TypeMethodDescriptionstatic longstatic booleancheckInBloomFilter(org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, Object predicateValue, Type sqlType) Check if the predicateValue might be in the bloomfilterstatic DomaingetDomain(Type type, long columnValuesCount, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, ParquetDataSourceId id, org.apache.parquet.column.ColumnDescriptor descriptor, org.joda.time.DateTimeZone timeZone) static DomaingetDomain(Type type, DictionaryDescriptor dictionaryDescriptor) static DomaingetDomain(org.apache.parquet.column.ColumnDescriptor column, Type type, long columnValuesCount, org.apache.parquet.column.statistics.Statistics<?> statistics, ParquetDataSourceId id, org.joda.time.DateTimeZone timeZone) getIndexLookupCandidates(Map<org.apache.parquet.column.ColumnDescriptor, Long> valueCounts, Map<org.apache.parquet.column.ColumnDescriptor, org.apache.parquet.column.statistics.Statistics<?>> statistics, ParquetDataSourceId id) Should the Parquet Reader process a file section with the specified statistics, and if it should, then return the columns are candidates for further inspection of more granular statistics from column index and dictionary.booleanmatches(BloomFilterStore bloomFilterStore, int domainCompactionThreshold) Should the Parquet Reader process a file section with the specified bloomfilter Storebooleanmatches(DictionaryDescriptor dictionary) Should the Parquet Reader process a file section with the specified dictionary based on that single dictionary.booleanmatches(Map<org.apache.parquet.column.ColumnDescriptor, Long> valueCounts, org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore columnIndexStore, ParquetDataSourceId id) Should the Parquet Reader process a file section with the specified statistics.Optional<org.apache.parquet.filter2.predicate.FilterPredicate>toParquetFilter(org.joda.time.DateTimeZone timeZone) Convert Predicate to Parquet filter if possible.
-
Constructor Details
-
TupleDomainParquetPredicate
public TupleDomainParquetPredicate(TupleDomain<org.apache.parquet.column.ColumnDescriptor> effectivePredicate, List<org.apache.parquet.column.ColumnDescriptor> columns, org.joda.time.DateTimeZone timeZone)
-
-
Method Details
-
getIndexLookupCandidates
public Optional<List<org.apache.parquet.column.ColumnDescriptor>> getIndexLookupCandidates(Map<org.apache.parquet.column.ColumnDescriptor, Long> valueCounts, Map<org.apache.parquet.column.ColumnDescriptor, throws ParquetCorruptionExceptionorg.apache.parquet.column.statistics.Statistics<?>> statistics, ParquetDataSourceId id) Should the Parquet Reader process a file section with the specified statistics, and if it should, then return the columns are candidates for further inspection of more granular statistics from column index and dictionary.- Parameters:
valueCounts- the number of values for a column in the segment; this can be used with Statistics to determine if a column is only nullstatistics- column statisticsid- Parquet file name- Returns:
- Optional.empty() if statistics were sufficient to eliminate the file section. Otherwise, a list of columns for which page-level indices and dictionary could be consulted to potentially eliminate the file section. An optional with empty list is returned if there is going to be no benefit in looking at column index or dictionary for any column.
- Throws:
ParquetCorruptionException
-
matches
Should the Parquet Reader process a file section with the specified dictionary based on that single dictionary. This is safe to check repeatedly to avoid loading more parquet dictionaries if the section can already be eliminated.- Parameters:
dictionary- The single column dictionary
-
matches
public boolean matches(Map<org.apache.parquet.column.ColumnDescriptor, Long> valueCounts, org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore columnIndexStore, ParquetDataSourceId id) throws ParquetCorruptionExceptionShould the Parquet Reader process a file section with the specified statistics.- Parameters:
valueCounts- the number of values for a column in the segment; this can be used with Statistics to determine if a column is only nullcolumnIndexStore- column index (statistics) storeid- Parquet file name- Throws:
ParquetCorruptionException
-
matches
Should the Parquet Reader process a file section with the specified bloomfilter Store- Parameters:
bloomFilterStore- bloomfilter Store
-
toParquetFilter
public Optional<org.apache.parquet.filter2.predicate.FilterPredicate> toParquetFilter(org.joda.time.DateTimeZone timeZone) Convert Predicate to Parquet filter if possible.- Parameters:
timeZone- current Parquet timezone- Returns:
- Converted Parquet filter or null if conversion not possible
-
getDomain
public static Domain getDomain(org.apache.parquet.column.ColumnDescriptor column, Type type, long columnValuesCount, org.apache.parquet.column.statistics.Statistics<?> statistics, ParquetDataSourceId id, org.joda.time.DateTimeZone timeZone) throws ParquetCorruptionException - Throws:
ParquetCorruptionException
-
getDomain
public static Domain getDomain(Type type, long columnValuesCount, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, ParquetDataSourceId id, org.apache.parquet.column.ColumnDescriptor descriptor, org.joda.time.DateTimeZone timeZone) throws ParquetCorruptionException - Throws:
ParquetCorruptionException
-
getDomain
-
asLong
-
checkInBloomFilter
public static boolean checkInBloomFilter(org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, Object predicateValue, Type sqlType) Check if the predicateValue might be in the bloomfilter- Parameters:
bloomFilter- parquet bloomfilter.predicateValue- effective discrete predicate value.sqlType- Type that contains information about the type schema from connector's metadata- Returns:
- true if the predicateValue might be in the bloomfilter, false if the predicateValue absolutely is not in the bloomfilter
-