Class ParquetUtil


  • public class ParquetUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static long extractTimestampInt96​(java.nio.ByteBuffer buffer)
      Method to read timestamp (parquet Int96) from bytebuffer.
      static org.apache.iceberg.Metrics fileMetrics​(org.apache.iceberg.io.InputFile file, org.apache.iceberg.MetricsConfig metricsConfig)  
      static org.apache.iceberg.Metrics fileMetrics​(org.apache.iceberg.io.InputFile file, org.apache.iceberg.MetricsConfig metricsConfig, org.apache.iceberg.mapping.NameMapping nameMapping)  
      static org.apache.iceberg.Metrics footerMetrics​(org.apache.parquet.hadoop.metadata.ParquetMetadata metadata, java.util.stream.Stream<org.apache.iceberg.FieldMetrics<?>> fieldMetrics, org.apache.iceberg.MetricsConfig metricsConfig)  
      static org.apache.iceberg.Metrics footerMetrics​(org.apache.parquet.hadoop.metadata.ParquetMetadata metadata, java.util.stream.Stream<org.apache.iceberg.FieldMetrics<?>> fieldMetrics, org.apache.iceberg.MetricsConfig metricsConfig, org.apache.iceberg.mapping.NameMapping nameMapping)  
      static java.util.List<java.lang.Long> getSplitOffsets​(org.apache.parquet.hadoop.metadata.ParquetMetadata md)
      Returns a list of offsets in ascending order determined by the starting position of the row groups.
      static boolean hasNoBloomFilterPages​(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)  
      static boolean hasNonDictionaryPages​(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)  
      static boolean isIntType​(org.apache.parquet.schema.PrimitiveType primitiveType)  
      static org.apache.parquet.column.Dictionary readDictionary​(org.apache.parquet.column.ColumnDescriptor desc, org.apache.parquet.column.page.PageReader pageSource)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • fileMetrics

        public static org.apache.iceberg.Metrics fileMetrics​(org.apache.iceberg.io.InputFile file,
                                                             org.apache.iceberg.MetricsConfig metricsConfig)
      • fileMetrics

        public static org.apache.iceberg.Metrics fileMetrics​(org.apache.iceberg.io.InputFile file,
                                                             org.apache.iceberg.MetricsConfig metricsConfig,
                                                             org.apache.iceberg.mapping.NameMapping nameMapping)
      • footerMetrics

        public static org.apache.iceberg.Metrics footerMetrics​(org.apache.parquet.hadoop.metadata.ParquetMetadata metadata,
                                                               java.util.stream.Stream<org.apache.iceberg.FieldMetrics<?>> fieldMetrics,
                                                               org.apache.iceberg.MetricsConfig metricsConfig)
      • footerMetrics

        public static org.apache.iceberg.Metrics footerMetrics​(org.apache.parquet.hadoop.metadata.ParquetMetadata metadata,
                                                               java.util.stream.Stream<org.apache.iceberg.FieldMetrics<?>> fieldMetrics,
                                                               org.apache.iceberg.MetricsConfig metricsConfig,
                                                               org.apache.iceberg.mapping.NameMapping nameMapping)
      • getSplitOffsets

        public static java.util.List<java.lang.Long> getSplitOffsets​(org.apache.parquet.hadoop.metadata.ParquetMetadata md)
        Returns a list of offsets in ascending order determined by the starting position of the row groups.
      • hasNonDictionaryPages

        public static boolean hasNonDictionaryPages​(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)
      • hasNoBloomFilterPages

        public static boolean hasNoBloomFilterPages​(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)
      • readDictionary

        public static org.apache.parquet.column.Dictionary readDictionary​(org.apache.parquet.column.ColumnDescriptor desc,
                                                                          org.apache.parquet.column.page.PageReader pageSource)
      • isIntType

        public static boolean isIntType​(org.apache.parquet.schema.PrimitiveType primitiveType)
      • extractTimestampInt96

        public static long extractTimestampInt96​(java.nio.ByteBuffer buffer)
        Method to read timestamp (parquet Int96) from bytebuffer. Read 12 bytes in byteBuffer: 8 bytes (time of day nanos) + 4 bytes(julianDay)