Class ParquetReaderUtils

java.lang.Object
io.trino.parquet.ParquetReaderUtils

public final class ParquetReaderUtils extends Object
  • Method Details

    • toInputStream

      public static org.apache.parquet.bytes.ByteBufferInputStream toInputStream(io.airlift.slice.Slice slice)
    • toInputStream

      public static org.apache.parquet.bytes.ByteBufferInputStream toInputStream(DictionaryPage page)
    • readUleb128Int

      public static int readUleb128Int(SimpleSliceInputStream input)
      Reads an integer formatted in ULEB128 variable-width format described in ...
    • readUleb128Long

      public static long readUleb128Long(SimpleSliceInputStream input)
    • readFixedWidthInt

      public static int readFixedWidthInt(SimpleSliceInputStream input, int bytesWidth)
    • zigzagDecode

      public static long zigzagDecode(long value)
      For storing signed values (not the deltas themselves) in DELTA_BINARY_PACKED encoding, zigzag encoding (...) is used to map negative values to positive ones and then apply ULEB128 on the result.
    • ceilDiv

      public static int ceilDiv(int dividend, int divisor)
      Returns the result of arguments division rounded up.

      Works only for positive numbers. The sum of dividend and divisor cannot exceed Integer.MAX_VALUE

    • propagateSignBit

      public static long propagateSignBit(long value, int bitsToPad)
      Propagate the sign bit in values that are shorter than 8 bytes.

      When the value of less than 8 bytes in put into a long variable, the padding bytes on the left side of the number should be all zeros for a positive number or all ones for negatives. This method does this padding using signed bit shift operator without branches.

      Parameters:
      value - Value to trim
      bitsToPad - Number of bits to pad
      Returns:
      Value with correct padding
    • castToByte

      public static byte castToByte(boolean value)
      Method simulates a cast from boolean to byte value. Despite using a ternary (?) operator, the just-in-time compiler usually figures out that this is a cast and turns that into a no-op.

      Method may be used to avoid branches that may be CPU costly due to branch misprediction. The following code:

            boolean[] flags = ...
            int sum = 0;
            for (int i = 0; i < length; i++){
                if (flags[i])
                    sum++;
            }
       
      will perform better when rewritten to
            boolean[] flags = ...
            int sum = 0;
            for (int i = 0; i < length; i++){
                sum += castToByte(flags[i]);
            }
       
    • castToByteNegate

      public static byte castToByteNegate(boolean value)
      Works the same as castToByte(boolean) and negates the boolean value
    • toShortExact

      public static short toShortExact(long value)
    • toShortExact

      public static short toShortExact(int value)
    • toByteExact

      public static byte toByteExact(long value)
    • toByteExact

      public static byte toByteExact(int value)
    • isOnlyDictionaryEncodingPages

      public static boolean isOnlyDictionaryEncodingPages(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData columnMetaData)