Class SliceUtf8

java.lang.Object
io.airlift.slice.SliceUtf8

public final class SliceUtf8 extends Object
Utility methods for UTF-8 encoded slices.
  • Method Summary

    Modifier and Type
    Method
    Description
    static Slice
    codePointToUtf8(int codePoint)
    Convert the code point to UTF-8.
    static int
    compareUtf16BE(Slice utf8Left, Slice utf8Right)
    Compares to UTF-8 sequences using UTF-16 big endian semantics.
    static int
    Counts the code points within UTF-8 encoded slice.
    static int
    countCodePoints(Slice utf8, int offset, int length)
    Counts the code points within UTF-8 encoded slice up to length.
    static Slice
     
    static Slice
    fixInvalidUtf8(Slice slice, OptionalInt replacementCodePoint)
     
    static int
    getCodePointAt(Slice utf8, int position)
    Gets the UTF-8 encoded code point at the position.
    static int
    getCodePointBefore(Slice utf8, int position)
    Gets the UTF-8 encoded code point before the position.
    static boolean
    isAscii(Slice utf8)
    Does the slice contain only 7-bit ASCII characters.
    static Slice
    Removes all white space characters from the left side of the string.
    static Slice
    leftTrim(Slice utf8, int[] whiteSpaceCodePoints)
    Removes all whiteSpaceCodePoints from the left side of the string.
    static int
    lengthOfCodePoint(int codePoint)
    Gets the UTF-8 sequence length of the code point.
    static int
    lengthOfCodePoint(Slice utf8, int position)
    Gets the UTF-8 sequence length of the code point at position.
    static int
    Gets the UTF-8 sequence length using the sequence start byte.
    static int
    lengthOfCodePointSafe(Slice utf8, int position)
    Gets the UTF-8 sequence length of the code point at position.
    static int
    offsetOfCodePoint(Slice utf8, int codePointCount)
    Finds the index of the first byte of the code point at a position, or -1 if the position is not within the slice.
    static int
    offsetOfCodePoint(Slice utf8, int position, int codePointCount)
    Starting from position bytes in utf8, finds the index of the first byte of the code point codePointCount in the slice.
    static Slice
    reverse(Slice utf8)
    Reverses the slice code point by code point.
    static Slice
    Removes all white space characters from the right side of the string.
    static Slice
    rightTrim(Slice utf8, int[] whiteSpaceCodePoints)
    Removes all white whiteSpaceCodePoints from the right side of the string.
    static int
    setCodePointAt(int codePoint, Slice utf8, int position)
    Sets the UTF-8 sequence for code point at the position.
    static Slice
    substring(Slice utf8, int codePointStart, int codePointLength)
    Gets the substring starting at codePointStart and extending for codePointLength code points.
    static Slice
    Converts slice to lower case code point by code point.
    static Slice
    Converts slice to upper case code point by code point.
    static Slice
    trim(Slice utf8)
    Removes all white space characters from the left and right side of the string.
    static Slice
    trim(Slice utf8, int[] whiteSpaceCodePoints)
    Removes all white whiteSpaceCodePoints from the left and right side of the string.
    static int
    tryGetCodePointAt(Slice utf8, int position)
    Tries to get the UTF-8 encoded code point at the position.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • isAscii

      public static boolean isAscii(Slice utf8)
      Does the slice contain only 7-bit ASCII characters.
    • countCodePoints

      public static int countCodePoints(Slice utf8)
      Counts the code points within UTF-8 encoded slice.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • countCodePoints

      public static int countCodePoints(Slice utf8, int offset, int length)
      Counts the code points within UTF-8 encoded slice up to length.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • substring

      public static Slice substring(Slice utf8, int codePointStart, int codePointLength)
      Gets the substring starting at codePointStart and extending for codePointLength code points.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • reverse

      public static Slice reverse(Slice utf8)
      Reverses the slice code point by code point.

      Note: Invalid UTF-8 sequences are copied directly to the output.

    • compareUtf16BE

      public static int compareUtf16BE(Slice utf8Left, Slice utf8Right)
      Compares to UTF-8 sequences using UTF-16 big endian semantics. This is equivalent to the String.compareTo(String). java.lang.String.
      Throws:
      InvalidUtf8Exception - if the UTF-8 are invalid
    • toUpperCase

      public static Slice toUpperCase(Slice utf8)
      Converts slice to upper case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.

      Note: Invalid UTF-8 sequences are copied directly to the output.

    • toLowerCase

      public static Slice toLowerCase(Slice utf8)
      Converts slice to lower case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.

      Note: Invalid UTF-8 sequences are copied directly to the output.

    • leftTrim

      public static Slice leftTrim(Slice utf8)
      Removes all white space characters from the left side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • leftTrim

      public static Slice leftTrim(Slice utf8, int[] whiteSpaceCodePoints)
      Removes all whiteSpaceCodePoints from the left side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • rightTrim

      public static Slice rightTrim(Slice utf8)
      Removes all white space characters from the right side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • rightTrim

      public static Slice rightTrim(Slice utf8, int[] whiteSpaceCodePoints)
      Removes all white whiteSpaceCodePoints from the right side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • trim

      public static Slice trim(Slice utf8)
      Removes all white space characters from the left and right side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • trim

      public static Slice trim(Slice utf8, int[] whiteSpaceCodePoints)
      Removes all white whiteSpaceCodePoints from the left and right side of the string.

      Note: Invalid UTF-8 sequences are not trimmed.

    • fixInvalidUtf8

      public static Slice fixInvalidUtf8(Slice slice)
    • fixInvalidUtf8

      public static Slice fixInvalidUtf8(Slice slice, OptionalInt replacementCodePoint)
    • tryGetCodePointAt

      public static int tryGetCodePointAt(Slice utf8, int position)
      Tries to get the UTF-8 encoded code point at the position. A positive return value means the UTF-8 sequence at the position is valid, and the result is the code point. A negative return value means the UTF-8 sequence at the position is invalid, and the length of the invalid sequence is the absolute value of the result.
      Returns:
      the code point or negative the number of bytes in the invalid UTF-8 sequence.
    • offsetOfCodePoint

      public static int offsetOfCodePoint(Slice utf8, int codePointCount)
      Finds the index of the first byte of the code point at a position, or -1 if the position is not within the slice.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • offsetOfCodePoint

      public static int offsetOfCodePoint(Slice utf8, int position, int codePointCount)
      Starting from position bytes in utf8, finds the index of the first byte of the code point codePointCount in the slice. If the slice does not contain codePointCount code points after position, -1 is returned.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • lengthOfCodePoint

      public static int lengthOfCodePoint(Slice utf8, int position)
      Gets the UTF-8 sequence length of the code point at position.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • lengthOfCodePointSafe

      public static int lengthOfCodePointSafe(Slice utf8, int position)
      Gets the UTF-8 sequence length of the code point at position.

      Truncated UTF-8 sequences, 5 and 6 byte sequences, and invalid code points are handled by this method without throwing an exception.

    • lengthOfCodePoint

      public static int lengthOfCodePoint(int codePoint)
      Gets the UTF-8 sequence length of the code point.
      Throws:
      InvalidCodePointException - if code point is not within a valid range
    • lengthOfCodePointFromStartByte

      public static int lengthOfCodePointFromStartByte(byte startByte)
      Gets the UTF-8 sequence length using the sequence start byte.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • getCodePointAt

      public static int getCodePointAt(Slice utf8, int position)
      Gets the UTF-8 encoded code point at the position.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • getCodePointBefore

      public static int getCodePointBefore(Slice utf8, int position)
      Gets the UTF-8 encoded code point before the position.

      Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.

    • codePointToUtf8

      public static Slice codePointToUtf8(int codePoint)
      Convert the code point to UTF-8.

      Throws:
      InvalidCodePointException - if code point is not within a valid range
    • setCodePointAt

      public static int setCodePointAt(int codePoint, Slice utf8, int position)
      Sets the UTF-8 sequence for code point at the position.
      Throws:
      InvalidCodePointException - if code point is not within a valid range