Class SliceUtf8
- java.lang.Object
-
- io.airlift.slice.SliceUtf8
-
public final class SliceUtf8 extends Object
Utility methods for UTF-8 encoded slices.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static SlicecodePointToUtf8(int codePoint)Convert the code point to UTF-8.static intcompareUtf16BE(Slice utf8Left, Slice utf8Right)Compares to UTF-8 sequences using UTF-16 big endian semantics.static intcountCodePoints(Slice utf8)Counts the code points within UTF-8 encoded slice.static intcountCodePoints(Slice utf8, int offset, int length)Counts the code points within UTF-8 encoded slice up tolength.static SlicefixInvalidUtf8(Slice slice)static SlicefixInvalidUtf8(Slice slice, OptionalInt replacementCodePoint)static intgetCodePointAt(Slice utf8, int position)Gets the UTF-8 encoded code point at theposition.static intgetCodePointBefore(Slice utf8, int position)Gets the UTF-8 encoded code point before theposition.static booleanisAscii(Slice utf8)Does the slice contain only 7-bit ASCII characters.static SliceleftTrim(Slice utf8)Removes all white space characters from the left side of the string.static SliceleftTrim(Slice utf8, int[] whiteSpaceCodePoints)Removes allwhiteSpaceCodePointsfrom the left side of the string.static intlengthOfCodePoint(int codePoint)Gets the UTF-8 sequence length of the code point.static intlengthOfCodePoint(Slice utf8, int position)Gets the UTF-8 sequence length of the code point atposition.static intlengthOfCodePointFromStartByte(byte startByte)Gets the UTF-8 sequence length using the sequence start byte.static intlengthOfCodePointSafe(Slice utf8, int position)Gets the UTF-8 sequence length of the code point atposition.static intoffsetOfCodePoint(Slice utf8, int codePointCount)Finds the index of the first byte of the code point at a position, or-1if the position is not within the slice.static intoffsetOfCodePoint(Slice utf8, int position, int codePointCount)Starting frompositionbytes inutf8, finds the index of the first byte of the code pointcodePointCountin the slice.static Slicereverse(Slice utf8)Reverses the slice code point by code point.static SlicerightTrim(Slice utf8)Removes all white space characters from the right side of the string.static SlicerightTrim(Slice utf8, int[] whiteSpaceCodePoints)Removes all whitewhiteSpaceCodePointsfrom the right side of the string.static intsetCodePointAt(int codePoint, Slice utf8, int position)Sets the UTF-8 sequence for code point at theposition.static Slicesubstring(Slice utf8, int codePointStart, int codePointLength)Gets the substring starting atcodePointStartand extending forcodePointLengthcode points.static SlicetoLowerCase(Slice utf8)Converts slice to lower case code point by code point.static SlicetoUpperCase(Slice utf8)Converts slice to upper case code point by code point.static Slicetrim(Slice utf8)Removes all white space characters from the left and right side of the string.static Slicetrim(Slice utf8, int[] whiteSpaceCodePoints)Removes all whitewhiteSpaceCodePointsfrom the left and right side of the string.static inttryGetCodePointAt(Slice utf8, int position)Tries to get the UTF-8 encoded code point at theposition.
-
-
-
Method Detail
-
isAscii
public static boolean isAscii(Slice utf8)
Does the slice contain only 7-bit ASCII characters.
-
countCodePoints
public static int countCodePoints(Slice utf8)
Counts the code points within UTF-8 encoded slice.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
countCodePoints
public static int countCodePoints(Slice utf8, int offset, int length)
Counts the code points within UTF-8 encoded slice up tolength.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
substring
public static Slice substring(Slice utf8, int codePointStart, int codePointLength)
Gets the substring starting atcodePointStartand extending forcodePointLengthcode points.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
reverse
public static Slice reverse(Slice utf8)
Reverses the slice code point by code point.Note: Invalid UTF-8 sequences are copied directly to the output.
-
compareUtf16BE
public static int compareUtf16BE(Slice utf8Left, Slice utf8Right)
Compares to UTF-8 sequences using UTF-16 big endian semantics. This is equivalent to theString.compareTo(Object).java.lang.String.- Throws:
InvalidUtf8Exception- if the UTF-8 are invalid
-
toUpperCase
public static Slice toUpperCase(Slice utf8)
Converts slice to upper case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.Note: Invalid UTF-8 sequences are copied directly to the output.
-
toLowerCase
public static Slice toLowerCase(Slice utf8)
Converts slice to lower case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.Note: Invalid UTF-8 sequences are copied directly to the output.
-
leftTrim
public static Slice leftTrim(Slice utf8)
Removes all white space characters from the left side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
leftTrim
public static Slice leftTrim(Slice utf8, int[] whiteSpaceCodePoints)
Removes allwhiteSpaceCodePointsfrom the left side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
rightTrim
public static Slice rightTrim(Slice utf8)
Removes all white space characters from the right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
rightTrim
public static Slice rightTrim(Slice utf8, int[] whiteSpaceCodePoints)
Removes all whitewhiteSpaceCodePointsfrom the right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
trim
public static Slice trim(Slice utf8)
Removes all white space characters from the left and right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
trim
public static Slice trim(Slice utf8, int[] whiteSpaceCodePoints)
Removes all whitewhiteSpaceCodePointsfrom the left and right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
fixInvalidUtf8
public static Slice fixInvalidUtf8(Slice slice, OptionalInt replacementCodePoint)
-
tryGetCodePointAt
public static int tryGetCodePointAt(Slice utf8, int position)
Tries to get the UTF-8 encoded code point at theposition. A positive return value means the UTF-8 sequence at the position is valid, and the result is the code point. A negative return value means the UTF-8 sequence at the position is invalid, and the length of the invalid sequence is the absolute value of the result.- Returns:
- the code point or negative the number of bytes in the invalid UTF-8 sequence.
-
offsetOfCodePoint
public static int offsetOfCodePoint(Slice utf8, int codePointCount)
Finds the index of the first byte of the code point at a position, or-1if the position is not within the slice.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
offsetOfCodePoint
public static int offsetOfCodePoint(Slice utf8, int position, int codePointCount)
Starting frompositionbytes inutf8, finds the index of the first byte of the code pointcodePointCountin the slice. If the slice does not containcodePointCountcode points afterposition,-1is returned.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
lengthOfCodePoint
public static int lengthOfCodePoint(Slice utf8, int position)
Gets the UTF-8 sequence length of the code point atposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
lengthOfCodePointSafe
public static int lengthOfCodePointSafe(Slice utf8, int position)
Gets the UTF-8 sequence length of the code point atposition.Truncated UTF-8 sequences, 5 and 6 byte sequences, and invalid code points are handled by this method without throwing an exception.
-
lengthOfCodePoint
public static int lengthOfCodePoint(int codePoint)
Gets the UTF-8 sequence length of the code point.- Throws:
InvalidCodePointException- if code point is not within a valid range
-
lengthOfCodePointFromStartByte
public static int lengthOfCodePointFromStartByte(byte startByte)
Gets the UTF-8 sequence length using the sequence start byte.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
getCodePointAt
public static int getCodePointAt(Slice utf8, int position)
Gets the UTF-8 encoded code point at theposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
getCodePointBefore
public static int getCodePointBefore(Slice utf8, int position)
Gets the UTF-8 encoded code point before theposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
codePointToUtf8
public static Slice codePointToUtf8(int codePoint)
Convert the code point to UTF-8.- Throws:
InvalidCodePointException- if code point is not within a valid range
-
setCodePointAt
public static int setCodePointAt(int codePoint, Slice utf8, int position)Sets the UTF-8 sequence for code point at theposition.- Throws:
InvalidCodePointException- if code point is not within a valid range
-
-