Class SliceUtf8
-
Method Summary
Modifier and TypeMethodDescriptionstatic SlicecodePointToUtf8(int codePoint) Convert the code point to UTF-8.static intcompareUtf16BE(Slice utf8Left, Slice utf8Right) Compares to UTF-8 sequences using UTF-16 big endian semantics.static intcountCodePoints(Slice utf8) Counts the code points within UTF-8 encoded slice.static intcountCodePoints(Slice utf8, int offset, int length) Counts the code points within UTF-8 encoded slice up tolength.static SlicefixInvalidUtf8(Slice slice) static SlicefixInvalidUtf8(Slice slice, OptionalInt replacementCodePoint) static intgetCodePointAt(Slice utf8, int position) Gets the UTF-8 encoded code point at theposition.static intgetCodePointBefore(Slice utf8, int position) Gets the UTF-8 encoded code point before theposition.static booleanDoes the slice contain only 7-bit ASCII characters.static SliceRemoves all white space characters from the left side of the string.static SliceRemoves allwhiteSpaceCodePointsfrom the left side of the string.static intlengthOfCodePoint(int codePoint) Gets the UTF-8 sequence length of the code point.static intlengthOfCodePoint(Slice utf8, int position) Gets the UTF-8 sequence length of the code point atposition.static intlengthOfCodePointFromStartByte(byte startByte) Gets the UTF-8 sequence length using the sequence start byte.static intlengthOfCodePointSafe(Slice utf8, int position) Gets the UTF-8 sequence length of the code point atposition.static intoffsetOfCodePoint(Slice utf8, int codePointCount) Finds the index of the first byte of the code point at a position, or-1if the position is not within the slice.static intoffsetOfCodePoint(Slice utf8, int position, int codePointCount) Starting frompositionbytes inutf8, finds the index of the first byte of the code pointcodePointCountin the slice.static SliceReverses the slice code point by code point.static SliceRemoves all white space characters from the right side of the string.static SliceRemoves all whitewhiteSpaceCodePointsfrom the right side of the string.static intsetCodePointAt(int codePoint, Slice utf8, int position) Sets the UTF-8 sequence for code point at theposition.static SliceGets the substring starting atcodePointStartand extending forcodePointLengthcode points.static SlicetoLowerCase(Slice utf8) Converts slice to lower case code point by code point.static SlicetoUpperCase(Slice utf8) Converts slice to upper case code point by code point.static SliceRemoves all white space characters from the left and right side of the string.static SliceRemoves all whitewhiteSpaceCodePointsfrom the left and right side of the string.static inttryGetCodePointAt(Slice utf8, int position) Tries to get the UTF-8 encoded code point at theposition.
-
Method Details
-
isAscii
Does the slice contain only 7-bit ASCII characters. -
countCodePoints
Counts the code points within UTF-8 encoded slice.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
countCodePoints
Counts the code points within UTF-8 encoded slice up tolength.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
substring
Gets the substring starting atcodePointStartand extending forcodePointLengthcode points.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
reverse
Reverses the slice code point by code point.Note: Invalid UTF-8 sequences are copied directly to the output.
-
compareUtf16BE
Compares to UTF-8 sequences using UTF-16 big endian semantics. This is equivalent to theString.compareTo(String).java.lang.String.- Throws:
InvalidUtf8Exception- if the UTF-8 are invalid
-
toUpperCase
Converts slice to upper case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.Note: Invalid UTF-8 sequences are copied directly to the output.
-
toLowerCase
Converts slice to lower case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.Note: Invalid UTF-8 sequences are copied directly to the output.
-
leftTrim
Removes all white space characters from the left side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
leftTrim
Removes allwhiteSpaceCodePointsfrom the left side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
rightTrim
Removes all white space characters from the right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
rightTrim
Removes all whitewhiteSpaceCodePointsfrom the right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
trim
Removes all white space characters from the left and right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
trim
Removes all whitewhiteSpaceCodePointsfrom the left and right side of the string.Note: Invalid UTF-8 sequences are not trimmed.
-
fixInvalidUtf8
-
fixInvalidUtf8
-
tryGetCodePointAt
Tries to get the UTF-8 encoded code point at theposition. A positive return value means the UTF-8 sequence at the position is valid, and the result is the code point. A negative return value means the UTF-8 sequence at the position is invalid, and the length of the invalid sequence is the absolute value of the result.- Returns:
- the code point or negative the number of bytes in the invalid UTF-8 sequence.
-
offsetOfCodePoint
Finds the index of the first byte of the code point at a position, or-1if the position is not within the slice.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
offsetOfCodePoint
Starting frompositionbytes inutf8, finds the index of the first byte of the code pointcodePointCountin the slice. If the slice does not containcodePointCountcode points afterposition,-1is returned.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
lengthOfCodePoint
Gets the UTF-8 sequence length of the code point atposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
lengthOfCodePointSafe
Gets the UTF-8 sequence length of the code point atposition.Truncated UTF-8 sequences, 5 and 6 byte sequences, and invalid code points are handled by this method without throwing an exception.
-
lengthOfCodePoint
public static int lengthOfCodePoint(int codePoint) Gets the UTF-8 sequence length of the code point.- Throws:
InvalidCodePointException- if code point is not within a valid range
-
lengthOfCodePointFromStartByte
public static int lengthOfCodePointFromStartByte(byte startByte) Gets the UTF-8 sequence length using the sequence start byte.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
getCodePointAt
Gets the UTF-8 encoded code point at theposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
getCodePointBefore
Gets the UTF-8 encoded code point before theposition.Note: This method does not explicitly check for valid UTF-8, and may return incorrect results or throw an exception for invalid UTF-8.
-
codePointToUtf8
Convert the code point to UTF-8.- Throws:
InvalidCodePointException- if code point is not within a valid range
-
setCodePointAt
Sets the UTF-8 sequence for code point at theposition.- Throws:
InvalidCodePointException- if code point is not within a valid range
-