Class UCharacterName
java.lang.Object
org.graalvm.shadowed.com.ibm.icu.impl.UCharacterName
Internal class to manage character names.
Since data for names are stored
in an array of char, by default indexes used in this class is referring to
a 2 byte count, unless otherwise stated. Cases where the index is referring
to a byte count, the index is halved and depending on whether the index is
even or odd, the MSB or LSB of the result char at the halved index is
returned. For indexes to an array of int, the index is multiplied by 2,
result char at the multiplied index and its following char is returned as an
int.
UCharacter acts as a public facade for this class
Note : 0 - 0x1F are control characters without names in Unicode 3.0
- Since:
- nov0700
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final UCharacterNamestatic final intNumber of lines per group 1 invalid input: '<'invalid input: '<' GROUP_SHIFT_intMaximum number of groups -
Method Summary
Modifier and TypeMethodDescriptionintgetAlgorithmEnd(int index) Gets the end of the rangeintGet the Algorithm range lengthgetAlgorithmName(int index, int codepoint) Gets the Algorithmic name of the codepointintgetAlgorithmStart(int index) Gets the start of the rangeintgetCharFromName(int choice, String name) Find a character by its name and return its code point valuevoidFills set with characters that are used in Unicode character names.static intgetCodepointMSB(int codepoint) Gets the MSB of the codepointgetExtendedName(int ch) Retrieves the extended namegetExtendedOr10Name(int ch) Gets the extended and 1.0 name when the most current unicode names failintgetGroup(int codepoint) Gets the group index for the codepoint, or the group before it.intgetGroupLengths(int index, char[] offsets, char[] lengths) Reads a block of compressed lengths of 32 strings and expands them into offsets and lengths for each string.static intgetGroupLimit(int msb) Gets the maximum codepoint + 1 of the groupstatic intgetGroupMin(int msb) Gets the minimum codepoint of the groupstatic intgetGroupMinFromCodepoint(int codepoint) CLOVER:OFFintgetGroupMSB(int gindex) Gets the MSB from the group indexgetGroupName(int ch, int choice) Gets the group name of the charactergetGroupName(int index, int length, int choice) Gets the name of the argument group index.static intgetGroupOffset(int codepoint) Gets the offset to a groupvoidCLOVER:OFFintGets the maximum length of any codepoint name.intCLOVER:OFFgetName(int ch, int choice) Retrieve the name of a Unicode code point.
-
Field Details
-
INSTANCE
-
LINES_PER_GROUP_
public static final int LINES_PER_GROUP_Number of lines per group 1 invalid input: '<'invalid input: '<' GROUP_SHIFT_- See Also:
-
m_groupcount_
public int m_groupcount_Maximum number of groups
-
-
Method Details
-
getName
Retrieve the name of a Unicode code point. Depending onchoice, the character name written into the buffer is the "modern" name or the name that was defined in Unicode version 1.0. The name contains only "invariant" characters like A-Z, 0-9, space, and '-'.- Parameters:
ch- the code point for which to get the name.choice- Selector for which name to get.- Returns:
- if code point is above 0x1fff, null is returned
-
getCharFromName
Find a character by its name and return its code point value- Parameters:
choice- selector to indicate if argument name is a Unicode 1.0 or the most current versionname- the name to search for- Returns:
- code point
-
getGroupLengths
public int getGroupLengths(int index, char[] offsets, char[] lengths) Reads a block of compressed lengths of 32 strings and expands them into offsets and lengths for each string. Lengths are stored with a variable-width encoding in consecutive nibbles: If a nibbleinvalid input: '<'0xc, then it is the length itself (0 = empty string). If a nibble>=0xc, then it forms a length value with the following nibble. The offsets and lengths arrays must be at least 33 (one more) long because there is no check here at the end if the last nibble is still used.- Parameters:
index- of group string object in arrayoffsets- array to store the value of the string offsetslengths- array to store the value of the string length- Returns:
- next index of the data string immediately after the lengths in terms of byte address
-
getGroupName
Gets the name of the argument group index. UnicodeData.txt uses ';' as a field separator, so no field can contain ';' as part of its contents. In unames.icu, it is marked as token[';'] == -1 only if the semicolon is used in the data file - which is iff we have Unicode 1.0 names or ISO comments or aliases. So, it will be token[';'] == -1 if we store U1.0 names/ISO comments/aliases although we know that it will never be part of a name. Equivalent to ICU4C's expandName.- Parameters:
index- of the group name string in byte countlength- of the group name stringchoice- of Unicode 1.0 name or the most current name- Returns:
- name of the group
-
getExtendedName
Retrieves the extended name -
getGroup
public int getGroup(int codepoint) Gets the group index for the codepoint, or the group before it.- Parameters:
codepoint- The codepoint index.- Returns:
- group index containing codepoint or the group before it.
-
getExtendedOr10Name
Gets the extended and 1.0 name when the most current unicode names fail- Parameters:
ch- codepoint- Returns:
- name of codepoint extended or 1.0
-
getGroupMSB
public int getGroupMSB(int gindex) Gets the MSB from the group index- Parameters:
gindex- group index- Returns:
- the MSB of the group if gindex is valid, -1 otherwise
-
getCodepointMSB
public static int getCodepointMSB(int codepoint) Gets the MSB of the codepoint- Parameters:
codepoint- The codepoint value.- Returns:
- the MSB of the codepoint
-
getGroupLimit
public static int getGroupLimit(int msb) Gets the maximum codepoint + 1 of the group- Parameters:
msb- most significant byte of the group- Returns:
- limit codepoint of the group
-
getGroupMin
public static int getGroupMin(int msb) Gets the minimum codepoint of the group- Parameters:
msb- most significant byte of the group- Returns:
- minimum codepoint of the group
-
getGroupOffset
public static int getGroupOffset(int codepoint) Gets the offset to a group- Parameters:
codepoint- The codepoint value.- Returns:
- offset to a group
-
getGroupMinFromCodepoint
public static int getGroupMinFromCodepoint(int codepoint) CLOVER:OFF -
getAlgorithmLength
public int getAlgorithmLength()Get the Algorithm range length- Returns:
- Algorithm range length
-
getAlgorithmStart
public int getAlgorithmStart(int index) Gets the start of the range- Parameters:
index- algorithm index- Returns:
- algorithm range start
-
getAlgorithmEnd
public int getAlgorithmEnd(int index) Gets the end of the range- Parameters:
index- algorithm index- Returns:
- algorithm range end
-
getAlgorithmName
Gets the Algorithmic name of the codepoint- Parameters:
index- algorithmic range indexcodepoint- The codepoint value.- Returns:
- algorithmic name of codepoint
-
getGroupName
Gets the group name of the character- Parameters:
ch- character to get the group namechoice- name choice selector to choose a unicode 1.0 or newer name
-
getMaxCharNameLength
public int getMaxCharNameLength()Gets the maximum length of any codepoint name. Equivalent to uprv_getMaxCharNameLength.- Returns:
- the maximum length of any codepoint name
-
getMaxISOCommentLength
public int getMaxISOCommentLength()CLOVER:OFF -
getCharNameCharacters
Fills set with characters that are used in Unicode character names. Equivalent to uprv_getCharNameCharacters.- Parameters:
set- USet to receive characters. Existing contents are deleted.
-
getISOCommentCharacters
CLOVER:OFF
-