public abstract class Trie extends Object
A trie is a kind of compressed, serializable table of values associated with Unicode code points (0..0x10ffff).
This class defines the basic structure of a trie and provides methods to retrieve the offsets to the actual data.
Data will be the form of an array of basic types, char or int.
The actual data format will have to be specified by the user in the inner static interface org.graalvm.shadowed.com.ibm.icu.impl.Trie.DataManipulate.
This trie implementation is optimized for getting offset while walking forward through a UTF-16 string. Therefore, the simplest and fastest access macros are the fromLead() and fromOffsetTrail() methods. The fromBMP() method are a little more complicated; they get offsets even for lead surrogate codepoints, while the fromLead() method get special "folded" offsets for lead surrogate code units if there is relevant data associated with them. From such a folded offsets, an offset needs to be extracted to supply to the fromOffsetTrail() methods. To handle such supplementary codepoints, some offset information are kept in the data.
Methods in org.graalvm.shadowed.com.ibm.icu.impl.Trie.DataManipulate are called to retrieve that offset from the folded value for the lead surrogate unit.
For examples of use, see org.graalvm.shadowed.com.ibm.icu.impl.CharTrie or org.graalvm.shadowed.com.ibm.icu.impl.IntTrie.
| Modifier and Type | Class and Description |
|---|---|
static interface |
Trie.DataManipulate
Character data in com.ibm.impl.Trie have different user-specified format
for different purposes.
|
| Modifier and Type | Field and Description |
|---|---|
protected static int |
BMP_INDEX_LENGTH
Length of the BMP portion of the index (stage 1) array.
|
protected static int |
DATA_BLOCK_LENGTH
Number of data values in a stage 2 (data array) block.
|
protected static int |
HEADER_LENGTH_
Size of Trie header in bytes
|
protected static int |
HEADER_OPTIONS_DATA_IS_32_BIT_ |
protected static int |
HEADER_OPTIONS_INDEX_SHIFT_ |
protected static int |
HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_
Latin 1 option mask
|
protected static int |
HEADER_SIGNATURE_
Constant number to authenticate the byte block
|
protected static int |
INDEX_STAGE_1_SHIFT_
Shift size for shifting right the input index.
|
protected static int |
INDEX_STAGE_2_SHIFT_
Shift size for shifting left the index array values.
|
protected static int |
INDEX_STAGE_3_MASK_
Mask for getting the lower bits from the input index.
|
protected static int |
LEAD_INDEX_OFFSET_
Lead surrogate code points' index displacement in the index array.
|
protected int |
m_dataLength_
Length of the data array
|
protected Trie.DataManipulate |
m_dataManipulate_
Internal TrieValue which handles the parsing of the data value.
|
protected int |
m_dataOffset_
Start index of the data portion of the trie.
|
protected char[] |
m_index_
Index or UTF16 characters
|
protected static int |
SURROGATE_BLOCK_BITS
Number of bits of a trail surrogate that are used in index table lookups.
|
protected static int |
SURROGATE_BLOCK_COUNT
Number of index (stage 1) entries per lead surrogate.
|
protected static int |
SURROGATE_MASK_
Surrogate mask to use when shifting offset to retrieve supplementary
values
|
| Modifier | Constructor and Description |
|---|---|
protected |
Trie(ByteBuffer bytes,
Trie.DataManipulate dataManipulate)
Trie constructor for CharTrie use.
|
protected |
Trie(char[] index,
int options,
Trie.DataManipulate dataManipulate)
Trie constructor
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
equals(Object other)
Checks if the argument Trie has the same data as this Trie.
|
protected int |
getBMPOffset(char ch)
Gets the offset to data which the BMP character points to
Treats a lead surrogate as a normal code point.
|
protected int |
getCodePointOffset(int ch)
Internal trie getter from a code point.
|
protected abstract int |
getInitialValue()
Gets the default initial value
|
protected int |
getLeadOffset(char ch)
Gets the offset to the data which this lead surrogate character points
to.
|
protected int |
getRawOffset(int offset,
char ch)
Gets the offset to the data which the index ch after variable offset
points to.
|
int |
getSerializedDataSize()
Gets the serialized data file size of the Trie.
|
protected abstract int |
getSurrogateOffset(char lead,
char trail)
Gets the offset to the data which the surrogate pair points to.
|
protected abstract int |
getValue(int index)
Gets the value at the argument index
|
int |
hashCode() |
protected boolean |
isCharTrie()
Determines if this is a 16 bit trie
|
protected boolean |
isIntTrie()
Determines if this is a 32 bit trie
|
boolean |
isLatin1Linear()
Determines if this trie has a linear latin 1 array
|
protected void |
unserialize(ByteBuffer bytes)
Parses the byte buffer and creates the trie index with it.
|
protected static final int LEAD_INDEX_OFFSET_
protected static final int INDEX_STAGE_1_SHIFT_
protected static final int INDEX_STAGE_2_SHIFT_
protected static final int DATA_BLOCK_LENGTH
protected static final int INDEX_STAGE_3_MASK_
protected static final int SURROGATE_BLOCK_BITS
protected static final int SURROGATE_BLOCK_COUNT
protected static final int BMP_INDEX_LENGTH
protected static final int SURROGATE_MASK_
protected char[] m_index_
protected Trie.DataManipulate m_dataManipulate_
protected int m_dataOffset_
protected int m_dataLength_
protected static final int HEADER_LENGTH_
protected static final int HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_
protected static final int HEADER_SIGNATURE_
protected static final int HEADER_OPTIONS_INDEX_SHIFT_
protected static final int HEADER_OPTIONS_DATA_IS_32_BIT_
protected Trie(ByteBuffer bytes, Trie.DataManipulate dataManipulate)
bytes - data of an ICU data file, containing the triedataManipulate - object containing the information to parse the
trie dataprotected Trie(char[] index,
int options,
Trie.DataManipulate dataManipulate)
index - array to be used for indexoptions - used by the triedataManipulate - object containing the information to parse the
trie datapublic final boolean isLatin1Linear()
public boolean equals(Object other)
public int getSerializedDataSize()
protected abstract int getSurrogateOffset(char lead,
char trail)
lead - lead surrogatetrail - trailing surrogateprotected abstract int getValue(int index)
index - value at index will be retrievedprotected abstract int getInitialValue()
protected final int getRawOffset(int offset,
char ch)
getRawOffset(0, ch);
will do. Otherwise if it is a supplementary character formed by surrogates lead and trail. Then we would have to call getRawOffset() with getFoldingIndexOffset(). See getSurrogateOffset().offset - index offset which ch is to start fromch - index to be used after offsetprotected final int getBMPOffset(char ch)
ch - BMP characterprotected final int getLeadOffset(char ch)
ch - lead surrogate characterprotected final int getCodePointOffset(int ch)
ch - codepointprotected void unserialize(ByteBuffer bytes)
Parses the byte buffer and creates the trie index with it.
The position of the input ByteBuffer must be right after the trie header.
This is overwritten by the child classes.
bytes - buffer containing trie dataprotected final boolean isIntTrie()
protected final boolean isCharTrie()