Class Trie2
java.lang.Object
org.graalvm.shadowed.com.ibm.icu.impl.Trie2
- All Implemented Interfaces:
Iterable<Trie2.Range>
- Direct Known Subclasses:
Trie2_16, Trie2_32, Trie2Writable
This is the interface and common implementation of a Unicode Trie2.
It is a kind of compressed table that maps from Unicode code points (0..0x10ffff)
to 16- or 32-bit integer values. It works best when there are ranges of
characters with the same value, which is generally the case with Unicode
character properties.
This is the second common version of a Unicode trie (hence the name Trie2).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassAn iterator that operates over an input CharSequence, and for each Unicode code point in the input returns the associated value from the Trie2.static classStruct-like class for holding the results returned by a UTrie2 CharSequence iterator.static classWhen iterating over the contents of a Trie2, Elements of this type are produced.static interfaceWhen iterating over the contents of a Trie2, an instance of TrieValueMapper may be used to remap the values from the Trie2. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncharSequenceIterator(CharSequence text, int index) Create an iterator that will produce the values from the Trie2 for the sequence of code points in an input text.static Trie2createFromSerialized(ByteBuffer bytes) Create a Trie2 from its serialized form.final booleanEquals function.abstract intget(int codePoint) Get the value for a code point as stored in the Trie2.abstract intgetFromU16SingleLead(char c) Get the trie value for a UTF-16 code unit.static intgetVersion(InputStream is, boolean littleEndianOk) Get the UTrie version from an InputStream containing the serialized form of either a Trie (version 1) or a Trie2 (version 2).inthashCode()iterator()Create an iterator over the value ranges in this Trie2.iterator(Trie2.ValueMapper mapper) Create an iterator over the value ranges from this Trie2.iteratorForLeadSurrogate(char lead) Create an iterator over the Trie2 values for the 1024=0x400 code points corresponding to a given lead surrogate.iteratorForLeadSurrogate(char lead, Trie2.ValueMapper mapper) Create an iterator over the Trie2 values for the 1024=0x400 code points corresponding to a given lead surrogate.protected intSerialize a trie2 Header and Index onto an OutputStream.Methods inherited from class Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface Iterable
forEach, spliterator
-
Constructor Details
-
Trie2
public Trie2()
-
-
Method Details
-
createFromSerialized
Create a Trie2 from its serialized form. Inverse of utrie2_serialize(). Reads from the current position and leaves the buffer after the end of the trie. The serialized format is identical between ICU4C and ICU4J, so this function will work with serialized Trie2s from either. The actual type of the returned Trie2 will be either Trie2_16 or Trie2_32, depending on the width of the data. To obtain the width of the Trie2, check the actual class type of the returned Trie2. Or use the createFromSerialized() function of Trie2_16 or Trie2_32, which will return only Tries of their specific type/size. The serialized Trie2 on the stream may be in either little or big endian byte order. This allows using serialized Tries from ICU4C without needing to consider the byte order of the system that created them.- Parameters:
bytes- a byte buffer to the serialized form of a UTrie2.- Returns:
- An unserialized Trie2, ready for use.
- Throws:
IllegalArgumentException- if the stream does not contain a serialized Trie2.IOException- if a read error occurs in the buffer.
-
getVersion
Get the UTrie version from an InputStream containing the serialized form of either a Trie (version 1) or a Trie2 (version 2).- Parameters:
is- an InputStream containing the serialized form of a UTrie, version 1 or 2. The stream must support mark() and reset(). The position of the input stream will be left unchanged.littleEndianOk- If false, only big-endian (Java native) serialized forms are recognized. If true, little-endian serialized forms are recognized as well.- Returns:
- the Trie version of the serialized form, or 0 if it is not recognized as a serialized UTrie
- Throws:
IOException- on errors in reading from the input stream.
-
get
public abstract int get(int codePoint) Get the value for a code point as stored in the Trie2.- Parameters:
codePoint- the code point- Returns:
- the value
-
getFromU16SingleLead
public abstract int getFromU16SingleLead(char c) Get the trie value for a UTF-16 code unit. A Trie2 stores two distinct values for input in the lead surrogate range, one for lead surrogates, which is the value that will be returned by this function, and a second value that is returned by Trie2.get(). For code units outside of the lead surrogate range, this function returns the same result as Trie2.get(). This function, together with the alternate value for lead surrogates, makes possible very efficient processing of UTF-16 strings without first converting surrogate pairs to their corresponding 32 bit code point values. At build-time, enumerate the contents of the Trie2 to see if there is non-trivial (non-initialValue) data for any of the supplementary code points associated with a lead surrogate. If so, then set a special (application-specific) value for the lead surrogate code _unit_, with Trie2Writable.setForLeadSurrogateCodeUnit(). At runtime, use Trie2.getFromU16SingleLead(). If there is non-trivial data and the code unit is a lead surrogate, then check if a trail surrogate follows. If so, assemble the supplementary code point and look up its value with Trie2.get(); otherwise reset the lead surrogate's value or do a code point lookup for it. If there is only trivial data for lead and trail surrogates, then processing can often skip them. For example, in normalization or case mapping all characters that do not have any mappings are simply copied as is.- Parameters:
c- the code point or lead surrogate value.- Returns:
- the value
-
equals
-
hashCode
-
iterator
Create an iterator over the value ranges in this Trie2. Values from the Trie2 are not remapped or filtered, but are returned as they are stored in the Trie2.- Specified by:
iteratorin interfaceIterable<Trie2.Range>- Returns:
- an Iterator
-
iterator
Create an iterator over the value ranges from this Trie2. Values from the Trie2 are passed through a caller-supplied remapping function, and it is the remapped values that determine the ranges that will be produced by the iterator.- Parameters:
mapper- provides a function to remap values obtained from the Trie2.- Returns:
- an Iterator
-
iteratorForLeadSurrogate
Create an iterator over the Trie2 values for the 1024=0x400 code points corresponding to a given lead surrogate. For example, for the lead surrogate U+D87E it will enumerate the values for [U+2F800..U+2FC00[. Used by data builder code that sets special lead surrogate code unit values for optimized UTF-16 string processing. Do not modify the Trie2 during the iteration. Except for the limited code point range, this functions just like Trie2.iterator(). -
iteratorForLeadSurrogate
Create an iterator over the Trie2 values for the 1024=0x400 code points corresponding to a given lead surrogate. For example, for the lead surrogate U+D87E it will enumerate the values for [U+2F800..U+2FC00[. Used by data builder code that sets special lead surrogate code unit values for optimized UTF-16 string processing. Do not modify the Trie2 during the iteration. Except for the limited code point range, this functions just like Trie2.iterator(). -
serializeHeader
Serialize a trie2 Header and Index onto an OutputStream. This is common code used for both the Trie2_16 and Trie2_32 serialize functions.- Parameters:
dos- the stream to which the serialized Trie2 data will be written.- Returns:
- the number of bytes written.
- Throws:
IOException
-
charSequenceIterator
Create an iterator that will produce the values from the Trie2 for the sequence of code points in an input text.- Parameters:
text- A text string to be iterated over.index- The starting iteration position within the input text.- Returns:
- the CharSequenceIterator
-