Class Normalizer2Impl
java.lang.Object
org.graalvm.shadowed.com.ibm.icu.impl.Normalizer2Impl
Low-level implementation of the Unicode Normalization Algorithm.
For the data structure and details see the documentation at the end of
C++ normalizer2impl.h and in the design doc at
https://unicode-org.github.io/icu/design/normalization/custom.html
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classstatic final classWritable buffer that takes care of canonical ordering.static final class -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intTwo-way mappings; each starts with a character that combines backward.static final intTwo-way mappings invalid input: '&' compositions.static final intstatic final intMappings are comp-normalized.static final intMappings are not comp-normalized but have a comp boundary before.static final intMappings do not have a comp boundary before.static final intMappings to the empty string.static final intMappings invalid input: '&' compositions in [minYesNo..minYesNoMappingsOnly[.static final intMappings only in [minYesNoMappingsOnly..minNoNo[.static final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidvoidaddLcccChars(UnicodeSet set) voidbooleancompose(CharSequence s, int src, int limit, boolean onlyContiguous, boolean doCompose, Normalizer2Impl.ReorderingBuffer buffer) voidcomposeAndAppend(CharSequence s, boolean doCompose, boolean onlyContiguous, Normalizer2Impl.ReorderingBuffer buffer) intcomposePair(int a, int b) intcomposeQuickCheck(CharSequence s, int src, int limit, boolean onlyContiguous, boolean doSpan) Very similar to compose(): Make the same changes in both places if relevant.voiddecompose(CharSequence s, int src, int limit, StringBuilder dest, int destLengthEstimate) Decomposes s[src, limit[ and writes the result to dest.intdecompose(CharSequence s, int src, int limit, Normalizer2Impl.ReorderingBuffer buffer) decompose(CharSequence s, StringBuilder dest) voiddecomposeAndAppend(CharSequence s, boolean doDecompose, Normalizer2Impl.ReorderingBuffer buffer) Builds the canonical-iterator data for this instance.booleangetCanonStartSet(int c, UnicodeSet set) Returns true if there are characters whose decomposition starts with c.intgetCC(int norm16) static intgetCCFromNormalYesOrMaybe(int norm16) static intgetCCFromYesOrMaybeYes(int norm16) intgetCCFromYesOrMaybeYesCP(int c) intgetCompQuickCheck(int norm16) getDecomposition(int c) Gets the decomposition for one code point.intgetFCD16(int c) Returns the FCD data for code point c.intgetFCD16FromNormData(int c) Gets the FCD value from the regular normalization data.intgetNorm16(int c) getRawDecomposition(int c) Gets the raw decomposition for one code point.intgetRawNorm16(int c) booleanhasCompBoundaryAfter(int c, boolean onlyContiguous) booleanhasCompBoundaryBefore(int c) booleanhasDecompBoundaryAfter(int c) booleanhasDecompBoundaryBefore(int c) booleanhasFCDBoundaryAfter(int c) booleanhasFCDBoundaryBefore(int c) booleanisAlgorithmicNoNo(int norm16) booleanisCanonSegmentStarter(int c) Returns true if code point c starts a canonical-iterator string segment.booleanisCompInert(int c, boolean onlyContiguous) booleanisCompNo(int norm16) booleanisDecompInert(int c) booleanisDecompYes(int norm16) booleanisFCDInert(int c) load(ByteBuffer bytes) intmakeFCD(CharSequence s, int src, int limit, Normalizer2Impl.ReorderingBuffer buffer) voidmakeFCDAndAppend(CharSequence s, boolean doMakeFCD, Normalizer2Impl.ReorderingBuffer buffer) booleannorm16HasDecompBoundaryAfter(int norm16) booleannorm16HasDecompBoundaryBefore(int norm16) booleansingleLeadMightHaveNonZeroFCD16(int lead) Returns true if the single-or-lead code unit c might have non-zero FCD data.
-
Field Details
-
MIN_YES_YES_WITH_CC
public static final int MIN_YES_YES_WITH_CC- See Also:
-
JAMO_VT
public static final int JAMO_VT- See Also:
-
MIN_NORMAL_MAYBE_YES
public static final int MIN_NORMAL_MAYBE_YES- See Also:
-
JAMO_L
public static final int JAMO_L- See Also:
-
INERT
public static final int INERT- See Also:
-
HAS_COMP_BOUNDARY_AFTER
public static final int HAS_COMP_BOUNDARY_AFTER- See Also:
-
OFFSET_SHIFT
public static final int OFFSET_SHIFT- See Also:
-
DELTA_TCCC_0
public static final int DELTA_TCCC_0- See Also:
-
DELTA_TCCC_1
public static final int DELTA_TCCC_1- See Also:
-
DELTA_TCCC_GT_1
public static final int DELTA_TCCC_GT_1- See Also:
-
DELTA_TCCC_MASK
public static final int DELTA_TCCC_MASK- See Also:
-
DELTA_SHIFT
public static final int DELTA_SHIFT- See Also:
-
MAX_DELTA
public static final int MAX_DELTA- See Also:
-
IX_NORM_TRIE_OFFSET
public static final int IX_NORM_TRIE_OFFSET- See Also:
-
IX_EXTRA_DATA_OFFSET
public static final int IX_EXTRA_DATA_OFFSET- See Also:
-
IX_SMALL_FCD_OFFSET
public static final int IX_SMALL_FCD_OFFSET- See Also:
-
IX_RESERVED3_OFFSET
public static final int IX_RESERVED3_OFFSET- See Also:
-
IX_TOTAL_SIZE
public static final int IX_TOTAL_SIZE- See Also:
-
IX_MIN_DECOMP_NO_CP
public static final int IX_MIN_DECOMP_NO_CP- See Also:
-
IX_MIN_COMP_NO_MAYBE_CP
public static final int IX_MIN_COMP_NO_MAYBE_CP- See Also:
-
IX_MIN_YES_NO
public static final int IX_MIN_YES_NOMappings invalid input: '&' compositions in [minYesNo..minYesNoMappingsOnly[.- See Also:
-
IX_MIN_NO_NO
public static final int IX_MIN_NO_NOMappings are comp-normalized.- See Also:
-
IX_LIMIT_NO_NO
public static final int IX_LIMIT_NO_NO- See Also:
-
IX_MIN_MAYBE_YES
public static final int IX_MIN_MAYBE_YES- See Also:
-
IX_MIN_YES_NO_MAPPINGS_ONLY
public static final int IX_MIN_YES_NO_MAPPINGS_ONLYMappings only in [minYesNoMappingsOnly..minNoNo[.- See Also:
-
IX_MIN_NO_NO_COMP_BOUNDARY_BEFORE
public static final int IX_MIN_NO_NO_COMP_BOUNDARY_BEFOREMappings are not comp-normalized but have a comp boundary before.- See Also:
-
IX_MIN_NO_NO_COMP_NO_MAYBE_CC
public static final int IX_MIN_NO_NO_COMP_NO_MAYBE_CCMappings do not have a comp boundary before.- See Also:
-
IX_MIN_NO_NO_EMPTY
public static final int IX_MIN_NO_NO_EMPTYMappings to the empty string.- See Also:
-
IX_MIN_LCCC_CP
public static final int IX_MIN_LCCC_CP- See Also:
-
IX_MIN_MAYBE_NO
public static final int IX_MIN_MAYBE_NOTwo-way mappings; each starts with a character that combines backward.- See Also:
-
IX_MIN_MAYBE_NO_COMBINES_FWD
public static final int IX_MIN_MAYBE_NO_COMBINES_FWDTwo-way mappings invalid input: '&' compositions.- See Also:
-
MAPPING_HAS_CCC_LCCC_WORD
public static final int MAPPING_HAS_CCC_LCCC_WORD- See Also:
-
MAPPING_HAS_RAW_MAPPING
public static final int MAPPING_HAS_RAW_MAPPING- See Also:
-
MAPPING_LENGTH_MASK
public static final int MAPPING_LENGTH_MASK- See Also:
-
COMP_1_LAST_TUPLE
public static final int COMP_1_LAST_TUPLE- See Also:
-
COMP_1_TRIPLE
public static final int COMP_1_TRIPLE- See Also:
-
COMP_1_TRAIL_LIMIT
public static final int COMP_1_TRAIL_LIMIT- See Also:
-
COMP_1_TRAIL_MASK
public static final int COMP_1_TRAIL_MASK- See Also:
-
COMP_1_TRAIL_SHIFT
public static final int COMP_1_TRAIL_SHIFT- See Also:
-
COMP_2_TRAIL_SHIFT
public static final int COMP_2_TRAIL_SHIFT- See Also:
-
COMP_2_TRAIL_MASK
public static final int COMP_2_TRAIL_MASK- See Also:
-
-
Constructor Details
-
Normalizer2Impl
public Normalizer2Impl()
-
-
Method Details
-
load
-
load
-
addLcccChars
-
addPropertyStarts
-
addCanonIterPropertyStarts
-
ensureCanonIterData
Builds the canonical-iterator data for this instance. This is required before any ofisCanonSegmentStarter(int)orgetCanonStartSet(int, UnicodeSet)are called, or else they crash.- Returns:
- this
-
getNorm16
public int getNorm16(int c) -
getRawNorm16
public int getRawNorm16(int c) -
getCompQuickCheck
public int getCompQuickCheck(int norm16) -
isAlgorithmicNoNo
public boolean isAlgorithmicNoNo(int norm16) -
isCompNo
public boolean isCompNo(int norm16) -
isDecompYes
public boolean isDecompYes(int norm16) -
getCC
public int getCC(int norm16) -
getCCFromNormalYesOrMaybe
public static int getCCFromNormalYesOrMaybe(int norm16) -
getCCFromYesOrMaybeYes
public static int getCCFromYesOrMaybeYes(int norm16) -
getCCFromYesOrMaybeYesCP
public int getCCFromYesOrMaybeYesCP(int c) -
getFCD16
public int getFCD16(int c) Returns the FCD data for code point c.- Parameters:
c- A Unicode code point.- Returns:
- The lccc(c) in bits 15..8 and tccc(c) in bits 7..0.
-
singleLeadMightHaveNonZeroFCD16
public boolean singleLeadMightHaveNonZeroFCD16(int lead) Returns true if the single-or-lead code unit c might have non-zero FCD data. -
getFCD16FromNormData
public int getFCD16FromNormData(int c) Gets the FCD value from the regular normalization data. -
getDecomposition
Gets the decomposition for one code point.- Parameters:
c- code point- Returns:
- c's decomposition, if it has one; returns null if it does not have a decomposition
-
getRawDecomposition
Gets the raw decomposition for one code point.- Parameters:
c- code point- Returns:
- c's raw decomposition, if it has one; returns null if it does not have a decomposition
-
isCanonSegmentStarter
public boolean isCanonSegmentStarter(int c) Returns true if code point c starts a canonical-iterator string segment.ensureCanonIterData()must have been called before this method, or else this method will crash.- Parameters:
c- A Unicode code point.- Returns:
- true if c starts a canonical-iterator string segment.
-
getCanonStartSet
Returns true if there are characters whose decomposition starts with c. If so, then the set is cleared and then filled with those characters.ensureCanonIterData()must have been called before this method, or else this method will crash.- Parameters:
c- A Unicode code point.set- A UnicodeSet to receive the characters whose decompositions start with c, if there are any.- Returns:
- true if there are characters whose decomposition starts with c.
-
decompose
-
decompose
public void decompose(CharSequence s, int src, int limit, StringBuilder dest, int destLengthEstimate) Decomposes s[src, limit[ and writes the result to dest. limit can be NULL if src is NUL-terminated. destLengthEstimate is the initial dest buffer capacity and can be -1. -
decompose
-
decomposeAndAppend
public void decomposeAndAppend(CharSequence s, boolean doDecompose, Normalizer2Impl.ReorderingBuffer buffer) -
compose
public boolean compose(CharSequence s, int src, int limit, boolean onlyContiguous, boolean doCompose, Normalizer2Impl.ReorderingBuffer buffer) -
composeQuickCheck
public int composeQuickCheck(CharSequence s, int src, int limit, boolean onlyContiguous, boolean doSpan) Very similar to compose(): Make the same changes in both places if relevant. doSpan: spanQuickCheckYes (ignore bit 0 of the return value) !doSpan: quickCheck- Returns:
- bits 31..1: spanQuickCheckYes (==s.length() if "yes") and bit 0: set if "maybe"; otherwise, if the span length<s.length() then the quick check result is "no"
-
composeAndAppend
public void composeAndAppend(CharSequence s, boolean doCompose, boolean onlyContiguous, Normalizer2Impl.ReorderingBuffer buffer) -
makeFCD
-
makeFCDAndAppend
public void makeFCDAndAppend(CharSequence s, boolean doMakeFCD, Normalizer2Impl.ReorderingBuffer buffer) -
hasDecompBoundaryBefore
public boolean hasDecompBoundaryBefore(int c) -
norm16HasDecompBoundaryBefore
public boolean norm16HasDecompBoundaryBefore(int norm16) -
hasDecompBoundaryAfter
public boolean hasDecompBoundaryAfter(int c) -
norm16HasDecompBoundaryAfter
public boolean norm16HasDecompBoundaryAfter(int norm16) -
isDecompInert
public boolean isDecompInert(int c) -
hasCompBoundaryBefore
public boolean hasCompBoundaryBefore(int c) -
hasCompBoundaryAfter
public boolean hasCompBoundaryAfter(int c, boolean onlyContiguous) -
isCompInert
public boolean isCompInert(int c, boolean onlyContiguous) -
hasFCDBoundaryBefore
public boolean hasFCDBoundaryBefore(int c) -
hasFCDBoundaryAfter
public boolean hasFCDBoundaryAfter(int c) -
isFCDInert
public boolean isFCDInert(int c) -
composePair
public int composePair(int a, int b)
-