Interface UProperty
Selection constants for Unicode properties.
These constants are used in functions like UCharacter.hasBinaryProperty(int) to select one of the Unicode properties.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
For details about the properties see http://www.unicode.org.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available. Check UCharacter.getUnicodeVersion() to be sure.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interfaceSelector constants for UCharacter.getPropertyName() and UCharacter.getPropertyValueName(). -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intString property Age.static final intBinary property Alphabetic.static final intBinary property ASCII_Hex_Digit (0-9 A-F a-f).static final intEnumerated property Bidi_Class.static final intBinary property Bidi_Control.static final intBinary property Bidi_Mirrored.static final intString property Bidi_Mirroring_Glyph.static final intOne more than the last constant for binary Unicode properties.static final intFirst constant for binary Unicode properties.static final intEnumerated property Block.static final intEnumerated property Canonical_Combining_Class.static final intString property Case_Folding.static final intBinary property Case_Sensitive.static final intBinary property Dash.static final intEnumerated property Decomposition_Type.static final intBinary property Default_Ignorable_Code_Point (new).static final intBinary property Deprecated (new).static final intBinary property Diacritic.static final intOne more than the last constant for double Unicode properties.static final intFirst constant for double Unicode properties.static final intEnumerated property East_Asian_Width.static final intBinary property Extender.static final intBinary property Full_Composition_Exclusion.static final intEnumerated property General_Category.static final intBitmask property General_Category_Mask.static final intBinary property Grapheme_Base (new).static final intEnumerated property Grapheme_Cluster_Break (new in Unicode 4.1).static final intBinary property Grapheme_Extend (new).static final intBinary property Grapheme_Link (new).static final intEnumerated property Hangul_Syllable_Type, new in Unicode 4.static final intBinary property Hex_Digit.static final intBinary property Hyphen.static final intBinary property ID_Continue.static final intBinary property ID_Start.static final intBinary property Ideographic.static final intBinary property IDS_Binary_Operator (new).static final intBinary property IDS_Trinary_Operator (new).static final intOne more than the last constant for enumerated/integer Unicode properties.static final intFirst constant for enumerated/integer Unicode properties.static final intString property ISO_Comment.static final intBinary property Join_Control.static final intEnumerated property Joining_Group.static final intEnumerated property Joining_Type.static final intEnumerated property Lead_Canonical_Combining_Class.static final intEnumerated property Line_Break.static final intBinary property Logical_Order_Exception (new).static final intBinary property Lowercase.static final intString property Lowercase_Mapping.static final intOne more than the last constant for bit-mask Unicode properties.static final intFirst constant for bit-mask Unicode properties.static final intBinary property Math.static final intString property Name.static final intBinary property NFC_Inert.static final intEnumerated property NFC_Quick_Check.static final intBinary property NFD_Inert.static final intEnumerated property NFD_Quick_Check.static final intBinary property NFKC_Inert.static final intEnumerated property NFKC_Quick_Check.static final intBinary property NFKD_Inert.static final intEnumerated property NFKD_Quick_Check.static final intBinary property Noncharacter_Code_Point.static final intEnumerated property Numeric_Type.static final intDouble property Numeric_Value.static final intBinary property Pattern_Syntax (new in Unicode 4.1).static final intBinary property Pattern_White_Space (new in Unicode 4.1).static final intBinary property alnum (a C/POSIX character class).static final intBinary property blank (a C/POSIX character class).static final intBinary property graph (a C/POSIX character class).static final intBinary property print (a C/POSIX character class).static final intBinary property xdigit (a C/POSIX character class).static final intBinary property Quotation_Mark.static final intBinary property Radical (new).static final intBinary property STerm (new in Unicode 4.0.1).static final intEnumerated property Script.static final intBinary Property Segment_Starter.static final intEnumerated property Sentence_Break (new in Unicode 4.1).static final intString property Simple_Case_Folding.static final intString property Simple_Lowercase_Mapping.static final intString property Simple_Titlecase_Mapping.static final intString property Simple_Uppercase_Mapping.static final intBinary property Soft_Dotted (new).static final intOne more than the last constant for string Unicode properties.static final intFirst constant for string Unicode properties.static final intBinary property Terminal_Punctuation.static final intString property Titlecase_Mapping.static final intEnumerated property Trail_Canonical_Combining_Class.static final intString property Unicode_1_Name.static final intBinary property Unified_Ideograph (new).static final intBinary property Uppercase.static final intString property Uppercase_Mapping.static final intBinary property Variation_Selector (new in Unicode 4.0.1).static final intBinary property White_Space.static final intEnumerated property Word_Break (new in Unicode 4.1).static final intBinary property XID_Continue.static final intBinary property XID_Start.
-
Field Details
-
ALPHABETIC
static final int ALPHABETICBinary property Alphabetic.
Property for UCharacter.isUAlphabetic(), different from the property in UCharacter.isalpha().
Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic.
- See Also:
-
BINARY_START
static final int BINARY_STARTFirst constant for binary Unicode properties.- See Also:
-
ASCII_HEX_DIGIT
static final int ASCII_HEX_DIGITBinary property ASCII_Hex_Digit (0-9 A-F a-f).- See Also:
-
BIDI_CONTROL
static final int BIDI_CONTROLBinary property Bidi_Control.
Format controls which have specific functions in the Bidi Algorithm.
- See Also:
-
BIDI_MIRRORED
static final int BIDI_MIRROREDBinary property Bidi_Mirrored.
Characters that may change display in RTL text.
Property for UCharacter.isMirrored().
See Bidi Algorithm; UTR 9.
- See Also:
-
DASH
static final int DASHBinary property Dash.
Variations of dashes.
- See Also:
-
DEFAULT_IGNORABLE_CODE_POINT
static final int DEFAULT_IGNORABLE_CODE_POINTBinary property Default_Ignorable_Code_Point (new).
Property that indicates codepoint is ignorable in most processing.
Codepoints (2060..206F, FFF0..FFFB, E0000..E0FFF) + Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space)
- See Also:
-
DEPRECATED
static final int DEPRECATEDBinary property Deprecated (new).
The usage of deprecated characters is strongly discouraged.
- See Also:
-
DIACRITIC
static final int DIACRITICBinary property Diacritic.
Characters that linguistically modify the meaning of another character to which they apply.
- See Also:
-
EXTENDER
static final int EXTENDERBinary property Extender.
Extend the value or shape of a preceding alphabetic character, e.g. length and iteration marks.
- See Also:
-
FULL_COMPOSITION_EXCLUSION
static final int FULL_COMPOSITION_EXCLUSIONBinary property Full_Composition_Exclusion.
CompositionExclusions.txt + Singleton Decompositions + Non-Starter Decompositions.
- See Also:
-
GRAPHEME_BASE
static final int GRAPHEME_BASEBinary property Grapheme_Base (new).
For programmatic determination of grapheme cluster boundaries. [0..10FFFF]-Cc-Cf-Cs-Co-Cn-Zl-Zp-Grapheme_Link-Grapheme_Extend-CGJ
- See Also:
-
GRAPHEME_EXTEND
static final int GRAPHEME_EXTENDBinary property Grapheme_Extend (new).
For programmatic determination of grapheme cluster boundaries.
Me+Mn+Mc+Other_Grapheme_Extend-Grapheme_Link-CGJ
- See Also:
-
GRAPHEME_LINK
static final int GRAPHEME_LINKBinary property Grapheme_Link (new).
For programmatic determination of grapheme cluster boundaries.
- See Also:
-
HEX_DIGIT
static final int HEX_DIGITBinary property Hex_Digit.
Characters commonly used for hexadecimal numbers.
- See Also:
-
HYPHEN
static final int HYPHENBinary property Hyphen.
Dashes used to mark connections between pieces of words, plus the Katakana middle dot.
- See Also:
-
ID_CONTINUE
static final int ID_CONTINUEBinary property ID_Continue.
Characters that can continue an identifier.
ID_Start+Mn+Mc+Nd+Pc
- See Also:
-
ID_START
static final int ID_STARTBinary property ID_Start.
Characters that can start an identifier.
Lu+Ll+Lt+Lm+Lo+Nl
- See Also:
-
IDEOGRAPHIC
static final int IDEOGRAPHICBinary property Ideographic.
CJKV ideographs.
- See Also:
-
IDS_BINARY_OPERATOR
static final int IDS_BINARY_OPERATORBinary property IDS_Binary_Operator (new).
For programmatic determination of Ideographic Description Sequences.
- See Also:
-
IDS_TRINARY_OPERATOR
static final int IDS_TRINARY_OPERATORBinary property IDS_Trinary_Operator (new).
invalid input: '<'p?For programmatic determination of Ideographic Description Sequences.- See Also:
-
JOIN_CONTROL
static final int JOIN_CONTROLBinary property Join_Control.
Format controls for cursive joining and ligation.
- See Also:
-
LOGICAL_ORDER_EXCEPTION
static final int LOGICAL_ORDER_EXCEPTIONBinary property Logical_Order_Exception (new).
Characters that do not use logical order and require special handling in most processing.
- See Also:
-
LOWERCASE
static final int LOWERCASEBinary property Lowercase.
Same as UCharacter.isULowercase(), different from UCharacter.islower().
Ll+Other_Lowercase
- See Also:
-
MATH
static final int MATHBinary property Math.
Sm+Other_Math
- See Also:
-
NONCHARACTER_CODE_POINT
static final int NONCHARACTER_CODE_POINTBinary property Noncharacter_Code_Point.
Code points that are explicitly defined as illegal for the encoding of characters.
- See Also:
-
QUOTATION_MARK
static final int QUOTATION_MARKBinary property Quotation_Mark.
- See Also:
-
RADICAL
static final int RADICALBinary property Radical (new).
For programmatic determination of Ideographic Description Sequences.
- See Also:
-
SOFT_DOTTED
static final int SOFT_DOTTEDBinary property Soft_Dotted (new).
Characters with a "soft dot", like i or j.
An accent placed on these characters causes the dot to disappear.
- See Also:
-
TERMINAL_PUNCTUATION
static final int TERMINAL_PUNCTUATIONBinary property Terminal_Punctuation.
Punctuation characters that generally mark the end of textual units.
- See Also:
-
UNIFIED_IDEOGRAPH
static final int UNIFIED_IDEOGRAPHBinary property Unified_Ideograph (new).
For programmatic determination of Ideographic Description Sequences.
- See Also:
-
UPPERCASE
static final int UPPERCASEBinary property Uppercase.
Same as UCharacter.isUUppercase(), different from UCharacter.isUpperCase().
Lu+Other_Uppercase
- See Also:
-
WHITE_SPACE
static final int WHITE_SPACEBinary property White_Space.
Same as UCharacter.isUWhiteSpace(), different from UCharacter.isSpace() and UCharacter.isWhitespace().
Space characters+TAB+CR+LF-ZWSP-ZWNBSP- See Also:
-
XID_CONTINUE
static final int XID_CONTINUEBinary property XID_Continue.
ID_Continue modified to allow closure under normalization forms NFKC and NFKD.
- See Also:
-
XID_START
static final int XID_STARTBinary property XID_Start.
ID_Start modified to allow closure under normalization forms NFKC and NFKD.
- See Also:
-
CASE_SENSITIVE
static final int CASE_SENSITIVEBinary property Case_Sensitive.
Either the source of a case mapping or _in_ the target of a case mapping. Not the same as the general category Cased_Letter.
- See Also:
-
S_TERM
static final int S_TERMBinary property STerm (new in Unicode 4.0.1). Sentence Terminal. Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/)- See Also:
-
VARIATION_SELECTOR
static final int VARIATION_SELECTORBinary property Variation_Selector (new in Unicode 4.0.1). Indicates all those characters that qualify as Variation Selectors. For details on the behavior of these characters, see StandardizedVariants.html and 15.6 Variation Selectors.- See Also:
-
NFD_INERT
static final int NFD_INERTBinary property NFD_Inert. ICU-specific property for characters that are inert under NFD, i.e., they do not interact with adjacent characters. Used for example in normalizing transforms in incremental mode to find the boundary of safely normalizable text despite possible text additions. There is one such property per normalization form. These properties are computed as follows - an inert character is: a) unassigned, or ALL of the following: b) of combining class 0. c) not decomposed by this normalization form. AND if NFC or NFKC, d) can never compose with a previous character. e) can never compose with a following character. f) can never change if another character is added. Example: a-breve might satisfy all but f, but if you add an ogonek it changes to a-ogonek + breve See also com.ibm.text.UCD.NFSkippable in the ICU4J repository, and icu/source/common/unormimp.h .- See Also:
-
NFKD_INERT
static final int NFKD_INERTBinary property NFKD_Inert. ICU-specific property for characters that are inert under NFKD, i.e., they do not interact with adjacent characters. Used for example in normalizing transforms in incremental mode to find the boundary of safely normalizable text despite possible text additions.- See Also:
-
NFC_INERT
static final int NFC_INERTBinary property NFC_Inert. ICU-specific property for characters that are inert under NFC, i.e., they do not interact with adjacent characters. Used for example in normalizing transforms in incremental mode to find the boundary of safely normalizable text despite possible text additions.- See Also:
-
NFKC_INERT
static final int NFKC_INERTBinary property NFKC_Inert. ICU-specific property for characters that are inert under NFKC, i.e., they do not interact with adjacent characters. Used for example in normalizing transforms in incremental mode to find the boundary of safely normalizable text despite possible text additions.- See Also:
-
SEGMENT_STARTER
static final int SEGMENT_STARTERBinary Property Segment_Starter. ICU-specific property for characters that are starters in terms of Unicode normalization and combining character sequences. They have ccc=0 and do not occur in non-initial position of the canonical decomposition of any character (like " in NFD(a-umlaut) and a Jamo T in an NFD(Hangul LVT)). ICU uses this property for segmenting a string for generating a set of canonically equivalent strings, e.g. for canonical closure while processing collation tailoring rules.- See Also:
-
PATTERN_SYNTAX
static final int PATTERN_SYNTAXBinary property Pattern_Syntax (new in Unicode 4.1). See UAX #31 Identifier and Pattern Syntax (http://www.unicode.org/reports/tr31/)- See Also:
-
PATTERN_WHITE_SPACE
static final int PATTERN_WHITE_SPACEBinary property Pattern_White_Space (new in Unicode 4.1). See UAX #31 Identifier and Pattern Syntax (http://www.unicode.org/reports/tr31/)- See Also:
-
POSIX_ALNUM
static final int POSIX_ALNUMBinary property alnum (a C/POSIX character class). Implemented according to the UTS #18 Annex C Standard Recommendation. See the UCharacter class documentation.- See Also:
-
POSIX_BLANK
static final int POSIX_BLANKBinary property blank (a C/POSIX character class). Implemented according to the UTS #18 Annex C Standard Recommendation. See the UCharacter class documentation.- See Also:
-
POSIX_GRAPH
static final int POSIX_GRAPHBinary property graph (a C/POSIX character class). Implemented according to the UTS #18 Annex C Standard Recommendation. See the UCharacter class documentation.- See Also:
-
POSIX_PRINT
static final int POSIX_PRINTBinary property print (a C/POSIX character class). Implemented according to the UTS #18 Annex C Standard Recommendation. See the UCharacter class documentation.- See Also:
-
POSIX_XDIGIT
static final int POSIX_XDIGITBinary property xdigit (a C/POSIX character class). Implemented according to the UTS #18 Annex C Standard Recommendation. See the UCharacter class documentation.- See Also:
-
BINARY_LIMIT
static final int BINARY_LIMITOne more than the last constant for binary Unicode properties.
- See Also:
-
BIDI_CLASS
static final int BIDI_CLASSEnumerated property Bidi_Class. Same as UCharacter.getDirection(int), returns UCharacterDirection values.- See Also:
-
INT_START
static final int INT_STARTFirst constant for enumerated/integer Unicode properties.- See Also:
-
BLOCK
static final int BLOCKEnumerated property Block. Same as UCharacter.UnicodeBlock.of(int), returns UCharacter.UnicodeBlock values.- See Also:
-
CANONICAL_COMBINING_CLASS
static final int CANONICAL_COMBINING_CLASSEnumerated property Canonical_Combining_Class. Same as UCharacter.getCombiningClass(int), returns 8-bit numeric values.- See Also:
-
DECOMPOSITION_TYPE
static final int DECOMPOSITION_TYPEEnumerated property Decomposition_Type. Returns UCharacter.DecompositionType values.- See Also:
-
EAST_ASIAN_WIDTH
static final int EAST_ASIAN_WIDTHEnumerated property East_Asian_Width. See http://www.unicode.org/reports/tr11/ Returns UCharacter.EastAsianWidth values.- See Also:
-
GENERAL_CATEGORY
static final int GENERAL_CATEGORYEnumerated property General_Category. Same as UCharacter.getType(int), returns UCharacterCategory values.- See Also:
-
JOINING_GROUP
static final int JOINING_GROUPEnumerated property Joining_Group. Returns UCharacter.JoiningGroup values.- See Also:
-
JOINING_TYPE
static final int JOINING_TYPEEnumerated property Joining_Type. Returns UCharacter.JoiningType values.- See Also:
-
LINE_BREAK
static final int LINE_BREAKEnumerated property Line_Break. Returns UCharacter.LineBreak values.- See Also:
-
NUMERIC_TYPE
static final int NUMERIC_TYPEEnumerated property Numeric_Type. Returns UCharacter.NumericType values.- See Also:
-
SCRIPT
static final int SCRIPTEnumerated property Script. Same as UScript.getScript(int), returns UScript values.- See Also:
-
HANGUL_SYLLABLE_TYPE
static final int HANGUL_SYLLABLE_TYPEEnumerated property Hangul_Syllable_Type, new in Unicode 4. Returns HangulSyllableType values.- See Also:
-
NFD_QUICK_CHECK
static final int NFD_QUICK_CHECKEnumerated property NFD_Quick_Check. Returns numeric values compatible with Normalizer.QuickCheckResult.- See Also:
-
NFKD_QUICK_CHECK
static final int NFKD_QUICK_CHECKEnumerated property NFKD_Quick_Check. Returns numeric values compatible with Normalizer.QuickCheckResult.- See Also:
-
NFC_QUICK_CHECK
static final int NFC_QUICK_CHECKEnumerated property NFC_Quick_Check. Returns numeric values compatible with Normalizer.QuickCheckResult.- See Also:
-
NFKC_QUICK_CHECK
static final int NFKC_QUICK_CHECKEnumerated property NFKC_Quick_Check. Returns numeric values compatible with Normalizer.QuickCheckResult.- See Also:
-
LEAD_CANONICAL_COMBINING_CLASS
static final int LEAD_CANONICAL_COMBINING_CLASSEnumerated property Lead_Canonical_Combining_Class. ICU-specific property for the ccc of the first code point of the decomposition, or lccc(c)=ccc(NFD(c)[0]). Useful for checking for canonically ordered text; see Normalizer.FCD and http://www.unicode.org/notes/tn5/#FCD . Returns 8-bit numeric values like CANONICAL_COMBINING_CLASS.- See Also:
-
TRAIL_CANONICAL_COMBINING_CLASS
static final int TRAIL_CANONICAL_COMBINING_CLASSEnumerated property Trail_Canonical_Combining_Class. ICU-specific property for the ccc of the last code point of the decomposition, or lccc(c)=ccc(NFD(c)[last]). Useful for checking for canonically ordered text; see Normalizer.FCD and http://www.unicode.org/notes/tn5/#FCD . Returns 8-bit numeric values like CANONICAL_COMBINING_CLASS.- See Also:
-
GRAPHEME_CLUSTER_BREAK
static final int GRAPHEME_CLUSTER_BREAKEnumerated property Grapheme_Cluster_Break (new in Unicode 4.1). Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns UGraphemeClusterBreak values.- See Also:
-
SENTENCE_BREAK
static final int SENTENCE_BREAKEnumerated property Sentence_Break (new in Unicode 4.1). Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns USentenceBreak values.- See Also:
-
WORD_BREAK
static final int WORD_BREAKEnumerated property Word_Break (new in Unicode 4.1). Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns UWordBreakValues values.- See Also:
-
INT_LIMIT
static final int INT_LIMITOne more than the last constant for enumerated/integer Unicode properties.- See Also:
-
GENERAL_CATEGORY_MASK
static final int GENERAL_CATEGORY_MASKBitmask property General_Category_Mask. This is the General_Category property returned as a bit mask. When used in UCharacter.getIntPropertyValue(c), returns bit masks for UCharacterCategory values where exactly one bit is set. When used with UCharacter.getPropertyValueName() and UCharacter.getPropertyValueEnum(), a multi-bit mask is used for sets of categories like "Letters".- See Also:
-
MASK_START
static final int MASK_STARTFirst constant for bit-mask Unicode properties.- See Also:
-
MASK_LIMIT
static final int MASK_LIMITOne more than the last constant for bit-mask Unicode properties.- See Also:
-
NUMERIC_VALUE
static final int NUMERIC_VALUEDouble property Numeric_Value. Corresponds to UCharacter.getUnicodeNumericValue(int).- See Also:
-
DOUBLE_START
static final int DOUBLE_STARTFirst constant for double Unicode properties.- See Also:
-
DOUBLE_LIMIT
static final int DOUBLE_LIMITOne more than the last constant for double Unicode properties.- See Also:
-
AGE
static final int AGEString property Age. Corresponds to UCharacter.getAge(int).- See Also:
-
STRING_START
static final int STRING_STARTFirst constant for string Unicode properties.- See Also:
-
BIDI_MIRRORING_GLYPH
static final int BIDI_MIRRORING_GLYPHString property Bidi_Mirroring_Glyph. Corresponds to UCharacter.getMirror(int).- See Also:
-
CASE_FOLDING
static final int CASE_FOLDINGString property Case_Folding. Corresponds to UCharacter.foldCase(String, boolean).- See Also:
-
ISO_COMMENT
static final int ISO_COMMENTString property ISO_Comment. Corresponds to UCharacter.getISOComment(int).- See Also:
-
LOWERCASE_MAPPING
static final int LOWERCASE_MAPPINGString property Lowercase_Mapping. Corresponds to UCharacter.toLowerCase(String).- See Also:
-
NAME
static final int NAMEString property Name. Corresponds to UCharacter.getName(int).- See Also:
-
SIMPLE_CASE_FOLDING
static final int SIMPLE_CASE_FOLDINGString property Simple_Case_Folding. Corresponds to UCharacter.foldCase(int, boolean).- See Also:
-
SIMPLE_LOWERCASE_MAPPING
static final int SIMPLE_LOWERCASE_MAPPINGString property Simple_Lowercase_Mapping. Corresponds to UCharacter.toLowerCase(int).- See Also:
-
SIMPLE_TITLECASE_MAPPING
static final int SIMPLE_TITLECASE_MAPPINGString property Simple_Titlecase_Mapping. Corresponds to UCharacter.toTitleCase(int).- See Also:
-
SIMPLE_UPPERCASE_MAPPING
static final int SIMPLE_UPPERCASE_MAPPINGString property Simple_Uppercase_Mapping. Corresponds to UCharacter.toUpperCase(int).- See Also:
-
TITLECASE_MAPPING
static final int TITLECASE_MAPPINGString property Titlecase_Mapping. Corresponds to UCharacter.toTitleCase(String).- See Also:
-
UNICODE_1_NAME
static final int UNICODE_1_NAMEString property Unicode_1_Name. Corresponds to UCharacter.getName1_0(int).- See Also:
-
UPPERCASE_MAPPING
static final int UPPERCASE_MAPPINGString property Uppercase_Mapping. Corresponds to UCharacter.toUpperCase(String).- See Also:
-
STRING_LIMIT
static final int STRING_LIMITOne more than the last constant for string Unicode properties.- See Also:
-