Package java.text

Class Collator

java.lang.Object
java.text.Collator
All Implemented Interfaces:
Cloneable, Comparator<Object>
Direct Known Subclasses:
RuleBasedCollator

public abstract class Collator
extends Object
implements Comparator<Object>, Cloneable
Performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 4 different levels of strength used in comparisons:

  • PRIMARY strength: Typically, this is used to denote differences between base characters (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are divided into different sections by base character.
  • SECONDARY strength: Accents in the characters are considered secondary differences (for example, "as" < "às" < "at"). Other differences between letters can also be considered secondary differences, depending on the language. A secondary difference is ignored when there is a primary difference anywhere in the strings.
  • TERTIARY strength: Upper and lower case differences in characters are distinguished at tertiary strength (for example, "ao" < "Ao" < "aò"). In addition, a variant of a letter differs from the base form on the tertiary strength (such as "A" and "Ⓐ"). Another example is the difference between large and small Kana. A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
  • IDENTICAL strength: When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. For example, Hebrew cantellation marks are only distinguished at this strength. This strength should be used sparingly, as only code point value differences between two strings are an extremely rare occurrence. Using this strength substantially decreases the performance for both comparison and collation key generation APIs. This strength also increases the size of the collation key.

This Collator deals only with two decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

Examples:

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }
 

The following example shows how to compare two strings using the collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(Collator.NO_DECOMPOSITION);
 if (myCollator.compare("ḁ̀", "ḁ̀") != 0) {
     System.out.println("ḁ̀ is not equal to ḁ̀ without decomposition");
     myCollator.setDecomposition(Collator.CANONICAL_DECOMPOSITION);
     if (myCollator.compare("ḁ̀", "ḁ̀") != 0) {
         System.out.println("Error: ḁ̀ should be equal to ḁ̀ with decomposition");
     } else {
         System.out.println("ḁ̀ is equal to ḁ̀ with decomposition");
     }
 } else {
     System.out.println("Error: ḁ̀ should be not equal to ḁ̀ without decomposition");
 }
 
See Also:
RuleBasedCollator, CollationKey
  • Field Summary

    Fields
    Modifier and Type Field Description
    static int CANONICAL_DECOMPOSITION
    Constant used to specify the decomposition rule.
    static int FULL_DECOMPOSITION
    Constant used to specify the decomposition rule.
    static int IDENTICAL
    Constant used to specify the collation strength.
    static int NO_DECOMPOSITION
    Constant used to specify the decomposition rule.
    static int PRIMARY
    Constant used to specify the collation strength.
    static int SECONDARY
    Constant used to specify the collation strength.
    static int TERTIARY
    Constant used to specify the collation strength.
  • Constructor Summary

    Constructors
    Modifier Constructor Description
    protected Collator()
    Constructs a new Collator instance.
  • Method Summary

    Modifier and Type Method Description
    Object clone()
    Returns a new collator with the same decomposition mode and strength value as this collator.
    int compare​(Object object1, Object object2)
    Compares two objects to determine their relative order.
    abstract int compare​(String string1, String string2)
    Compares two strings to determine their relative order.
    boolean equals​(Object object)
    Compares this collator with the specified object and indicates if they are equal.
    boolean equals​(String string1, String string2)
    Compares two strings using the collation rules to determine if they are equal.
    static Locale[] getAvailableLocales()
    Returns an array of locales for which custom Collator instances are available.
    abstract CollationKey getCollationKey​(String string)
    Returns a CollationKey for the specified string for this collator with the current decomposition rule and strength value.
    int getDecomposition()
    Returns the decomposition rule for this collator.
    static Collator getInstance()
    Returns a Collator instance which is appropriate for the user's default Locale.
    static Collator getInstance​(Locale locale)
    Returns a Collator instance which is appropriate for locale.
    int getStrength()
    Returns the strength value for this collator.
    abstract int hashCode()
    Returns an integer hash code for this object.
    void setDecomposition​(int value)
    Sets the decomposition rule for this collator.
    void setStrength​(int value)
    Sets the strength value for this collator.

    Methods inherited from class java.lang.Object

    finalize, getClass, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • NO_DECOMPOSITION

      public static final int NO_DECOMPOSITION
      Constant used to specify the decomposition rule.
      See Also:
      Constant Field Values
    • CANONICAL_DECOMPOSITION

      public static final int CANONICAL_DECOMPOSITION
      Constant used to specify the decomposition rule.
      See Also:
      Constant Field Values
    • FULL_DECOMPOSITION

      public static final int FULL_DECOMPOSITION
      Constant used to specify the decomposition rule. This value for decomposition is not supported.
      See Also:
      Constant Field Values
    • PRIMARY

      public static final int PRIMARY
      Constant used to specify the collation strength.
      See Also:
      Constant Field Values
    • SECONDARY

      public static final int SECONDARY
      Constant used to specify the collation strength.
      See Also:
      Constant Field Values
    • TERTIARY

      public static final int TERTIARY
      Constant used to specify the collation strength.
      See Also:
      Constant Field Values
    • IDENTICAL

      public static final int IDENTICAL
      Constant used to specify the collation strength.
      See Also:
      Constant Field Values
  • Constructor Details

    • Collator

      protected Collator()
      Constructs a new Collator instance.
  • Method Details

    • clone

      public Object clone()
      Returns a new collator with the same decomposition mode and strength value as this collator.
      Overrides:
      clone in class Object
      Returns:
      a shallow copy of this collator.
      See Also:
      Cloneable
    • compare

      public int compare​(Object object1, Object object2)
      Compares two objects to determine their relative order. The objects must be strings.
      Specified by:
      compare in interface Comparator<Object>
      Parameters:
      object1 - the first string to compare.
      object2 - the second string to compare.
      Returns:
      a negative value if object1 is less than object2, 0 if they are equal, and a positive value if object1 is greater than object2.
      Throws:
      ClassCastException - if object1 or object2 is not a String.
    • compare

      public abstract int compare​(String string1, String string2)
      Compares two strings to determine their relative order.
      Parameters:
      string1 - the first string to compare.
      string2 - the second string to compare.
      Returns:
      a negative value if string1 is less than string2, 0 if they are equal and a positive value if string1 is greater than string2.
    • equals

      public boolean equals​(Object object)
      Compares this collator with the specified object and indicates if they are equal.
      Specified by:
      equals in interface Comparator<Object>
      Overrides:
      equals in class Object
      Parameters:
      object - the object to compare with this object.
      Returns:
      true if object is a Collator object and it has the same strength and decomposition values as this collator; false otherwise.
      See Also:
      hashCode()
    • equals

      public boolean equals​(String string1, String string2)
      Compares two strings using the collation rules to determine if they are equal.
      Parameters:
      string1 - the first string to compare.
      string2 - the second string to compare.
      Returns:
      true if string1 and string2 are equal using the collation rules, false otherwise.
    • getAvailableLocales

      public static Locale[] getAvailableLocales()
      Returns an array of locales for which custom Collator instances are available.

      Note that Android does not support user-supplied locale service providers.

    • getCollationKey

      public abstract CollationKey getCollationKey​(String string)
      Returns a CollationKey for the specified string for this collator with the current decomposition rule and strength value.
      Parameters:
      string - the source string that is converted into a collation key.
      Returns:
      the collation key for string.
    • getDecomposition

      public int getDecomposition()
      Returns the decomposition rule for this collator.
      Returns:
      the decomposition rule, either NO_DECOMPOSITION or CANONICAL_DECOMPOSITION. FULL_DECOMPOSITION is not supported.
    • getInstance

      public static Collator getInstance()
      Returns a Collator instance which is appropriate for the user's default Locale. See "Be wary of the default locale".
    • getInstance

      public static Collator getInstance​(Locale locale)
      Returns a Collator instance which is appropriate for locale.
    • getStrength

      public int getStrength()
      Returns the strength value for this collator.
      Returns:
      the strength value, either PRIMARY, SECONDARY, TERTIARY or IDENTICAL.
    • hashCode

      public abstract int hashCode()
      Description copied from class: Object
      Returns an integer hash code for this object. By contract, any two objects for which Object.equals(java.lang.Object) returns true must return the same hash code value. This means that subclasses of Object usually override both methods or neither method.

      Note that hash values must not change over time unless information used in equals comparisons also changes.

      See Writing a correct hashCode method if you intend implementing your own hashCode method.

      Overrides:
      hashCode in class Object
      Returns:
      this object's hash code.
      See Also:
      Object.equals(java.lang.Object)
    • setDecomposition

      public void setDecomposition​(int value)
      Sets the decomposition rule for this collator.
      Parameters:
      value - the decomposition rule, either NO_DECOMPOSITION or CANONICAL_DECOMPOSITION. FULL_DECOMPOSITION is not supported.
      Throws:
      IllegalArgumentException - if the provided decomposition rule is not valid. This includes FULL_DECOMPOSITION.
    • setStrength

      public void setStrength​(int value)
      Sets the strength value for this collator.
      Parameters:
      value - the strength value, either PRIMARY, SECONDARY, TERTIARY, or IDENTICAL.
      Throws:
      IllegalArgumentException - if the provided strength value is not valid.