Class UniCharIterator

java.lang.Object
com.adobe.xfa.ut.UniCharIterator

public class UniCharIterator extends Object
Allow iteration by Unicode characters over a Java UTF-16 encoded string.

A Java character is only a 16-bit quantity. Unicode characters can have values up to 0x10FFFF, which exceeds the space available in a Java character. When such Unicode characters appear in a Java string they are encoded using the UTF-16 encoding and occupuy two consecutive Java characters, known as a surrogate pair.

This class allows the caller to step through a Java string it true Unicode character amounts. It also provides some static methods to generate Java characters from Unicode characters.

An iterator instance is associated with an instance of the Java CharSequence interface. This interface is implemented by both the String and StringBuilder classes.

At any given time, one can think of the iterator as being positioned between characters in the associated character sequence. It can also be positioned before the first character and after the last. Operations move the iterator forward or backward in the underlying character sequence and return the Unicode character passed over.

The iterator carries an index number that can be useful for indexing into the character sequence independently of the iterator. Index values start at zero and count up to the number of Java characters in the sequence. Index zero is before the first character, index one is between the first and second characters, and so on.

It does not make sense for the iterator to be positioned between the two Java characters making up a surrogate pair. Subsequent operations could lead to assertion errors and unpredictable results.

Note: The iterator caches the length of the given character sequence. If the caller is using an iterator and modifies the sequence in such a way that its length changes, it must call an associate() overload to re-establish the length.

  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor.
    Construct an iterator associated with a given character sequence.
    UniCharIterator(CharSequence charSequence, int index)
    Construct an iterator associated with a given sequence, and initially positioned at a specified index.
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    append(StringBuilder s, int c)
    Append a Unicode character to a Java StringBuilder.
    void
    attach(CharSequence charSequence)
    Attach the iterator to a given character sequence.
    void
    attach(CharSequence charSequence, int index)
    Attach the iterator to a given sequence, and initially positioned at a specified index.
    int
    Get the current Java character index number of the iterator.
    boolean
    Query whether the iterator is at the end of the text.
    boolean
    Query whether the iterator is at the the of the text.
    int
    Advance the iterator by one Unicode character.
    int
    Back up the iterator by one Unicode character.
    void
    setIndex(int index)
    Set the iterator's index.
    static String
    toString(int c)
    Return a Java string that represents the given Unicode character.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • UniCharIterator

      public UniCharIterator()
      Default constructor.

      The iterator is not associated with any character sequence, and is not particularly useful until the attach() method is called.

    • UniCharIterator

      public UniCharIterator(CharSequence charSequence)
      Construct an iterator associated with a given character sequence. The iterator is initially positioned before the first character in the sequence.
      Parameters:
      charSequence - Character sequence to associate the iterator with.
    • UniCharIterator

      public UniCharIterator(CharSequence charSequence, int index)
      Construct an iterator associated with a given sequence, and initially positioned at a specified index.
      Parameters:
      charSequence - Character sequence to associate the iterator with.
      index - Index number into the character sequence, with meaning as described above.
  • Method Details

    • append

      public static void append(StringBuilder s, int c)
      Append a Unicode character to a Java StringBuilder. This method determines whether the Unicode character can be represented as a single Java character or must be a surrogate pair. It then adds the appropriate Java character(s) to the given string buffer.
      Parameters:
      s - String buffer to add to.
      c - Unicode character to be added.
    • attach

      public void attach(CharSequence charSequence)
      Attach the iterator to a given character sequence. The iterator is initially positioned before the first character in the sequence.
      Parameters:
      charSequence - Character sequence to associate the iterator with.
    • attach

      public void attach(CharSequence charSequence, int index)
      Attach the iterator to a given sequence, and initially positioned at a specified index.
      Parameters:
      charSequence - Character sequence to associate the iterator with.
      index - Index number into the character sequence, with meaning as described above.
    • getIndex

      public int getIndex()
      Get the current Java character index number of the iterator.
      Returns:
      Index number, as described above.
    • isAtEnd

      public boolean isAtEnd()
      Query whether the iterator is at the end of the text.
      Returns:
      True if the iterator is positioned after the last character in the underlying text; false if not.
    • isAtStart

      public boolean isAtStart()
      Query whether the iterator is at the the of the text.
      Returns:
      True if the iterator is positioned before the first character in the underlying text; false if not.
    • next

      public int next()
      Advance the iterator by one Unicode character. The iterator will not be advanced if it is already positioined after the last Java character in the sequence. The iterator's index will increase by one or two, depending on the makeup of the Unicode character it advances over.
      Returns:
      Unicode character advanced over.
    • prev

      public int prev()
      Back up the iterator by one Unicode character. The iterator will not be moved if it is already positioined after the last Java character in the sequence. The iterator's index will decrease by one or two, depending on the makeup of the Unicode character it moves over.
      Returns:
      Unicode character passed over.
    • setIndex

      public void setIndex(int index)
      Set the iterator's index. This method changes the index, but keeps the iterator associated with the same character sequence.
      Parameters:
      index - New index to set for this iterator.
    • toString

      public static String toString(int c)
      Return a Java string that represents the given Unicode character.
      Parameters:
      c - Unicode character to convert to a Java string.
      Returns:
      Resulting String. If the character is less than 0x10000, the result will simply contain the single character passed in. Otherwise it will contain the two characters making up the surrogate pair.