public abstract class UnicodeString extends java.lang.Object implements AtomicMatchKey, java.lang.Comparable<UnicodeString>
The interface is future-proofed to support code points in the range 0 to 2^31, and string lengths of up to 2^63 characters. Implementations may (and do) impose lower limits.
| Constructor and Description |
|---|
UnicodeString() |
| Modifier and Type | Method and Description |
|---|---|
AtomicValue |
asAtomic()
Get an atomic value that encapsulates this match key.
|
protected void |
checkSubstringBounds(long start,
long end) |
abstract int |
codePointAt(long index)
Get the code point at a given position in the string
|
abstract IntIterator |
codePoints()
Get an iterator over the code points present in the string.
|
int |
compareTo(UnicodeString other)
Compare this string to another using codepoint comparison
|
UnicodeString |
concat(UnicodeString other)
Concatenate with another string, returning a new string
|
UnicodeString |
economize() |
boolean |
equals(java.lang.Object obj) |
long |
estimatedLength()
Get the estimated length of the string, suitable for space allocation.
|
abstract int |
getWidth()
Get the number of bits needed to hold all the characters in this string
|
int |
hashCode()
Compute a hashCode.
|
boolean |
hasSubstring(UnicodeString other,
long offset)
Ask whether this string has another string as its content starting at a given offset
|
long |
indexOf(int codePoint)
Get the position of the first occurrence of the specified codepoint,
starting the search at the beginning
|
abstract long |
indexOf(int codePoint,
long from)
Get the position of the first occurrence of the specified codepoint,
starting the search at a given position in the string
|
long |
indexOf(UnicodeString other,
long from)
Get the first position, at or beyond
from, where another string appears as a substring
of this string, comparing codepoints. |
long |
indexWhere(java.util.function.IntPredicate predicate,
long from)
Get the position of the first occurrence of a codepoint that matches a supplied predicate,
starting the search at a given position in the string
|
boolean |
isEmpty()
Ask whether the string is empty
|
abstract long |
length()
Get the length of the string
|
int |
length32()
Get the length of the string, provided it is less than 2^31 characters
|
UnicodeString |
prefix(long end)
Get a substring of this string, starting at position 0, with a given end position
|
static int |
requireInt(long value)
Utility method for use where strings longer than 2^31 characters cannot yet be handled.
|
UnicodeString |
substring(long start)
Get a substring of this codepoint sequence, with a given start position,
finishing at the end of the string
|
abstract UnicodeString |
substring(long start,
long end)
Get a substring of this string, with a given start and end position
|
UnicodeString |
tidy()
Ensure that the implementation is capable of counting codepoints in the string.
|
void |
verifyCharacters()
Diagnostic method: verify that all the characters in the string are valid XML codepoints
|
public UnicodeString tidy()
UnicodeString, or another that represents the same sequence
of characters.public UnicodeString economize()
public abstract long length()
public int length32()
intjava.lang.UnsupportedOperationException - if the string is longer than 2^31 characterspublic long estimatedLength()
UnicodeString, the actual length of the string in codepointspublic boolean isEmpty()
public abstract int getWidth()
public long indexOf(int codePoint)
codePoint - the sought codePointjava.lang.UnsupportedOperationException - if the UnicodeString has not been prepared
for codePoint accesspublic abstract long indexOf(int codePoint,
long from)
codePoint - the sought codePointfrom - the position from which the search should start (0-based)java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared
for codePoint accesspublic long indexWhere(java.util.function.IntPredicate predicate,
long from)
predicate - condition that the codepoint must satisfyfrom - the position from which the search should start (0-based)java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared
for codePoint accesspublic long indexOf(UnicodeString other, long from)
from, where another string appears as a substring
of this string, comparing codepoints.other - the other (sought) stringfrom - the position (0-based) where searching is to start (counting in codepoints)public boolean hasSubstring(UnicodeString other, long offset)
other - the other stringoffset - the starting position in this string (counting in codepoints)public abstract IntIterator codePoints()
public abstract int codePointAt(long index)
index - the given position (0-based)java.lang.IndexOutOfBoundsException - if the index is out of rangepublic UnicodeString substring(long start)
start - the start position (0-based): that is, the position of the first
code point to be includedjava.lang.IndexOutOfBoundsException - if the start position is out of rangepublic abstract UnicodeString substring(long start, long end)
start - the start position (0-based): that is, the position of the first
code point to be includedend - the end position (0-based): specifically, the position of the first
code point not to be includedjava.lang.IndexOutOfBoundsException - if the start/end positions are out of range (the conditions
are the same as for String.substring())public UnicodeString prefix(long end)
end - the end position (0-based): specifically, the position of the first
code point not to be includedjava.lang.IndexOutOfBoundsException - if the end position is out of rangepublic UnicodeString concat(UnicodeString other)
other - the string to be appendedprotected void checkSubstringBounds(long start,
long end)
public void verifyCharacters()
java.lang.IllegalStateException - if the contents are invalidpublic boolean equals(java.lang.Object obj)
equals in class java.lang.Objectpublic int hashCode()
UnicodeString use compatible hash codes and the
hashing algorithm is therefore identical to that for java.lang.String. This means
that for strings containing Astral characters, the hash code needs to be computed by decomposing
an Astral character into a surrogate pair.hashCode in class java.lang.Objectpublic int compareTo(UnicodeString other)
compareTo in interface java.lang.Comparable<UnicodeString>other - the other stringpublic AtomicValue asAtomic()
asAtomic in interface AtomicMatchKeypublic static int requireInt(long value)
value - the actual value of a character position within a string, or the length of
a stringjava.lang.UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUECopyright (c) 2004-2022 Saxonica Limited. All rights reserved.