public class UnicodeRegex extends Object implements Cloneable, Freezable<UnicodeRegex>, StringTransform
TODO: Move to org.graalvm.shadowed.com.ibm.icu.dev.somewhere. 2015-sep-03: This is used there, and also in CLDR and in UnicodeTools.
| Constructor and Description |
|---|
UnicodeRegex() |
| Modifier and Type | Method and Description |
|---|---|
static List<String> |
appendLines(List<String> result,
InputStream inputStream,
String encoding)
Utility for loading lines from a UTF8 file.
|
static List<String> |
appendLines(List<String> result,
String file,
String encoding)
Utility for loading lines from a file.
|
UnicodeRegex |
cloneAsThawed()
Provides for the clone operation.
|
static Pattern |
compile(String regex)
Compile a regex string, after processing by fix(...).
|
static Pattern |
compile(String regex,
int options)
Compile a regex string, after processing by fix(...).
|
String |
compileBnf(List<String> lines)
Compile a composed string from a set of BNF lines, such as for composing a regex
expression.
|
String |
compileBnf(String bnfLines)
Compile a composed string from a set of BNF lines; see the List version for more information.
|
static String |
fix(String regex)
Convenience static function, using standard parameters.
|
UnicodeRegex |
freeze()
Freezes the object.
|
String |
getBnfCommentString() |
String |
getBnfLineSeparator() |
String |
getBnfVariableInfix() |
SymbolTable |
getSymbolTable()
Set the symbol table for internal processing
|
boolean |
isFrozen()
Determines whether the object has been frozen or not.
|
void |
setBnfCommentString(String bnfCommentString) |
void |
setBnfLineSeparator(String bnfLineSeparator) |
void |
setBnfVariableInfix(String bnfVariableInfix) |
UnicodeRegex |
setSymbolTable(SymbolTable symbolTable)
Get the symbol table for internal processing
|
String |
transform(String regex)
Adds full Unicode property support, with the latest version of Unicode,
to Java Regex, bringing it up to Level 1 (see
http://www.unicode.org/reports/tr18/).
|
public SymbolTable getSymbolTable()
public UnicodeRegex setSymbolTable(SymbolTable symbolTable)
public String transform(String regex)
Not thread-safe; create a separate copy for different threads.
In the future, we may extend this to support other regex packages.
transform in interface StringTransformtransform in interface Transform<String,String>regex - A modified Java regex pattern, as in the input to
Pattern.compile(), except that all "character classes" are
processed as if they were UnicodeSet patterns. Example:
"abc[:bc=N:]. See UnicodeSet for the differences in syntax.public static String fix(String regex)
regex - as in process()public static Pattern compile(String regex)
regex - Raw regex pattern, as in fix(...).public static Pattern compile(String regex, int options)
regex - Raw regex pattern, as in fix(...).public String compileBnf(String bnfLines)
bnfLines - Series of BNF lines.public String compileBnf(List<String> lines)
Example:
uri = (?: (scheme) \\:)? (host) (?: \\? (query))? (?: \\u0023 (fragment))?; scheme = reserved+; host = // reserved+; query = [\\=reserved]+; fragment = reserved+; reserved = [[:ascii:][:alphabetic:]];
Caveats: at this point the parsing is simple; for example, # cannot be quoted (use \\u0023); you can set it to null to disable. The equality sign and a few others can be reset with setBnfX().
lines - Series of lines that represent a BNF expression. The lines contain
a series of statements that of the form x=y;. A statement can take
multiple lines, but there can't be multiple statements on a line.
A hash quotes to the end of the line.public String getBnfCommentString()
public void setBnfCommentString(String bnfCommentString)
public String getBnfVariableInfix()
public void setBnfVariableInfix(String bnfVariableInfix)
public String getBnfLineSeparator()
public void setBnfLineSeparator(String bnfLineSeparator)
public static List<String> appendLines(List<String> result, String file, String encoding) throws IOException
result - The result of the appended lines.file - The file to have an input stream.encoding - if null, then UTF-8IOException - If there were problems opening the file for input stream.public static List<String> appendLines(List<String> result, InputStream inputStream, String encoding) throws UnsupportedEncodingException, IOException
result - The result of the appended lines.inputStream - The input stream.encoding - if null, then UTF-8IOException - If there were problems opening the input stream for reading.UnsupportedEncodingExceptionpublic UnicodeRegex cloneAsThawed()
FreezablecloneAsThawed in interface Freezable<UnicodeRegex>public UnicodeRegex freeze()
Freezablefreeze in interface Freezable<UnicodeRegex>public boolean isFrozen()
FreezableisFrozen in interface Freezable<UnicodeRegex>