public class Preprocess extends InternalModule
Can process following formats:
May include:
| Modifier and Type | Field and Description |
|---|---|
protected String |
cardinalRule |
protected String |
ordinalRule |
protected String |
yearRule |
logger, stateMODULE_OFFLINE, MODULE_RUNNING| Constructor and Description |
|---|
Preprocess() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
expand(Document doc)
processes a document in mary xml format, from Tokens to Words which can be phonemised.
|
protected String |
expandAbbreviation(String abbrev,
boolean isCapital) |
protected String |
expandAcronym(String acronym) |
protected String |
expandConsonants(String consonants)
add a space between each char of a string
|
protected String |
expandDate(String date) |
protected String |
expandDuration(String duration) |
protected String |
expandHashtag(String hashtag) |
protected String |
expandMoney(String money,
String currency) |
protected String |
expandNumber(double number) |
protected String |
expandNumberS(String numberS)
expands a digit followed by an s.
|
protected String |
expandOrdinal(double number) |
protected String |
expandRange(String range) |
protected String |
expandRealNumber(String number) |
protected String |
expandTime(String time,
boolean isNextTokenTime) |
protected String |
expandURL(String email)
expand a URL string partially by splitting by @, / and .
|
protected String |
expandWordNumber(String wordnumseq) |
protected String |
expandYear(double number) |
protected String |
expandYearBCAD(String year) |
protected static String |
getOrdinalRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
Try to extract the rule name for "expand ordinal" from the given RuleBasedNumberFormat.
|
protected static String |
getYearRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
Try to extract the rule name for "expand year" from the given RuleBasedNumberFormat.
|
static Map<Object,Object> |
loadAbbrevMap() |
MaryData |
process(MaryData d) |
protected String |
splitContraction(String contraction) |
getInputType, getLocale, getOutputType, getState, inputType, name, outputType, powerOnSelfTest, shutdown, startupprotected final String cardinalRule
protected final String ordinalRule
protected final String yearRule
public MaryData process(MaryData d) throws Exception
process in interface MaryModuleprocess in class InternalModuleExceptionprotected void expand(Document doc) throws ParseException, IOException, MaryConfigurationException
doc - docParseException - parse exceptionIOException - IO ExceptionMaryConfigurationException - mary configuration exceptionprotected String expandNumber(double number)
protected String expandOrdinal(double number)
protected String expandYear(double number)
protected String expandURL(String email)
email - emailprotected String expandConsonants(String consonants)
consonants - consonantsprotected String expandNumberS(String numberS)
numberS - numberSprotected String expandAbbreviation(String abbrev, boolean isCapital)
abbrev - the token to be expandedisCapital - whether the following token begins with a capital letterprotected String expandDate(String date) throws ParseException
ParseExceptionprotected String expandTime(String time, boolean isNextTokenTime)
time - the token to be expandedisNextTokenTime - whether the following token contains am or pmprotected static String getOrdinalRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
The rule name is locale sensitive, but usually starts with "%spellout-ordinal".
rbnf - The RuleBasedNumberFormat from where we will try to extract the rule name.protected static String getYearRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
The rule name is locale sensitive, but usually starts with "%spellout-numbering-year".
rbnf - The RuleBasedNumberFormat from where we will try to extract the rule name.public static Map<Object,Object> loadAbbrevMap() throws IOException
IOExceptionCopyright © 2000–2022 DFKI GmbH. All rights reserved.