Package opennlp.tools.tokenize
Class TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer default implementations and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator, Dictionary etc.-
Constructor Summary
ConstructorsConstructorDescriptionCreates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Creates aTokenizerFactory. -
Method Summary
Modifier and TypeMethodDescriptionstatic TokenizerFactorycreate(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses create a newTokenizerFactory.Creates aMapwith pairs of keys and objects.Creates the manifest entries that will be added to the model manifestGets the abbreviation dictionaryGets the alpha numeric pattern.Gets the context generatorRetrieves the language code.booleanGets whether to use alphanumeric optimization.voidValidates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Constructor Details
-
TokenizerFactory
public TokenizerFactory()Creates aTokenizerFactorythat provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Creates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameters:
languageCode- the language of the natural textabbreviationDictionary- an abbreviations dictionaryuseAlphaNumericOptimization- if true alpha numerics are skippedalphaNumericPattern- null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC
-
-
Method Details
-
validateArtifactMap
Description copied from class:BaseToolFactoryValidates the parsed artifacts. If something is not valid subclasses should throw anInvalidFormatException. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.- Specified by:
validateArtifactMapin classBaseToolFactory- Throws:
InvalidFormatException
-
createArtifactMap
Description copied from class:BaseToolFactoryCreates aMapwith pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMapthat should be populated by sub-classes.- Overrides:
createArtifactMapin classBaseToolFactory
-
createManifestEntries
Description copied from class:BaseToolFactoryCreates the manifest entries that will be added to the model manifest- Overrides:
createManifestEntriesin classBaseToolFactory- Returns:
- the manifest entries to added to the model manifest
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses create a newTokenizerFactory.- Parameters:
subclassName- the name of the class implementing theTokenizerFactorylanguageCode- the language code the tokenizer should useabbreviationDictionary- an optional dictionary containing abbreviations, or null if not presentuseAlphaNumericOptimization- indicate if the alpha numeric optimization should be enabled or disabledalphaNumericPattern- the pattern the alpha numeric optimization should use- Returns:
- the instance of the Tokenizer Factory
- Throws:
InvalidFormatException- if once of the input parameters doesn't comply if the expected format
-
getAlphaNumericPattern
Gets the alpha numeric pattern.- Returns:
- the user specified alpha numeric pattern or a default.
-
isUseAlphaNumericOptmization
public boolean isUseAlphaNumericOptmization()Gets whether to use alphanumeric optimization.- Returns:
- true if the alpha numeric optimization is enabled, otherwise false
-
getAbbreviationDictionary
Gets the abbreviation dictionary- Returns:
- null or the abbreviation dictionary
-
getLanguageCode
Retrieves the language code.- Returns:
- the language code
-
getContextGenerator
Gets the context generator- Returns:
- a new instance of the context generator
-