Package opennlp.tools.tokenize
Class TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer
default implementation and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator
, Dictionary
etc.-
Constructor Summary
ConstructorDescriptionInstantiates aTokenizerFactory
that provides the default implementation of the resources.TokenizerFactory
(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory
. -
Method Summary
Modifier and TypeMethodDescriptionstatic TokenizerFactory
create
(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses instantiate a newTokenizerFactory
.A model's implementation should call this constructor that creates a model programmatically.boolean
void
Validates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Constructor Details
-
TokenizerFactory
public TokenizerFactory()Instantiates aTokenizerFactory
that provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory
. Use this constructor to programmatically create a factory.- Parameters:
languageCode
- The ISO language code to be used for this factory.abbreviationDictionary
- TheDictionary
which holds abbreviations.useAlphaNumericOptimization
- Whether alphanumerics are skipped, or not.alphaNumericPattern
-null
or a custom alphanumericPattern
(default is:"^[A-Za-z0-9]+$"
, provided byFactory.DEFAULT_ALPHANUMERIC
.
-
-
Method Details
-
validateArtifactMap
Description copied from class:BaseToolFactory
Validates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMap
at the beginning of this method.- Specified by:
validateArtifactMap
in classBaseToolFactory
- Throws:
InvalidFormatException
- Thrown if validation found invalid states.
-
createArtifactMap
Description copied from class:BaseToolFactory
A model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMap
that should be populated by subclasses.- Overrides:
createArtifactMap
in classBaseToolFactory
- Returns:
- Retrieves a
Map
with pairs of keys and objects.
-
createManifestEntries
- Overrides:
createManifestEntries
in classBaseToolFactory
- Returns:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses instantiate a newTokenizerFactory
.- Parameters:
subclassName
- The name of the class implementing theTokenizerFactory
.languageCode
- The ISO language code theTokenizer
should use.abbreviationDictionary
- An optionalDictionary
containing abbreviations, ornull
if not present.useAlphaNumericOptimization
- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern
- ThePattern
the alphanumeric optimization should use, if enabled.- Returns:
- A valid
TokenizerFactory
instance. - Throws:
InvalidFormatException
- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
- Returns:
- Retrieves the (user-)specified alphanumeric
Pattern
or a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()- Returns:
true
if the alphanumeric optimization is enabled, otherwisefalse
.
-
getAbbreviationDictionary
- Returns:
- The abbreviation
Dictionary
ornull
if none is active.
-
getLanguageCode
- Returns:
- Retrieves the ISO language code in use.
-
getContextGenerator
- Returns:
- Retrieves a
TokenContextGenerator
instance.
-