Package opennlp.tools.tokenize
Class TokenizerFactory
- java.lang.Object
-
- opennlp.tools.util.BaseToolFactory
-
- opennlp.tools.tokenize.TokenizerFactory
-
public class TokenizerFactory extends BaseToolFactory
The factory that providesTokenizer
default implementation and resources. Users can extend this class if their application requires overriding theTokenContextGenerator
,Dictionary
etc.
-
-
Constructor Summary
Constructors Constructor Description TokenizerFactory()
Instantiates aTokenizerFactory
that provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Instantiates aTokenizerFactory
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TokenizerFactory
create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Factory method the framework uses instantiate a newTokenizerFactory
.Map<String,Object>
createArtifactMap()
A model's implementation should call this constructor that creates a model programmatically.Map<String,String>
createManifestEntries()
Dictionary
getAbbreviationDictionary()
Pattern
getAlphaNumericPattern()
TokenContextGenerator
getContextGenerator()
String
getLanguageCode()
boolean
isUseAlphaNumericOptimization()
void
validateArtifactMap()
Validates the parsed artifacts.-
Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
-
-
-
Constructor Detail
-
TokenizerFactory
public TokenizerFactory()
Instantiates aTokenizerFactory
that provides the default implementation of the resources.
-
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Instantiates aTokenizerFactory
. Use this constructor to programmatically create a factory.- Parameters:
languageCode
- The ISO language code to be used for this factory.abbreviationDictionary
- TheDictionary
which holds abbreviations.useAlphaNumericOptimization
- Whether alphanumerics are skipped, or not.alphaNumericPattern
-null
or a custom alphanumericPattern
(default is:"^[A-Za-z0-9]+$"
, provided byFactory.DEFAULT_ALPHANUMERIC
.
-
-
Method Detail
-
validateArtifactMap
public void validateArtifactMap() throws InvalidFormatException
Description copied from class:BaseToolFactory
Validates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMap
at the beginning of this method.- Specified by:
validateArtifactMap
in classBaseToolFactory
- Throws:
InvalidFormatException
- Thrown if validation found invalid states.
-
createArtifactMap
public Map<String,Object> createArtifactMap()
Description copied from class:BaseToolFactory
A model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMap
that should be populated by subclasses.- Overrides:
createArtifactMap
in classBaseToolFactory
- Returns:
- Retrieves a
Map
with pairs of keys and objects.
-
createManifestEntries
public Map<String,String> createManifestEntries()
- Overrides:
createManifestEntries
in classBaseToolFactory
- Returns:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
Factory method the framework uses instantiate a newTokenizerFactory
.- Parameters:
subclassName
- The name of the class implementing theTokenizerFactory
.languageCode
- The ISO language code theTokenizer
should use.abbreviationDictionary
- An optionalDictionary
containing abbreviations, ornull
if not present.useAlphaNumericOptimization
- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern
- ThePattern
the alphanumeric optimization should use, if enabled.- Returns:
- A valid
TokenizerFactory
instance. - Throws:
InvalidFormatException
- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
public Pattern getAlphaNumericPattern()
- Returns:
- Retrieves the (user-)specified alphanumeric
Pattern
or a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()
- Returns:
true
if the alphanumeric optimization is enabled, otherwisefalse
.
-
getAbbreviationDictionary
public Dictionary getAbbreviationDictionary()
- Returns:
- The abbreviation
Dictionary
ornull
if none is active.
-
getLanguageCode
public String getLanguageCode()
- Returns:
- Retrieves the ISO language code in use.
-
getContextGenerator
public TokenContextGenerator getContextGenerator()
- Returns:
- Retrieves a
TokenContextGenerator
instance.
-
-