Package opennlp.tools.tokenize
Class TokenizerFactory
- java.lang.Object
-
- opennlp.tools.util.BaseToolFactory
-
- opennlp.tools.tokenize.TokenizerFactory
-
public class TokenizerFactory extends BaseToolFactory
The factory that providesTokenizer
default implementations and resources. Users can extend this class if their application requires overriding theTokenContextGenerator
,Dictionary
etc.
-
-
Constructor Summary
Constructors Constructor Description TokenizerFactory()
Creates aTokenizerFactory
that provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Creates aTokenizerFactory
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TokenizerFactory
create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Factory method the framework uses create a newTokenizerFactory
.Map<String,Object>
createArtifactMap()
Creates aMap
with pairs of keys and objects.Map<String,String>
createManifestEntries()
Creates the manifest entries that will be added to the model manifestDictionary
getAbbreviationDictionary()
Gets the abbreviation dictionaryPattern
getAlphaNumericPattern()
Gets the alpha numeric pattern.TokenContextGenerator
getContextGenerator()
Gets the context generatorString
getLanguageCode()
Retrieves the language code.boolean
isUseAlphaNumericOptmization()
Gets whether to use alphanumeric optimization.void
validateArtifactMap()
Validates the parsed artifacts.-
Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
-
-
-
Constructor Detail
-
TokenizerFactory
public TokenizerFactory()
Creates aTokenizerFactory
that provides the default implementation of the resources.
-
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Creates aTokenizerFactory
. Use this constructor to programmatically create a factory.- Parameters:
languageCode
- the language of the natural textabbreviationDictionary
- an abbreviations dictionaryuseAlphaNumericOptimization
- if true alpha numerics are skippedalphaNumericPattern
- null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC
-
-
Method Detail
-
validateArtifactMap
public void validateArtifactMap() throws InvalidFormatException
Description copied from class:BaseToolFactory
Validates the parsed artifacts. If something is not valid subclasses should throw anInvalidFormatException
. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.- Specified by:
validateArtifactMap
in classBaseToolFactory
- Throws:
InvalidFormatException
-
createArtifactMap
public Map<String,Object> createArtifactMap()
Description copied from class:BaseToolFactory
Creates aMap
with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMap
that should be populated by sub-classes.- Overrides:
createArtifactMap
in classBaseToolFactory
-
createManifestEntries
public Map<String,String> createManifestEntries()
Description copied from class:BaseToolFactory
Creates the manifest entries that will be added to the model manifest- Overrides:
createManifestEntries
in classBaseToolFactory
- Returns:
- the manifest entries to added to the model manifest
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
Factory method the framework uses create a newTokenizerFactory
.- Parameters:
subclassName
- the name of the class implementing theTokenizerFactory
languageCode
- the language code the tokenizer should useabbreviationDictionary
- an optional dictionary containing abbreviations, or null if not presentuseAlphaNumericOptimization
- indicate if the alpha numeric optimization should be enabled or disabledalphaNumericPattern
- the pattern the alpha numeric optimization should use- Returns:
- the instance of the Tokenizer Factory
- Throws:
InvalidFormatException
- if once of the input parameters doesn't comply if the expected format
-
getAlphaNumericPattern
public Pattern getAlphaNumericPattern()
Gets the alpha numeric pattern.- Returns:
- the user specified alpha numeric pattern or a default.
-
isUseAlphaNumericOptmization
public boolean isUseAlphaNumericOptmization()
Gets whether to use alphanumeric optimization.- Returns:
- true if the alpha numeric optimization is enabled, otherwise false
-
getAbbreviationDictionary
public Dictionary getAbbreviationDictionary()
Gets the abbreviation dictionary- Returns:
- null or the abbreviation dictionary
-
getLanguageCode
public String getLanguageCode()
Retrieves the language code.- Returns:
- the language code
-
getContextGenerator
public TokenContextGenerator getContextGenerator()
Gets the context generator- Returns:
- a new instance of the context generator
-
-