Package opennlp.tools.tokenize
Class TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides 
Tokenizer default implementation and
 resources. Users can extend this class if their application requires
 overriding the TokenContextGenerator, Dictionary etc.- 
Constructor SummaryConstructorsConstructorDescriptionInstantiates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory.
- 
Method SummaryModifier and TypeMethodDescriptionstatic TokenizerFactorycreate(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses instantiate a newTokenizerFactory.A model's implementation should call this constructor that creates a model programmatically.booleanvoidValidates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactorycreate, create, createArtifactSerializersMap
- 
Constructor Details- 
TokenizerFactorypublic TokenizerFactory()Instantiates aTokenizerFactorythat provides the default implementation of the resources.
- 
TokenizerFactorypublic TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameters:
- languageCode- The ISO language code to be used for this factory.
- abbreviationDictionary- The- Dictionarywhich holds abbreviations.
- useAlphaNumericOptimization- Whether alphanumerics are skipped, or not.
- alphaNumericPattern-- nullor a custom alphanumeric- Pattern(default is:- "^[A-Za-z0-9]+$", provided by- Factory.DEFAULT_ALPHANUMERIC.
 
 
- 
- 
Method Details- 
validateArtifactMapDescription copied from class:BaseToolFactoryValidates the parsed artifacts.Note: Subclasses should generally invoke super.validateArtifactMapat the beginning of this method.- Specified by:
- validateArtifactMapin class- BaseToolFactory
- Throws:
- InvalidFormatException- Thrown if validation found invalid states.
 
- 
createArtifactMapDescription copied from class:BaseToolFactoryA model's implementation should call this constructor that creates a model programmatically.The base implementation will return a HashMapthat should be populated by subclasses.- Overrides:
- createArtifactMapin class- BaseToolFactory
- Returns:
- Retrieves a Mapwith pairs of keys and objects.
 
- 
createManifestEntries- Overrides:
- createManifestEntriesin class- BaseToolFactory
- Returns:
- Retrieves the manifest entries to be added to the model manifest.
 
- 
createpublic static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses instantiate a newTokenizerFactory.- Parameters:
- subclassName- The name of the class implementing the- TokenizerFactory.
- languageCode- The ISO language code the- Tokenizershould use.
- abbreviationDictionary- An optional- Dictionarycontaining abbreviations, or- nullif not present.
- useAlphaNumericOptimization- Whether the alphanumeric optimization is be enabled or not.
- alphaNumericPattern- The- Patternthe alphanumeric optimization should use, if enabled.
- Returns:
- A valid TokenizerFactoryinstance.
- Throws:
- InvalidFormatException- Thrown if one of the input parameters doesn't comply the expected format.
 
- 
getAlphaNumericPattern- Returns:
- Retrieves the (user-)specified alphanumeric Patternor a default.
 
- 
isUseAlphaNumericOptimizationpublic boolean isUseAlphaNumericOptimization()- Returns:
- trueif the alphanumeric optimization is enabled, otherwise- false.
 
- 
getAbbreviationDictionary- Returns:
- The abbreviation Dictionaryornullif none is active.
 
- 
getLanguageCode- Returns:
- Retrieves the ISO language code in use.
 
- 
getContextGenerator- Returns:
- Retrieves a TokenContextGeneratorinstance.
 
 
-