public class TokenizerFactory extends BaseToolFactory
Tokenizer
default implementations and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator
, Dictionary
etc.Constructor and Description |
---|
TokenizerFactory()
Creates a
TokenizerFactory that provides the default implementation
of the resources. |
TokenizerFactory(String languageCode,
Dictionary abbreviationDictionary,
boolean useAlphaNumericOptimization,
Pattern alphaNumericPattern)
Creates a
TokenizerFactory . |
Modifier and Type | Method and Description |
---|---|
static TokenizerFactory |
create(String subclassName,
String languageCode,
Dictionary abbreviationDictionary,
boolean useAlphaNumericOptimization,
Pattern alphaNumericPattern)
Factory method the framework uses create a new
TokenizerFactory . |
Map<String,Object> |
createArtifactMap()
Creates a
Map with pairs of keys and objects. |
Map<String,String> |
createManifestEntries()
Creates the manifest entries that will be added to the model manifest
|
Dictionary |
getAbbreviationDictionary()
Gets the abbreviation dictionary
|
Pattern |
getAlphaNumericPattern()
Gets the alpha numeric pattern.
|
TokenContextGenerator |
getContextGenerator()
Gets the context generator
|
String |
getLanguageCode()
Retrieves the language code.
|
boolean |
isUseAlphaNumericOptmization()
Gets whether to use alphanumeric optimization.
|
void |
validateArtifactMap()
Validates the parsed artifacts.
|
create, create, createArtifactSerializersMap
public TokenizerFactory()
TokenizerFactory
that provides the default implementation
of the resources.public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
TokenizerFactory
. Use this constructor to
programmatically create a factory.languageCode
- the language of the natural textabbreviationDictionary
- an abbreviations dictionaryuseAlphaNumericOptimization
- if true alpha numerics are skippedalphaNumericPattern
- null or a custom alphanumeric pattern (default is:
"^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC
public void validateArtifactMap() throws InvalidFormatException
BaseToolFactory
InvalidFormatException
.
Note:
Subclasses should generally invoke super.validateArtifactMap at the beginning
of this method.validateArtifactMap
in class BaseToolFactory
InvalidFormatException
public Map<String,Object> createArtifactMap()
BaseToolFactory
Map
with pairs of keys and objects. The models
implementation should call this constructor that creates a model
programmatically.
The base implementation will return a HashMap
that should be
populated by sub-classes.
createArtifactMap
in class BaseToolFactory
public Map<String,String> createManifestEntries()
BaseToolFactory
createManifestEntries
in class BaseToolFactory
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
TokenizerFactory
.subclassName
- the name of the class implementing the TokenizerFactory
languageCode
- the language code the tokenizer should useabbreviationDictionary
- an optional dictionary containing abbreviations, or null if not presentuseAlphaNumericOptimization
- indicate if the alpha numeric optimization
should be enabled or disabledalphaNumericPattern
- the pattern the alpha numeric optimization should useInvalidFormatException
- if once of the input parameters doesn't comply if the expected formatpublic Pattern getAlphaNumericPattern()
public boolean isUseAlphaNumericOptmization()
public Dictionary getAbbreviationDictionary()
public String getLanguageCode()
public TokenContextGenerator getContextGenerator()
Copyright © 2021 The Apache Software Foundation. All rights reserved.