Package opennlp.tools.tokenize
Klasse TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer default implementation and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator, Dictionary etc.-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungInstantiates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungstatic TokenizerFactorycreate(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses instantiate a newTokenizerFactory.A model's implementation should call this constructor that creates a model programmatically.booleanvoidValidates the parsed artifacts.Von Klasse geerbte Methoden opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Konstruktordetails
-
TokenizerFactory
public TokenizerFactory()Instantiates aTokenizerFactorythat provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameter:
languageCode- The ISO language code to be used for this factory.abbreviationDictionary- TheDictionarywhich holds abbreviations.useAlphaNumericOptimization- Whether alphanumerics are skipped, or not.alphaNumericPattern-nullor a custom alphanumericPattern(default is:"^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC.
-
-
Methodendetails
-
validateArtifactMap
Beschreibung aus Klasse kopiert:BaseToolFactoryValidates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMapat the beginning of this method.- Angegeben von:
validateArtifactMapin KlasseBaseToolFactory- Löst aus:
InvalidFormatException- Thrown if validation found invalid states.
-
createArtifactMap
Beschreibung aus Klasse kopiert:BaseToolFactoryA model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMapthat should be populated by subclasses.- Setzt außer Kraft:
createArtifactMapin KlasseBaseToolFactory- Gibt zurück:
- Retrieves a
Mapwith pairs of keys and objects.
-
createManifestEntries
- Setzt außer Kraft:
createManifestEntriesin KlasseBaseToolFactory- Gibt zurück:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses instantiate a newTokenizerFactory.- Parameter:
subclassName- The name of the class implementing theTokenizerFactory.languageCode- The ISO language code theTokenizershould use.abbreviationDictionary- An optionalDictionarycontaining abbreviations, ornullif not present.useAlphaNumericOptimization- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern- ThePatternthe alphanumeric optimization should use, if enabled.- Gibt zurück:
- A valid
TokenizerFactoryinstance. - Löst aus:
InvalidFormatException- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
- Gibt zurück:
- Retrieves the (user-)specified alphanumeric
Patternor a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()- Gibt zurück:
trueif the alphanumeric optimization is enabled, otherwisefalse.
-
getAbbreviationDictionary
- Gibt zurück:
- The abbreviation
Dictionaryornullif none is active.
-
getLanguageCode
- Gibt zurück:
- Retrieves the ISO language code in use.
-
getContextGenerator
- Gibt zurück:
- Retrieves a
TokenContextGeneratorinstance.
-