Package opennlp.tools.tokenize
Klasse TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer
default implementation and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator
, Dictionary
etc.-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungInstantiates aTokenizerFactory
that provides the default implementation of the resources.TokenizerFactory
(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory
. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungstatic TokenizerFactory
create
(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses instantiate a newTokenizerFactory
.A model's implementation should call this constructor that creates a model programmatically.boolean
void
Validates the parsed artifacts.Von Klasse geerbte Methoden opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Konstruktordetails
-
TokenizerFactory
public TokenizerFactory()Instantiates aTokenizerFactory
that provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory
. Use this constructor to programmatically create a factory.- Parameter:
languageCode
- The ISO language code to be used for this factory.abbreviationDictionary
- TheDictionary
which holds abbreviations.useAlphaNumericOptimization
- Whether alphanumerics are skipped, or not.alphaNumericPattern
-null
or a custom alphanumericPattern
(default is:"^[A-Za-z0-9]+$"
, provided byFactory.DEFAULT_ALPHANUMERIC
.
-
-
Methodendetails
-
validateArtifactMap
Beschreibung aus Klasse kopiert:BaseToolFactory
Validates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMap
at the beginning of this method.- Angegeben von:
validateArtifactMap
in KlasseBaseToolFactory
- Löst aus:
InvalidFormatException
- Thrown if validation found invalid states.
-
createArtifactMap
Beschreibung aus Klasse kopiert:BaseToolFactory
A model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMap
that should be populated by subclasses.- Setzt außer Kraft:
createArtifactMap
in KlasseBaseToolFactory
- Gibt zurück:
- Retrieves a
Map
with pairs of keys and objects.
-
createManifestEntries
- Setzt außer Kraft:
createManifestEntries
in KlasseBaseToolFactory
- Gibt zurück:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses instantiate a newTokenizerFactory
.- Parameter:
subclassName
- The name of the class implementing theTokenizerFactory
.languageCode
- The ISO language code theTokenizer
should use.abbreviationDictionary
- An optionalDictionary
containing abbreviations, ornull
if not present.useAlphaNumericOptimization
- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern
- ThePattern
the alphanumeric optimization should use, if enabled.- Gibt zurück:
- A valid
TokenizerFactory
instance. - Löst aus:
InvalidFormatException
- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
- Gibt zurück:
- Retrieves the (user-)specified alphanumeric
Pattern
or a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()- Gibt zurück:
true
if the alphanumeric optimization is enabled, otherwisefalse
.
-
getAbbreviationDictionary
- Gibt zurück:
- The abbreviation
Dictionary
ornull
if none is active.
-
getLanguageCode
- Gibt zurück:
- Retrieves the ISO language code in use.
-
getContextGenerator
- Gibt zurück:
- Retrieves a
TokenContextGenerator
instance.
-