Package opennlp.tools.postag
Class POSTaggerME
java.lang.Object
opennlp.tools.postag.POSTaggerME
- All Implemented Interfaces:
opennlp.tools.ml.Probabilistic,opennlp.tools.postag.POSTagger
@ThreadSafe
public class POSTaggerME
extends Object
implements opennlp.tools.postag.POSTagger, opennlp.tools.ml.Probabilistic
A
part-of-speech tagger implementation that uses maximum entropy.
Tries to predict whether words are nouns, verbs, or any other POS tags
depending on their surrounding context.
A POS tagger instance is thread-safe. One instance can be shared across multiple threads to save both
memory and model load time (loading a POSModel is the dominant startup cost; sharing one tagger
avoids paying it per-thread).
Note: Thread safety uses LastResultOwnerOrThreadLocal (and related patterns elsewhere) so
probs() sees per-thread last results without pinning unnecessary ThreadLocal entries for
single-threaded short-lived instances. In container environments with classloader isolation (e.g. Jakarta
EE), ensure instances do not outlive the application's lifecycle.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe default beam size value is 3. -
Constructor Summary
ConstructorsConstructorDescriptionPOSTaggerME(String language) Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.POSTaggerME(String language, POSTagFormat format) Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.POSTaggerME(POSModel model) Initializes aPOSTaggerMEwith the providedmodel.POSTaggerME(POSModel model, POSTagFormat format) Initializes aPOSTaggerMEwith the providedmodel.POSTaggerME(POSModel model, POSTagFormat format, int contextCacheSize) Initializes aPOSTaggerMEwith the providedmodeland explicit cache configuration. -
Method Summary
Modifier and TypeMethodDescriptionstatic DictionarybuildNGramDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, int cutoff) Constructs annGram dictionaryfrom anObjectStreamof samples.voidRemoves thread-local state to prevent classloader leaks in container environments.String[]String[]getOrderedTags(List<String> words, List<String> tags, int index) String[]getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs) static voidpopulatePOSDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.postag.MutableTagDictionary dict, int cutoff) Populates aPOSDictionaryfrom anObjectStreamof samples.double[]probs()The sequence was determined based on the previous call totag(String[]).voidprobs(double[] probs) Populates the specifiedprobsarray with the probabilities for each tag of the last tagged sentence.String[][]Returns at most the specifiednumTaggingsfor the specifiedsentence.String[]String[]opennlp.tools.util.Sequence[]topKSequences(String[] sentence) opennlp.tools.util.Sequence[]topKSequences(String[] sentence, Object[] additionalContext) static POSModeltrain(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.util.TrainingParameters mlParams, POSTaggerFactory posFactory) Starts a training of aPOSModelwith the given parameters.
-
Field Details
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZEThe default beam size value is 3.- See Also:
-
-
Constructor Details
-
POSTaggerME
Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.- Parameters:
language- An ISO conform language code.- Throws:
IOException- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.- Parameters:
language- An ISO conform language code.format- A validPOSTagFormat.- Throws:
IOException- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
Initializes aPOSTaggerMEwith the providedmodel.- Parameters:
model- A validPOSModel.
-
POSTaggerME
Initializes aPOSTaggerMEwith the providedmodel.- Parameters:
model- A validPOSModel.format- A validPOSTagFormat.
-
POSTaggerME
Initializes aPOSTaggerMEwith the providedmodeland explicit cache configuration.- Parameters:
model- A validPOSModel.format- A validPOSTagFormat.contextCacheSize- size of the per-thread context generator cache. Use0to disable caching,-1for the default (beam size), or a non-negative value; values less than-1are not allowed.
-
-
Method Details
-
getAllPosTags
- Returns:
- Retrieves an array of all possible part-of-speech tags from the tagger.
-
tag
- Specified by:
tagin interfaceopennlp.tools.postag.POSTagger
-
tag
- Specified by:
tagin interfaceopennlp.tools.postag.POSTagger
-
tag
Returns at most the specifiednumTaggingsfor the specifiedsentence.- Parameters:
numTaggings- The number of tagging to be returned.sentence- An array of tokens which make up a sentence.- Returns:
- At most the specified number of taggings for the specified
sentence.
-
topKSequences
- Specified by:
topKSequencesin interfaceopennlp.tools.postag.POSTagger
-
topKSequences
- Specified by:
topKSequencesin interfaceopennlp.tools.postag.POSTagger
-
probs
public void probs(double[] probs) Populates the specifiedprobsarray with the probabilities for each tag of the last tagged sentence.- Parameters:
probs- An array to put the probabilities into.
-
probs
public double[] probs()The sequence was determined based on the previous call totag(String[]).- Specified by:
probsin interfaceopennlp.tools.ml.Probabilistic- Returns:
- an array with the same number of probabilities as tokens were sent to
tag(String[])when it was last called
-
clearThreadLocalState
public void clearThreadLocalState()Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the tagger is no longer needed. -
getOrderedTags
-
getOrderedTags
-
train
public static POSModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.util.TrainingParameters mlParams, POSTaggerFactory posFactory) throws IOException Starts a training of aPOSModelwith the given parameters.- Parameters:
languageCode- The ISO language code to train the model. Must not benull.samples- TheObjectStreamofPOSSampleused as input for training.mlParams- TheTrainingParametersfor the context of the training process.posFactory- ThePOSTaggerFactoryfor creating related objects as defined viamlParams.- Returns:
- A valid, trained
POSModelinstance. - Throws:
IOException- Thrown if IO errors occurred.
-
buildNGramDictionary
public static Dictionary buildNGramDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, int cutoff) throws IOException Constructs annGram dictionaryfrom anObjectStreamof samples.- Parameters:
samples- TheObjectStreamto process.cutoff- A non-negative cut-off value.- Returns:
- A valid
Dictionaryinstance holding nGrams. - Throws:
IOException- Thrown if IO errors occurred during dictionary construction.
-
populatePOSDictionary
public static void populatePOSDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.postag.MutableTagDictionary dict, int cutoff) throws IOException Populates aPOSDictionaryfrom anObjectStreamof samples.- Parameters:
samples- TheObjectStreamto process.dict- TheMutableTagDictionaryto use during population.cutoff- A non-negative cut-off value.- Throws:
IOException- Thrown if IO errors occurred during dictionary construction.
-