Package opennlp.tools.lemmatizer
Class LemmatizerME
java.lang.Object
opennlp.tools.lemmatizer.LemmatizerME
- All Implemented Interfaces:
opennlp.tools.lemmatizer.Lemmatizer,opennlp.tools.ml.Probabilistic
@ThreadSafe
public class LemmatizerME
extends Object
implements opennlp.tools.lemmatizer.Lemmatizer, opennlp.tools.ml.Probabilistic
A probabilistic
Lemmatizer implementation.
Tries to predict the induced permutation class for each word depending on its surrounding context.
A lemmatizer instance is thread-safe. One instance can be shared across multiple threads to save memory.
Note: In container environments with classloader isolation (e.g. Jakarta EE), ensure instances do
not outlive the application's lifecycle, as underlying components use ThreadLocal state that may
pin the classloader.
Based on Grzegorz ChrupaĆa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidRemoves thread-local state to prevent classloader leaks in container environments.static String[]decodeLemmas(String[] toks, String[] preds) Decodes the lemma from the word and the induced lemma class.static String[]encodeLemmas(String[] toks, String[] lemmas) Encodes the word given its lemmas.String[]String[][]predictLemmas(int numLemmas, String[] toks, String[] tags) Predict all possible lemmas (using a default upper bound).String[]predictSES(String[] toks, String[] tags) Predict Short Edit Script (automatically induced lemma class).double[]probs()The sequence was determined based on the previous call tolemmatize(String[], String[]).voidprobs(double[] probs) Populates the specified array with the probabilities of the last decoded sequence.opennlp.tools.util.Sequence[]topKLemmaClasses(String[] sentence, String[] tags) opennlp.tools.util.Sequence[]topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore) opennlp.tools.util.Sequence[]topKSequences(String[] sentence, String[] tags) opennlp.tools.util.Sequence[]topKSequences(String[] sentence, String[] tags, double minSequenceScore) static LemmatizerModeltrain(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.lemmatizer.LemmaSample> samples, opennlp.tools.util.TrainingParameters params, LemmatizerFactory factory) Starts a training of aLemmatizerModelwith the given parameters.
-
Field Details
-
LEMMA_NUMBER
public static final int LEMMA_NUMBER- See Also:
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE- See Also:
-
-
Constructor Details
-
LemmatizerME
- Parameters:
model- TheLemmatizerModelto be used.
-
-
Method Details
-
lemmatize
- Specified by:
lemmatizein interfaceopennlp.tools.lemmatizer.Lemmatizer
-
lemmatize
- Specified by:
lemmatizein interfaceopennlp.tools.lemmatizer.Lemmatizer
-
predictSES
Predict Short Edit Script (automatically induced lemma class).- Parameters:
toks- An array of tokens.tags- An array of postags.- Returns:
- An array of possible lemma classes for each token in
toks.
-
predictLemmas
Predict all possible lemmas (using a default upper bound).- Parameters:
numLemmas- The default number of lemmastoks- An array of tokens.tags- An array of postags.- Returns:
- A 2-dimensional array containing all possible lemmas for each token and postag pair.
-
decodeLemmas
Decodes the lemma from the word and the induced lemma class.- Parameters:
toks- An array of tokens.preds- An array of predicted lemma classes.- Returns:
- The array of decoded lemmas.
-
encodeLemmas
Encodes the word given its lemmas.- Parameters:
toks- An array of tokens.lemmas- An array of lemmas.- Returns:
- The array of lemma classes.
-
topKSequences
- Parameters:
sentence- An array of tokens.tags- An array of postags.- Returns:
- Retrieves the top-k
sequences.
-
topKSequences
public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore) - Parameters:
sentence- An array of tokens.tags- An array of postags.minSequenceScore- The minimum score to be achieved.- Returns:
- Retrieves the top-k
sequences.
-
probs
public void probs(double[] probs) Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize(String[], String[]).The specified array should be at least as large as the number of tokens in the previous call to
lemmatize(String[], String[]).- Parameters:
probs- An array used to hold the probabilities of the last decoded sequence.
-
probs
public double[] probs()The sequence was determined based on the previous call tolemmatize(String[], String[]).- Specified by:
probsin interfaceopennlp.tools.ml.Probabilistic- Returns:
- an array with the same number of probabilities as tokens were sent to
lemmatize(String[], String[])when it was last called
-
clearThreadLocalState
public void clearThreadLocalState()Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the lemmatizer is no longer needed. -
train
public static LemmatizerModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.lemmatizer.LemmaSample> samples, opennlp.tools.util.TrainingParameters params, LemmatizerFactory factory) throws IOException Starts a training of aLemmatizerModelwith the given parameters.- Parameters:
languageCode- The ISO conform language code.samples- TheObjectStreamofLemmaSampleused as input for training.params- TheTrainingParametersfor the context of the training.factory- TheLemmatizerFactoryfor creating related objects defined viaparams.- Returns:
- A valid, trained
LemmatizerModelinstance. - Throws:
IOException- Thrown if IO errors occurred.
-
topKLemmaClasses
- Parameters:
sentence- An array of tokens.tags- An array of postags.- Returns:
- Retrieves the top-k
lemma classes.
-
topKLemmaClasses
public opennlp.tools.util.Sequence[] topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore) - Parameters:
sentence- An array of tokens.tags- An array of postags.minSequenceScore- The minimum score to be achieved.- Returns:
- Retrieves the top-k
lemma classes.
-