Package opennlp.tools.lemmatizer
Class LemmatizerME
java.lang.Object
opennlp.tools.lemmatizer.LemmatizerME
- All Implemented Interfaces:
- Lemmatizer
A probabilistic 
Lemmatizer implementation.
 Tries to predict the induced permutation class for each word depending on its surrounding context.
Based on Grzegorz ChrupaĆa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final intstatic final int
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionstatic String[]decodeLemmas(String[] toks, String[] preds) Decodes the lemma from the word and the induced lemma class.static String[]encodeLemmas(String[] toks, String[] lemmas) Encodes the word given its lemmas.String[]Generates lemmas for the word and postag.Generates lemma tags for the word and postag.String[][]predictLemmas(int numLemmas, String[] toks, String[] tags) Predict all possible lemmas (using a default upper bound).String[]predictSES(String[] toks, String[] tags) Predict Short Edit Script (automatically induced lemma class).double[]probs()Returns an array with the probabilities of the last decoded sequence.voidprobs(double[] probs) Populates the specified array with the probabilities of the last decoded sequence.Sequence[]topKLemmaClasses(String[] sentence, String[] tags) Sequence[]topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore) Sequence[]topKSequences(String[] sentence, String[] tags) Sequence[]topKSequences(String[] sentence, String[] tags, double minSequenceScore) static LemmatizerModeltrain(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) Starts a training of aLemmatizerModelwith the given parameters.
- 
Field Details- 
LEMMA_NUMBERpublic static final int LEMMA_NUMBER- See Also:
 
- 
DEFAULT_BEAM_SIZEpublic static final int DEFAULT_BEAM_SIZE- See Also:
 
 
- 
- 
Constructor Details- 
LemmatizerME- Parameters:
- model- The- LemmatizerModelto be used.
 
 
- 
- 
Method Details- 
lemmatizeDescription copied from interface:LemmatizerGenerates lemmas for the word and postag.- Specified by:
- lemmatizein interface- Lemmatizer
- Parameters:
- toks- An array of the tokens
- tags- an array of the pos tags
- Returns:
- An array of possible lemmas for each token in the tokssequence.
 
- 
lemmatizeDescription copied from interface:LemmatizerGenerates lemma tags for the word and postag.- Specified by:
- lemmatizein interface- Lemmatizer
- Parameters:
- toks- An array of the tokens
- tags- An array of the pos tags
- Returns:
- A list of every possible lemma for each token in the tokssequence.
 
- 
predictSESPredict Short Edit Script (automatically induced lemma class).- Parameters:
- toks- An array of tokens.
- tags- An array of postags.
- Returns:
- An array of possible lemma classes for each token in toks.
 
- 
predictLemmasPredict all possible lemmas (using a default upper bound).- Parameters:
- numLemmas- The default number of lemmas
- toks- An array of tokens.
- tags- An array of postags.
- Returns:
- A 2-dimensional array containing all possible lemmas for each token and postag pair.
 
- 
decodeLemmasDecodes the lemma from the word and the induced lemma class.- Parameters:
- toks- An array of tokens.
- preds- An array of predicted lemma classes.
- Returns:
- The array of decoded lemmas.
 
- 
encodeLemmasEncodes the word given its lemmas.- Parameters:
- toks- An array of tokens.
- lemmas- An array of lemmas.
- Returns:
- The array of lemma classes.
 
- 
topKSequences- Parameters:
- sentence- An array of tokens.
- tags- An array of postags.
- Returns:
- Retrieves the top-k sequences.
 
- 
topKSequences- Parameters:
- sentence- An array of tokens.
- tags- An array of postags.
- minSequenceScore- The minimum score to be achieved.
- Returns:
- Retrieves the top-k sequences.
 
- 
probspublic void probs(double[] probs) Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize(String[], String[]).The specified array should be at least as large as the number of tokens in the previous call to lemmatize(String[], String[]).- Parameters:
- probs- An array used to hold the probabilities of the last decoded sequence.
 
- 
probspublic double[] probs()Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize(String[], String[]).- Returns:
- An array with the same number of probabilities as tokens were sent to
         lemmatize(String[], String[])when it was last called.
 
- 
trainpublic static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) throws IOException Starts a training of aLemmatizerModelwith the given parameters.- Parameters:
- languageCode- The ISO conform language code.
- samples- The- ObjectStreamof- LemmaSampleused as input for training.
- params- The- TrainingParametersfor the context of the training.
- factory- The- LemmatizerFactoryfor creating related objects defined via- params.
- Returns:
- A valid, trained LemmatizerModelinstance.
- Throws:
- IOException- Thrown if IO errors occurred.
 
- 
topKLemmaClasses- Parameters:
- sentence- An array of tokens.
- tags- An array of postags.
- Returns:
- Retrieves the top-k lemma classes.
 
- 
topKLemmaClasses- Parameters:
- sentence- An array of tokens.
- tags- An array of postags.
- minSequenceScore- The minimum score to be achieved.
- Returns:
- Retrieves the top-k lemma classes.
 
 
-