Package opennlp.tools.lemmatizer
Class LemmatizerME
- java.lang.Object
-
- opennlp.tools.lemmatizer.LemmatizerME
-
- All Implemented Interfaces:
Lemmatizer
public class LemmatizerME extends Object implements Lemmatizer
A probabilistic lemmatizer. Tries to predict the induced permutation class for each word depending on its surrounding context. Based on Grzegorz ChrupaĆa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University. http://grzegorz.chrupala.me/papers/phd-single.pdf
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_BEAM_SIZEstatic intLEMMA_NUMBER
-
Constructor Summary
Constructors Constructor Description LemmatizerME(LemmatizerModel model)Initializes the current instance with the provided model and the default beam size of 3.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String[]decodeLemmas(String[] toks, String[] preds)Decodes the lemma from the word and the induced lemma class.static String[]encodeLemmas(String[] toks, String[] lemmas)String[]lemmatize(String[] toks, String[] tags)Generates lemmas for the word and postag returning the result in an array.List<List<String>>lemmatize(List<String> toks, List<String> tags)Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.String[][]predictLemmas(int numLemmas, String[] toks, String[] tags)Predict all possible lemmas (using a default upper bound).String[]predictSES(String[] toks, String[] tags)Predict Short Edit Script (automatically induced lemma class).double[]probs()Returns an array with the probabilities of the last decoded sequence.voidprobs(double[] probs)Populates the specified array with the probabilities of the last decoded sequence.Sequence[]topKLemmaClasses(String[] sentence, String[] tags)Sequence[]topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)Sequence[]topKSequences(String[] sentence, String[] tags)Sequence[]topKSequences(String[] sentence, String[] tags, double minSequenceScore)static LemmatizerModeltrain(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters trainParams, LemmatizerFactory posFactory)
-
-
-
Field Detail
-
LEMMA_NUMBER
public static final int LEMMA_NUMBER
- See Also:
- Constant Field Values
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LemmatizerME
public LemmatizerME(LemmatizerModel model)
Initializes the current instance with the provided model and the default beam size of 3.- Parameters:
model- the model
-
-
Method Detail
-
lemmatize
public String[] lemmatize(String[] toks, String[] tags)
Description copied from interface:LemmatizerGenerates lemmas for the word and postag returning the result in an array.- Specified by:
lemmatizein interfaceLemmatizer- Parameters:
toks- an array of the tokenstags- an array of the pos tags- Returns:
- an array of possible lemmas for each token in the sequence.
-
lemmatize
public List<List<String>> lemmatize(List<String> toks, List<String> tags)
Description copied from interface:LemmatizerGenerates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.- Specified by:
lemmatizein interfaceLemmatizer- Parameters:
toks- an array of the tokenstags- an array of the pos tags- Returns:
- a list of every possible lemma for each token in the sequence.
-
predictSES
public String[] predictSES(String[] toks, String[] tags)
Predict Short Edit Script (automatically induced lemma class).- Parameters:
toks- the array of tokenstags- the array of pos tags- Returns:
- an array containing the lemma classes
-
predictLemmas
public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
Predict all possible lemmas (using a default upper bound).- Parameters:
numLemmas- the default number of lemmastoks- the tokenstags- the postags- Returns:
- a double array containing all posible lemmas for each token and postag pair
-
decodeLemmas
public static String[] decodeLemmas(String[] toks, String[] preds)
Decodes the lemma from the word and the induced lemma class.- Parameters:
toks- the array of tokenspreds- the predicted lemma classes- Returns:
- the array of decoded lemmas
-
topKSequences
public Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
-
probs
public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize. The specified array should be at least as large as the number of tokens in the previous call tolemmatize.- Parameters:
probs- An array used to hold the probabilities of the last decoded sequence.
-
probs
public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tochunk.- Returns:
- An array with the same number of probabilities as tokens were sent to
chunkwhen it was last called.
-
train
public static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters trainParams, LemmatizerFactory posFactory) throws IOException
- Throws:
IOException
-
-