Package opennlp.tools.lemmatizer
Class LemmatizerME
- java.lang.Object
-
- opennlp.tools.lemmatizer.LemmatizerME
-
- All Implemented Interfaces:
Lemmatizer
public class LemmatizerME extends Object implements Lemmatizer
A probabilistic lemmatizer. Tries to predict the induced permutation class for each word depending on its surrounding context. Based on Grzegorz ChrupaĆa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University. http://grzegorz.chrupala.me/papers/phd-single.pdf
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BEAM_SIZE
static int
LEMMA_NUMBER
-
Constructor Summary
Constructors Constructor Description LemmatizerME(LemmatizerModel model)
Initializes the current instance with the provided model and the default beam size of 3.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String[]
decodeLemmas(String[] toks, String[] preds)
Decodes the lemma from the word and the induced lemma class.static String[]
encodeLemmas(String[] toks, String[] lemmas)
String[]
lemmatize(String[] toks, String[] tags)
Generates lemmas for the word and postag returning the result in an array.List<List<String>>
lemmatize(List<String> toks, List<String> tags)
Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.String[][]
predictLemmas(int numLemmas, String[] toks, String[] tags)
Predict all possible lemmas (using a default upper bound).String[]
predictSES(String[] toks, String[] tags)
Predict Short Edit Script (automatically induced lemma class).double[]
probs()
Returns an array with the probabilities of the last decoded sequence.void
probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence.Sequence[]
topKLemmaClasses(String[] sentence, String[] tags)
Sequence[]
topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)
Sequence[]
topKSequences(String[] sentence, String[] tags)
Sequence[]
topKSequences(String[] sentence, String[] tags, double minSequenceScore)
static LemmatizerModel
train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters trainParams, LemmatizerFactory posFactory)
-
-
-
Field Detail
-
LEMMA_NUMBER
public static final int LEMMA_NUMBER
- See Also:
- Constant Field Values
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LemmatizerME
public LemmatizerME(LemmatizerModel model)
Initializes the current instance with the provided model and the default beam size of 3.- Parameters:
model
- the model
-
-
Method Detail
-
lemmatize
public String[] lemmatize(String[] toks, String[] tags)
Description copied from interface:Lemmatizer
Generates lemmas for the word and postag returning the result in an array.- Specified by:
lemmatize
in interfaceLemmatizer
- Parameters:
toks
- an array of the tokenstags
- an array of the pos tags- Returns:
- an array of possible lemmas for each token in the sequence.
-
lemmatize
public List<List<String>> lemmatize(List<String> toks, List<String> tags)
Description copied from interface:Lemmatizer
Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.- Specified by:
lemmatize
in interfaceLemmatizer
- Parameters:
toks
- an array of the tokenstags
- an array of the pos tags- Returns:
- a list of every possible lemma for each token in the sequence.
-
predictSES
public String[] predictSES(String[] toks, String[] tags)
Predict Short Edit Script (automatically induced lemma class).- Parameters:
toks
- the array of tokenstags
- the array of pos tags- Returns:
- an array containing the lemma classes
-
predictLemmas
public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
Predict all possible lemmas (using a default upper bound).- Parameters:
numLemmas
- the default number of lemmastoks
- the tokenstags
- the postags- Returns:
- a double array containing all posible lemmas for each token and postag pair
-
decodeLemmas
public static String[] decodeLemmas(String[] toks, String[] preds)
Decodes the lemma from the word and the induced lemma class.- Parameters:
toks
- the array of tokenspreds
- the predicted lemma classes- Returns:
- the array of decoded lemmas
-
topKSequences
public Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
-
probs
public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize
. The specified array should be at least as large as the number of tokens in the previous call tolemmatize
.- Parameters:
probs
- An array used to hold the probabilities of the last decoded sequence.
-
probs
public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tochunk
.- Returns:
- An array with the same number of probabilities as tokens were sent to
chunk
when it was last called.
-
train
public static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters trainParams, LemmatizerFactory posFactory) throws IOException
- Throws:
IOException
-
-