Package opennlp.tools.lemmatizer
Klasse LemmatizerME
java.lang.Object
opennlp.tools.lemmatizer.LemmatizerME
- Alle implementierten Schnittstellen:
Lemmatizer
A probabilistic
Lemmatizer
implementation.
Tries to predict the induced permutation class for each word depending on its surrounding context.
Based on Grzegorz Chrupała. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University
-
Feldübersicht
FelderModifizierer und TypFeldBeschreibungstatic final int
static final int
-
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungstatic String[]
decodeLemmas
(String[] toks, String[] preds) Decodes the lemma from the word and the induced lemma class.static String[]
encodeLemmas
(String[] toks, String[] lemmas) Encodes the word given its lemmas.String[]
Generates lemmas for the word and postag.Generates lemma tags for the word and postag.String[][]
predictLemmas
(int numLemmas, String[] toks, String[] tags) Predict all possible lemmas (using a default upper bound).String[]
predictSES
(String[] toks, String[] tags) Predict Short Edit Script (automatically induced lemma class).double[]
probs()
Returns an array with the probabilities of the last decoded sequence.void
probs
(double[] probs) Populates the specified array with the probabilities of the last decoded sequence.Sequence[]
topKLemmaClasses
(String[] sentence, String[] tags) Sequence[]
topKLemmaClasses
(String[] sentence, String[] tags, double minSequenceScore) Sequence[]
topKSequences
(String[] sentence, String[] tags) Sequence[]
topKSequences
(String[] sentence, String[] tags, double minSequenceScore) static LemmatizerModel
train
(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) Starts a training of aLemmatizerModel
with the given parameters.
-
Felddetails
-
LEMMA_NUMBER
public static final int LEMMA_NUMBER- Siehe auch:
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE- Siehe auch:
-
-
Konstruktordetails
-
LemmatizerME
- Parameter:
model
- TheLemmatizerModel
to be used.
-
-
Methodendetails
-
lemmatize
Beschreibung aus Schnittstelle kopiert:Lemmatizer
Generates lemmas for the word and postag.- Angegeben von:
lemmatize
in SchnittstelleLemmatizer
- Parameter:
toks
- An array of the tokenstags
- an array of the pos tags- Gibt zurück:
- An array of possible lemmas for each token in the
toks
sequence.
-
lemmatize
Beschreibung aus Schnittstelle kopiert:Lemmatizer
Generates lemma tags for the word and postag.- Angegeben von:
lemmatize
in SchnittstelleLemmatizer
- Parameter:
toks
- An array of the tokenstags
- An array of the pos tags- Gibt zurück:
- A list of every possible lemma for each token in the
toks
sequence.
-
predictSES
Predict Short Edit Script (automatically induced lemma class).- Parameter:
toks
- An array of tokens.tags
- An array of postags.- Gibt zurück:
- An array of possible lemma classes for each token in
toks
.
-
predictLemmas
Predict all possible lemmas (using a default upper bound).- Parameter:
numLemmas
- The default number of lemmastoks
- An array of tokens.tags
- An array of postags.- Gibt zurück:
- A 2-dimensional array containing all possible lemmas for each token and postag pair.
-
decodeLemmas
Decodes the lemma from the word and the induced lemma class.- Parameter:
toks
- An array of tokens.preds
- An array of predicted lemma classes.- Gibt zurück:
- The array of decoded lemmas.
-
encodeLemmas
Encodes the word given its lemmas.- Parameter:
toks
- An array of tokens.lemmas
- An array of lemmas.- Gibt zurück:
- The array of lemma classes.
-
topKSequences
- Parameter:
sentence
- An array of tokens.tags
- An array of postags.- Gibt zurück:
- Retrieves the top-k
sequences
.
-
topKSequences
- Parameter:
sentence
- An array of tokens.tags
- An array of postags.minSequenceScore
- The minimum score to be achieved.- Gibt zurück:
- Retrieves the top-k
sequences
.
-
probs
public void probs(double[] probs) Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize(String[], String[])
.The specified array should be at least as large as the number of tokens in the previous call to
lemmatize(String[], String[])
.- Parameter:
probs
- An array used to hold the probabilities of the last decoded sequence.
-
probs
public double[] probs()Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call tolemmatize(String[], String[])
.- Gibt zurück:
- An array with the same number of probabilities as tokens were sent to
lemmatize(String[], String[])
when it was last called.
-
train
public static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) throws IOException Starts a training of aLemmatizerModel
with the given parameters.- Parameter:
languageCode
- The ISO conform language code.samples
- TheObjectStream
ofLemmaSample
used as input for training.params
- TheTrainingParameters
for the context of the training.factory
- TheLemmatizerFactory
for creating related objects defined viaparams
.- Gibt zurück:
- A valid, trained
LemmatizerModel
instance. - Löst aus:
IOException
- Thrown if IO errors occurred.
-
topKLemmaClasses
- Parameter:
sentence
- An array of tokens.tags
- An array of postags.- Gibt zurück:
- Retrieves the top-k
lemma classes
.
-
topKLemmaClasses
- Parameter:
sentence
- An array of tokens.tags
- An array of postags.minSequenceScore
- The minimum score to be achieved.- Gibt zurück:
- Retrieves the top-k
lemma classes
.
-