java.lang.Object

opennlp.tools.lemmatizer.LemmatizerME

All Implemented Interfaces:: Lemmatizer

public class LemmatizerME extends Object implements Lemmatizer

A probabilistic Lemmatizer implementation.

Tries to predict the induced permutation class for each word depending on its surrounding context.

Based on Grzegorz Chrupała. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University

Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_BEAM_SIZE

static final int

LEMMA_NUMBER
Constructor Summary

Constructors

Constructor

Description

LemmatizerME(LemmatizerModel model)

Initializes a LemmatizerME with the provided model and a default beam size of 3.
Method Summary

Modifier and Type

Method

Description

static String[]

decodeLemmas(String[] toks, String[] preds)

Decodes the lemma from the word and the induced lemma class.

static String[]

encodeLemmas(String[] toks, String[] lemmas)

Encodes the word given its lemmas.

String[]

lemmatize(String[] toks, String[] tags)

Generates lemmas for the word and postag.

List<List<String>>

lemmatize(List<String> toks, List<String> tags)

Generates lemma tags for the word and postag.

String[][]

predictLemmas(int numLemmas, String[] toks, String[] tags)

Predict all possible lemmas (using a default upper bound).

String[]

predictSES(String[] toks, String[] tags)

Predict Short Edit Script (automatically induced lemma class).

double[]

probs()

Returns an array with the probabilities of the last decoded sequence.

void

probs(double[] probs)

Populates the specified array with the probabilities of the last decoded sequence.

Sequence[]

topKLemmaClasses(String[] sentence, String[] tags)

Sequence[]

topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)

Sequence[]

topKSequences(String[] sentence, String[] tags)

Sequence[]

topKSequences(String[] sentence, String[] tags, double minSequenceScore)

static LemmatizerModel

train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory)

Starts a training of a LemmatizerModel with the given parameters.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LEMMA_NUMBER
  
  public static final int LEMMA_NUMBER
  See Also:
  
  Constant Field Values
- DEFAULT_BEAM_SIZE
  
  public static final int DEFAULT_BEAM_SIZE
  See Also:
  
  Constant Field Values
Constructor Details
- LemmatizerME
  
  public LemmatizerME(LemmatizerModel model)
  
  Initializes a LemmatizerME with the provided model and a default beam size of 3.
  
  Parameters:
  
  model - The LemmatizerModel to be used.
Method Details
- lemmatize
  
  public String[] lemmatize(String[] toks, String[] tags)
  
  Description copied from interface: Lemmatizer
  
  Generates lemmas for the word and postag.
  
  Specified by:
  
  lemmatize in interface Lemmatizer
  
  Parameters:
  
  toks - An array of the tokens
  
  tags - an array of the pos tags
  
  Returns:
  
  An array of possible lemmas for each token in the toks sequence.
- lemmatize
  
  public List<List<String>> lemmatize(List<String> toks, List<String> tags)
  
  Description copied from interface: Lemmatizer
  
  Generates lemma tags for the word and postag.
  
  Specified by:
  
  lemmatize in interface Lemmatizer
  
  Parameters:
  
  toks - An array of the tokens
  
  tags - An array of the pos tags
  
  Returns:
  
  A list of every possible lemma for each token in the toks sequence.
- predictSES
  
  public String[] predictSES(String[] toks, String[] tags)
  
  Predict Short Edit Script (automatically induced lemma class).
  
  Parameters:
  
  toks - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  An array of possible lemma classes for each token in toks.
- predictLemmas
  
  public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
  
  Predict all possible lemmas (using a default upper bound).
  
  Parameters:
  
  numLemmas - The default number of lemmas
  
  toks - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  A 2-dimensional array containing all possible lemmas for each token and postag pair.
- decodeLemmas
  
  public static String[] decodeLemmas(String[] toks, String[] preds)
  
  Decodes the lemma from the word and the induced lemma class.
  
  Parameters:
  
  toks - An array of tokens.
  
  preds - An array of predicted lemma classes.
  
  Returns:
  
  The array of decoded lemmas.
- encodeLemmas
  
  public static String[] encodeLemmas(String[] toks, String[] lemmas)
  
  Encodes the word given its lemmas.
  
  Parameters:
  
  toks - An array of tokens.
  
  lemmas - An array of lemmas.
  
  Returns:
  
  The array of lemma classes.
- topKSequences
  
  public Sequence[] topKSequences(String[] sentence, String[] tags)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  Retrieves the top-k sequences.
- topKSequences
  
  public Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  minSequenceScore - The minimum score to be achieved.
  
  Returns:
  
  Retrieves the top-k sequences.
- probs
  
  public void probs(double[] probs)
  
  Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).
  The specified array should be at least as large as the number of tokens in the previous call to lemmatize(String[], String[]).
  
  Parameters:
  
  probs - An array used to hold the probabilities of the last decoded sequence.
- probs
  
  public double[] probs()
  
  Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).
  
  Returns:
  
  An array with the same number of probabilities as tokens were sent to lemmatize(String[], String[]) when it was last called.
- train
  
  public static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) throws IOException
  
  Starts a training of a LemmatizerModel with the given parameters.
  
  Parameters:
  
  languageCode - The ISO conform language code.
  
  samples - The ObjectStream of LemmaSample used as input for training.
  
  params - The TrainingParameters for the context of the training.
  
  factory - The LemmatizerFactory for creating related objects defined via params.
  
  Returns:
  
  A valid, trained LemmatizerModel instance.
  
  Throws:
  
  IOException - Thrown if IO errors occurred.
- topKLemmaClasses
  
  public Sequence[] topKLemmaClasses(String[] sentence, String[] tags)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  Retrieves the top-k lemma classes.
- topKLemmaClasses
  
  public Sequence[] topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  minSequenceScore - The minimum score to be achieved.
  
  Returns:
  
  Retrieves the top-k lemma classes.

Class LemmatizerME

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LEMMA_NUMBER

DEFAULT_BEAM_SIZE

Constructor Details

LemmatizerME

Method Details

lemmatize

lemmatize

predictSES

predictLemmas

decodeLemmas

encodeLemmas

topKSequences

topKSequences

probs

probs

train

topKLemmaClasses

topKLemmaClasses