Class LemmatizerME

java.lang.Object
opennlp.tools.lemmatizer.LemmatizerME
All Implemented Interfaces:
Lemmatizer

public class LemmatizerME extends Object implements Lemmatizer
A probabilistic Lemmatizer implementation.

Tries to predict the induced permutation class for each word depending on its surrounding context.

Based on Grzegorz ChrupaƂa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University

  • Field Details

  • Constructor Details

  • Method Details

    • lemmatize

      public String[] lemmatize(String[] toks, String[] tags)
      Description copied from interface: Lemmatizer
      Generates lemmas for the word and postag.
      Specified by:
      lemmatize in interface Lemmatizer
      Parameters:
      toks - An array of the tokens
      tags - an array of the pos tags
      Returns:
      An array of possible lemmas for each token in the toks sequence.
    • lemmatize

      public List<List<String>> lemmatize(List<String> toks, List<String> tags)
      Description copied from interface: Lemmatizer
      Generates lemma tags for the word and postag.
      Specified by:
      lemmatize in interface Lemmatizer
      Parameters:
      toks - An array of the tokens
      tags - An array of the pos tags
      Returns:
      A list of every possible lemma for each token in the toks sequence.
    • predictSES

      public String[] predictSES(String[] toks, String[] tags)
      Predict Short Edit Script (automatically induced lemma class).
      Parameters:
      toks - An array of tokens.
      tags - An array of postags.
      Returns:
      An array of possible lemma classes for each token in toks.
    • predictLemmas

      public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
      Predict all possible lemmas (using a default upper bound).
      Parameters:
      numLemmas - The default number of lemmas
      toks - An array of tokens.
      tags - An array of postags.
      Returns:
      A 2-dimensional array containing all possible lemmas for each token and postag pair.
    • decodeLemmas

      public static String[] decodeLemmas(String[] toks, String[] preds)
      Decodes the lemma from the word and the induced lemma class.
      Parameters:
      toks - An array of tokens.
      preds - An array of predicted lemma classes.
      Returns:
      The array of decoded lemmas.
    • encodeLemmas

      public static String[] encodeLemmas(String[] toks, String[] lemmas)
      Encodes the word given its lemmas.
      Parameters:
      toks - An array of tokens.
      lemmas - An array of lemmas.
      Returns:
      The array of lemma classes.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence, String[] tags)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      Returns:
      Retrieves the top-k sequences.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      minSequenceScore - The minimum score to be achieved.
      Returns:
      Retrieves the top-k sequences.
    • probs

      public void probs(double[] probs)
      Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).

      The specified array should be at least as large as the number of tokens in the previous call to lemmatize(String[], String[]).

      Parameters:
      probs - An array used to hold the probabilities of the last decoded sequence.
    • probs

      public double[] probs()
      Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).
      Returns:
      An array with the same number of probabilities as tokens were sent to lemmatize(String[], String[]) when it was last called.
    • train

      public static LemmatizerModel train(String languageCode, ObjectStream<LemmaSample> samples, TrainingParameters params, LemmatizerFactory factory) throws IOException
      Starts a training of a LemmatizerModel with the given parameters.
      Parameters:
      languageCode - The ISO conform language code.
      samples - The ObjectStream of LemmaSample used as input for training.
      params - The TrainingParameters for the context of the training.
      factory - The LemmatizerFactory for creating related objects defined via params.
      Returns:
      A valid, trained LemmatizerModel instance.
      Throws:
      IOException - Thrown if IO errors occurred.
    • topKLemmaClasses

      public Sequence[] topKLemmaClasses(String[] sentence, String[] tags)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      Returns:
      Retrieves the top-k lemma classes.
    • topKLemmaClasses

      public Sequence[] topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      minSequenceScore - The minimum score to be achieved.
      Returns:
      Retrieves the top-k lemma classes.