Class LemmatizerME

java.lang.Object
opennlp.tools.lemmatizer.LemmatizerME
All Implemented Interfaces:
opennlp.tools.lemmatizer.Lemmatizer, opennlp.tools.ml.Probabilistic

@ThreadSafe public class LemmatizerME extends Object implements opennlp.tools.lemmatizer.Lemmatizer, opennlp.tools.ml.Probabilistic
A probabilistic Lemmatizer implementation.

Tries to predict the induced permutation class for each word depending on its surrounding context.

A lemmatizer instance is thread-safe. One instance can be shared across multiple threads to save memory.

Note: In container environments with classloader isolation (e.g. Jakarta EE), ensure instances do not outlive the application's lifecycle, as underlying components use ThreadLocal state that may pin the classloader.

Based on Grzegorz ChrupaƂa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University

See Also:
  • Lemmatizer
  • Probabilistic
  • Field Details

  • Constructor Details

  • Method Details

    • lemmatize

      public String[] lemmatize(String[] toks, String[] tags)
      Specified by:
      lemmatize in interface opennlp.tools.lemmatizer.Lemmatizer
    • lemmatize

      public List<List<String>> lemmatize(List<String> toks, List<String> tags)
      Specified by:
      lemmatize in interface opennlp.tools.lemmatizer.Lemmatizer
    • predictSES

      public String[] predictSES(String[] toks, String[] tags)
      Predict Short Edit Script (automatically induced lemma class).
      Parameters:
      toks - An array of tokens.
      tags - An array of postags.
      Returns:
      An array of possible lemma classes for each token in toks.
    • predictLemmas

      public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
      Predict all possible lemmas (using a default upper bound).
      Parameters:
      numLemmas - The default number of lemmas
      toks - An array of tokens.
      tags - An array of postags.
      Returns:
      A 2-dimensional array containing all possible lemmas for each token and postag pair.
    • decodeLemmas

      public static String[] decodeLemmas(String[] toks, String[] preds)
      Decodes the lemma from the word and the induced lemma class.
      Parameters:
      toks - An array of tokens.
      preds - An array of predicted lemma classes.
      Returns:
      The array of decoded lemmas.
    • encodeLemmas

      public static String[] encodeLemmas(String[] toks, String[] lemmas)
      Encodes the word given its lemmas.
      Parameters:
      toks - An array of tokens.
      lemmas - An array of lemmas.
      Returns:
      The array of lemma classes.
    • topKSequences

      public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      Returns:
      Retrieves the top-k sequences.
    • topKSequences

      public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      minSequenceScore - The minimum score to be achieved.
      Returns:
      Retrieves the top-k sequences.
    • probs

      public void probs(double[] probs)
      Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).

      The specified array should be at least as large as the number of tokens in the previous call to lemmatize(String[], String[]).

      Parameters:
      probs - An array used to hold the probabilities of the last decoded sequence.
    • probs

      public double[] probs()
      The sequence was determined based on the previous call to lemmatize(String[], String[]).
      Specified by:
      probs in interface opennlp.tools.ml.Probabilistic
      Returns:
      an array with the same number of probabilities as tokens were sent to lemmatize(String[], String[]) when it was last called
    • clearThreadLocalState

      public void clearThreadLocalState()
      Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the lemmatizer is no longer needed.
    • train

      public static LemmatizerModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.lemmatizer.LemmaSample> samples, opennlp.tools.util.TrainingParameters params, LemmatizerFactory factory) throws IOException
      Starts a training of a LemmatizerModel with the given parameters.
      Parameters:
      languageCode - The ISO conform language code.
      samples - The ObjectStream of LemmaSample used as input for training.
      params - The TrainingParameters for the context of the training.
      factory - The LemmatizerFactory for creating related objects defined via params.
      Returns:
      A valid, trained LemmatizerModel instance.
      Throws:
      IOException - Thrown if IO errors occurred.
    • topKLemmaClasses

      public opennlp.tools.util.Sequence[] topKLemmaClasses(String[] sentence, String[] tags)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      Returns:
      Retrieves the top-k lemma classes.
    • topKLemmaClasses

      public opennlp.tools.util.Sequence[] topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)
      Parameters:
      sentence - An array of tokens.
      tags - An array of postags.
      minSequenceScore - The minimum score to be achieved.
      Returns:
      Retrieves the top-k lemma classes.