opennlp.tools.lemmatizer.LemmatizerME

All Implemented Interfaces:: opennlp.tools.lemmatizer.Lemmatizer, opennlp.tools.ml.Probabilistic

@ThreadSafe public class LemmatizerME extends Object implements opennlp.tools.lemmatizer.Lemmatizer, opennlp.tools.ml.Probabilistic

A probabilistic Lemmatizer implementation.

Tries to predict the induced permutation class for each word depending on its surrounding context.

A lemmatizer instance is thread-safe. One instance can be shared across multiple threads to save memory.

Note: In container environments with classloader isolation (e.g. Jakarta EE), ensure instances do not outlive the application's lifecycle, as underlying components use ThreadLocal state that may pin the classloader.

Based on Grzegorz Chrupała. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_BEAM_SIZE

static final int

LEMMA_NUMBER
Constructor Summary

Constructors

Constructor

Description

LemmatizerME(LemmatizerModel model)

Initializes a LemmatizerME with the provided model and a default beam size of 3.
Method Summary

Modifier and Type

Method

Description

void

clearThreadLocalState()

Removes thread-local state to prevent classloader leaks in container environments.

static String[]

decodeLemmas(String[] toks, String[] preds)

Decodes the lemma from the word and the induced lemma class.

static String[]

encodeLemmas(String[] toks, String[] lemmas)

Encodes the word given its lemmas.

String[]

lemmatize(String[] toks, String[] tags)

List<List<String>>

lemmatize(List<String> toks, List<String> tags)

String[][]

predictLemmas(int numLemmas, String[] toks, String[] tags)

Predict all possible lemmas (using a default upper bound).

String[]

predictSES(String[] toks, String[] tags)

Predict Short Edit Script (automatically induced lemma class).

double[]

probs()

The sequence was determined based on the previous call to lemmatize(String[], String[]).

void

probs(double[] probs)

Populates the specified array with the probabilities of the last decoded sequence.

opennlp.tools.util.Sequence[]

topKLemmaClasses(String[] sentence, String[] tags)

opennlp.tools.util.Sequence[]

topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)

opennlp.tools.util.Sequence[]

topKSequences(String[] sentence, String[] tags)

opennlp.tools.util.Sequence[]

topKSequences(String[] sentence, String[] tags, double minSequenceScore)

static LemmatizerModel

train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.lemmatizer.LemmaSample> samples, opennlp.tools.util.TrainingParameters params, LemmatizerFactory factory)

Starts a training of a LemmatizerModel with the given parameters.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LEMMA_NUMBER
  
  public static final int LEMMA_NUMBER
  See Also:
  
  Constant Field Values
- DEFAULT_BEAM_SIZE
  
  public static final int DEFAULT_BEAM_SIZE
  See Also:
  
  Constant Field Values
Constructor Details
- LemmatizerME
  
  public LemmatizerME(LemmatizerModel model)
  
  Initializes a LemmatizerME with the provided model and a default beam size of 3.
  
  Parameters:
  
  model - The LemmatizerModel to be used.
Method Details
- lemmatize
  
  public String[] lemmatize(String[] toks, String[] tags)
  
  Specified by:
  
  lemmatize in interface opennlp.tools.lemmatizer.Lemmatizer
- lemmatize
  
  public List<List<String>> lemmatize(List<String> toks, List<String> tags)
  
  Specified by:
  
  lemmatize in interface opennlp.tools.lemmatizer.Lemmatizer
- predictSES
  
  public String[] predictSES(String[] toks, String[] tags)
  
  Predict Short Edit Script (automatically induced lemma class).
  
  Parameters:
  
  toks - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  An array of possible lemma classes for each token in toks.
- predictLemmas
  
  public String[][] predictLemmas(int numLemmas, String[] toks, String[] tags)
  
  Predict all possible lemmas (using a default upper bound).
  
  Parameters:
  
  numLemmas - The default number of lemmas
  
  toks - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  A 2-dimensional array containing all possible lemmas for each token and postag pair.
- decodeLemmas
  
  public static String[] decodeLemmas(String[] toks, String[] preds)
  
  Decodes the lemma from the word and the induced lemma class.
  
  Parameters:
  
  toks - An array of tokens.
  
  preds - An array of predicted lemma classes.
  
  Returns:
  
  The array of decoded lemmas.
- encodeLemmas
  
  public static String[] encodeLemmas(String[] toks, String[] lemmas)
  
  Encodes the word given its lemmas.
  
  Parameters:
  
  toks - An array of tokens.
  
  lemmas - An array of lemmas.
  
  Returns:
  
  The array of lemma classes.
- topKSequences
  
  public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  Retrieves the top-k sequences.
- topKSequences
  
  public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  minSequenceScore - The minimum score to be achieved.
  
  Returns:
  
  Retrieves the top-k sequences.
- probs
  
  public void probs(double[] probs)
  
  Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize(String[], String[]).
  The specified array should be at least as large as the number of tokens in the previous call to lemmatize(String[], String[]).
  
  Parameters:
  
  probs - An array used to hold the probabilities of the last decoded sequence.
- probs
  
  public double[] probs()
  
  The sequence was determined based on the previous call to lemmatize(String[], String[]).
  
  Specified by:
  
  probs in interface opennlp.tools.ml.Probabilistic
  
  Returns:
  
  an array with the same number of probabilities as tokens were sent to lemmatize(String[], String[]) when it was last called
- clearThreadLocalState
  
  public void clearThreadLocalState()
  
  Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the lemmatizer is no longer needed.
- train
  
  public static LemmatizerModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.lemmatizer.LemmaSample> samples, opennlp.tools.util.TrainingParameters params, LemmatizerFactory factory) throws IOException
  
  Starts a training of a LemmatizerModel with the given parameters.
  
  Parameters:
  
  languageCode - The ISO conform language code.
  
  samples - The ObjectStream of LemmaSample used as input for training.
  
  params - The TrainingParameters for the context of the training.
  
  factory - The LemmatizerFactory for creating related objects defined via params.
  
  Returns:
  
  A valid, trained LemmatizerModel instance.
  
  Throws:
  
  IOException - Thrown if IO errors occurred.
- topKLemmaClasses
  
  public opennlp.tools.util.Sequence[] topKLemmaClasses(String[] sentence, String[] tags)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  Returns:
  
  Retrieves the top-k lemma classes.
- topKLemmaClasses
  
  public opennlp.tools.util.Sequence[] topKLemmaClasses(String[] sentence, String[] tags, double minSequenceScore)
  
  Parameters:
  
  sentence - An array of tokens.
  
  tags - An array of postags.
  
  minSequenceScore - The minimum score to be achieved.
  
  Returns:
  
  Retrieves the top-k lemma classes.

Class LemmatizerME

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LEMMA_NUMBER

DEFAULT_BEAM_SIZE

Constructor Details

LemmatizerME

Method Details

lemmatize

lemmatize

predictSES

predictLemmas

decodeLemmas

encodeLemmas

topKSequences

topKSequences

probs

probs

clearThreadLocalState

train

topKLemmaClasses

topKLemmaClasses