Class POSTaggerME

java.lang.Object
opennlp.tools.postag.POSTaggerME
All Implemented Interfaces:
POSTagger

public class POSTaggerME extends Object implements POSTagger
A part-of-speech tagger implementation that uses maximum entropy.

Tries to predict whether words are nouns, verbs, or any other POS tags depending on their surrounding context.

See Also:
  • Field Details

    • DEFAULT_BEAM_SIZE

      public static final int DEFAULT_BEAM_SIZE
      The default beam size value is 3.
      See Also:
  • Constructor Details

  • Method Details

    • getAllPosTags

      public String[] getAllPosTags()
      Returns:
      Retrieves an array of all possible part-of-speech tags from the tagger.
    • tag

      public String[] tag(String[] sentence)
      Assigns the sentence of tokens pos tags.
      Specified by:
      tag in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      Returns:
      An array of pos tags for each token provided in sentence.
    • tag

      public String[] tag(String[] sentence, Object[] additionalContext)
      Assigns the sentence of tokens pos tags.
      Specified by:
      tag in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      additionalContext - The context to provide additional information with.
      Returns:
      An array of pos tags for each token provided in sentence.
    • tag

      public String[][] tag(int numTaggings, String[] sentence)
      Returns at most the specified numTaggings for the specified sentence.
      Parameters:
      numTaggings - The number of tagging to be returned.
      sentence - An array of tokens which make up a sentence.
      Returns:
      At most the specified number of taggings for the specified sentence.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence)
      Assigns the sentence the top-k sequences.
      Specified by:
      topKSequences in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      Returns:
      An array of sequences for each token provided in sentence.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence, Object[] additionalContext)
      Assigns the sentence the top-k sequences.
      Specified by:
      topKSequences in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      additionalContext - The context to provide additional information with.
      Returns:
      An array of sequences for each token provided in sentence.
    • probs

      public void probs(double[] probs)
      Populates the specified probs array with the probabilities for each tag of the last tagged sentence.
      Parameters:
      probs - An array to put the probabilities into.
    • probs

      public double[] probs()
      Returns:
      An array with the probabilities for each tag of the last tagged sentence.
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index)
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
    • train

      public static POSModel train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters mlParams, POSTaggerFactory posFactory) throws IOException
      Starts a training of a POSModel with the given parameters.
      Parameters:
      languageCode - The ISO language code to train the model. Must not be null.
      samples - The ObjectStream of POSSample used as input for training.
      mlParams - The TrainingParameters for the context of the training process.
      posFactory - The POSTaggerFactory for creating related objects as defined via mlParams.
      Returns:
      A valid, trained POSModel instance.
      Throws:
      IOException - Thrown if IO errors occurred.
    • buildNGramDictionary

      public static Dictionary buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff) throws IOException
      Constructs a nGram dictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      cutoff - A non-negative cut-off value.
      Returns:
      A valid Dictionary instance holding nGrams.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.
    • populatePOSDictionary

      public static void populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) throws IOException
      Populates a POSDictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      dict - The MutableTagDictionary to use during population.
      cutoff - A non-negative cut-off value.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.