Class POSTaggerME

java.lang.Object
opennlp.tools.postag.POSTaggerME
All Implemented Interfaces:
POSTagger

public class POSTaggerME extends Object implements POSTagger
A part-of-speech tagger that uses maximum entropy.

Tries to predict whether words are nouns, verbs, or any of 70 other POS tags depending on their surrounding context.

  • Field Details

  • Constructor Details

    • POSTaggerME

      public POSTaggerME(String language) throws IOException
      Initializes a POSTaggerME by downloading a default model for a given language.
      Parameters:
      language - An ISO conform language code.
      Throws:
      IOException - Thrown if the model could not be downloaded or saved.
    • POSTaggerME

      public POSTaggerME(POSModel model)
      Initializes a POSTaggerME with the provided model.
      Parameters:
      model - A valid POSModel.
  • Method Details

    • getAllPosTags

      public String[] getAllPosTags()
      Returns:
      Retrieves an array of all possible part-of-speech tags from the tagger.
    • tag

      public String[] tag(String[] sentence)
      Description copied from interface: POSTagger
      Assigns the sentence of tokens pos tags.
      Specified by:
      tag in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      Returns:
      An array of pos tags for each token provided in sentence.
    • tag

      public String[] tag(String[] sentence, Object[] additionalContext)
      Description copied from interface: POSTagger
      Assigns the sentence of tokens pos tags.
      Specified by:
      tag in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      additionalContext - The context to provide additional information with.
      Returns:
      An array of pos tags for each token provided in sentence.
    • tag

      public String[][] tag(int numTaggings, String[] sentence)
      Returns at most the specified numTaggings for the specified sentence.
      Parameters:
      numTaggings - The number of tagging to be returned.
      sentence - An array of tokens which make up a sentence.
      Returns:
      At most the specified number of taggings for the specified sentence.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence)
      Description copied from interface: POSTagger
      Assigns the sentence the top-k sequences.
      Specified by:
      topKSequences in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      Returns:
      An array of sequences for each token provided in sentence.
    • topKSequences

      public Sequence[] topKSequences(String[] sentence, Object[] additionalContext)
      Description copied from interface: POSTagger
      Assigns the sentence the top-k sequences.
      Specified by:
      topKSequences in interface POSTagger
      Parameters:
      sentence - The sentence of tokens to be tagged.
      additionalContext - The context to provide additional information with.
      Returns:
      An array of sequences for each token provided in sentence.
    • probs

      public void probs(double[] probs)
      Populates the specified array with the probabilities for each tag of the last tagged sentence.
      Parameters:
      probs - An array to put the probabilities into.
    • probs

      public double[] probs()
      Returns:
      An array with the probabilities for each tag of the last tagged sentence.
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index)
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
    • train

      public static POSModel train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters trainParams, POSTaggerFactory posFactory) throws IOException
      Throws:
      IOException
    • buildNGramDictionary

      public static Dictionary buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff) throws IOException
      Constructs a nGram dictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      cutoff - A non-negative cut-off value.
      Returns:
      A valid Dictionary instance holding nGrams.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.
    • populatePOSDictionary

      public static void populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) throws IOException
      Populates a POSDictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      dict - The MutableTagDictionary to use during population.
      cutoff - A non-negative cut-off value.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.