Package opennlp.tools.postag
Class POSTaggerME
- java.lang.Object
-
- opennlp.tools.postag.POSTaggerME
-
- All Implemented Interfaces:
POSTagger
public class POSTaggerME extends Object implements POSTagger
Apart-of-speech taggerthat uses maximum entropy.Tries to predict whether words are nouns, verbs, or any of 70 other POS tags depending on their surrounding context.
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_BEAM_SIZE
-
Constructor Summary
Constructors Constructor Description POSTaggerME(String language)Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.POSTaggerME(POSModel model)Initializes aPOSTaggerMEwith the providedmodel.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static DictionarybuildNGramDictionary(ObjectStream<POSSample> samples, int cutoff)Constructs anGram dictionaryfrom anObjectStreamof samples.String[]getAllPosTags()String[]getOrderedTags(List<String> words, List<String> tags, int index)String[]getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)static voidpopulatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff)Populates aPOSDictionaryfrom anObjectStreamof samples.double[]probs()voidprobs(double[] probs)Populates the specified array with the probabilities for each tag of the last tagged sentence.String[][]tag(int numTaggings, String[] sentence)Returns at most the specifiednumTaggingsfor the specifiedsentence.String[]tag(String[] sentence)Assigns the sentence of tokens pos tags.String[]tag(String[] sentence, Object[] additionalContext)Assigns the sentence of tokens pos tags.Sequence[]topKSequences(String[] sentence)Assigns the sentence the top-ksequences.Sequence[]topKSequences(String[] sentence, Object[] additionalContext)Assigns the sentence the top-ksequences.static POSModeltrain(String languageCode, ObjectStream<POSSample> samples, TrainingParameters trainParams, POSTaggerFactory posFactory)
-
-
-
Field Detail
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
POSTaggerME
public POSTaggerME(String language) throws IOException
Initializes aPOSTaggerMEby downloading a default model for a givenlanguage.- Parameters:
language- An ISO conform language code.- Throws:
IOException- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
public POSTaggerME(POSModel model)
Initializes aPOSTaggerMEwith the providedmodel.- Parameters:
model- A validPOSModel.
-
-
Method Detail
-
getAllPosTags
public String[] getAllPosTags()
- Returns:
- Retrieves an array of all possible part-of-speech tags from the tagger.
-
tag
public String[] tag(String[] sentence)
Description copied from interface:POSTaggerAssigns the sentence of tokens pos tags.
-
tag
public String[] tag(String[] sentence, Object[] additionalContext)
Description copied from interface:POSTaggerAssigns the sentence of tokens pos tags.
-
tag
public String[][] tag(int numTaggings, String[] sentence)
Returns at most the specifiednumTaggingsfor the specifiedsentence.- Parameters:
numTaggings- The number of tagging to be returned.sentence- An array of tokens which make up a sentence.- Returns:
- At most the specified number of taggings for the specified
sentence.
-
topKSequences
public Sequence[] topKSequences(String[] sentence)
Description copied from interface:POSTaggerAssigns the sentence the top-ksequences.- Specified by:
topKSequencesin interfacePOSTagger- Parameters:
sentence- The sentence of tokens to be tagged.- Returns:
- An array of
sequencesfor each token provided insentence.
-
topKSequences
public Sequence[] topKSequences(String[] sentence, Object[] additionalContext)
Description copied from interface:POSTaggerAssigns the sentence the top-ksequences.- Specified by:
topKSequencesin interfacePOSTagger- Parameters:
sentence- The sentence of tokens to be tagged.additionalContext- The context to provide additional information with.- Returns:
- An array of
sequencesfor each token provided insentence.
-
probs
public void probs(double[] probs)
Populates the specified array with the probabilities for each tag of the last tagged sentence.- Parameters:
probs- An array to put the probabilities into.
-
probs
public double[] probs()
- Returns:
- An array with the probabilities for each tag of the last tagged sentence.
-
getOrderedTags
public String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
-
train
public static POSModel train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters trainParams, POSTaggerFactory posFactory) throws IOException
- Throws:
IOException
-
buildNGramDictionary
public static Dictionary buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff) throws IOException
Constructs anGram dictionaryfrom anObjectStreamof samples.- Parameters:
samples- TheObjectStreamto process.cutoff- A non-negative cut-off value.- Returns:
- A valid
Dictionaryinstance holding nGrams. - Throws:
IOException- Thrown if IO errors occurred during dictionary construction.
-
populatePOSDictionary
public static void populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) throws IOException
Populates aPOSDictionaryfrom anObjectStreamof samples.- Parameters:
samples- TheObjectStreamto process.dict- TheMutableTagDictionaryto use during population.cutoff- A non-negative cut-off value.- Throws:
IOException- Thrown if IO errors occurred during dictionary construction.
-
-