Package opennlp.tools.postag
Class POSTaggerME
- java.lang.Object
-
- opennlp.tools.postag.POSTaggerME
-
- All Implemented Interfaces:
POSTagger
public class POSTaggerME extends Object implements POSTagger
Apart-of-speech tagger
that uses maximum entropy.Tries to predict whether words are nouns, verbs, or any of 70 other POS tags depending on their surrounding context.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BEAM_SIZE
-
Constructor Summary
Constructors Constructor Description POSTaggerME(String language)
Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.POSTaggerME(POSModel model)
Initializes aPOSTaggerME
with the providedmodel
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Dictionary
buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff)
Constructs anGram dictionary
from anObjectStream
of samples.String[]
getAllPosTags()
String[]
getOrderedTags(List<String> words, List<String> tags, int index)
String[]
getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
static void
populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff)
Populates aPOSDictionary
from anObjectStream
of samples.double[]
probs()
void
probs(double[] probs)
Populates the specified array with the probabilities for each tag of the last tagged sentence.String[][]
tag(int numTaggings, String[] sentence)
Returns at most the specifiednumTaggings
for the specifiedsentence
.String[]
tag(String[] sentence)
Assigns the sentence of tokens pos tags.String[]
tag(String[] sentence, Object[] additionalContext)
Assigns the sentence of tokens pos tags.Sequence[]
topKSequences(String[] sentence)
Assigns the sentence the top-ksequences
.Sequence[]
topKSequences(String[] sentence, Object[] additionalContext)
Assigns the sentence the top-ksequences
.static POSModel
train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters trainParams, POSTaggerFactory posFactory)
-
-
-
Field Detail
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
POSTaggerME
public POSTaggerME(String language) throws IOException
Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.- Parameters:
language
- An ISO conform language code.- Throws:
IOException
- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
public POSTaggerME(POSModel model)
Initializes aPOSTaggerME
with the providedmodel
.- Parameters:
model
- A validPOSModel
.
-
-
Method Detail
-
getAllPosTags
public String[] getAllPosTags()
- Returns:
- Retrieves an array of all possible part-of-speech tags from the tagger.
-
tag
public String[] tag(String[] sentence)
Description copied from interface:POSTagger
Assigns the sentence of tokens pos tags.
-
tag
public String[] tag(String[] sentence, Object[] additionalContext)
Description copied from interface:POSTagger
Assigns the sentence of tokens pos tags.
-
tag
public String[][] tag(int numTaggings, String[] sentence)
Returns at most the specifiednumTaggings
for the specifiedsentence
.- Parameters:
numTaggings
- The number of tagging to be returned.sentence
- An array of tokens which make up a sentence.- Returns:
- At most the specified number of taggings for the specified
sentence
.
-
topKSequences
public Sequence[] topKSequences(String[] sentence)
Description copied from interface:POSTagger
Assigns the sentence the top-ksequences
.- Specified by:
topKSequences
in interfacePOSTagger
- Parameters:
sentence
- The sentence of tokens to be tagged.- Returns:
- An array of
sequences
for each token provided insentence
.
-
topKSequences
public Sequence[] topKSequences(String[] sentence, Object[] additionalContext)
Description copied from interface:POSTagger
Assigns the sentence the top-ksequences
.- Specified by:
topKSequences
in interfacePOSTagger
- Parameters:
sentence
- The sentence of tokens to be tagged.additionalContext
- The context to provide additional information with.- Returns:
- An array of
sequences
for each token provided insentence
.
-
probs
public void probs(double[] probs)
Populates the specified array with the probabilities for each tag of the last tagged sentence.- Parameters:
probs
- An array to put the probabilities into.
-
probs
public double[] probs()
- Returns:
- An array with the probabilities for each tag of the last tagged sentence.
-
getOrderedTags
public String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
-
train
public static POSModel train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters trainParams, POSTaggerFactory posFactory) throws IOException
- Throws:
IOException
-
buildNGramDictionary
public static Dictionary buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff) throws IOException
Constructs anGram dictionary
from anObjectStream
of samples.- Parameters:
samples
- TheObjectStream
to process.cutoff
- A non-negative cut-off value.- Returns:
- A valid
Dictionary
instance holding nGrams. - Throws:
IOException
- Thrown if IO errors occurred during dictionary construction.
-
populatePOSDictionary
public static void populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) throws IOException
Populates aPOSDictionary
from anObjectStream
of samples.- Parameters:
samples
- TheObjectStream
to process.dict
- TheMutableTagDictionary
to use during population.cutoff
- A non-negative cut-off value.- Throws:
IOException
- Thrown if IO errors occurred during dictionary construction.
-
-