Package opennlp.tools.postag
Class POSTaggerME
java.lang.Object
opennlp.tools.postag.POSTaggerME
- All Implemented Interfaces:
POSTagger
A
part-of-speech tagger
implementation that uses maximum entropy.
Tries to predict whether words are nouns, verbs, or any other POS tags
depending on their surrounding context.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The default beam size value is 3. -
Constructor Summary
ConstructorDescriptionPOSTaggerME
(String language) Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.POSTaggerME
(String language, POSTagFormat format) Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.POSTaggerME
(POSModel model) Initializes aPOSTaggerME
with the providedmodel
.POSTaggerME
(POSModel model, POSTagFormat format) Initializes aPOSTaggerME
with the providedmodel
. -
Method Summary
Modifier and TypeMethodDescriptionstatic Dictionary
buildNGramDictionary
(ObjectStream<POSSample> samples, int cutoff) Constructs anGram dictionary
from anObjectStream
of samples.String[]
String[]
getOrderedTags
(List<String> words, List<String> tags, int index) String[]
getOrderedTags
(List<String> words, List<String> tags, int index, double[] tprobs) static void
populatePOSDictionary
(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) Populates aPOSDictionary
from anObjectStream
of samples.double[]
probs()
void
probs
(double[] probs) Populates the specifiedprobs
array with the probabilities for each tag of the last tagged sentence.String[][]
Returns at most the specifiednumTaggings
for the specifiedsentence
.String[]
Assigns the sentence of tokens pos tags.String[]
Assigns the sentence of tokens pos tags.Sequence[]
topKSequences
(String[] sentence) Assigns the sentence the top-ksequences
.Sequence[]
topKSequences
(String[] sentence, Object[] additionalContext) Assigns the sentence the top-ksequences
.static POSModel
train
(String languageCode, ObjectStream<POSSample> samples, TrainingParameters mlParams, POSTaggerFactory posFactory) Starts a training of aPOSModel
with the given parameters.
-
Field Details
-
DEFAULT_BEAM_SIZE
public static final int DEFAULT_BEAM_SIZEThe default beam size value is 3.- See Also:
-
-
Constructor Details
-
POSTaggerME
Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.- Parameters:
language
- An ISO conform language code.- Throws:
IOException
- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
Initializes aPOSTaggerME
by downloading a default model for a givenlanguage
.- Parameters:
language
- An ISO conform language code.format
- A validPOSTagFormat
.- Throws:
IOException
- Thrown if the model could not be downloaded or saved.
-
POSTaggerME
Initializes aPOSTaggerME
with the providedmodel
.- Parameters:
model
- A validPOSModel
.
-
POSTaggerME
Initializes aPOSTaggerME
with the providedmodel
.- Parameters:
model
- A validPOSModel
.format
- A validPOSTagFormat
.
-
-
Method Details
-
getAllPosTags
- Returns:
- Retrieves an array of all possible part-of-speech tags from the tagger.
-
tag
Assigns the sentence of tokens pos tags. -
tag
Assigns the sentence of tokens pos tags. -
tag
Returns at most the specifiednumTaggings
for the specifiedsentence
.- Parameters:
numTaggings
- The number of tagging to be returned.sentence
- An array of tokens which make up a sentence.- Returns:
- At most the specified number of taggings for the specified
sentence
.
-
topKSequences
Assigns the sentence the top-ksequences
.- Specified by:
topKSequences
in interfacePOSTagger
- Parameters:
sentence
- The sentence of tokens to be tagged.- Returns:
- An array of
sequences
for each token provided insentence
.
-
topKSequences
Assigns the sentence the top-ksequences
.- Specified by:
topKSequences
in interfacePOSTagger
- Parameters:
sentence
- The sentence of tokens to be tagged.additionalContext
- The context to provide additional information with.- Returns:
- An array of
sequences
for each token provided insentence
.
-
probs
public void probs(double[] probs) Populates the specifiedprobs
array with the probabilities for each tag of the last tagged sentence.- Parameters:
probs
- An array to put the probabilities into.
-
probs
public double[] probs()- Returns:
- An array with the probabilities for each tag of the last tagged sentence.
-
getOrderedTags
-
getOrderedTags
-
train
public static POSModel train(String languageCode, ObjectStream<POSSample> samples, TrainingParameters mlParams, POSTaggerFactory posFactory) throws IOException Starts a training of aPOSModel
with the given parameters.- Parameters:
languageCode
- The ISO language code to train the model. Must not benull
.samples
- TheObjectStream
ofPOSSample
used as input for training.mlParams
- TheTrainingParameters
for the context of the training process.posFactory
- ThePOSTaggerFactory
for creating related objects as defined viamlParams
.- Returns:
- A valid, trained
POSModel
instance. - Throws:
IOException
- Thrown if IO errors occurred.
-
buildNGramDictionary
public static Dictionary buildNGramDictionary(ObjectStream<POSSample> samples, int cutoff) throws IOException Constructs anGram dictionary
from anObjectStream
of samples.- Parameters:
samples
- TheObjectStream
to process.cutoff
- A non-negative cut-off value.- Returns:
- A valid
Dictionary
instance holding nGrams. - Throws:
IOException
- Thrown if IO errors occurred during dictionary construction.
-
populatePOSDictionary
public static void populatePOSDictionary(ObjectStream<POSSample> samples, MutableTagDictionary dict, int cutoff) throws IOException Populates aPOSDictionary
from anObjectStream
of samples.- Parameters:
samples
- TheObjectStream
to process.dict
- TheMutableTagDictionary
to use during population.cutoff
- A non-negative cut-off value.- Throws:
IOException
- Thrown if IO errors occurred during dictionary construction.
-