Package opennlp.tools.doccat
Class DocumentCategorizerME
- java.lang.Object
-
- opennlp.tools.doccat.DocumentCategorizerME
-
- All Implemented Interfaces:
DocumentCategorizer
public class DocumentCategorizerME extends Object implements DocumentCategorizer
A Max-Ent based implementation ofDocumentCategorizer.
-
-
Constructor Summary
Constructors Constructor Description DocumentCategorizerME(DoccatModel model)Initializes aDocumentCategorizerMEinstance with a doccat model.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]categorize(String[] text)Categorizes the giventext, provided in separate tokens.double[]categorize(String[] text, Map<String,Object> extraInformation)Categorize the giventextprovided as tokens along with the provided extra information.StringgetAllResults(double[] results)Retrieves the name of the category associated with the given probabilities.StringgetBestCategory(double[] outcome)Retrieves the best category from previously generatedoutcomeprobabilitiesStringgetCategory(int index)Retrieves the category at a givenindex.intgetIndex(String category)Retrieves the index of a certain category.intgetNumberOfCategories()Retrieves the number of categories.Map<String,Double>scoreMap(String[] text)Retrieves aMapin which the key is the category name and the value is the score.SortedMap<Double,Set<String>>sortedScoreMap(String[] text)Retrieves aSortedMapof the scores sorted in ascending order, together with their associated categories.static DoccatModeltrain(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory)Starts a training of aDoccatModelwith the given parameters.
-
-
-
Constructor Detail
-
DocumentCategorizerME
public DocumentCategorizerME(DoccatModel model)
Initializes aDocumentCategorizerMEinstance with a doccat model. Default feature generation is used.- Parameters:
model- theDoccatModelto be used for categorization.
-
-
Method Detail
-
categorize
public double[] categorize(String[] text, Map<String,Object> extraInformation)
Categorize the giventextprovided as tokens along with the provided extra information.- Specified by:
categorizein interfaceDocumentCategorizer- Parameters:
text- The text tokens to categorize.extraInformation- Additional information for context to be used by the feature generator.- Returns:
- The per category probabilities.
-
categorize
public double[] categorize(String[] text)
Description copied from interface:DocumentCategorizerCategorizes the giventext, provided in separate tokens.- Specified by:
categorizein interfaceDocumentCategorizer- Parameters:
text- The tokens of text to categorize.- Returns:
- The per category probabilities.
-
scoreMap
public Map<String,Double> scoreMap(String[] text)
Description copied from interface:DocumentCategorizerRetrieves aMapin which the key is the category name and the value is the score.- Specified by:
scoreMapin interfaceDocumentCategorizer- Parameters:
text- The tokenized input text to classify.- Returns:
- A
Mapwith the score as a key.
-
sortedScoreMap
public SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
Description copied from interface:DocumentCategorizerRetrieves aSortedMapof the scores sorted in ascending order, together with their associated categories.Many categories can have the same score, hence the
Setas value.- Specified by:
sortedScoreMapin interfaceDocumentCategorizer- Parameters:
text- the input text to classify- Returns:
- A
SortedMapwith the score as a key.
-
getBestCategory
public String getBestCategory(double[] outcome)
Description copied from interface:DocumentCategorizerRetrieves the best category from previously generatedoutcomeprobabilities- Specified by:
getBestCategoryin interfaceDocumentCategorizer- Parameters:
outcome- An array of computed outcome probabilities.- Returns:
- The best category represented as String.
-
getIndex
public int getIndex(String category)
Description copied from interface:DocumentCategorizerRetrieves the index of a certain category.- Specified by:
getIndexin interfaceDocumentCategorizer- Parameters:
category- The category for which theindexis to be found.- Returns:
- The index.
-
getCategory
public String getCategory(int index)
Description copied from interface:DocumentCategorizerRetrieves the category at a givenindex.- Specified by:
getCategoryin interfaceDocumentCategorizer- Parameters:
index- The index for which thecategoryshall be found.- Returns:
- The category represented as String.
-
getNumberOfCategories
public int getNumberOfCategories()
Description copied from interface:DocumentCategorizerRetrieves the number of categories.- Specified by:
getNumberOfCategoriesin interfaceDocumentCategorizer- Returns:
- The no. of categories.
-
getAllResults
public String getAllResults(double[] results)
Description copied from interface:DocumentCategorizerRetrieves the name of the category associated with the given probabilities.- Specified by:
getAllResultsin interfaceDocumentCategorizer- Parameters:
results- The probabilities of each category.- Returns:
- The name of the outcome.
-
train
public static DoccatModel train(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException
Starts a training of aDoccatModelwith the given parameters.- Parameters:
lang- The ISO conform language code.samples- TheObjectStreamofDocumentSampleused as input for training.mlParams- TheTrainingParametersfor the context of the training.factory- TheDoccatFactoryfor creating related objects defined viamlParams.- Returns:
- A valid, trained
DoccatModelinstance. - Throws:
IOException- Thrown if IO errors occurred.
-
-