Package opennlp.tools.doccat
Class DocumentCategorizerME
- java.lang.Object
- 
- opennlp.tools.doccat.DocumentCategorizerME
 
- 
- All Implemented Interfaces:
- DocumentCategorizer
 
 public class DocumentCategorizerME extends Object implements DocumentCategorizer Maxent implementation ofDocumentCategorizer.
- 
- 
Constructor SummaryConstructors Constructor Description DocumentCategorizerME(DoccatModel model)Initializes the current instance with a doccat model.
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]categorize(String[] text)Categorizes the given text.double[]categorize(String[] text, Map<String,Object> extraInformation)Categorize the given text provided as tokens along with the provided extra informationStringgetAllResults(double[] results)get the name of the category associated with the given probabiltiesStringgetBestCategory(double[] outcome)get the best category from previously generated outcome probabilitiesStringgetCategory(int index)get the category at a given indexintgetIndex(String category)get the index of a certain categoryintgetNumberOfCategories()get the number of categoriesMap<String,Double>scoreMap(String[] text)Returns a map in which the key is the category name and the value is the scoreSortedMap<Double,Set<String>>sortedScoreMap(String[] text)Returns a map with the score as a key in ascending order.static DoccatModeltrain(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory)
 
- 
- 
- 
Constructor Detail- 
DocumentCategorizerMEpublic DocumentCategorizerME(DoccatModel model) Initializes the current instance with a doccat model. Default feature generation is used.- Parameters:
- model- the doccat model
 
 
- 
 - 
Method Detail- 
categorizepublic double[] categorize(String[] text, Map<String,Object> extraInformation) Categorize the given text provided as tokens along with the provided extra information- Specified by:
- categorizein interface- DocumentCategorizer
- Parameters:
- text- text tokens to categorize
- extraInformation- additional information
- Returns:
- per category probabilities
 
 - 
categorizepublic double[] categorize(String[] text) Categorizes the given text.- Specified by:
- categorizein interface- DocumentCategorizer
- Parameters:
- text- the text to categorize
- Returns:
- per category probabilities
 
 - 
scoreMappublic Map<String,Double> scoreMap(String[] text) Returns a map in which the key is the category name and the value is the score- Specified by:
- scoreMapin interface- DocumentCategorizer
- Parameters:
- text- the input text to classify
- Returns:
- the score map
 
 - 
sortedScoreMappublic SortedMap<Double,Set<String>> sortedScoreMap(String[] text) Returns a map with the score as a key in ascending order. The value is a Set of categories with the score. Many categories can have the same score, hence the Set as value- Specified by:
- sortedScoreMapin interface- DocumentCategorizer
- Parameters:
- text- the input text to classify
- Returns:
- the sorted score map
 
 - 
getBestCategorypublic String getBestCategory(double[] outcome) Description copied from interface:DocumentCategorizerget the best category from previously generated outcome probabilities- Specified by:
- getBestCategoryin interface- DocumentCategorizer
- Parameters:
- outcome- a vector of outcome probabilities
- Returns:
- the best category String
 
 - 
getIndexpublic int getIndex(String category) Description copied from interface:DocumentCategorizerget the index of a certain category- Specified by:
- getIndexin interface- DocumentCategorizer
- Parameters:
- category- the category
- Returns:
- an index
 
 - 
getCategorypublic String getCategory(int index) Description copied from interface:DocumentCategorizerget the category at a given index- Specified by:
- getCategoryin interface- DocumentCategorizer
- Parameters:
- index- the index
- Returns:
- a category
 
 - 
getNumberOfCategoriespublic int getNumberOfCategories() Description copied from interface:DocumentCategorizerget the number of categories- Specified by:
- getNumberOfCategoriesin interface- DocumentCategorizer
- Returns:
- the no. of categories
 
 - 
getAllResultspublic String getAllResults(double[] results) Description copied from interface:DocumentCategorizerget the name of the category associated with the given probabilties- Specified by:
- getAllResultsin interface- DocumentCategorizer
- Parameters:
- results- the probabilities of each category
- Returns:
- the name of the outcome
 
 - 
trainpublic static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException - Throws:
- IOException
 
 
- 
 
-