Package opennlp.tools.doccat
Class DocumentCategorizerME
- java.lang.Object
-
- opennlp.tools.doccat.DocumentCategorizerME
-
- All Implemented Interfaces:
DocumentCategorizer
public class DocumentCategorizerME extends Object implements DocumentCategorizer
Maxent implementation ofDocumentCategorizer.
-
-
Constructor Summary
Constructors Constructor Description DocumentCategorizerME(DoccatModel model)Initializes the current instance with a doccat model.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]categorize(String[] text)Categorizes the given text.double[]categorize(String[] text, Map<String,Object> extraInformation)Categorize the given text provided as tokens along with the provided extra informationStringgetAllResults(double[] results)get the name of the category associated with the given probabiltiesStringgetBestCategory(double[] outcome)get the best category from previously generated outcome probabilitiesStringgetCategory(int index)get the category at a given indexintgetIndex(String category)get the index of a certain categoryintgetNumberOfCategories()get the number of categoriesMap<String,Double>scoreMap(String[] text)Returns a map in which the key is the category name and the value is the scoreSortedMap<Double,Set<String>>sortedScoreMap(String[] text)Returns a map with the score as a key in ascending order.static DoccatModeltrain(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory)
-
-
-
Constructor Detail
-
DocumentCategorizerME
public DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model. Default feature generation is used.- Parameters:
model- the doccat model
-
-
Method Detail
-
categorize
public double[] categorize(String[] text, Map<String,Object> extraInformation)
Categorize the given text provided as tokens along with the provided extra information- Specified by:
categorizein interfaceDocumentCategorizer- Parameters:
text- text tokens to categorizeextraInformation- additional information- Returns:
- per category probabilities
-
categorize
public double[] categorize(String[] text)
Categorizes the given text.- Specified by:
categorizein interfaceDocumentCategorizer- Parameters:
text- the text to categorize- Returns:
- per category probabilities
-
scoreMap
public Map<String,Double> scoreMap(String[] text)
Returns a map in which the key is the category name and the value is the score- Specified by:
scoreMapin interfaceDocumentCategorizer- Parameters:
text- the input text to classify- Returns:
- the score map
-
sortedScoreMap
public SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
Returns a map with the score as a key in ascending order. The value is a Set of categories with the score. Many categories can have the same score, hence the Set as value- Specified by:
sortedScoreMapin interfaceDocumentCategorizer- Parameters:
text- the input text to classify- Returns:
- the sorted score map
-
getBestCategory
public String getBestCategory(double[] outcome)
Description copied from interface:DocumentCategorizerget the best category from previously generated outcome probabilities- Specified by:
getBestCategoryin interfaceDocumentCategorizer- Parameters:
outcome- a vector of outcome probabilities- Returns:
- the best category String
-
getIndex
public int getIndex(String category)
Description copied from interface:DocumentCategorizerget the index of a certain category- Specified by:
getIndexin interfaceDocumentCategorizer- Parameters:
category- the category- Returns:
- an index
-
getCategory
public String getCategory(int index)
Description copied from interface:DocumentCategorizerget the category at a given index- Specified by:
getCategoryin interfaceDocumentCategorizer- Parameters:
index- the index- Returns:
- a category
-
getNumberOfCategories
public int getNumberOfCategories()
Description copied from interface:DocumentCategorizerget the number of categories- Specified by:
getNumberOfCategoriesin interfaceDocumentCategorizer- Returns:
- the no. of categories
-
getAllResults
public String getAllResults(double[] results)
Description copied from interface:DocumentCategorizerget the name of the category associated with the given probabilties- Specified by:
getAllResultsin interfaceDocumentCategorizer- Parameters:
results- the probabilities of each category- Returns:
- the name of the outcome
-
train
public static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException
- Throws:
IOException
-
-