Package opennlp.tools.doccat
Class DocumentCategorizerME
- java.lang.Object
-
- opennlp.tools.doccat.DocumentCategorizerME
-
- All Implemented Interfaces:
DocumentCategorizer
public class DocumentCategorizerME extends Object implements DocumentCategorizer
Maxent implementation ofDocumentCategorizer
.
-
-
Constructor Summary
Constructors Constructor Description DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]
categorize(String[] text)
Categorizes the given text.double[]
categorize(String[] text, Map<String,Object> extraInformation)
Categorize the given text provided as tokens along with the provided extra informationString
getAllResults(double[] results)
get the name of the category associated with the given probabiltiesString
getBestCategory(double[] outcome)
get the best category from previously generated outcome probabilitiesString
getCategory(int index)
get the category at a given indexint
getIndex(String category)
get the index of a certain categoryint
getNumberOfCategories()
get the number of categoriesMap<String,Double>
scoreMap(String[] text)
Returns a map in which the key is the category name and the value is the scoreSortedMap<Double,Set<String>>
sortedScoreMap(String[] text)
Returns a map with the score as a key in ascending order.static DoccatModel
train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory)
-
-
-
Constructor Detail
-
DocumentCategorizerME
public DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model. Default feature generation is used.- Parameters:
model
- the doccat model
-
-
Method Detail
-
categorize
public double[] categorize(String[] text, Map<String,Object> extraInformation)
Categorize the given text provided as tokens along with the provided extra information- Specified by:
categorize
in interfaceDocumentCategorizer
- Parameters:
text
- text tokens to categorizeextraInformation
- additional information- Returns:
- per category probabilities
-
categorize
public double[] categorize(String[] text)
Categorizes the given text.- Specified by:
categorize
in interfaceDocumentCategorizer
- Parameters:
text
- the text to categorize- Returns:
- per category probabilities
-
scoreMap
public Map<String,Double> scoreMap(String[] text)
Returns a map in which the key is the category name and the value is the score- Specified by:
scoreMap
in interfaceDocumentCategorizer
- Parameters:
text
- the input text to classify- Returns:
- the score map
-
sortedScoreMap
public SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
Returns a map with the score as a key in ascending order. The value is a Set of categories with the score. Many categories can have the same score, hence the Set as value- Specified by:
sortedScoreMap
in interfaceDocumentCategorizer
- Parameters:
text
- the input text to classify- Returns:
- the sorted score map
-
getBestCategory
public String getBestCategory(double[] outcome)
Description copied from interface:DocumentCategorizer
get the best category from previously generated outcome probabilities- Specified by:
getBestCategory
in interfaceDocumentCategorizer
- Parameters:
outcome
- a vector of outcome probabilities- Returns:
- the best category String
-
getIndex
public int getIndex(String category)
Description copied from interface:DocumentCategorizer
get the index of a certain category- Specified by:
getIndex
in interfaceDocumentCategorizer
- Parameters:
category
- the category- Returns:
- an index
-
getCategory
public String getCategory(int index)
Description copied from interface:DocumentCategorizer
get the category at a given index- Specified by:
getCategory
in interfaceDocumentCategorizer
- Parameters:
index
- the index- Returns:
- a category
-
getNumberOfCategories
public int getNumberOfCategories()
Description copied from interface:DocumentCategorizer
get the number of categories- Specified by:
getNumberOfCategories
in interfaceDocumentCategorizer
- Returns:
- the no. of categories
-
getAllResults
public String getAllResults(double[] results)
Description copied from interface:DocumentCategorizer
get the name of the category associated with the given probabilties- Specified by:
getAllResults
in interfaceDocumentCategorizer
- Parameters:
results
- the probabilities of each category- Returns:
- the name of the outcome
-
train
public static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException
- Throws:
IOException
-
-