Package opennlp.tools.doccat
Interface DocumentCategorizer
-
- All Known Implementing Classes:
DocumentCategorizerME
public interface DocumentCategorizer
Interface for classes which categorize documents.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description double[]
categorize(String[] text)
Categorizes the giventext
, provided in separate tokens.double[]
categorize(String[] text, Map<String,Object> extraInformation)
Categorizes the giventext
provided as tokens along with the providedextraInformation
.String
getAllResults(double[] results)
Retrieves the name of the category associated with the given probabilities.String
getBestCategory(double[] outcome)
Retrieves the best category from previously generatedoutcome
probabilitiesString
getCategory(int index)
Retrieves the category at a givenindex
.int
getIndex(String category)
Retrieves the index of a certain category.int
getNumberOfCategories()
Retrieves the number of categories.Map<String,Double>
scoreMap(String[] text)
Retrieves aMap
in which the key is the category name and the value is the score.SortedMap<Double,Set<String>>
sortedScoreMap(String[] text)
Retrieves aSortedMap
of the scores sorted in ascending order, together with their associated categories.
-
-
-
Method Detail
-
categorize
double[] categorize(String[] text, Map<String,Object> extraInformation)
Categorizes the giventext
provided as tokens along with the providedextraInformation
.- Parameters:
text
- The tokens of text to categorize.extraInformation
- The extra information used for this context.- Returns:
- The per category probabilities.
-
categorize
double[] categorize(String[] text)
Categorizes the giventext
, provided in separate tokens.- Parameters:
text
- The tokens of text to categorize.- Returns:
- The per category probabilities.
-
getBestCategory
String getBestCategory(double[] outcome)
Retrieves the best category from previously generatedoutcome
probabilities- Parameters:
outcome
- An array of computed outcome probabilities.- Returns:
- The best category represented as String.
-
getIndex
int getIndex(String category)
Retrieves the index of a certain category.- Parameters:
category
- The category for which theindex
is to be found.- Returns:
- The index.
-
getCategory
String getCategory(int index)
Retrieves the category at a givenindex
.- Parameters:
index
- The index for which thecategory
shall be found.- Returns:
- The category represented as String.
-
getNumberOfCategories
int getNumberOfCategories()
Retrieves the number of categories.- Returns:
- The no. of categories.
-
getAllResults
String getAllResults(double[] results)
Retrieves the name of the category associated with the given probabilities.- Parameters:
results
- The probabilities of each category.- Returns:
- The name of the outcome.
-
scoreMap
Map<String,Double> scoreMap(String[] text)
Retrieves aMap
in which the key is the category name and the value is the score.- Parameters:
text
- The tokenized input text to classify.- Returns:
- A
Map
with the score as a key.
-
sortedScoreMap
SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
Retrieves aSortedMap
of the scores sorted in ascending order, together with their associated categories.Many categories can have the same score, hence the
Set
as value.- Parameters:
text
- the input text to classify- Returns:
- A
SortedMap
with the score as a key.
-
-