public class DocumentCategorizerME extends Object implements DocumentCategorizer
DocumentCategorizer
.Constructor and Description |
---|
DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model.
|
DocumentCategorizerME(DoccatModel model,
FeatureGenerator... featureGenerators)
Deprecated.
train a
DoccatModel with a specific
DoccatFactory to customize the FeatureGenerator s |
Modifier and Type | Method and Description |
---|---|
double[] |
categorize(String documentText)
Deprecated.
will be removed after 1.7.1 release. Don't use it.
|
double[] |
categorize(String[] text)
Categorizes the given text.
|
double[] |
categorize(String[] text,
Map<String,Object> extraInformation)
Categorizes the given text, provided in separate tokens.
|
double[] |
categorize(String documentText,
Map<String,Object> extraInformation)
Deprecated.
will be removed after 1.7.1 release. Don't use it.
|
String |
getAllResults(double[] results)
get the name of the category associated with the given probabilties
|
String |
getBestCategory(double[] outcome)
get the best category from previously generated outcome probabilities
|
String |
getCategory(int index)
get the category at a given index
|
int |
getIndex(String category)
get the index of a certain category
|
int |
getNumberOfCategories()
get the number of categories
|
Map<String,Double> |
scoreMap(String text)
Deprecated.
will be removed after 1.7.1 release. Don't use it.
|
Map<String,Double> |
scoreMap(String[] text)
Returns a map in which the key is the category name and the value is the score
|
SortedMap<Double,Set<String>> |
sortedScoreMap(String text)
Deprecated.
will be removed after 1.7.1 release. Don't use it.
|
SortedMap<Double,Set<String>> |
sortedScoreMap(String[] text)
Returns a map with the score as a key in ascending order.
|
static DoccatModel |
train(String languageCode,
ObjectStream<DocumentSample> samples,
TrainingParameters mlParams,
DoccatFactory factory) |
@Deprecated public DocumentCategorizerME(DoccatModel model, FeatureGenerator... featureGenerators)
DoccatModel
with a specific
DoccatFactory
to customize the FeatureGenerator
smodel
- the doccat modelfeatureGenerators
- the feature generatorspublic DocumentCategorizerME(DoccatModel model)
model
- the doccat modelpublic double[] categorize(String[] text, Map<String,Object> extraInformation)
DocumentCategorizer
categorize
in interface DocumentCategorizer
text
- the tokens of text to categorizeextraInformation
- optional extra information to pass for evaluationpublic double[] categorize(String[] text)
categorize
in interface DocumentCategorizer
text
- the text to categorize@Deprecated public double[] categorize(String documentText, Map<String,Object> extraInformation)
DoccatFactory.getTokenizer()
and defaults to
SimpleTokenizer
.categorize
in interface DocumentCategorizer
documentText
- the text to categorizeextraInformation
- extra metadata@Deprecated public double[] categorize(String documentText)
categorize
in interface DocumentCategorizer
documentText
- the text to categorize@Deprecated public Map<String,Double> scoreMap(String text)
scoreMap
in interface DocumentCategorizer
text
- the input text to classifypublic Map<String,Double> scoreMap(String[] text)
scoreMap
in interface DocumentCategorizer
text
- the input text to classify@Deprecated public SortedMap<Double,Set<String>> sortedScoreMap(String text)
sortedScoreMap
in interface DocumentCategorizer
text
- the input text to classifypublic SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
sortedScoreMap
in interface DocumentCategorizer
text
- the input text to classifypublic String getBestCategory(double[] outcome)
DocumentCategorizer
getBestCategory
in interface DocumentCategorizer
outcome
- a vector of outcome probabilitiespublic int getIndex(String category)
DocumentCategorizer
getIndex
in interface DocumentCategorizer
category
- the categorypublic String getCategory(int index)
DocumentCategorizer
getCategory
in interface DocumentCategorizer
index
- the indexpublic int getNumberOfCategories()
DocumentCategorizer
getNumberOfCategories
in interface DocumentCategorizer
public String getAllResults(double[] results)
DocumentCategorizer
getAllResults
in interface DocumentCategorizer
results
- the probabilities of each categorypublic static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException
IOException
Copyright © 2017 The Apache Software Foundation. All rights reserved.