Class DocumentCategorizerME
java.lang.Object
opennlp.tools.doccat.DocumentCategorizerME
- All Implemented Interfaces:
opennlp.tools.doccat.DocumentCategorizer
public class DocumentCategorizerME
extends Object
implements opennlp.tools.doccat.DocumentCategorizer
A Max-Ent based implementation of
DocumentCategorizer.-
Constructor Summary
ConstructorsConstructorDescriptionDocumentCategorizerME(DoccatModel model) Initializes aDocumentCategorizerMEinstance with a doccat model. -
Method Summary
Modifier and TypeMethodDescriptiondouble[]categorize(String[] text) double[]categorize(String[] text, Map<String, Object> extraInformation) Categorize the giventextprovided as tokens along with the provided extra information.getAllResults(double[] results) getBestCategory(double[] outcome) getCategory(int index) intintsortedScoreMap(String[] text) static DoccatModeltrain(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, opennlp.tools.util.TrainingParameters mlParams, DoccatFactory factory) Starts a training of aDoccatModelwith the given parameters.
-
Constructor Details
-
DocumentCategorizerME
Initializes aDocumentCategorizerMEinstance with a doccat model. Default feature generation is used.- Parameters:
model- theDoccatModelto be used for categorization.
-
-
Method Details
-
categorize
Categorize the giventextprovided as tokens along with the provided extra information.- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer- Parameters:
text- The text tokens to categorize.extraInformation- Additional information for context to be used by the feature generator.- Returns:
- The per category probabilities.
-
categorize
- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer
-
scoreMap
-
sortedScoreMap
-
getBestCategory
- Specified by:
getBestCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getIndex
- Specified by:
getIndexin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getCategory
- Specified by:
getCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getNumberOfCategories
public int getNumberOfCategories()- Specified by:
getNumberOfCategoriesin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getAllResults
- Specified by:
getAllResultsin interfaceopennlp.tools.doccat.DocumentCategorizer
-
train
public static DoccatModel train(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, opennlp.tools.util.TrainingParameters mlParams, DoccatFactory factory) throws IOException Starts a training of aDoccatModelwith the given parameters.- Parameters:
lang- The ISO conform language code.samples- TheObjectStreamofDocumentSampleused as input for training.mlParams- TheTrainingParametersfor the context of the training.factory- TheDoccatFactoryfor creating related objects defined viamlParams.- Returns:
- A valid, trained
DoccatModelinstance. - Throws:
IOException- Thrown if IO errors occurred.
-