Package opennlp.tools.doccat
Class DocumentCategorizerME
java.lang.Object
opennlp.tools.doccat.DocumentCategorizerME
- All Implemented Interfaces:
- DocumentCategorizer
A Max-Ent based implementation of 
DocumentCategorizer.- 
Constructor SummaryConstructorsConstructorDescriptionDocumentCategorizerME(DoccatModel model) Initializes aDocumentCategorizerMEinstance with a doccat model.
- 
Method SummaryModifier and TypeMethodDescriptiondouble[]categorize(String[] text) Categorizes the giventext, provided in separate tokens.double[]categorize(String[] text, Map<String, Object> extraInformation) Categorize the giventextprovided as tokens along with the provided extra information.getAllResults(double[] results) Retrieves the name of the category associated with the given probabilities.getBestCategory(double[] outcome) Retrieves the best category from previously generatedoutcomeprobabilitiesgetCategory(int index) Retrieves the category at a givenindex.intRetrieves the index of a certain category.intRetrieves the number of categories.Retrieves aMapin which the key is the category name and the value is the score.sortedScoreMap(String[] text) Retrieves aSortedMapof the scores sorted in ascending order, together with their associated categories.static DoccatModeltrain(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) Starts a training of aDoccatModelwith the given parameters.
- 
Constructor Details- 
DocumentCategorizerMEInitializes aDocumentCategorizerMEinstance with a doccat model. Default feature generation is used.- Parameters:
- model- the- DoccatModelto be used for categorization.
 
 
- 
- 
Method Details- 
categorizeCategorize the giventextprovided as tokens along with the provided extra information.- Specified by:
- categorizein interface- DocumentCategorizer
- Parameters:
- text- The text tokens to categorize.
- extraInformation- Additional information for context to be used by the feature generator.
- Returns:
- The per category probabilities.
 
- 
categorizeDescription copied from interface:DocumentCategorizerCategorizes the giventext, provided in separate tokens.- Specified by:
- categorizein interface- DocumentCategorizer
- Parameters:
- text- The tokens of text to categorize.
- Returns:
- The per category probabilities.
 
- 
scoreMapDescription copied from interface:DocumentCategorizerRetrieves aMapin which the key is the category name and the value is the score.- Specified by:
- scoreMapin interface- DocumentCategorizer
- Parameters:
- text- The tokenized input text to classify.
- Returns:
- A Mapwith the score as a key.
 
- 
sortedScoreMapDescription copied from interface:DocumentCategorizerRetrieves aSortedMapof the scores sorted in ascending order, together with their associated categories.Many categories can have the same score, hence the Setas value.- Specified by:
- sortedScoreMapin interface- DocumentCategorizer
- Parameters:
- text- the input text to classify
- Returns:
- A SortedMapwith the score as a key.
 
- 
getBestCategoryDescription copied from interface:DocumentCategorizerRetrieves the best category from previously generatedoutcomeprobabilities- Specified by:
- getBestCategoryin interface- DocumentCategorizer
- Parameters:
- outcome- An array of computed outcome probabilities.
- Returns:
- The best category represented as String.
 
- 
getIndexDescription copied from interface:DocumentCategorizerRetrieves the index of a certain category.- Specified by:
- getIndexin interface- DocumentCategorizer
- Parameters:
- category- The category for which the- indexis to be found.
- Returns:
- The index.
 
- 
getCategoryDescription copied from interface:DocumentCategorizerRetrieves the category at a givenindex.- Specified by:
- getCategoryin interface- DocumentCategorizer
- Parameters:
- index- The index for which the- categoryshall be found.
- Returns:
- The category represented as String.
 
- 
getNumberOfCategoriespublic int getNumberOfCategories()Description copied from interface:DocumentCategorizerRetrieves the number of categories.- Specified by:
- getNumberOfCategoriesin interface- DocumentCategorizer
- Returns:
- The no. of categories.
 
- 
getAllResultsDescription copied from interface:DocumentCategorizerRetrieves the name of the category associated with the given probabilities.- Specified by:
- getAllResultsin interface- DocumentCategorizer
- Parameters:
- results- The probabilities of each category.
- Returns:
- The name of the outcome.
 
- 
trainpublic static DoccatModel train(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException Starts a training of aDoccatModelwith the given parameters.- Parameters:
- lang- The ISO conform language code.
- samples- The- ObjectStreamof- DocumentSampleused as input for training.
- mlParams- The- TrainingParametersfor the context of the training.
- factory- The- DoccatFactoryfor creating related objects defined via- mlParams.
- Returns:
- A valid, trained DoccatModelinstance.
- Throws:
- IOException- Thrown if IO errors occurred.
 
 
-