Package opennlp.tools.doccat
Klasse DocumentCategorizerME
java.lang.Object
opennlp.tools.doccat.DocumentCategorizerME
- Alle implementierten Schnittstellen:
DocumentCategorizer
A Max-Ent based implementation of
DocumentCategorizer
.-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungDocumentCategorizerME
(DoccatModel model) Initializes aDocumentCategorizerME
instance with a doccat model. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungdouble[]
categorize
(String[] text) Categorizes the giventext
, provided in separate tokens.double[]
categorize
(String[] text, Map<String, Object> extraInformation) Categorize the giventext
provided as tokens along with the provided extra information.getAllResults
(double[] results) Retrieves the name of the category associated with the given probabilities.getBestCategory
(double[] outcome) Retrieves the best category from previously generatedoutcome
probabilitiesgetCategory
(int index) Retrieves the category at a givenindex
.int
Retrieves the index of a certain category.int
Retrieves the number of categories.Retrieves aMap
in which the key is the category name and the value is the score.sortedScoreMap
(String[] text) Retrieves aSortedMap
of the scores sorted in ascending order, together with their associated categories.static DoccatModel
train
(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) Starts a training of aDoccatModel
with the given parameters.
-
Konstruktordetails
-
DocumentCategorizerME
Initializes aDocumentCategorizerME
instance with a doccat model. Default feature generation is used.- Parameter:
model
- theDoccatModel
to be used for categorization.
-
-
Methodendetails
-
categorize
Categorize the giventext
provided as tokens along with the provided extra information.- Angegeben von:
categorize
in SchnittstelleDocumentCategorizer
- Parameter:
text
- The text tokens to categorize.extraInformation
- Additional information for context to be used by the feature generator.- Gibt zurück:
- The per category probabilities.
-
categorize
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Categorizes the giventext
, provided in separate tokens.- Angegeben von:
categorize
in SchnittstelleDocumentCategorizer
- Parameter:
text
- The tokens of text to categorize.- Gibt zurück:
- The per category probabilities.
-
scoreMap
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves aMap
in which the key is the category name and the value is the score.- Angegeben von:
scoreMap
in SchnittstelleDocumentCategorizer
- Parameter:
text
- The tokenized input text to classify.- Gibt zurück:
- A
Map
with the score as a key.
-
sortedScoreMap
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves aSortedMap
of the scores sorted in ascending order, together with their associated categories.Many categories can have the same score, hence the
Set
as value.- Angegeben von:
sortedScoreMap
in SchnittstelleDocumentCategorizer
- Parameter:
text
- the input text to classify- Gibt zurück:
- A
SortedMap
with the score as a key.
-
getBestCategory
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves the best category from previously generatedoutcome
probabilities- Angegeben von:
getBestCategory
in SchnittstelleDocumentCategorizer
- Parameter:
outcome
- An array of computed outcome probabilities.- Gibt zurück:
- The best category represented as String.
-
getIndex
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves the index of a certain category.- Angegeben von:
getIndex
in SchnittstelleDocumentCategorizer
- Parameter:
category
- The category for which theindex
is to be found.- Gibt zurück:
- The index.
-
getCategory
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves the category at a givenindex
.- Angegeben von:
getCategory
in SchnittstelleDocumentCategorizer
- Parameter:
index
- The index for which thecategory
shall be found.- Gibt zurück:
- The category represented as String.
-
getNumberOfCategories
public int getNumberOfCategories()Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves the number of categories.- Angegeben von:
getNumberOfCategories
in SchnittstelleDocumentCategorizer
- Gibt zurück:
- The no. of categories.
-
getAllResults
Beschreibung aus Schnittstelle kopiert:DocumentCategorizer
Retrieves the name of the category associated with the given probabilities.- Angegeben von:
getAllResults
in SchnittstelleDocumentCategorizer
- Parameter:
results
- The probabilities of each category.- Gibt zurück:
- The name of the outcome.
-
train
public static DoccatModel train(String lang, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, DoccatFactory factory) throws IOException Starts a training of aDoccatModel
with the given parameters.- Parameter:
lang
- The ISO conform language code.samples
- TheObjectStream
ofDocumentSample
used as input for training.mlParams
- TheTrainingParameters
for the context of the training.factory
- TheDoccatFactory
for creating related objects defined viamlParams
.- Gibt zurück:
- A valid, trained
DoccatModel
instance. - Löst aus:
IOException
- Thrown if IO errors occurred.
-