opennlp.tools.doccat
Class DocumentCategorizerME

java.lang.Object
  extended by opennlp.tools.doccat.DocumentCategorizerME
All Implemented Interfaces:
DocumentCategorizer

public class DocumentCategorizerME
extends Object
implements DocumentCategorizer

Maxent implementation of DocumentCategorizer.


Constructor Summary
DocumentCategorizerME(DoccatModel model)
          Initializes the current instance with a doccat model.
DocumentCategorizerME(DoccatModel model, FeatureGenerator... featureGenerators)
          Initializes a the current instance with a doccat model and custom feature generation.
DocumentCategorizerME(opennlp.model.MaxentModel model)
          Deprecated. Use DocumentCategorizerME(DoccatModel) instead.
DocumentCategorizerME(opennlp.model.MaxentModel model, FeatureGenerator... featureGenerators)
          Deprecated. Use DocumentCategorizerME(DoccatModel, FeatureGenerator...) instead.
 
Method Summary
 double[] categorize(String documentText)
          Categorizes the given text.
 double[] categorize(String[] text)
          Categorizes the given text.
 String getAllResults(double[] results)
           
 String getBestCategory(double[] outcome)
           
 String getCategory(int index)
           
 int getIndex(String category)
           
 int getNumberOfCategories()
           
static opennlp.model.AbstractModel train(DocumentCategorizerEventStream eventStream)
          Deprecated. 
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples)
          Trains a doccat model with default feature generation.
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, int cutoff, int iterations)
          Trains a doccat model with default feature generation.
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, int cutoff, int iterations, FeatureGenerator... featureGenerators)
          Trains a document categorizer model with custom feature generation.
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, TrainingParameters mlParams, FeatureGenerator... featureGenerators)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentCategorizerME

public DocumentCategorizerME(DoccatModel model,
                             FeatureGenerator... featureGenerators)
Initializes a the current instance with a doccat model and custom feature generation. The feature generation must be identical to the configuration at training time.

Parameters:
model -
featureGenerators -

DocumentCategorizerME

public DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model. Default feature generation is used.

Parameters:
model -

DocumentCategorizerME

@Deprecated
public DocumentCategorizerME(opennlp.model.MaxentModel model)
Deprecated. Use DocumentCategorizerME(DoccatModel) instead.

Initializes the current instance with the given MaxentModel.

Parameters:
model -

DocumentCategorizerME

@Deprecated
public DocumentCategorizerME(opennlp.model.MaxentModel model,
                                        FeatureGenerator... featureGenerators)
Deprecated. Use DocumentCategorizerME(DoccatModel, FeatureGenerator...) instead.

Initializes the current instance with a the given MaxentModel and FeatureGenerators.

Parameters:
model -
featureGenerators -
Method Detail

categorize

public double[] categorize(String[] text)
Categorizes the given text.

Specified by:
categorize in interface DocumentCategorizer
Parameters:
text -

categorize

public double[] categorize(String documentText)
Categorizes the given text. The text is tokenized with the SimpleTokenizer before it is passed to the feature generation.

Specified by:
categorize in interface DocumentCategorizer

getBestCategory

public String getBestCategory(double[] outcome)
Specified by:
getBestCategory in interface DocumentCategorizer

getIndex

public int getIndex(String category)
Specified by:
getIndex in interface DocumentCategorizer

getCategory

public String getCategory(int index)
Specified by:
getCategory in interface DocumentCategorizer

getNumberOfCategories

public int getNumberOfCategories()
Specified by:
getNumberOfCategories in interface DocumentCategorizer

getAllResults

public String getAllResults(double[] results)
Specified by:
getAllResults in interface DocumentCategorizer

train

@Deprecated
public static opennlp.model.AbstractModel train(DocumentCategorizerEventStream eventStream)
                                         throws IOException
Deprecated. 

Trains a new model for the DocumentCategorizerME.

Parameters:
eventStream -
Returns:
the new model
Throws:
IOException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples,
                                TrainingParameters mlParams,
                                FeatureGenerator... featureGenerators)
                         throws IOException
Throws:
IOException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples,
                                int cutoff,
                                int iterations,
                                FeatureGenerator... featureGenerators)
                         throws IOException
Trains a document categorizer model with custom feature generation.

Parameters:
languageCode -
samples -
cutoff -
iterations -
featureGenerators -
Returns:
the trained doccat model
Throws:
IOException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples,
                                int cutoff,
                                int iterations)
                         throws IOException
Trains a doccat model with default feature generation.

Parameters:
languageCode -
samples -
Returns:
the trained doccat model
Throws:
IOException
ObjectStreamException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples)
                         throws IOException
Trains a doccat model with default feature generation.

Parameters:
languageCode -
samples -
Returns:
the trained doccat model
Throws:
IOException
ObjectStreamException


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.