Class DocumentCategorizerME

java.lang.Object
opennlp.tools.doccat.DocumentCategorizerME
All Implemented Interfaces:
opennlp.tools.doccat.DocumentCategorizer

public class DocumentCategorizerME extends Object implements opennlp.tools.doccat.DocumentCategorizer
A Max-Ent based implementation of DocumentCategorizer.
  • Constructor Details

    • DocumentCategorizerME

      public DocumentCategorizerME(DoccatModel model)
      Initializes a DocumentCategorizerME instance with a doccat model. Default feature generation is used.
      Parameters:
      model - the DoccatModel to be used for categorization.
  • Method Details

    • categorize

      public double[] categorize(String[] text, Map<String,Object> extraInformation)
      Categorize the given text provided as tokens along with the provided extra information.
      Specified by:
      categorize in interface opennlp.tools.doccat.DocumentCategorizer
      Parameters:
      text - The text tokens to categorize.
      extraInformation - Additional information for context to be used by the feature generator.
      Returns:
      The per category probabilities.
    • categorize

      public double[] categorize(String[] text)
      Specified by:
      categorize in interface opennlp.tools.doccat.DocumentCategorizer
    • scoreMap

      public Map<String,Double> scoreMap(String[] text)
      Specified by:
      scoreMap in interface opennlp.tools.doccat.DocumentCategorizer
    • sortedScoreMap

      public SortedMap<Double, Set<String>> sortedScoreMap(String[] text)
      Specified by:
      sortedScoreMap in interface opennlp.tools.doccat.DocumentCategorizer
    • getBestCategory

      public String getBestCategory(double[] outcome)
      Specified by:
      getBestCategory in interface opennlp.tools.doccat.DocumentCategorizer
    • getIndex

      public int getIndex(String category)
      Specified by:
      getIndex in interface opennlp.tools.doccat.DocumentCategorizer
    • getCategory

      public String getCategory(int index)
      Specified by:
      getCategory in interface opennlp.tools.doccat.DocumentCategorizer
    • getNumberOfCategories

      public int getNumberOfCategories()
      Specified by:
      getNumberOfCategories in interface opennlp.tools.doccat.DocumentCategorizer
    • getAllResults

      public String getAllResults(double[] results)
      Specified by:
      getAllResults in interface opennlp.tools.doccat.DocumentCategorizer
    • train

      public static DoccatModel train(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, opennlp.tools.util.TrainingParameters mlParams, DoccatFactory factory) throws IOException
      Starts a training of a DoccatModel with the given parameters.
      Parameters:
      lang - The ISO conform language code.
      samples - The ObjectStream of DocumentSample used as input for training.
      mlParams - The TrainingParameters for the context of the training.
      factory - The DoccatFactory for creating related objects defined via mlParams.
      Returns:
      A valid, trained DoccatModel instance.
      Throws:
      IOException - Thrown if IO errors occurred.