Interface DocumentCategorizer

All Known Implementing Classes:
DocumentCategorizerME

public interface DocumentCategorizer
Interface for classes which categorize documents.
  • Method Details

    • categorize

      double[] categorize(String[] text, Map<String,Object> extraInformation)
      Categorizes the given text provided as tokens along with the provided extraInformation.
      Parameters:
      text - The tokens of text to categorize.
      extraInformation - The extra information used for this context.
      Returns:
      The per category probabilities.
    • categorize

      double[] categorize(String[] text)
      Categorizes the given text, provided in separate tokens.
      Parameters:
      text - The tokens of text to categorize.
      Returns:
      The per category probabilities.
    • getBestCategory

      String getBestCategory(double[] outcome)
      Retrieves the best category from previously generated outcome probabilities
      Parameters:
      outcome - An array of computed outcome probabilities.
      Returns:
      The best category represented as String.
    • getIndex

      int getIndex(String category)
      Retrieves the index of a certain category.
      Parameters:
      category - The category for which the index is to be found.
      Returns:
      The index.
    • getCategory

      String getCategory(int index)
      Retrieves the category at a given index.
      Parameters:
      index - The index for which the category shall be found.
      Returns:
      The category represented as String.
    • getNumberOfCategories

      int getNumberOfCategories()
      Retrieves the number of categories.
      Returns:
      The no. of categories.
    • getAllResults

      String getAllResults(double[] results)
      Retrieves the name of the category associated with the given probabilities.
      Parameters:
      results - The probabilities of each category.
      Returns:
      The name of the outcome.
    • scoreMap

      Map<String,Double> scoreMap(String[] text)
      Retrieves a Map in which the key is the category name and the value is the score.
      Parameters:
      text - The tokenized input text to classify.
      Returns:
      A Map with the score as a key.
    • sortedScoreMap

      SortedMap<Double,Set<String>> sortedScoreMap(String[] text)
      Retrieves a SortedMap of the scores sorted in ascending order, together with their associated categories.

      Many categories can have the same score, hence the Set as value.

      Parameters:
      text - the input text to classify
      Returns:
      A SortedMap with the score as a key.