Interface DocumentCategorizer

  • All Known Implementing Classes:
    DocumentCategorizerME

    public interface DocumentCategorizer
    Interface for classes which categorize documents.
    • Method Detail

      • categorize

        double[] categorize​(String[] text,
                            Map<String,​Object> extraInformation)
        Categorize the given text provided as tokens along with the provided extra information
        Parameters:
        text - the tokens of text to categorize
        extraInformation - extra information
        Returns:
        per category probabilities
      • categorize

        double[] categorize​(String[] text)
        Categorizes the given text, provided in separate tokens.
        Parameters:
        text - the tokens of text to categorize
        Returns:
        per category probabilities
      • getBestCategory

        String getBestCategory​(double[] outcome)
        get the best category from previously generated outcome probabilities
        Parameters:
        outcome - a vector of outcome probabilities
        Returns:
        the best category String
      • getIndex

        int getIndex​(String category)
        get the index of a certain category
        Parameters:
        category - the category
        Returns:
        an index
      • getCategory

        String getCategory​(int index)
        get the category at a given index
        Parameters:
        index - the index
        Returns:
        a category
      • getNumberOfCategories

        int getNumberOfCategories()
        get the number of categories
        Returns:
        the no. of categories
      • getAllResults

        String getAllResults​(double[] results)
        get the name of the category associated with the given probabilties
        Parameters:
        results - the probabilities of each category
        Returns:
        the name of the outcome
      • scoreMap

        Map<String,​Double> scoreMap​(String[] text)
        Returns a map in which the key is the category name and the value is the score
        Parameters:
        text - the input text to classify
        Returns:
        a map with the score as a key. The value is a Set of categories with the score.
      • sortedScoreMap

        SortedMap<Double,​Set<String>> sortedScoreMap​(String[] text)
        Get a map of the scores sorted in ascending aorder together with their associated categories. Many categories can have the same score, hence the Set as value
        Parameters:
        text - the input text to classify
        Returns:
        a map with the score as a key. The value is a Set of categories with the score.