Interface DocumentCategorizer

  • All Known Implementing Classes:
    DocumentCategorizerME

    public interface DocumentCategorizer
    Interface for classes which categorize documents.
    • Method Detail

      • categorize

        double[] categorize​(String[] text,
                            Map<String,​Object> extraInformation)
        Categorizes the given text provided as tokens along with the provided extraInformation.
        Parameters:
        text - The tokens of text to categorize.
        extraInformation - The extra information used for this context.
        Returns:
        The per category probabilities.
      • categorize

        double[] categorize​(String[] text)
        Categorizes the given text, provided in separate tokens.
        Parameters:
        text - The tokens of text to categorize.
        Returns:
        The per category probabilities.
      • getBestCategory

        String getBestCategory​(double[] outcome)
        Retrieves the best category from previously generated outcome probabilities
        Parameters:
        outcome - An array of computed outcome probabilities.
        Returns:
        The best category represented as String.
      • getIndex

        int getIndex​(String category)
        Retrieves the index of a certain category.
        Parameters:
        category - The category for which the index is to be found.
        Returns:
        The index.
      • getCategory

        String getCategory​(int index)
        Retrieves the category at a given index.
        Parameters:
        index - The index for which the category shall be found.
        Returns:
        The category represented as String.
      • getNumberOfCategories

        int getNumberOfCategories()
        Retrieves the number of categories.
        Returns:
        The no. of categories.
      • getAllResults

        String getAllResults​(double[] results)
        Retrieves the name of the category associated with the given probabilities.
        Parameters:
        results - The probabilities of each category.
        Returns:
        The name of the outcome.
      • scoreMap

        Map<String,​Double> scoreMap​(String[] text)
        Retrieves a Map in which the key is the category name and the value is the score.
        Parameters:
        text - The tokenized input text to classify.
        Returns:
        A Map with the score as a key.
      • sortedScoreMap

        SortedMap<Double,​Set<String>> sortedScoreMap​(String[] text)
        Retrieves a SortedMap of the scores sorted in ascending order, together with their associated categories.

        Many categories can have the same score, hence the Set as value.

        Parameters:
        text - the input text to classify
        Returns:
        A SortedMap with the score as a key.