Klasse NGramUtils

java.lang.Object
opennlp.tools.ngram.NGramUtils

public class NGramUtils extends Object
Utility class for ngrams. Some methods apply specifically to certain 'n' values, for e.g. tri/bi/uni-grams.
  • Konstruktordetails

    • NGramUtils

      public NGramUtils()
  • Methodendetails

    • calculateLaplaceSmoothingProbability

      public static double calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k)
      calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm
      Parameter:
      ngram - the ngram to get the probability for
      set - the vocabulary
      k - the smoothing factor
      Gibt zurück:
      the Laplace smoothing probability
      Siehe auch:
    • calculateUnigramMLProbability

      public static double calculateUnigramMLProbability(String word, Collection<StringList> set)
      calculate the probability of a unigram in a vocabulary using maximum likelihood estimation
      Parameter:
      word - the only word in the unigram
      set - the vocabulary
      Gibt zurück:
      the maximum likelihood probability
    • calculateBigramMLProbability

      public static double calculateBigramMLProbability(String x0, String x1, Collection<StringList> set)
      calculate the probability of a bigram in a vocabulary using maximum likelihood estimation
      Parameter:
      x0 - first word in the bigram
      x1 - second word in the bigram
      set - the vocabulary
      Gibt zurück:
      the maximum likelihood probability
    • calculateTrigramMLProbability

      public static double calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set)
      calculate the probability of a trigram in a vocabulary using maximum likelihood estimation
      Parameter:
      x0 - first word in the trigram
      x1 - second word in the trigram
      x2 - third word in the trigram
      set - the vocabulary
      Gibt zurück:
      the maximum likelihood probability
    • calculateNgramMLProbability

      public static double calculateNgramMLProbability(StringList ngram, Iterable<StringList> set)
      calculate the probability of a ngram in a vocabulary using maximum likelihood estimation
      Parameter:
      ngram - a ngram
      set - the vocabulary
      Gibt zurück:
      the maximum likelihood probability
    • calculateBigramPriorSmoothingProbability

      public static double calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k)
      calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm
      Parameter:
      x0 - the first word in the bigram
      x1 - the second word in the bigram
      set - the vocabulary
      k - the smoothing factor
      Gibt zurück:
      the prior Laplace smoothing probability
    • calculateTrigramLinearInterpolationProbability

      public static double calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3)
      calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm
      Parameter:
      x0 - the first word in the trigram
      x1 - the second word in the trigram
      x2 - the third word in the trigram
      set - the vocabulary
      lambda1 - trigram interpolation factor
      lambda2 - bigram interpolation factor
      lambda3 - unigram interpolation factor
      Gibt zurück:
      the linear interpolation probability
    • calculateMissingNgramProbabilityMass

      public static double calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set)
      calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm
      Parameter:
      ngram - the ngram
      discount - discount factor
      set - the vocabulary
      Gibt zurück:
      the probability
    • getNMinusOneTokenFirst

      public static StringList getNMinusOneTokenFirst(StringList ngram)
      get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram
      Parameter:
      ngram - a ngram
      Gibt zurück:
      a ngram
    • getNMinusOneTokenLast

      public static StringList getNMinusOneTokenLast(StringList ngram)
      get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram
      Parameter:
      ngram - a ngram
      Gibt zurück:
      a ngram
    • getNGrams

      public static Collection<StringList> getNGrams(StringList sequence, int size)
      Get the ngrams of dimension n of a certain input sequence of tokens.
      Parameter:
      sequence - a sequence of tokens
      size - the size of the resulting ngrmams
      Gibt zurück:
      all the possible ngrams of the given size derivable from the input sequence
    • getNGrams

      public static Collection<String[]> getNGrams(String[] sequence, int size)
      Get the ngrams of dimension n of a certain input sequence of tokens.
      Parameter:
      sequence - a sequence of tokens
      size - the size of the resulting ngrmams
      Gibt zurück:
      all the possible ngrams of the given size derivable from the input sequence