java.lang.Object

opennlp.tools.ngram.NGramUtils

public class NGramUtils extends Object

Utility class for ngrams. Some methods apply specifically to certain 'n' values, for e.g. tri/bi/uni-grams.

Constructor Summary

Constructors

Constructor

Description

NGramUtils()
Method Summary

Modifier and Type

Method

Description

static double

calculateBigramMLProbability(String x0, String x1, Collection<StringList> set)

calculate the probability of a bigram in a vocabulary using maximum likelihood estimation

static double

calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k)

calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm

static double

calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k)

calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm

static double

calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set)

calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm

static double

calculateNgramMLProbability(StringList ngram, Iterable<StringList> set)

calculate the probability of a ngram in a vocabulary using maximum likelihood estimation

static double

calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3)

calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm

static double

calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set)

calculate the probability of a trigram in a vocabulary using maximum likelihood estimation

static double

calculateUnigramMLProbability(String word, Collection<StringList> set)

calculate the probability of a unigram in a vocabulary using maximum likelihood estimation

static Collection<String[]>

getNGrams(String[] sequence, int size)

Get the ngrams of dimension n of a certain input sequence of tokens.

static Collection<StringList>

getNGrams(StringList sequence, int size)

Get the ngrams of dimension n of a certain input sequence of tokens.

static StringList

getNMinusOneTokenFirst(StringList ngram)

get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram

static StringList

getNMinusOneTokenLast(StringList ngram)

get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- NGramUtils
  
  public NGramUtils()
Method Details
- calculateLaplaceSmoothingProbability
  
  public static double calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k)
  
  calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm
  Parameters:
  
  ngram - the ngram to get the probability for
  
  set - the vocabulary
  
  k - the smoothing factor
  
  Returns:
  
  the Laplace smoothing probability
  
  See Also:
  
  Additive Smoothing
- calculateUnigramMLProbability
  
  public static double calculateUnigramMLProbability(String word, Collection<StringList> set)
  
  calculate the probability of a unigram in a vocabulary using maximum likelihood estimation
  
  Parameters:
  
  word - the only word in the unigram
  
  set - the vocabulary
  
  Returns:
  
  the maximum likelihood probability
- calculateBigramMLProbability
  
  public static double calculateBigramMLProbability(String x0, String x1, Collection<StringList> set)
  
  calculate the probability of a bigram in a vocabulary using maximum likelihood estimation
  
  Parameters:
  
  x0 - first word in the bigram
  
  x1 - second word in the bigram
  
  set - the vocabulary
  
  Returns:
  
  the maximum likelihood probability
- calculateTrigramMLProbability
  
  public static double calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set)
  
  calculate the probability of a trigram in a vocabulary using maximum likelihood estimation
  
  Parameters:
  
  x0 - first word in the trigram
  
  x1 - second word in the trigram
  
  x2 - third word in the trigram
  
  set - the vocabulary
  
  Returns:
  
  the maximum likelihood probability
- calculateNgramMLProbability
  
  public static double calculateNgramMLProbability(StringList ngram, Iterable<StringList> set)
  
  calculate the probability of a ngram in a vocabulary using maximum likelihood estimation
  
  Parameters:
  
  ngram - a ngram
  
  set - the vocabulary
  
  Returns:
  
  the maximum likelihood probability
- calculateBigramPriorSmoothingProbability
  
  public static double calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k)
  
  calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm
  
  Parameters:
  
  x0 - the first word in the bigram
  
  x1 - the second word in the bigram
  
  set - the vocabulary
  
  k - the smoothing factor
  
  Returns:
  
  the prior Laplace smoothing probability
- calculateTrigramLinearInterpolationProbability
  
  public static double calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3)
  
  calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm
  
  Parameters:
  
  x0 - the first word in the trigram
  
  x1 - the second word in the trigram
  
  x2 - the third word in the trigram
  
  set - the vocabulary
  
  lambda1 - trigram interpolation factor
  
  lambda2 - bigram interpolation factor
  
  lambda3 - unigram interpolation factor
  
  Returns:
  
  the linear interpolation probability
- calculateMissingNgramProbabilityMass
  
  public static double calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set)
  
  calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm
  
  Parameters:
  
  ngram - the ngram
  
  discount - discount factor
  
  set - the vocabulary
  
  Returns:
  
  the probability
- getNMinusOneTokenFirst
  
  public static StringList getNMinusOneTokenFirst(StringList ngram)
  
  get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram
  
  Parameters:
  
  ngram - a ngram
  
  Returns:
  
  a ngram
- getNMinusOneTokenLast
  
  public static StringList getNMinusOneTokenLast(StringList ngram)
  
  get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram
  
  Parameters:
  
  ngram - a ngram
  
  Returns:
  
  a ngram
- getNGrams
  
  public static Collection<StringList> getNGrams(StringList sequence, int size)
  
  Get the ngrams of dimension n of a certain input sequence of tokens.
  
  Parameters:
  
  sequence - a sequence of tokens
  
  size - the size of the resulting ngrmams
  
  Returns:
  
  all the possible ngrams of the given size derivable from the input sequence
- getNGrams
  
  public static Collection<String[]> getNGrams(String[] sequence, int size)
  
  Get the ngrams of dimension n of a certain input sequence of tokens.
  
  Parameters:
  
  sequence - a sequence of tokens
  
  size - the size of the resulting ngrmams
  
  Returns:
  
  all the possible ngrams of the given size derivable from the input sequence

Class NGramUtils

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

NGramUtils

Method Details

calculateLaplaceSmoothingProbability

calculateUnigramMLProbability

calculateBigramMLProbability

calculateTrigramMLProbability

calculateNgramMLProbability

calculateBigramPriorSmoothingProbability

calculateTrigramLinearInterpolationProbability

calculateMissingNgramProbabilityMass

getNMinusOneTokenFirst

getNMinusOneTokenLast

getNGrams

getNGrams