java.lang.Object

opennlp.tools.ngram.NGramUtils

public class NGramUtils extends Object

Utility class for ngrams. Some methods apply specifically to certain 'n' values, for e.g. tri/bi/uni-grams.

Konstruktorübersicht

Konstruktoren

Konstruktor

Beschreibung

NGramUtils()
Methodenübersicht

Modifizierer und Typ

Methode

Beschreibung

static double

calculateBigramMLProbability(String x0, String x1, Collection<StringList> set)

calculate the probability of a bigram in a vocabulary using maximum likelihood estimation

static double

calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k)

calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm

static double

calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k)

calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm

static double

calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set)

calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm

static double

calculateNgramMLProbability(StringList ngram, Iterable<StringList> set)

calculate the probability of a ngram in a vocabulary using maximum likelihood estimation

static double

calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3)

calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm

static double

calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set)

calculate the probability of a trigram in a vocabulary using maximum likelihood estimation

static double

calculateUnigramMLProbability(String word, Collection<StringList> set)

calculate the probability of a unigram in a vocabulary using maximum likelihood estimation

static Collection<String[]>

getNGrams(String[] sequence, int size)

Get the ngrams of dimension n of a certain input sequence of tokens.

static Collection<StringList>

getNGrams(StringList sequence, int size)

Get the ngrams of dimension n of a certain input sequence of tokens.

static StringList

getNMinusOneTokenFirst(StringList ngram)

get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram

static StringList

getNMinusOneTokenLast(StringList ngram)

get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram

Von Klasse geerbte Methoden java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Konstruktordetails
- NGramUtils
  
  public NGramUtils()
Methodendetails
- calculateLaplaceSmoothingProbability
  
  public static double calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k)
  
  calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm
  Parameter:
  
  ngram - the ngram to get the probability for
  
  set - the vocabulary
  
  k - the smoothing factor
  
  Gibt zurück:
  
  the Laplace smoothing probability
  
  Siehe auch:
  
  Additive Smoothing
- calculateUnigramMLProbability
  
  public static double calculateUnigramMLProbability(String word, Collection<StringList> set)
  
  calculate the probability of a unigram in a vocabulary using maximum likelihood estimation
  
  Parameter:
  
  word - the only word in the unigram
  
  set - the vocabulary
  
  Gibt zurück:
  
  the maximum likelihood probability
- calculateBigramMLProbability
  
  public static double calculateBigramMLProbability(String x0, String x1, Collection<StringList> set)
  
  calculate the probability of a bigram in a vocabulary using maximum likelihood estimation
  
  Parameter:
  
  x0 - first word in the bigram
  
  x1 - second word in the bigram
  
  set - the vocabulary
  
  Gibt zurück:
  
  the maximum likelihood probability
- calculateTrigramMLProbability
  
  public static double calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set)
  
  calculate the probability of a trigram in a vocabulary using maximum likelihood estimation
  
  Parameter:
  
  x0 - first word in the trigram
  
  x1 - second word in the trigram
  
  x2 - third word in the trigram
  
  set - the vocabulary
  
  Gibt zurück:
  
  the maximum likelihood probability
- calculateNgramMLProbability
  
  public static double calculateNgramMLProbability(StringList ngram, Iterable<StringList> set)
  
  calculate the probability of a ngram in a vocabulary using maximum likelihood estimation
  
  Parameter:
  
  ngram - a ngram
  
  set - the vocabulary
  
  Gibt zurück:
  
  the maximum likelihood probability
- calculateBigramPriorSmoothingProbability
  
  public static double calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k)
  
  calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm
  
  Parameter:
  
  x0 - the first word in the bigram
  
  x1 - the second word in the bigram
  
  set - the vocabulary
  
  k - the smoothing factor
  
  Gibt zurück:
  
  the prior Laplace smoothing probability
- calculateTrigramLinearInterpolationProbability
  
  public static double calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3)
  
  calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm
  
  Parameter:
  
  x0 - the first word in the trigram
  
  x1 - the second word in the trigram
  
  x2 - the third word in the trigram
  
  set - the vocabulary
  
  lambda1 - trigram interpolation factor
  
  lambda2 - bigram interpolation factor
  
  lambda3 - unigram interpolation factor
  
  Gibt zurück:
  
  the linear interpolation probability
- calculateMissingNgramProbabilityMass
  
  public static double calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set)
  
  calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm
  
  Parameter:
  
  ngram - the ngram
  
  discount - discount factor
  
  set - the vocabulary
  
  Gibt zurück:
  
  the probability
- getNMinusOneTokenFirst
  
  public static StringList getNMinusOneTokenFirst(StringList ngram)
  
  get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram
  
  Parameter:
  
  ngram - a ngram
  
  Gibt zurück:
  
  a ngram
- getNMinusOneTokenLast
  
  public static StringList getNMinusOneTokenLast(StringList ngram)
  
  get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram
  
  Parameter:
  
  ngram - a ngram
  
  Gibt zurück:
  
  a ngram
- getNGrams
  
  public static Collection<StringList> getNGrams(StringList sequence, int size)
  
  Get the ngrams of dimension n of a certain input sequence of tokens.
  
  Parameter:
  
  sequence - a sequence of tokens
  
  size - the size of the resulting ngrmams
  
  Gibt zurück:
  
  all the possible ngrams of the given size derivable from the input sequence
- getNGrams
  
  public static Collection<String[]> getNGrams(String[] sequence, int size)
  
  Get the ngrams of dimension n of a certain input sequence of tokens.
  
  Parameter:
  
  sequence - a sequence of tokens
  
  size - the size of the resulting ngrmams
  
  Gibt zurück:
  
  all the possible ngrams of the given size derivable from the input sequence

Klasse NGramUtils

Konstruktorübersicht

Methodenübersicht

Von Klasse geerbte Methoden java.lang.Object

Konstruktordetails

NGramUtils

Methodendetails

calculateLaplaceSmoothingProbability

calculateUnigramMLProbability

calculateBigramMLProbability

calculateTrigramMLProbability

calculateNgramMLProbability

calculateBigramPriorSmoothingProbability

calculateTrigramLinearInterpolationProbability

calculateMissingNgramProbabilityMass

getNMinusOneTokenFirst

getNMinusOneTokenLast

getNGrams

getNGrams