Package opennlp.tools.ngram
Klasse NGramUtils
java.lang.Object
opennlp.tools.ngram.NGramUtils
Utility class for ngrams.
Some methods apply specifically to certain 'n' values, for e.g. tri/bi/uni-grams.
-
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungstatic doublecalculateBigramMLProbability(String x0, String x1, Collection<StringList> set) calculate the probability of a bigram in a vocabulary using maximum likelihood estimationstatic doublecalculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k) calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithmstatic doublecalculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k) calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithmstatic doublecalculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set) calculate the probability of a ngram in a vocabulary using the missing probability mass algorithmstatic doublecalculateNgramMLProbability(StringList ngram, Iterable<StringList> set) calculate the probability of a ngram in a vocabulary using maximum likelihood estimationstatic doublecalculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3) calculate the probability of a trigram in a vocabulary using a linear interpolation algorithmstatic doublecalculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set) calculate the probability of a trigram in a vocabulary using maximum likelihood estimationstatic doublecalculateUnigramMLProbability(String word, Collection<StringList> set) calculate the probability of a unigram in a vocabulary using maximum likelihood estimationstatic Collection<String[]> Get the ngrams of dimension n of a certain input sequence of tokens.static Collection<StringList> getNGrams(StringList sequence, int size) Get the ngrams of dimension n of a certain input sequence of tokens.static StringListgetNMinusOneTokenFirst(StringList ngram) get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngramstatic StringListgetNMinusOneTokenLast(StringList ngram) get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram
-
Konstruktordetails
-
NGramUtils
public NGramUtils()
-
-
Methodendetails
-
calculateLaplaceSmoothingProbability
public static double calculateLaplaceSmoothingProbability(StringList ngram, Iterable<StringList> set, Double k) calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm- Parameter:
ngram- the ngram to get the probability forset- the vocabularyk- the smoothing factor- Gibt zurück:
- the Laplace smoothing probability
- Siehe auch:
-
calculateUnigramMLProbability
calculate the probability of a unigram in a vocabulary using maximum likelihood estimation- Parameter:
word- the only word in the unigramset- the vocabulary- Gibt zurück:
- the maximum likelihood probability
-
calculateBigramMLProbability
calculate the probability of a bigram in a vocabulary using maximum likelihood estimation- Parameter:
x0- first word in the bigramx1- second word in the bigramset- the vocabulary- Gibt zurück:
- the maximum likelihood probability
-
calculateTrigramMLProbability
public static double calculateTrigramMLProbability(String x0, String x1, String x2, Iterable<StringList> set) calculate the probability of a trigram in a vocabulary using maximum likelihood estimation- Parameter:
x0- first word in the trigramx1- second word in the trigramx2- third word in the trigramset- the vocabulary- Gibt zurück:
- the maximum likelihood probability
-
calculateNgramMLProbability
calculate the probability of a ngram in a vocabulary using maximum likelihood estimation- Parameter:
ngram- a ngramset- the vocabulary- Gibt zurück:
- the maximum likelihood probability
-
calculateBigramPriorSmoothingProbability
public static double calculateBigramPriorSmoothingProbability(String x0, String x1, Collection<StringList> set, Double k) calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm- Parameter:
x0- the first word in the bigramx1- the second word in the bigramset- the vocabularyk- the smoothing factor- Gibt zurück:
- the prior Laplace smoothing probability
-
calculateTrigramLinearInterpolationProbability
public static double calculateTrigramLinearInterpolationProbability(String x0, String x1, String x2, Collection<StringList> set, Double lambda1, Double lambda2, Double lambda3) calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm- Parameter:
x0- the first word in the trigramx1- the second word in the trigramx2- the third word in the trigramset- the vocabularylambda1- trigram interpolation factorlambda2- bigram interpolation factorlambda3- unigram interpolation factor- Gibt zurück:
- the linear interpolation probability
-
calculateMissingNgramProbabilityMass
public static double calculateMissingNgramProbabilityMass(StringList ngram, double discount, Iterable<StringList> set) calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm- Parameter:
ngram- the ngramdiscount- discount factorset- the vocabulary- Gibt zurück:
- the probability
-
getNMinusOneTokenFirst
get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram- Parameter:
ngram- a ngram- Gibt zurück:
- a ngram
-
getNMinusOneTokenLast
get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram- Parameter:
ngram- a ngram- Gibt zurück:
- a ngram
-
getNGrams
Get the ngrams of dimension n of a certain input sequence of tokens.- Parameter:
sequence- a sequence of tokenssize- the size of the resulting ngrmams- Gibt zurück:
- all the possible ngrams of the given size derivable from the input sequence
-
getNGrams
Get the ngrams of dimension n of a certain input sequence of tokens.- Parameter:
sequence- a sequence of tokenssize- the size of the resulting ngrmams- Gibt zurück:
- all the possible ngrams of the given size derivable from the input sequence
-