Package opennlp.tools.ngram
Class NGramModel
java.lang.Object
opennlp.tools.ngram.NGramModel
- All Implemented Interfaces:
Iterable<StringList>
- Direct Known Subclasses:
NGramLanguageModel
The
NGramModel
can be used to crate ngrams and character ngrams.- See Also:
-
Constructor Summary
ConstructorDescriptionInstantiates an emptyNGramModel
instance.Instantiates aNGramModel
via anInputStream
reference. -
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(CharSequence chars, int minLength, int maxLength) Adds character NGrams to the current instance.void
add
(StringList ngram) Adds one NGram, if it already exists the count increase by one.void
add
(StringList ngram, int minLength, int maxLength) Adds NGrams up to the specified length to the current instance.boolean
contains
(StringList tokens) Checks fit he given tokens are contained by the current instance.void
cutoff
(int cutoffUnder, int cutoffOver) Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.boolean
int
getCount
(StringList ngram) Retrieves the count of the given ngram.int
hashCode()
iterator()
Retrieves anIterator
over allStringList
entries.int
Retrieves the total count of all Ngrams.void
remove
(StringList tokens) Removes the specified tokens form the NGram model, they are just dropped.void
serialize
(OutputStream out) Writes the ngram instance to the givenOutputStream
.void
setCount
(StringList ngram, int count) Sets the count of an existing ngram.int
size()
Retrieves the number ofStringList
entries in the current instance.Creates a dictionary which contain allStringList
which are in the currentNGramModel
.toDictionary
(boolean caseSensitive) Creates a dictionary which contains allStringList
s which are in the currentNGramModel
.toString()
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
NGramModel
public NGramModel()Instantiates an emptyNGramModel
instance. -
NGramModel
Instantiates aNGramModel
via anInputStream
reference.- Parameters:
in
- the serialized model stream- Throws:
IOException
- Thrown if errors occurred reading fromin
.
-
-
Method Details
-
getCount
Retrieves the count of the given ngram.- Parameters:
ngram
- an ngram- Returns:
- count of the ngram or 0 if it is not contained
-
setCount
Sets the count of an existing ngram.- Parameters:
ngram
-count
-
-
add
Adds one NGram, if it already exists the count increase by one.- Parameters:
ngram
-
-
add
Adds NGrams up to the specified length to the current instance.- Parameters:
ngram
- the tokens to build the uni-grams, bi-grams, tri-grams, .. from.minLength
- - minimal lengthmaxLength
- - maximal length
-
add
Adds character NGrams to the current instance.- Parameters:
chars
-minLength
-maxLength
-
-
remove
Removes the specified tokens form the NGram model, they are just dropped.- Parameters:
tokens
-
-
contains
Checks fit he given tokens are contained by the current instance.- Parameters:
tokens
-- Returns:
- true if the ngram is contained
-
size
public int size()Retrieves the number ofStringList
entries in the current instance.- Returns:
- number of different grams
-
iterator
Retrieves anIterator
over allStringList
entries.- Specified by:
iterator
in interfaceIterable<StringList>
- Returns:
- iterator over all grams
-
numberOfGrams
public int numberOfGrams()Retrieves the total count of all Ngrams.- Returns:
- total count of all ngrams
-
cutoff
public void cutoff(int cutoffUnder, int cutoffOver) Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.- Parameters:
cutoffUnder
-cutoffOver
-
-
toDictionary
Creates a dictionary which contain allStringList
which are in the currentNGramModel
.Entries which are only different in the case are merged into one.
Calling this method is the same as calling
toDictionary(boolean)
with true.- Returns:
- a dictionary of the ngrams
-
toDictionary
Creates a dictionary which contains allStringList
s which are in the currentNGramModel
.- Parameters:
caseSensitive
- Specifies whether case distinctions should be kept in the creation of the dictionary.- Returns:
- a dictionary of the ngrams
-
serialize
Writes the ngram instance to the givenOutputStream
.- Parameters:
out
-- Throws:
IOException
- if an I/O Error during writing occurs
-
equals
-
toString
-
hashCode
public int hashCode()
-