Package opennlp.tools.ngram
Class NGramModel
java.lang.Object
opennlp.tools.ngram.NGramModel
- All Implemented Interfaces:
- Iterable<StringList>
- Direct Known Subclasses:
- NGramLanguageModel
The 
NGramModel can be used to crate ngrams and character ngrams.- See Also:
- 
Constructor SummaryConstructorsConstructorDescriptionInstantiates an emptyNGramModelinstance.Instantiates aNGramModelvia anInputStreamreference.
- 
Method SummaryModifier and TypeMethodDescriptionvoidadd(CharSequence chars, int minLength, int maxLength) Adds character NGrams to the current instance.voidadd(StringList ngram) Adds one NGram, if it already exists the count increase by one.voidadd(StringList ngram, int minLength, int maxLength) Adds NGrams up to the specified length to the current instance.booleancontains(StringList tokens) Checks fit he given tokens are contained by the current instance.voidcutoff(int cutoffUnder, int cutoffOver) Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.booleanintgetCount(StringList ngram) Retrieves the count of the given ngram.inthashCode()iterator()Retrieves anIteratorover allStringListentries.intRetrieves the total count of all Ngrams.voidremove(StringList tokens) Removes the specified tokens form the NGram model, they are just dropped.voidserialize(OutputStream out) Writes the ngram instance to the givenOutputStream.voidsetCount(StringList ngram, int count) Sets the count of an existing ngram.intsize()Retrieves the number ofStringListentries in the current instance.Creates a dictionary which contain allStringListwhich are in the currentNGramModel.toDictionary(boolean caseSensitive) Creates a dictionary which contains allStringLists which are in the currentNGramModel.toString()Methods inherited from interface java.lang.IterableforEach, spliterator
- 
Constructor Details- 
NGramModelpublic NGramModel()Instantiates an emptyNGramModelinstance.
- 
NGramModelInstantiates aNGramModelvia anInputStreamreference.- Parameters:
- in- the serialized model stream
- Throws:
- IOException- Thrown if errors occurred reading from- in.
 
 
- 
- 
Method Details- 
getCountRetrieves the count of the given ngram.- Parameters:
- ngram- an ngram
- Returns:
- count of the ngram or 0 if it is not contained
 
- 
setCountSets the count of an existing ngram.- Parameters:
- ngram-
- count-
 
- 
addAdds one NGram, if it already exists the count increase by one.- Parameters:
- ngram-
 
- 
addAdds NGrams up to the specified length to the current instance.- Parameters:
- ngram- the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
- minLength- - minimal length
- maxLength- - maximal length
 
- 
addAdds character NGrams to the current instance.- Parameters:
- chars-
- minLength-
- maxLength-
 
- 
removeRemoves the specified tokens form the NGram model, they are just dropped.- Parameters:
- tokens-
 
- 
containsChecks fit he given tokens are contained by the current instance.- Parameters:
- tokens-
- Returns:
- true if the ngram is contained
 
- 
sizepublic int size()Retrieves the number ofStringListentries in the current instance.- Returns:
- number of different grams
 
- 
iteratorRetrieves anIteratorover allStringListentries.- Specified by:
- iteratorin interface- Iterable<StringList>
- Returns:
- iterator over all grams
 
- 
numberOfGramspublic int numberOfGrams()Retrieves the total count of all Ngrams.- Returns:
- total count of all ngrams
 
- 
cutoffpublic void cutoff(int cutoffUnder, int cutoffOver) Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.- Parameters:
- cutoffUnder-
- cutoffOver-
 
- 
toDictionaryCreates a dictionary which contain allStringListwhich are in the currentNGramModel.Entries which are only different in the case are merged into one. Calling this method is the same as calling toDictionary(boolean)with true.- Returns:
- a dictionary of the ngrams
 
- 
toDictionaryCreates a dictionary which contains allStringLists which are in the currentNGramModel.- Parameters:
- caseSensitive- Specifies whether case distinctions should be kept in the creation of the dictionary.
- Returns:
- a dictionary of the ngrams
 
- 
serializeWrites the ngram instance to the givenOutputStream.- Parameters:
- out-
- Throws:
- IOException- if an I/O Error during writing occurs
 
- 
equals
- 
toString
- 
hashCodepublic int hashCode()
 
-