public class NGramModel extends Object implements Iterable<StringList>
NGramModel
can be used to crate ngrams and character ngrams.StringList
Constructor and Description |
---|
NGramModel()
Initializes an empty instance.
|
NGramModel(InputStream in)
Initializes the current instance.
|
Modifier and Type | Method and Description |
---|---|
void |
add(CharSequence chars,
int minLength,
int maxLength)
Adds character NGrams to the current instance.
|
void |
add(StringList ngram)
Adds one NGram, if it already exists the count increase by one.
|
void |
add(StringList ngram,
int minLength,
int maxLength)
Adds NGrams up to the specified length to the current instance.
|
boolean |
contains(StringList tokens)
Checks fit he given tokens are contained by the current instance.
|
void |
cutoff(int cutoffUnder,
int cutoffOver)
Deletes all ngram which do appear less than the cutoffUnder value
and more often than the cutoffOver value.
|
boolean |
equals(Object obj) |
int |
getCount(StringList ngram)
Retrieves the count of the given ngram.
|
int |
hashCode() |
Iterator<StringList> |
iterator()
Retrieves an
Iterator over all StringList entries. |
int |
numberOfGrams()
Retrieves the total count of all Ngrams.
|
void |
remove(StringList tokens)
Removes the specified tokens form the NGram model, they are just dropped.
|
void |
serialize(OutputStream out)
Writes the ngram instance to the given
OutputStream . |
void |
setCount(StringList ngram,
int count)
Sets the count of an existing ngram.
|
int |
size()
Retrieves the number of
StringList entries in the current instance. |
Dictionary |
toDictionary()
Creates a dictionary which contain all
StringList which
are in the current NGramModel . |
Dictionary |
toDictionary(boolean caseSensitive)
Creates a dictionary which contains all
StringList s which
are in the current NGramModel . |
String |
toString() |
forEach, spliterator
public NGramModel()
public NGramModel(InputStream in) throws IOException
in
- the serialized model streamIOException
public int getCount(StringList ngram)
ngram
- an ngrampublic void setCount(StringList ngram, int count)
ngram
- count
- public void add(StringList ngram)
ngram
- public void add(StringList ngram, int minLength, int maxLength)
ngram
- the tokens to build the uni-grams, bi-grams, tri-grams, ..
from.minLength
- - minimal lengthmaxLength
- - maximal lengthpublic void add(CharSequence chars, int minLength, int maxLength)
chars
- minLength
- maxLength
- public void remove(StringList tokens)
tokens
- public boolean contains(StringList tokens)
tokens
- public int size()
StringList
entries in the current instance.public Iterator<StringList> iterator()
Iterator
over all StringList
entries.iterator
in interface Iterable<StringList>
public int numberOfGrams()
public void cutoff(int cutoffUnder, int cutoffOver)
cutoffUnder
- cutoffOver
- public Dictionary toDictionary()
StringList
which
are in the current NGramModel
.
Entries which are only different in the case are merged into one.
Calling this method is the same as calling toDictionary(boolean)
with true.public Dictionary toDictionary(boolean caseSensitive)
StringList
s which
are in the current NGramModel
.caseSensitive
- Specifies whether case distinctions should be kept
in the creation of the dictionary.public void serialize(OutputStream out) throws IOException
OutputStream
.out
- IOException
- if an I/O Error during writing occursCopyright © 2021 The Apache Software Foundation. All rights reserved.