Class NGramModel

java.lang.Object
opennlp.tools.ngram.NGramModel
All Implemented Interfaces:
Iterable<StringList>
Direct Known Subclasses:
NGramLanguageModel

public class NGramModel extends Object implements Iterable<StringList>
The NGramModel can be used to crate ngrams and character ngrams.
See Also:
  • Constructor Details

    • NGramModel

      public NGramModel()
      Initializes an empty instance.
    • NGramModel

      public NGramModel(InputStream in) throws IOException
      Initializes the current instance.
      Parameters:
      in - the serialized model stream
      Throws:
      IOException
  • Method Details

    • getCount

      public int getCount(StringList ngram)
      Retrieves the count of the given ngram.
      Parameters:
      ngram - an ngram
      Returns:
      count of the ngram or 0 if it is not contained
    • setCount

      public void setCount(StringList ngram, int count)
      Sets the count of an existing ngram.
      Parameters:
      ngram -
      count -
    • add

      public void add(StringList ngram)
      Adds one NGram, if it already exists the count increase by one.
      Parameters:
      ngram -
    • add

      public void add(StringList ngram, int minLength, int maxLength)
      Adds NGrams up to the specified length to the current instance.
      Parameters:
      ngram - the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
      minLength - - minimal length
      maxLength - - maximal length
    • add

      public void add(CharSequence chars, int minLength, int maxLength)
      Adds character NGrams to the current instance.
      Parameters:
      chars -
      minLength -
      maxLength -
    • remove

      public void remove(StringList tokens)
      Removes the specified tokens form the NGram model, they are just dropped.
      Parameters:
      tokens -
    • contains

      public boolean contains(StringList tokens)
      Checks fit he given tokens are contained by the current instance.
      Parameters:
      tokens -
      Returns:
      true if the ngram is contained
    • size

      public int size()
      Retrieves the number of StringList entries in the current instance.
      Returns:
      number of different grams
    • iterator

      public Iterator<StringList> iterator()
      Retrieves an Iterator over all StringList entries.
      Specified by:
      iterator in interface Iterable<StringList>
      Returns:
      iterator over all grams
    • numberOfGrams

      public int numberOfGrams()
      Retrieves the total count of all Ngrams.
      Returns:
      total count of all ngrams
    • cutoff

      public void cutoff(int cutoffUnder, int cutoffOver)
      Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
      Parameters:
      cutoffUnder -
      cutoffOver -
    • toDictionary

      public Dictionary toDictionary()
      Creates a dictionary which contain all StringList which are in the current NGramModel.

      Entries which are only different in the case are merged into one.

      Calling this method is the same as calling toDictionary(boolean) with true.

      Returns:
      a dictionary of the ngrams
    • toDictionary

      public Dictionary toDictionary(boolean caseSensitive)
      Creates a dictionary which contains all StringLists which are in the current NGramModel.
      Parameters:
      caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.
      Returns:
      a dictionary of the ngrams
    • serialize

      public void serialize(OutputStream out) throws IOException
      Writes the ngram instance to the given OutputStream.
      Parameters:
      out -
      Throws:
      IOException - if an I/O Error during writing occurs
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object