Class NGramModel

    • Constructor Detail

      • NGramModel

        public NGramModel()
        Initializes an empty instance.
      • NGramModel

        public NGramModel​(InputStream in)
                   throws IOException
        Initializes the current instance.
        Parameters:
        in - the serialized model stream
        Throws:
        IOException
    • Method Detail

      • getCount

        public int getCount​(StringList ngram)
        Retrieves the count of the given ngram.
        Parameters:
        ngram - an ngram
        Returns:
        count of the ngram or 0 if it is not contained
      • setCount

        public void setCount​(StringList ngram,
                             int count)
        Sets the count of an existing ngram.
        Parameters:
        ngram -
        count -
      • add

        public void add​(StringList ngram)
        Adds one NGram, if it already exists the count increase by one.
        Parameters:
        ngram -
      • add

        public void add​(StringList ngram,
                        int minLength,
                        int maxLength)
        Adds NGrams up to the specified length to the current instance.
        Parameters:
        ngram - the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
        minLength - - minimal length
        maxLength - - maximal length
      • add

        public void add​(CharSequence chars,
                        int minLength,
                        int maxLength)
        Adds character NGrams to the current instance.
        Parameters:
        chars -
        minLength -
        maxLength -
      • remove

        public void remove​(StringList tokens)
        Removes the specified tokens form the NGram model, they are just dropped.
        Parameters:
        tokens -
      • contains

        public boolean contains​(StringList tokens)
        Checks fit he given tokens are contained by the current instance.
        Parameters:
        tokens -
        Returns:
        true if the ngram is contained
      • size

        public int size()
        Retrieves the number of StringList entries in the current instance.
        Returns:
        number of different grams
      • numberOfGrams

        public int numberOfGrams()
        Retrieves the total count of all Ngrams.
        Returns:
        total count of all ngrams
      • cutoff

        public void cutoff​(int cutoffUnder,
                           int cutoffOver)
        Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
        Parameters:
        cutoffUnder -
        cutoffOver -
      • toDictionary

        public Dictionary toDictionary()
        Creates a dictionary which contain all StringList which are in the current NGramModel. Entries which are only different in the case are merged into one. Calling this method is the same as calling toDictionary(boolean) with true.
        Returns:
        a dictionary of the ngrams
      • toDictionary

        public Dictionary toDictionary​(boolean caseSensitive)
        Creates a dictionary which contains all StringLists which are in the current NGramModel.
        Parameters:
        caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.
        Returns:
        a dictionary of the ngrams
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object