Class WhitespaceTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class WhitespaceTokenizer
    extends Object
    This tokenizer uses white spaces to tokenize the input text. To obtain an instance of this tokenizer use the static final INSTANCE field.
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(String d)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        d - The string to be tokenized.
        Returns:
        The Span[] with the spans (offsets into s) for each token as the individuals array elements.
      • tokenize

        public String[] tokenize​(String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.
      • setKeepNewLines

        public void setKeepNewLines​(boolean keepNewLines)