Class WhitespaceTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class WhitespaceTokenizer
    extends Object
    A basic Tokenizer implementation which performs tokenization using white spaces.

    To obtain an instance of this tokenizer use the static final INSTANCE field.

    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(String d)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        d - The string to be tokenized.
        Returns:
        The spans (offsets into {@code s}) for each token as the individuals array elements.
      • tokenize

        public String[] tokenize​(String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts.
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.
      • setKeepNewLines

        public void setKeepNewLines​(boolean keepNewLines)
        Switches whether to keep new lines or not.
        Parameters:
        keepNewLines - True if new lines are kept, false otherwise.