Class SimpleTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class SimpleTokenizer
    extends Object
    A basic Tokenizer implementation which performs tokenization using character classes.

    To obtain an instance of this tokenizer use the static final INSTANCE field.

    • Constructor Detail

      • SimpleTokenizer

        @Deprecated
        public SimpleTokenizer()
        Deprecated.
        Use INSTANCE field instead to obtain an instance. This constructor will be made private in the future.
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(String s)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        s - The string to be tokenized.
        Returns:
        The spans (offsets into {@code s}) for each token as the individuals array elements.
      • tokenize

        public String[] tokenize​(String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts.
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.
      • setKeepNewLines

        public void setKeepNewLines​(boolean keepNewLines)
        Switches whether to keep new lines or not.
        Parameters:
        keepNewLines - True if new lines are kept, false otherwise.