Class SimpleTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class SimpleTokenizer
    extends Object
    Performs tokenization using character classes.
    • Constructor Detail

      • SimpleTokenizer

        @Deprecated
        public SimpleTokenizer()
        Deprecated.
        Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(String s)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        s - The string to be tokenized.
        Returns:
        The Span[] with the spans (offsets into s) for each token as the individuals array elements.
      • tokenize

        public String[] tokenize​(String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.
      • setKeepNewLines

        public void setKeepNewLines​(boolean keepNewLines)