Class WhitespaceTokenizer

java.lang.Object
opennlp.tools.tokenize.WhitespaceTokenizer
All Implemented Interfaces:
Tokenizer

public class WhitespaceTokenizer extends Object
A basic Tokenizer implementation which performs tokenization using white spaces.

To obtain an instance of this tokenizer use the static final INSTANCE field.

  • Field Details

  • Method Details

    • tokenizePos

      public Span[] tokenizePos(String d)
      Description copied from interface: Tokenizer
      Finds the boundaries of atomic parts in a string.
      Parameters:
      d - The string to be tokenized.
      Returns:
      The spans (offsets into s) for each token as the individuals array elements.
    • tokenize

      public String[] tokenize(String s)
      Description copied from interface: Tokenizer
      Splits a string into its atomic parts.
      Specified by:
      tokenize in interface Tokenizer
      Parameters:
      s - The string to be tokenized.
      Returns:
      The String[] with the individual tokens as the array elements.
    • setKeepNewLines

      public void setKeepNewLines(boolean keepNewLines)
      Switches whether to keep new lines or not.
      Parameters:
      keepNewLines - True if new lines are kept, false otherwise.