Class SimpleTokenizer

java.lang.Object
opennlp.tools.tokenize.SimpleTokenizer
All Implemented Interfaces:
Tokenizer

public class SimpleTokenizer extends Object
A basic Tokenizer implementation which performs tokenization using character classes.

To obtain an instance of this tokenizer use the static final INSTANCE field.

  • Field Details

  • Constructor Details

    • SimpleTokenizer

      @Deprecated public SimpleTokenizer()
      Deprecated.
      Use INSTANCE field instead to obtain an instance. This constructor will be made private in the future.
  • Method Details

    • tokenizePos

      public Span[] tokenizePos(String s)
      Description copied from interface: Tokenizer
      Finds the boundaries of atomic parts in a string.
      Parameters:
      s - The string to be tokenized.
      Returns:
      The spans (offsets into s) for each token as the individuals array elements.
    • tokenize

      public String[] tokenize(String s)
      Description copied from interface: Tokenizer
      Splits a string into its atomic parts.
      Specified by:
      tokenize in interface Tokenizer
      Parameters:
      s - The string to be tokenized.
      Returns:
      The String[] with the individual tokens as the array elements.
    • setKeepNewLines

      public void setKeepNewLines(boolean keepNewLines)
      Switches whether to keep new lines or not.
      Parameters:
      keepNewLines - True if new lines are kept, false otherwise.