Klasse WhitespaceTokenizer

java.lang.Object
opennlp.tools.tokenize.WhitespaceTokenizer
Alle implementierten Schnittstellen:
Tokenizer

public class WhitespaceTokenizer extends Object
A basic Tokenizer implementation which performs tokenization using white spaces.

To obtain an instance of this tokenizer use the static final INSTANCE field.

  • Felddetails

  • Methodendetails

    • tokenizePos

      public Span[] tokenizePos(String d)
      Beschreibung aus Schnittstelle kopiert: Tokenizer
      Finds the boundaries of atomic parts in a string.
      Parameter:
      d - The string to be tokenized.
      Gibt zurück:
      The spans (offsets into s) for each token as the individuals array elements.
    • tokenize

      public String[] tokenize(String s)
      Beschreibung aus Schnittstelle kopiert: Tokenizer
      Splits a string into its atomic parts.
      Angegeben von:
      tokenize in Schnittstelle Tokenizer
      Parameter:
      s - The string to be tokenized.
      Gibt zurück:
      The String[] with the individual tokens as the array elements.
    • setKeepNewLines

      public void setKeepNewLines(boolean keepNewLines)
      Switches whether to keep new lines or not.
      Parameter:
      keepNewLines - True if new lines are kept, false otherwise.