opennlp.tools.tokenize
Class WhitespaceTokenizer

java.lang.Object
  extended by opennlp.tools.tokenize.WhitespaceTokenizer
All Implemented Interfaces:
Tokenizer

public class WhitespaceTokenizer
extends Object

This tokenizer uses white spaces to tokenize the input text. To obtain an instance of this tokenizer use the static final INSTANCE field.


Field Summary
static WhitespaceTokenizer INSTANCE
          Use this static reference to retrieve an instance of the WhitespaceTokenizer.
 
Method Summary
 String[] tokenize(String s)
          Splits a string into its atomic parts
 Span[] tokenizePos(String d)
          Finds the boundaries of atomic parts in a string.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INSTANCE

public static final WhitespaceTokenizer INSTANCE
Use this static reference to retrieve an instance of the WhitespaceTokenizer.

Method Detail

tokenizePos

public Span[] tokenizePos(String d)
Description copied from interface: Tokenizer
Finds the boundaries of atomic parts in a string.

Parameters:
d - The string to be tokenized.
Returns:
The Span[] with the spans (offsets into s) for each token as the individuals array elements.

tokenize

public String[] tokenize(String s)
Description copied from interface: Tokenizer
Splits a string into its atomic parts

Specified by:
tokenize in interface Tokenizer
Parameters:
s - The string to be tokenized.
Returns:
The String[] with the individual tokens as the array elements.


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.