Package opennlp.tools.tokenize
Class SimpleTokenizer
- java.lang.Object
-
- opennlp.tools.tokenize.SimpleTokenizer
-
-
Field Summary
Fields Modifier and Type Field Description static SimpleTokenizer
INSTANCE
Use this static reference to retrieve an instance of theSimpleTokenizer
.
-
Constructor Summary
Constructors Constructor Description SimpleTokenizer()
Deprecated.UseINSTANCE
field instead to obtain an instance.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
setKeepNewLines(boolean keepNewLines)
Switches whether to keep new lines or not.String[]
tokenize(String s)
Splits a string into its atomic parts.Span[]
tokenizePos(String s)
Finds the boundaries of atomic parts in a string.
-
-
-
Field Detail
-
INSTANCE
public static final SimpleTokenizer INSTANCE
Use this static reference to retrieve an instance of theSimpleTokenizer
.
-
-
Constructor Detail
-
SimpleTokenizer
@Deprecated public SimpleTokenizer()
Deprecated.UseINSTANCE
field instead to obtain an instance. This constructor will be made private in the future.
-
-
Method Detail
-
tokenizePos
public Span[] tokenizePos(String s)
Description copied from interface:Tokenizer
Finds the boundaries of atomic parts in a string.- Parameters:
s
- The string to be tokenized.- Returns:
- The
spans (offsets into {@code s})
for each token as the individuals array elements.
-
tokenize
public String[] tokenize(String s)
Description copied from interface:Tokenizer
Splits a string into its atomic parts.
-
setKeepNewLines
public void setKeepNewLines(boolean keepNewLines)
Switches whether to keep new lines or not.- Parameters:
keepNewLines
-True
if new lines are kept,false
otherwise.
-
-