Package opennlp.tools.tokenize
Class SimpleTokenizer
java.lang.Object
opennlp.tools.tokenize.SimpleTokenizer
- All Implemented Interfaces:
Tokenizer
-
Field Summary
Modifier and TypeFieldDescriptionstatic final SimpleTokenizer
Use this static reference to retrieve an instance of theSimpleTokenizer
. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
setKeepNewLines
(boolean keepNewLines) Switches whether to keep new lines or not.String[]
Splits a string into its atomic parts.Span[]
Finds the boundaries of atomic parts in a string.
-
Field Details
-
INSTANCE
Use this static reference to retrieve an instance of theSimpleTokenizer
.
-
-
Constructor Details
-
SimpleTokenizer
Deprecated.UseINSTANCE
field instead to obtain an instance. This constructor will be made private in the future.
-
-
Method Details
-
tokenizePos
Description copied from interface:Tokenizer
Finds the boundaries of atomic parts in a string.- Parameters:
s
- The string to be tokenized.- Returns:
- The
spans (offsets into
for each token as the individuals array elements.s
)
-
tokenize
Description copied from interface:Tokenizer
Splits a string into its atomic parts. -
setKeepNewLines
public void setKeepNewLines(boolean keepNewLines) Switches whether to keep new lines or not.- Parameters:
keepNewLines
-True
if new lines are kept,false
otherwise.
-
INSTANCE
field instead to obtain an instance.