Package opennlp.tools.tokenize
package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently, there is the
learnable
TokenizerME
, the WhitespaceTokenizer
and
the SimpleTokenizer
which is a character class tokenizer.-
ClassDescriptionA default
TokenContextGenerator
which produces events for maxent decisions for tokenization.ADetokenizer
merges tokens back to their detokenized representation.This enum contains an operation for every token to merge the tokens together to their detokenized form.TheDetokenizerEvaluator
measures the performance of the givenDetokenizer
with the provided referencesamples
.A rule based detokenizer.A basicTokenizer
implementation which performs tokenization using character classes.Interface for context generators required forTokenizerME
.The interface for tokenizers, which segment a string into its tokens.A cross validator fortokenizers
.A marker interface for evaluatingtokenizers
.TheTokenizerEvaluator
measures the performance of the givenTokenizer
with the provided referencesamples
.The factory that providesTokenizer
default implementation and resources.ATokenizer
for converting raw text into separated tokens.TheTokenizerModel
is the model used by a learnableTokenizer
.ATokenSample
is text with token spans.This class is astream filter
which reads in string encoded samples and createssamples
out of them.A basicTokenizer
implementation which performs tokenization using white spaces.This stream formatsObjectStream
ofsamples
into whitespace separated token strings.ATokenizer
implementation which performs tokenization using word pieces.