Package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently, there is the
learnable
TokenizerME
, the WhitespaceTokenizer
and
the SimpleTokenizer
which is a character class tokenizer.-
Interface Summary Interface Description Detokenizer ADetokenizer
merges tokens back to their detokenized representation.TokenContextGenerator Interface for context generators required forTokenizerME
.Tokenizer The interface for tokenizers, which segment a string into its tokens.TokenizerEvaluationMonitor A marker interface for evaluatingtokenizers
. -
Class Summary Class Description DefaultTokenContextGenerator A defaultTokenContextGenerator
which produces events for maxent decisions for tokenization.DetokenizationDictionary DetokenizerEvaluator TheDetokenizerEvaluator
measures the performance of the givenDetokenizer
with the provided referencesamples
.DictionaryDetokenizer A rule based detokenizer.SimpleTokenizer A basicTokenizer
implementation which performs tokenization using character classes.TokenizerCrossValidator A cross validator fortokenizers
.TokenizerEvaluator TheTokenizerEvaluator
measures the performance of the givenTokenizer
with the provided referencesamples
.TokenizerFactory The factory that providesTokenizer
default implementation and resources.TokenizerME ATokenizer
for converting raw text into separated tokens.TokenizerModel TheTokenizerModel
is the model used by a learnableTokenizer
.TokenizerStream TokenSample ATokenSample
is text with token spans.TokenSampleStream This class is astream filter
which reads in string encoded samples and createssamples
out of them.TokSpanEventStream WhitespaceTokenizer A basicTokenizer
implementation which performs tokenization using white spaces.WhitespaceTokenStream This stream formatsObjectStream
ofsamples
into whitespace separated token strings.WordpieceTokenizer ATokenizer
implementation which performs tokenization using word pieces. -
Enum Summary Enum Description DetokenizationDictionary.Operation Detokenizer.DetokenizationOperation This enum contains an operation for every token to merge the tokens together to their detokenized form.