Contains classes related to finding token or words in a string. All tokenizer implement the Tokenizer interface. Currently, there is the learnable
SimpleTokenizerwhich is a character class tokenizer.
TokenContextGeneratorwhich produces events for maxent decisions for tokenization.A
Detokenizermerges tokens back to their detokenized representation.This enum contains an operation for every token to merge the tokens together to their detokenized form.A rule based detokenizer.A basic
Tokenizerimplementation which performs tokenization using character classes.Interface for context generators required for
TokenizerME.The interface for tokenizers, which segment a string into its tokens.A cross validator for
tokenizers.A marker interface for evaluating
tokenizers.The factory that provides
Tokenizerdefault implementation and resources.A
Tokenizerfor converting raw text into separated tokens.A
TokenSampleis text with token spans.A basic
Tokenizerimplementation which performs tokenization using white spaces.A
Tokenizerimplementation which performs tokenization using word pieces.