Package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently there is the
learnable
TokenizerME, the WhitespaceTokenizer and
the SimpleTokenizer which is a character class tokenizer.-
Interface Summary Interface Description Detokenizer A Detokenizer merges tokens back to their untokenized representation.TokenContextGenerator Interface forTokenizerMEcontext generators.Tokenizer The interface for tokenizers, which segment a string into its tokens.TokenizerEvaluationMonitor -
Class Summary Class Description DefaultTokenContextGenerator Generate events for maxent decisions for tokenization.DetokenizationDictionary DetokenizerEvaluator TheDetokenizerEvaluatormeasures the performance of the givenDetokenizerwith the provided referenceTokenSamples.DictionaryDetokenizer A rule based detokenizer.SimpleTokenizer Performs tokenization using character classes.TokenizerCrossValidator TokenizerEvaluator TheTokenizerEvaluatormeasures the performance of the givenTokenizerwith the provided referenceTokenSamples.TokenizerFactory The factory that providesTokenizerdefault implementations and resources.TokenizerME A Tokenizer for converting raw text into separated tokens.TokenizerModel TheTokenizerModelis the model used by a learnableTokenizer.TokenizerStream TheTokenizerStreamuses a tokenizer to tokenize the input string and outputTokenSamples.TokenSample ATokenSampleis text with token spans.TokenSampleStream This class is a stream filter which reads in string encoded samples and createsTokenSamples out of them.TokSpanEventStream This class reads theTokenSamples from the givenIteratorand converts theTokenSamples intoEvents which can be used by the maxent library for training.WhitespaceTokenizer This tokenizer uses white spaces to tokenize the input text.WhitespaceTokenStream This stream formats aTokenSamples into whitespace separated token strings.WordpieceTokenizer A WordPiece tokenizer. -
Enum Summary Enum Description DetokenizationDictionary.Operation Detokenizer.DetokenizationOperation This enum contains an operation for every token to merge the tokens together to their detokenized form.