Package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently, there is the
learnable
TokenizerME, the WhitespaceTokenizer and
the SimpleTokenizer which is a character class tokenizer.-
Interface Summary Interface Description Detokenizer ADetokenizermerges tokens back to their detokenized representation.TokenContextGenerator Interface for context generators required forTokenizerME.Tokenizer The interface for tokenizers, which segment a string into its tokens.TokenizerEvaluationMonitor A marker interface for evaluatingtokenizers. -
Class Summary Class Description DefaultTokenContextGenerator A defaultTokenContextGeneratorwhich produces events for maxent decisions for tokenization.DetokenizationDictionary DetokenizerEvaluator TheDetokenizerEvaluatormeasures the performance of the givenDetokenizerwith the provided referencesamples.DictionaryDetokenizer A rule based detokenizer.SimpleTokenizer A basicTokenizerimplementation which performs tokenization using character classes.TokenizerCrossValidator A cross validator fortokenizers.TokenizerEvaluator TheTokenizerEvaluatormeasures the performance of the givenTokenizerwith the provided referencesamples.TokenizerFactory The factory that providesTokenizerdefault implementation and resources.TokenizerME ATokenizerfor converting raw text into separated tokens.TokenizerModel TheTokenizerModelis the model used by a learnableTokenizer.TokenizerStream TokenSample ATokenSampleis text with token spans.TokenSampleStream This class is astream filterwhich reads in string encoded samples and createssamplesout of them.TokSpanEventStream WhitespaceTokenizer A basicTokenizerimplementation which performs tokenization using white spaces.WhitespaceTokenStream This stream formatsObjectStreamofsamplesinto whitespace separated token strings.WordpieceTokenizer ATokenizerimplementation which performs tokenization using word pieces. -
Enum Summary Enum Description DetokenizationDictionary.Operation Detokenizer.DetokenizationOperation This enum contains an operation for every token to merge the tokens together to their detokenized form.