Package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently there is the
learnable
TokenizerME
, the WhitespaceTokenizer
and
the SimpleTokenizer
which is a character class tokenizer.-
Interface Summary Interface Description Detokenizer A Detokenizer merges tokens back to their untokenized representation.TokenContextGenerator Interface forTokenizerME
context generators.Tokenizer The interface for tokenizers, which segment a string into its tokens.TokenizerEvaluationMonitor -
Class Summary Class Description DefaultTokenContextGenerator Generate events for maxent decisions for tokenization.DetokenizationDictionary DetokenizerEvaluator TheDetokenizerEvaluator
measures the performance of the givenDetokenizer
with the provided referenceTokenSample
s.DictionaryDetokenizer A rule based detokenizer.SimpleTokenizer Performs tokenization using character classes.TokenizerCrossValidator TokenizerEvaluator TheTokenizerEvaluator
measures the performance of the givenTokenizer
with the provided referenceTokenSample
s.TokenizerFactory The factory that providesTokenizer
default implementations and resources.TokenizerME A Tokenizer for converting raw text into separated tokens.TokenizerModel TheTokenizerModel
is the model used by a learnableTokenizer
.TokenizerStream TheTokenizerStream
uses a tokenizer to tokenize the input string and outputTokenSample
s.TokenSample ATokenSample
is text with token spans.TokenSampleStream This class is a stream filter which reads in string encoded samples and createsTokenSample
s out of them.TokSpanEventStream This class reads theTokenSample
s from the givenIterator
and converts theTokenSample
s intoEvent
s which can be used by the maxent library for training.WhitespaceTokenizer This tokenizer uses white spaces to tokenize the input text.WhitespaceTokenStream This stream formats aTokenSample
s into whitespace separated token strings.WordpieceTokenizer A WordPiece tokenizer. -
Enum Summary Enum Description DetokenizationDictionary.Operation Detokenizer.DetokenizationOperation This enum contains an operation for every token to merge the tokens together to their detokenized form.