See: Description
Interface | Description |
---|---|
Detokenizer |
A Detokenizer merges tokens back to their untokenized representation.
|
TokenContextGenerator |
Interface for
TokenizerME context generators. |
Tokenizer |
The interface for tokenizers, which segment a string into its tokens.
|
TokenizerEvaluationMonitor |
Class | Description |
---|---|
DefaultTokenContextGenerator |
Generate events for maxent decisions for tokenization.
|
DetokenizationDictionary | |
DetokenizerEvaluator |
The
DetokenizerEvaluator measures the performance of
the given Detokenizer with the provided reference
TokenSample s. |
DictionaryDetokenizer |
A rule based detokenizer.
|
SimpleTokenizer |
Performs tokenization using character classes.
|
TokenizerCrossValidator | |
TokenizerEvaluator |
The
TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
TokenSample s. |
TokenizerFactory |
The factory that provides
Tokenizer default implementations and
resources. |
TokenizerME |
A Tokenizer for converting raw text into separated tokens.
|
TokenizerModel |
The
TokenizerModel is the model used
by a learnable Tokenizer . |
TokenizerStream |
The
TokenizerStream uses a tokenizer to tokenize the
input string and output TokenSample s. |
TokenSample |
A
TokenSample is text with token spans. |
TokenSampleStream |
This class is a stream filter which reads in string encoded samples and creates
TokenSample s out of them. |
TokSpanEventStream |
This class reads the
TokenSample s from the given Iterator
and converts the TokenSample s into Event s which
can be used by the maxent library for training. |
WhitespaceTokenizer |
This tokenizer uses white spaces to tokenize the input text.
|
WhitespaceTokenStream |
This stream formats a
TokenSample s into whitespace
separated token strings. |
Enum | Description |
---|---|
DetokenizationDictionary.Operation | |
Detokenizer.DetokenizationOperation |
This enum contains an operation for every token to merge the
tokens together to their detokenized form.
|
TokenizerME
, the WhitespaceTokenizer
and
the SimpleTokenizer
which is a character class tokenizer.Copyright © 2021 The Apache Software Foundation. All rights reserved.