Package opennlp.tools.tokenize
Class ThreadSafeTokenizerME
java.lang.Object
opennlp.tools.tokenize.ThreadSafeTokenizerME
- All Implemented Interfaces:
AutoCloseable
,Probabilistic
,Tokenizer
@ThreadSafe
public class ThreadSafeTokenizerME
extends Object
implements Tokenizer, Probabilistic, AutoCloseable
A thread-safe version of
TokenizerME
. Using it is completely transparent.
You can use it in a single-threaded context as well, it only incurs a minimal overhead.
Note:
This implementation uses a ThreadLocal
. Although the implementation is
lightweight because the model is not duplicated, if you have many long-running threads,
you may run into memory problems.
Be careful when using this in a Jakarta EE application, for example.
The user is responsible for clearing theThreadLocal
via calling close()
.- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionThreadSafeTokenizerME
(String language) Initializes aThreadSafeTokenizerME
by downloading a default model for a givenlanguage
.Initializes aThreadSafeTokenizerME
with the specifiedmodel
.ThreadSafeTokenizerME
(TokenizerModel model, Dictionary abbDict) Instantiates aThreadSafeTokenizerME
with an existingTokenizerModel
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
double[]
Deprecated, for removal: This API element is subject to removal in a future version.double[]
probs()
Retrieves the probabilities of the last decoded sequence.String[]
Splits a string into its atomic parts.Span[]
Finds the boundaries of atomic parts in a string.
-
Constructor Details
-
ThreadSafeTokenizerME
Initializes aThreadSafeTokenizerME
by downloading a default model for a givenlanguage
.- Parameters:
language
- An ISO conform language code.- Throws:
IOException
- Thrown if the model could not be downloaded or saved.
-
ThreadSafeTokenizerME
Initializes aThreadSafeTokenizerME
with the specifiedmodel
.- Parameters:
model
- A validTokenizerModel
.
-
ThreadSafeTokenizerME
Instantiates aThreadSafeTokenizerME
with an existingTokenizerModel
.- Parameters:
model
- TheTokenizerModel
to be used.abbDict
- TheDictionary
to be used. It must fit the language of themodel
.
-
-
Method Details
-
tokenize
Description copied from interface:Tokenizer
Splits a string into its atomic parts. -
tokenizePos
Description copied from interface:Tokenizer
Finds the boundaries of atomic parts in a string.- Specified by:
tokenizePos
in interfaceTokenizer
- Parameters:
s
- The string to be tokenized.- Returns:
- The
spans (offsets into
for each token as the individuals array elements.s
)
-
probs
public double[] probs()Description copied from interface:Probabilistic
Retrieves the probabilities of the last decoded sequence.- Specified by:
probs
in interfaceProbabilistic
- Returns:
- An array with the same number of probabilities as tokens were sent to the computational method when it was last called.
-
getProbabilities
Deprecated, for removal: This API element is subject to removal in a future version.Useprobs()
instead. -
close
public void close()- Specified by:
close
in interfaceAutoCloseable
-
probs()
instead.