Package opennlp.tools.tokenize
Class ThreadSafeTokenizerME
java.lang.Object
opennlp.tools.tokenize.ThreadSafeTokenizerME
- All Implemented Interfaces:
- AutoCloseable,- Probabilistic,- Tokenizer
@ThreadSafe
public class ThreadSafeTokenizerME
extends Object
implements Tokenizer, Probabilistic, AutoCloseable
A thread-safe version of 
TokenizerME. Using it is completely transparent.
 You can use it in a single-threaded context as well, it only incurs a minimal overhead.
 
 Note:
 This implementation uses a ThreadLocal. Although the implementation is
 lightweight because the model is not duplicated, if you have many long-running threads,
 you may run into memory problems.
 
Be careful when using this in a Jakarta EE application, for example.
The user is responsible for clearing theThreadLocal
 via calling close().- See Also:
- 
Constructor SummaryConstructorsConstructorDescriptionThreadSafeTokenizerME(String language) Initializes aThreadSafeTokenizerMEby downloading a default model for a givenlanguage.Initializes aThreadSafeTokenizerMEwith the specifiedmodel.ThreadSafeTokenizerME(TokenizerModel model, Dictionary abbDict) Instantiates aThreadSafeTokenizerMEwith an existingTokenizerModel.
- 
Method SummaryModifier and TypeMethodDescriptionvoidclose()double[]Deprecated, for removal: This API element is subject to removal in a future version.double[]probs()Retrieves the probabilities of the last decoded sequence.String[]Splits a string into its atomic parts.Span[]Finds the boundaries of atomic parts in a string.
- 
Constructor Details- 
ThreadSafeTokenizerMEInitializes aThreadSafeTokenizerMEby downloading a default model for a givenlanguage.- Parameters:
- language- An ISO conform language code.
- Throws:
- IOException- Thrown if the model could not be downloaded or saved.
 
- 
ThreadSafeTokenizerMEInitializes aThreadSafeTokenizerMEwith the specifiedmodel.- Parameters:
- model- A valid- TokenizerModel.
 
- 
ThreadSafeTokenizerMEInstantiates aThreadSafeTokenizerMEwith an existingTokenizerModel.- Parameters:
- model- The- TokenizerModelto be used.
- abbDict- The- Dictionaryto be used. It must fit the language of the- model.
 
 
- 
- 
Method Details- 
tokenizeDescription copied from interface:TokenizerSplits a string into its atomic parts.
- 
tokenizePosDescription copied from interface:TokenizerFinds the boundaries of atomic parts in a string.- Specified by:
- tokenizePosin interface- Tokenizer
- Parameters:
- s- The string to be tokenized.
- Returns:
- The spans (offsets intofor each token as the individuals array elements.s)
 
- 
probspublic double[] probs()Description copied from interface:ProbabilisticRetrieves the probabilities of the last decoded sequence.- Specified by:
- probsin interface- Probabilistic
- Returns:
- An array with the same number of probabilities as tokens were sent to the computational method when it was last called.
 
- 
getProbabilitiesDeprecated, for removal: This API element is subject to removal in a future version.Useprobs()instead.
- 
closepublic void close()- Specified by:
- closein interface- AutoCloseable
 
 
- 
probs()instead.