Class ThreadSafeTokenizerME

java.lang.Object
opennlp.tools.tokenize.ThreadSafeTokenizerME
All Implemented Interfaces:
AutoCloseable, Tokenizer

@ThreadSafe public class ThreadSafeTokenizerME extends Object implements Tokenizer, AutoCloseable
A thread-safe version of TokenizerME. Using it is completely transparent. You can use it in a single-threaded context as well, it only incurs a minimal overhead.

Note, however, that this implementation uses a ThreadLocal. Although the implementation is lightweight because the model is not duplicated, if you have many long-running threads, you may run into memory problems.

Be careful when using this in a Jakarta EE application, for example.

The user is responsible for clearing the ThreadLocal.
  • Constructor Details

    • ThreadSafeTokenizerME

      public ThreadSafeTokenizerME(TokenizerModel model)
  • Method Details

    • tokenize

      public String[] tokenize(String s)
      Description copied from interface: Tokenizer
      Splits a string into its atomic parts.
      Specified by:
      tokenize in interface Tokenizer
      Parameters:
      s - The string to be tokenized.
      Returns:
      The String[] with the individual tokens as the array elements.
    • tokenizePos

      public Span[] tokenizePos(String s)
      Description copied from interface: Tokenizer
      Finds the boundaries of atomic parts in a string.
      Specified by:
      tokenizePos in interface Tokenizer
      Parameters:
      s - The string to be tokenized.
      Returns:
      The spans (offsets into s) for each token as the individuals array elements.
    • getProbabilities

      public double[] getProbabilities()
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable