Class TokenSampleStream

java.lang.Object
opennlp.tools.util.FilterObjectStream<String, opennlp.tools.tokenize.TokenSample>
opennlp.tools.tokenize.TokenSampleStream
All Implemented Interfaces:
AutoCloseable, opennlp.tools.util.ObjectStream<opennlp.tools.tokenize.TokenSample>

public class TokenSampleStream extends FilterObjectStream<String, opennlp.tools.tokenize.TokenSample>
This class is a stream filter which reads in string encoded samples and creates samples out of them. The input string sample is tokenized if a whitespace or the special separator chars occur.

Sample:
"token1 token2 token3<SPLIT>token4"
The tokens token1 and token2 are separated by a whitespace, token3 and token4 are separated by the special character sequence. In this case, the default split sequence applies.

Note: The sequence must be unique in the input string and is not escaped.

  • Constructor Details

    • TokenSampleStream

      public TokenSampleStream(opennlp.tools.util.ObjectStream<String> samples, String separatorChars)
      Initializes a instance.
      Parameters:
      samples - A plain text line stream. Must not be null.
      separatorChars - The characters to be considered separators. See TokenSample.DEFAULT_SEPARATOR_CHARS. Must not be null.
    • TokenSampleStream

      public TokenSampleStream(opennlp.tools.util.ObjectStream<String> sentences)
      Initializes a instance.
      Parameters:
      sentences - A plain text line stream. Must not be null.
  • Method Details