Class EmptyLinePreprocessorStream

  • All Implemented Interfaces:
    AutoCloseable, ObjectStream<String>

    public class EmptyLinePreprocessorStream
    extends FilterObjectStream<String,​String>
    Stream to to clean up empty lines for empty line separated document streams.
    - Skips empty line at training data start
    - Transforms multiple empty lines in a row into one
    - Replaces white space lines with empty lines
    - TODO: Terminates last document with empty line if it is missing

    This stream should be used by the components that mark empty lines to mark document boundaries.

    Note: This class is not thread safe.
    Do not use this class, internal use only!

    • Constructor Detail

      • EmptyLinePreprocessorStream

        public EmptyLinePreprocessorStream​(ObjectStream<String> in)
    • Method Detail

      • read

        public String read()
                    throws IOException
        Description copied from interface: ObjectStream
        Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.
        Returns:
        the next object or null to signal that the stream is exhausted
        Throws:
        IOException - if there is an error during reading