Class DocumentSampleStream

All Implemented Interfaces:
AutoCloseable, ObjectStream<DocumentSample>

public class DocumentSampleStream extends FilterObjectStream<String,DocumentSample>
Reads in string encoded training samples, parses them and outputs DocumentSample objects.

Format:
Each line contains one sample document.
The category is the first string in the line followed by a tab and whitespace separated document tokens.

Sample line: category-string tab-char whitespace-separated-tokens line-break-char(s)

See Also:
  • Constructor Details

  • Method Details

    • read

      public DocumentSample read() throws IOException
      Description copied from interface: ObjectStream
      Returns the next ObjectStream object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.
      Returns:
      The next object or null to signal that the stream is exhausted.
      Throws:
      IOException - Thrown if there is an error during reading.