LeipzigDoccatSampleStream (Apache OpenNLP Tools 1.5.3 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

opennlp.tools.formats
Class LeipzigDoccatSampleStream

java.lang.Object
  opennlp.tools.util.FilterObjectStream<String,DocumentSample>
      opennlp.tools.formats.LeipzigDoccatSampleStream

All Implemented Interfaces:: ObjectStream<DocumentSample>

public class LeipzigDoccatSampleStream
extends FilterObjectStream<String,DocumentSample>
extends FilterObjectStream<String,DocumentSample>

Stream filter to produce document samples out of a Leipzig sentences.txt file. In the Leipzig corpus the encoding of the various sentences.txt file is defined by the language. The language must be specified to produce the category tags and is used to determine the correct input encoding.

The input text is tokenized with the SimpleTokenizer. The input text classified by the language model must also be tokenized by the SimpleTokenizer to produce exactly the same tokenization during testing and training.

Method Summary
`DocumentSample`	`read()` Returns the next object.

Methods inherited from class opennlp.tools.util.FilterObjectStream
`close, reset`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

read

public DocumentSample read()
                    throws IOException

Description copied from interface: ObjectStream

Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.

Returns:: the next object or null to signal that the stream is exhausted
Throws:: IOException