public class LeipzigDoccatSampleStream extends FilterObjectStream<String,DocumentSample>
The input text is tokenized with the SimpleTokenizer
. The input text classified
by the language model must also be tokenized by the SimpleTokenizer
to produce
exactly the same tokenization during testing and training.ΓΈ
Constructor and Description |
---|
LeipzigDoccatSampleStream(String language,
int sentencesPerDocument,
InputStreamFactory in)
Creates a new LeipzigDoccatSampleStream with the specified parameters.
|
LeipzigDoccatSampleStream(String language,
int sentencesPerDocument,
Tokenizer tokenizer,
InputStreamFactory in)
Creates a new LeipzigDoccatSampleStream with the specified parameters.
|
Modifier and Type | Method and Description |
---|---|
DocumentSample |
read()
Returns the next object.
|
close, reset
public LeipzigDoccatSampleStream(String language, int sentencesPerDocument, Tokenizer tokenizer, InputStreamFactory in) throws IOException
language
- the Leipzig input sentences.txt filesentencesPerDocument
- the number of sentences which
should be grouped into once DocumentSample
in
- the InputStream pointing to the contents of the sentences.txt input fileIOException
- IOExceptionpublic LeipzigDoccatSampleStream(String language, int sentencesPerDocument, InputStreamFactory in) throws IOException
language
- the Leipzig input sentences.txt filesentencesPerDocument
- the number of sentences which should be
grouped into once DocumentSample
in
- the InputStream pointing to the contents of the sentences.txt input fileIOException
- IOExceptionpublic DocumentSample read() throws IOException
ObjectStream
IOException
- if there is an error during readingCopyright © 2017 The Apache Software Foundation. All rights reserved.