Package opennlp.tools.formats.ad
Class ADChunkSampleStream
java.lang.Object
opennlp.tools.formats.ad.ADChunkSampleStream
- All Implemented Interfaces:
AutoCloseable
,ObjectStream<ChunkSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese Chunker training.
The heuristic to extract chunks where based o paper 'A Machine Learning
Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero
Santos and Ruy Milidiú).
Data can be found on this web site.
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.
Detailed info about the NER tagset.
Note: Do not use this class, internal use only!
-
Field Summary
-
Constructor Summary
ConstructorDescriptionADChunkSampleStream
(InputStreamFactory in, String charsetName) Instantiates aADChunkSampleStream
stream from anInputStreamFactory
.ADChunkSampleStream
(ObjectStream<String> lineStream) Instantiates aADChunkSampleStream
stream fromObjectStream<String>
, that could be aPlainTextByLineStream
object. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes theObjectStream
and releases all allocated resources.static String
convertFuncTag
(String t, boolean useCGTags) read()
Returns the nextObjectStream
object.void
reset()
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.void
setEnd
(int aEnd) void
setStart
(int aStart)
-
Field Details
-
OTHER
- See Also:
-
-
Constructor Details
-
ADChunkSampleStream
Instantiates aADChunkSampleStream
stream fromObjectStream<String>
, that could be aPlainTextByLineStream
object.- Parameters:
lineStream
- AnObjectStream<String>
as input.
-
ADChunkSampleStream
Instantiates aADChunkSampleStream
stream from anInputStreamFactory
.- Parameters:
in
- TheInputStreamFactory
for the corpus.charsetName
- Thecharset
to use for reading of the corpus.- Throws:
IOException
-
-
Method Details
-
read
Description copied from interface:ObjectStream
Returns the nextObjectStream
object. Calling this method repeatedly until it returnsnull
will return each object from the underlying source exactly once.- Specified by:
read
in interfaceObjectStream<ChunkSample>
- Returns:
- The next object or
null
to signal that the stream is exhausted. - Throws:
IOException
- Thrown if there is an error during reading.
-
convertFuncTag
-
setStart
public void setStart(int aStart) -
setEnd
public void setEnd(int aEnd) -
reset
Description copied from interface:ObjectStream
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required.The implementation of this method is optional.
- Specified by:
reset
in interfaceObjectStream<ChunkSample>
- Throws:
IOException
- Thrown if there is an error during resetting the stream.UnsupportedOperationException
- Thrown if thereset()
is not supported. By default, this is the case.
-
close
Description copied from interface:ObjectStream
Closes theObjectStream
and releases all allocated resources. After close was called, it's not allowed to callObjectStream.read()
orObjectStream.reset()
.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceObjectStream<ChunkSample>
- Throws:
IOException
- Thrown if there is an error during closing the stream.
-