Package opennlp.tools.formats.ad
Class ADChunkSampleStream
- java.lang.Object
-
- opennlp.tools.formats.ad.ADChunkSampleStream
-
- All Implemented Interfaces:
AutoCloseable
,ObjectStream<ChunkSample>
public class ADChunkSampleStream extends Object implements ObjectStream<ChunkSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).
Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.htmlInformation about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdfDetailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names
Note: Do not use this class, internal use only!
-
-
Constructor Summary
Constructors Constructor Description ADChunkSampleStream(InputStreamFactory in, String charsetName)
ADChunkSampleStream(ObjectStream<String> lineStream)
Creates a newNameSample
stream from a line stream, i.e.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes theObjectStream
and releases all allocated resources.static String
convertFuncTag(String t, boolean useCGTags)
ChunkSample
read()
Returns the next object.void
reset()
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.void
setEnd(int aEnd)
void
setStart(int aStart)
-
-
-
Field Detail
-
OTHER
public static final String OTHER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ADChunkSampleStream
public ADChunkSampleStream(ObjectStream<String> lineStream)
Creates a newNameSample
stream from a line stream, i.e.ObjectStream
<String
>, that could be aPlainTextByLineStream
object.- Parameters:
lineStream
- a stream of lines asString
-
ADChunkSampleStream
public ADChunkSampleStream(InputStreamFactory in, String charsetName) throws IOException
- Throws:
IOException
-
-
Method Detail
-
read
public ChunkSample read() throws IOException
Description copied from interface:ObjectStream
Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.- Specified by:
read
in interfaceObjectStream<ChunkSample>
- Returns:
- the next object or null to signal that the stream is exhausted
- Throws:
IOException
- if there is an error during reading
-
setStart
public void setStart(int aStart)
-
setEnd
public void setEnd(int aEnd)
-
reset
public void reset() throws IOException, UnsupportedOperationException
Description copied from interface:ObjectStream
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required. The implementation of this method is optional.- Specified by:
reset
in interfaceObjectStream<ChunkSample>
- Throws:
IOException
- if there is an error during reseting the streamUnsupportedOperationException
-
close
public void close() throws IOException
Description copied from interface:ObjectStream
Closes theObjectStream
and releases all allocated resources. After close was called its not allowed to call read or reset.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceObjectStream<ChunkSample>
- Throws:
IOException
- if there is an error during closing the stream
-
-