opennlp.tools.formats.ad
Class ADChunkSampleStream

java.lang.Object
  extended by opennlp.tools.formats.ad.ADChunkSampleStream
All Implemented Interfaces:
ObjectStream<ChunkSample>

public class ADChunkSampleStream
extends Object
implements ObjectStream<ChunkSample>

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

Note: Do not use this class, internal use only!


Field Summary
static String OTHER
           
 
Constructor Summary
ADChunkSampleStream(InputStream in, String charsetName)
          Creates a new NameSample stream from a InputStream
ADChunkSampleStream(ObjectStream<String> lineStream)
          Creates a new NameSample stream from a line stream, i.e.
 
Method Summary
 void close()
          Closes the ObjectStream and releases all allocated resources.
static String convertFuncTag(String t, boolean useCGTags)
           
 ChunkSample read()
          Returns the next object.
 void reset()
          Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
 void setEnd(int aEnd)
           
 void setStart(int aStart)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OTHER

public static final String OTHER
See Also:
Constant Field Values
Constructor Detail

ADChunkSampleStream

public ADChunkSampleStream(ObjectStream<String> lineStream)
Creates a new NameSample stream from a line stream, i.e. ObjectStream< String>, that could be a PlainTextByLineStream object.

Parameters:
lineStream - a stream of lines as String

ADChunkSampleStream

public ADChunkSampleStream(InputStream in,
                           String charsetName)
Creates a new NameSample stream from a InputStream

Parameters:
in - the Corpus InputStream
charsetName - the charset of the Arvores Deitadas Corpus
Method Detail

read

public ChunkSample read()
                 throws IOException
Description copied from interface: ObjectStream
Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.

Specified by:
read in interface ObjectStream<ChunkSample>
Returns:
the next object or null to signal that the stream is exhausted
Throws:
IOException

convertFuncTag

public static String convertFuncTag(String t,
                                    boolean useCGTags)

setStart

public void setStart(int aStart)

setEnd

public void setEnd(int aEnd)

reset

public void reset()
           throws IOException,
                  UnsupportedOperationException
Description copied from interface: ObjectStream
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required. The implementation of this method is optional.

Specified by:
reset in interface ObjectStream<ChunkSample>
Throws:
IOException
UnsupportedOperationException

close

public void close()
           throws IOException
Description copied from interface: ObjectStream
Closes the ObjectStream and releases all allocated resources. After close was called its not allowed to call read or reset.

Specified by:
close in interface ObjectStream<ChunkSample>
Throws:
IOException


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.