opennlp.tools.formats.ad
Class ADNameSampleStream

java.lang.Object
  extended by opennlp.tools.formats.ad.ADNameSampleStream
All Implemented Interfaces:
ObjectStream<NameSample>

public class ADNameSampleStream
extends Object
implements ObjectStream<NameSample>

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.

The data contains four named entity types: Person, Organization, Group, Place, Event, ArtProd, Abstract, Thing, Time and Numeric.

Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

Note: Do not use this class, internal use only!


Constructor Summary
ADNameSampleStream(InputStream in, String charsetName, boolean splitHyphenatedTokens)
          Creates a new NameSample stream from a InputStream
ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens)
          Creates a new NameSample stream from a line stream, i.e.
 
Method Summary
 void close()
          Closes the ObjectStream and releases all allocated resources.
 NameSample read()
          Returns the next object.
 void reset()
          Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ADNameSampleStream

public ADNameSampleStream(ObjectStream<String> lineStream,
                          boolean splitHyphenatedTokens)
Creates a new NameSample stream from a line stream, i.e. ObjectStream< String>, that could be a PlainTextByLineStream object.

Parameters:
lineStream - a stream of lines as String
splitHyphenatedTokens - if true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro"

ADNameSampleStream

public ADNameSampleStream(InputStream in,
                          String charsetName,
                          boolean splitHyphenatedTokens)
Creates a new NameSample stream from a InputStream

Parameters:
in - the Corpus InputStream
charsetName - the charset of the Arvores Deitadas Corpus
splitHyphenatedTokens - if true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro"
Method Detail

read

public NameSample read()
                throws IOException
Description copied from interface: ObjectStream
Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.

Specified by:
read in interface ObjectStream<NameSample>
Returns:
the next object or null to signal that the stream is exhausted
Throws:
IOException

reset

public void reset()
           throws IOException,
                  UnsupportedOperationException
Description copied from interface: ObjectStream
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required. The implementation of this method is optional.

Specified by:
reset in interface ObjectStream<NameSample>
Throws:
IOException
UnsupportedOperationException

close

public void close()
           throws IOException
Description copied from interface: ObjectStream
Closes the ObjectStream and releases all allocated resources. After close was called its not allowed to call read or reset.

Specified by:
close in interface ObjectStream<NameSample>
Throws:
IOException


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.