Class ADNameSampleStream

  • All Implemented Interfaces:
    AutoCloseable, ObjectStream<NameSample>

    public class ADNameSampleStream
    extends Object
    implements ObjectStream<NameSample>
    Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.

    The data contains four named entity types: Person, Organization, Group, Place, Event, ArtProd, Abstract, Thing, Time and Numeric.

    Data can be found on this web site:
    http://www.linguateca.pt/floresta/corpus.html

    Information about the format:
    Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
    12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

    Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

    Note: Do not use this class, internal use only!

    • Constructor Detail

      • ADNameSampleStream

        public ADNameSampleStream​(ObjectStream<String> lineStream,
                                  boolean splitHyphenatedTokens)
        Creates a new NameSample stream from a line stream, i.e. ObjectStream<String>, that could be a PlainTextByLineStream object.
        Parameters:
        lineStream - a stream of lines as String
        splitHyphenatedTokens - if true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro"
      • ADNameSampleStream

        @Deprecated
        public ADNameSampleStream​(InputStreamFactory in,
                                  String charsetName,
                                  boolean splitHyphenatedTokens)
                           throws IOException
        Deprecated.
        Creates a new NameSample stream from a InputStream
        Parameters:
        in - the Corpus InputStream
        charsetName - the charset of the Arvores Deitadas Corpus
        splitHyphenatedTokens - if true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro"
        Throws:
        IOException
    • Method Detail

      • read

        public NameSample read()
                        throws IOException
        Description copied from interface: ObjectStream
        Returns the next object. Calling this method repeatedly until it returns null will return each object from the underlying source exactly once.
        Specified by:
        read in interface ObjectStream<NameSample>
        Returns:
        the next object or null to signal that the stream is exhausted
        Throws:
        IOException - if there is an error during reading
      • close

        public void close()
                   throws IOException
        Description copied from interface: ObjectStream
        Closes the ObjectStream and releases all allocated resources. After close was called its not allowed to call read or reset.
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface ObjectStream<NameSample>
        Throws:
        IOException - if there is an error during closing the stream