Class ADChunkSampleStream

  • All Implemented Interfaces:
    AutoCloseable, ObjectStream<ChunkSample>

    public class ADChunkSampleStream
    extends Object
    implements ObjectStream<ChunkSample>
    Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

    The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

    Data can be found on this web site:
    http://www.linguateca.pt/floresta/corpus.html

    Information about the format:
    Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
    12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

    Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

    Note: Do not use this class, internal use only!