Class ADChunkSampleStream

  • All Implemented Interfaces:
    AutoCloseable, ObjectStream<ChunkSample>

    public class ADChunkSampleStream
    extends Object
    implements ObjectStream<ChunkSample>
    Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

    The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

    Data can be found on this web site:

    Information about the format:
    Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
    12 de Fevereiro de 2006.

    Detailed info about the NER tagset:

    Note: Do not use this class, internal use only!