Class ADChunkSampleStream

java.lang.Object
opennlp.tools.formats.ad.ADChunkSampleStream
All Implemented Interfaces:
AutoCloseable, ObjectStream<ChunkSample>

@Internal public class ADChunkSampleStream extends Object implements ObjectStream<ChunkSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

Data can be found on this web site.

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.

Detailed info about the NER tagset.

Note: Do not use this class, internal use only!