Class ADChunkSampleStream

java.lang.Object
opennlp.tools.formats.ad.ADChunkSampleStream
All Implemented Interfaces:
AutoCloseable, opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>

@Internal public class ADChunkSampleStream extends Object implements opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

Data can be found on this web site.

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.

Detailed info about the NER tagset.

Note: Do not use this class, internal use only!

  • Field Details

  • Constructor Details

    • ADChunkSampleStream

      public ADChunkSampleStream(opennlp.tools.util.ObjectStream<String> lineStream)
      Instantiates a ADChunkSampleStream stream from opennlp.tools.util.ObjectStream<String>, that could be a PlainTextByLineStream object.
      Parameters:
      lineStream - An opennlp.tools.util.ObjectStream<String> as input.
    • ADChunkSampleStream

      public ADChunkSampleStream(opennlp.tools.util.InputStreamFactory in, String charsetName) throws IOException
      Instantiates a ADChunkSampleStream stream from an InputStreamFactory.
      Parameters:
      in - The InputStreamFactory for the corpus.
      charsetName - The charset to use for reading of the corpus.
      Throws:
      IOException
  • Method Details

    • read

      public opennlp.tools.chunker.ChunkSample read() throws IOException
      Specified by:
      read in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
      Throws:
      IOException
    • convertFuncTag

      public static String convertFuncTag(String t, boolean useCGTags)
    • setStart

      public void setStart(int aStart)
    • setEnd

      public void setEnd(int aEnd)
    • reset

      public void reset() throws IOException, UnsupportedOperationException
      Specified by:
      reset in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
      Throws:
      IOException
      UnsupportedOperationException
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
      Throws:
      IOException