Class ADNameSampleStream

java.lang.Object
opennlp.tools.formats.ad.ADNameSampleStream
All Implemented Interfaces:
AutoCloseable, opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>

@Internal public class ADNameSampleStream extends Object implements opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.

The data contains four named entity types: Person, Organization, Group, Place, Event, ArtProd, Abstract, Thing, Time and Numeric.

Data can be found on this web site.

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.

Detailed info about the NER tagset.

Note: Do not use this class, internal use only!

  • Constructor Summary

    Constructors
    Constructor
    Description
    ADNameSampleStream(opennlp.tools.util.InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens)
    Deprecated, for removal: This API element is subject to removal in a future version.
    ADNameSampleStream(opennlp.tools.util.ObjectStream<String> lineStream, boolean splitHyphenatedTokens)
    Initializes a new ADNameSampleStream stream from a opennlp.tools.util.ObjectStream<String>, that could be a PlainTextByLineStream object.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    opennlp.tools.namefind.NameSample
     
    void
     

    Methods inherited from class Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ADNameSampleStream

      public ADNameSampleStream(opennlp.tools.util.ObjectStream<String> lineStream, boolean splitHyphenatedTokens)
      Initializes a new ADNameSampleStream stream from a opennlp.tools.util.ObjectStream<String>, that could be a PlainTextByLineStream object.
      Parameters:
      lineStream - An opennlp.tools.util.ObjectStream<String> as input.
      splitHyphenatedTokens - If true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
    • ADNameSampleStream

      @Deprecated(forRemoval=true) public ADNameSampleStream(opennlp.tools.util.InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) throws IOException
      Deprecated, for removal: This API element is subject to removal in a future version.
      Initializes a new ADNameSampleStream from an InputStreamFactory
      Parameters:
      in - The Corpus InputStreamFactory.
      charsetName - The charset to use for reading of the corpus.
      splitHyphenatedTokens - If true hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
      Throws:
      IOException
  • Method Details