Class Conll02NameSampleStream

java.lang.Object
opennlp.tools.formats.Conll02NameSampleStream
All Implemented Interfaces:
AutoCloseable, opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>

@Internal public class Conll02NameSampleStream extends Object implements opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Parser for the Dutch and Spanish ner training files of the CONLL 2002 shared task.

The Dutch data has a DOCSTART tag to mark article boundaries, adaptive data in the feature generators will be cleared before every article.
The Spanish data does not contain article boundaries, adaptive data will be cleared for every sentence.

The data contains four named entity types: Person, Organization, Location and Misc.

Data can be found on this web site.

Note: Do not use this class, internal use only!

  • Field Details

  • Constructor Details

    • Conll02NameSampleStream

      public Conll02NameSampleStream(Conll02NameSampleStream.LANGUAGE lang, opennlp.tools.util.ObjectStream<String> lineStream, int types)
      Parameters:
      lang - The language of the CONLL 02 data.
      lineStream - An opennlp.tools.util.ObjectStream<String> over the lines in the CONLL 02 data file.
      types - The entity types to include in the Name Sample object stream.
    • Conll02NameSampleStream

      public Conll02NameSampleStream(Conll02NameSampleStream.LANGUAGE lang, opennlp.tools.util.InputStreamFactory in, int types) throws IOException
      Parameters:
      lang - The language of the CONLL 02 data.
      in - The InputStreamFactory for the input file.
      types - The entity types to include in the Name Sample object stream.
      Throws:
      IOException - Thrown if IO errors occurred.
  • Method Details