Class GermEval2014NameSampleStream

java.lang.Object
opennlp.tools.formats.GermEval2014NameSampleStream
All Implemented Interfaces:
AutoCloseable, opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>

@Internal public class GermEval2014NameSampleStream extends Object implements opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Parser for the GermEval 2014 Named Entity Recognition Shared Task data.

The data is in a tab-separated format with four columns:

  1. Token index (1-based per sentence)
  2. Token text
  3. Outer named entity tag (IOB2 scheme)
  4. Nested/embedded named entity tag (IOB2 scheme)
Comment lines starting with # mark document boundaries and contain source URL and date metadata. Blank lines separate sentences.

The data uses four main entity types: Person (PER), Location (LOC), Organization (ORG) and Other (OTH), with additional deriv and part suffixes for derived forms and name parts respectively.

Since NameSample does not support overlapping spans, this stream requires selecting either the outer or inner annotation layer via a GermEval2014NameSampleStream.NerLayer parameter.

Data can be found on this web site.

Note: Do not use this class, internal use only!