Class MascSentence

java.lang.Object
opennlp.tools.util.Span
opennlp.tools.formats.masc.MascSentence
All Implemented Interfaces:
Serializable, Comparable<Span>

public class MascSentence extends Span
See Also:
  • Constructor Details

    • MascSentence

      public MascSentence(int s, int e, String text, List<MascWord> sentenceQuarks, List<MascWord> allQuarks)
      Initializes a MascSentence containing its associated text and quarks
      Parameters:
      s - Start of the sentence within the corpus file
      e - End of the sentence within the corpus file
      text - The reference to text of the corpus file
      sentenceQuarks - The quarks found in that sentence
      allQuarks - The reference to a list of all quarks in the file
  • Method Details

    • getNamedEntities

      public List<Span> getNamedEntities()
      Returns:
      Retrieves the named entities, e.g. Span(1,3, "org") for tokens [1,3).
    • getSentDetectText

      public String getSentDetectText()
      Returns:
      Retrieves text of the sentence as defined by the sentence segmentation annotation.
    • getTokenText

      public String getTokenText()
      Returns:
      Retrieves text of the sentence as defined by the tokens in it.
    • getTokenStrings

      public List<String> getTokenStrings()
      Returns:
      The texts of the individual tokens in the sentence
    • getTokensSpans

      public List<Span> getTokensSpans()
      Retrieves the boundaries of individual tokens.
      Returns:
      The spans representing the tokens of the sentence, according to Penn tokenization.
    • getTags

      public List<String> getTags() throws IOException
      Returns:
      Get the (individual) tags of tokens in the sentence.
      Throws:
      IOException - Thrown if used on an un-tokenized sentence.