Class MascSentence

    • Constructor Detail

      • MascSentence

        public MascSentence​(int s,
                            int e,
                            String text,
                            List<MascWord> sentenceQuarks,
                            List<MascWord> allQuarks)
        Initializes a MascSentence containing its associated text and quarks
        Parameters:
        s - Start of the sentence within the corpus file
        e - End of the sentence within the corpus file
        text - The reference to text of the corpus file
        sentenceQuarks - The quarks found in that sentence
        allQuarks - The reference to a list of all quarks in the file
    • Method Detail

      • getNamedEntities

        public List<Span> getNamedEntities()
        Returns:
        Retrieves the named entities, e.g. Span(1,3, "org") for tokens [1,3).
      • getSentDetectText

        public String getSentDetectText()
        Returns:
        Retrieves text of the sentence as defined by the sentence segmentation annotation.
      • getTokenText

        public String getTokenText()
        Returns:
        Retrieves text of the sentence as defined by the tokens in it.
      • getTokenStrings

        public List<String> getTokenStrings()
        Returns:
        The texts of the individual tokens in the sentence
      • getTokensSpans

        public List<Span> getTokensSpans()
        Retrieves the boundaries of individual tokens.
        Returns:
        The spans representing the tokens of the sentence, according to Penn tokenization.
      • getTags

        public List<String> getTags()
                             throws IOException
        Returns:
        Get the (individual) tags of tokens in the sentence.
        Throws:
        IOException - Thrown if used on an un-tokenized sentence.