Class MascSentence

    • Constructor Detail

      • MascSentence

        public MascSentence​(int s,
                            int e,
                            String text,
                            List<MascWord> sentenceQuarks,
                            List<MascWord> allQuarks)
        Create a MascSentence, containing its associated text and quarks
        Parameters:
        s - Start of the sentence within the corpus file
        e - End of the sentence within the corpus file
        text - The reference to text of the corpus file
        sentenceQuarks - The quarks found in that sentence
        allQuarks - The reference to a list of all quarks in the file
    • Method Detail

      • getNamedEntities

        public List<Span> getNamedEntities()
        Get the named entities
        Returns:
        List of named entities defined as token span, e.g. Span(1,3, "org") for tokens [1,3)
      • getSentDetectText

        public String getSentDetectText()
        Get the sentence text
        Returns:
        Text of the sentence as defined by the sentence segmentation annotation.
      • getTokenText

        public String getTokenText()
        Get the text of the sentence tokens
        Returns:
        Text of the sentence as defined by the tokens in it.
      • getTokenStrings

        public List<String> getTokenStrings()
        Get the text of the sentence tokens
        Returns:
        The texts of the individual tokens in the sentence
      • getTokensSpans

        public List<Span> getTokensSpans()
        Get the boundaries of individual tokens
        Returns:
        Spans representing the tokens of the sentence (according to Penn tokenization)
      • getTags

        public List<String> getTags()
                             throws IOException
        Get the tags of tokens in the sentence
        Returns:
        A list of individual tags
        Throws:
        IOException - if used on an untokenized sentence