Package opennlp.tools.formats.masc
Class MascSentence
- java.lang.Object
-
- opennlp.tools.util.Span
-
- opennlp.tools.formats.masc.MascSentence
-
- All Implemented Interfaces:
Serializable
,Comparable<Span>
public class MascSentence extends Span
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<Span>
getNamedEntities()
Get the named entitiesString
getSentDetectText()
Get the sentence textList<String>
getTags()
Get the tags of tokens in the sentenceList<Span>
getTokensSpans()
Get the boundaries of individual tokensList<String>
getTokenStrings()
Get the text of the sentence tokensString
getTokenText()
Get the text of the sentence tokens-
Methods inherited from class opennlp.tools.util.Span
compareTo, contains, contains, crosses, equals, getCoveredText, getEnd, getProb, getStart, getType, hashCode, intersects, length, spansToStrings, spansToStrings, startsWith, toString, trim
-
-
-
-
Constructor Detail
-
MascSentence
public MascSentence(int s, int e, String text, List<MascWord> sentenceQuarks, List<MascWord> allQuarks)
Create a MascSentence, containing its associated text and quarks- Parameters:
s
- Start of the sentence within the corpus filee
- End of the sentence within the corpus filetext
- The reference to text of the corpus filesentenceQuarks
- The quarks found in that sentenceallQuarks
- The reference to a list of all quarks in the file
-
-
Method Detail
-
getNamedEntities
public List<Span> getNamedEntities()
Get the named entities- Returns:
- List of named entities defined as token span, e.g. Span(1,3, "org") for tokens [1,3)
-
getSentDetectText
public String getSentDetectText()
Get the sentence text- Returns:
- Text of the sentence as defined by the sentence segmentation annotation.
-
getTokenText
public String getTokenText()
Get the text of the sentence tokens- Returns:
- Text of the sentence as defined by the tokens in it.
-
getTokenStrings
public List<String> getTokenStrings()
Get the text of the sentence tokens- Returns:
- The texts of the individual tokens in the sentence
-
getTokensSpans
public List<Span> getTokensSpans()
Get the boundaries of individual tokens- Returns:
- Spans representing the tokens of the sentence (according to Penn tokenization)
-
getTags
public List<String> getTags() throws IOException
Get the tags of tokens in the sentence- Returns:
- A list of individual tags
- Throws:
IOException
- if used on an untokenized sentence
-
-