Package opennlp.tools.formats.masc
Class MascDocument
java.lang.Object
opennlp.tools.formats.masc.MascDocument
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionbooleanChecks whether there is NER by GATE-5.0 ANNIE.booleanChecks whether there is Penn tagging produced by GATE-5.0 ANNIE.static MascDocumentparseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) Initializes aMascDocumentwith all the stand-off annotations translated into the internal structure.read()voidreset()Resets the reading of sentences to the beginning of the document.
- 
Constructor Details- 
MascDocument
 
- 
- 
Method Details- 
parseDocumentpublic static MascDocument parseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) throws IOException Initializes aMascDocumentwith all the stand-off annotations translated into the internal structure.- Parameters:
- path- The path where the document header is.
- f_primary- The- filewith the raw corpus text.
- f_seg- The- filewith segmentation into quarks.
- f_penn- The- filewith tokenization and Penn POS tags produced by GATE-5.0 ANNIE application.
- f_s- The- filewith sentence boundaries.
- f_ne- The- filewith named entities.
- Returns:
- A document containing the text and its annotations. Immutability is not guaranteed yet.
- Throws:
- IOException- if the raw data cannot be read or the alignment of the raw data with annotations fails
 
- 
hasPennTagspublic boolean hasPennTags()Checks whether there is Penn tagging produced by GATE-5.0 ANNIE.- Returns:
- trueif this file has aligned tags/tokens,- falseotherwise.
 
- 
hasNamedEntitiespublic boolean hasNamedEntities()Checks whether there is NER by GATE-5.0 ANNIE.- Returns:
- trueif this file has named entities,- falseotherwise.
 
- 
read- Returns:
- Retrieves the next sentence or nullif end of document reached.
 
- 
resetpublic void reset()Resets the reading of sentences to the beginning of the document.
 
-