Package opennlp.tools.formats.masc
Class MascDocument
- java.lang.Object
- 
- opennlp.tools.formats.masc.MascDocument
 
- 
 public class MascDocument extends Object 
- 
- 
Constructor SummaryConstructors Constructor Description MascDocument(String path, List<MascSentence> sentences)
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanhasNamedEntities()booleanhasPennTags()Check whether there is Penn tagging produced by GATE-5.0 ANNIEstatic MascDocumentparseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne)Creates a MASC document with all of the stand-off annotations translated into the internal structure.MascSentenceread()Get next sentence.voidreset()Return the reading of sentences to the beginning of the document.
 
- 
- 
- 
Constructor Detail- 
MascDocumentpublic MascDocument(String path, List<MascSentence> sentences) 
 
- 
 - 
Method Detail- 
parseDocumentpublic static MascDocument parseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) throws IOException Creates a MASC document with all of the stand-off annotations translated into the internal structure.- Parameters:
- path- The path where the document header is.
- f_primary- The file with the raw corpus text.
- f_seg- The file with segmentation into quarks.
- f_ne- The file with named entities.
- f_penn- The file with tokenization and Penn POS tags produced by GATE-5.0 ANNIE application.
- f_s- The file with sentence boundaries.
- Returns:
- A document containing the text and its annotations. Immutability is not guaranteed yet.
- Throws:
- IOException- if the raw data cannot be read or the alignment of the raw data with annotations fails
 
 - 
hasPennTagspublic boolean hasPennTags() Check whether there is Penn tagging produced by GATE-5.0 ANNIE- Returns:
- true if this file has aligned tags/tokens
 
 - 
hasNamedEntitiespublic boolean hasNamedEntities() 
 - 
readpublic MascSentence read() Get next sentence.- Returns:
- Next sentence or null if end of document reached.
 
 - 
resetpublic void reset() Return the reading of sentences to the beginning of the document.
 
- 
 
-