Package opennlp.tools.formats.masc
Class MascDocument
- java.lang.Object
-
- opennlp.tools.formats.masc.MascDocument
-
public class MascDocument extends Object
-
-
Constructor Summary
Constructors Constructor Description MascDocument(String path, List<MascSentence> sentences)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
hasNamedEntities()
boolean
hasPennTags()
Check whether there is Penn tagging produced by GATE-5.0 ANNIEstatic MascDocument
parseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne)
Creates a MASC document with all of the stand-off annotations translated into the internal structure.MascSentence
read()
Get next sentence.void
reset()
Return the reading of sentences to the beginning of the document.
-
-
-
Constructor Detail
-
MascDocument
public MascDocument(String path, List<MascSentence> sentences)
-
-
Method Detail
-
parseDocument
public static MascDocument parseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) throws IOException
Creates a MASC document with all of the stand-off annotations translated into the internal structure.- Parameters:
path
- The path where the document header is.f_primary
- The file with the raw corpus text.f_seg
- The file with segmentation into quarks.f_ne
- The file with named entities.f_penn
- The file with tokenization and Penn POS tags produced by GATE-5.0 ANNIE application.f_s
- The file with sentence boundaries.- Returns:
- A document containing the text and its annotations. Immutability is not guaranteed yet.
- Throws:
IOException
- if the raw data cannot be read or the alignment of the raw data with annotations fails
-
hasPennTags
public boolean hasPennTags()
Check whether there is Penn tagging produced by GATE-5.0 ANNIE- Returns:
- true if this file has aligned tags/tokens
-
hasNamedEntities
public boolean hasNamedEntities()
-
read
public MascSentence read()
Get next sentence.- Returns:
- Next sentence or null if end of document reached.
-
reset
public void reset()
Return the reading of sentences to the beginning of the document.
-
-