Package opennlp.tools.formats.ad
Klasse ADNameSampleStream
java.lang.Object
opennlp.tools.formats.ad.ADNameSampleStream
- Alle implementierten Schnittstellen:
AutoCloseable,ObjectStream<NameSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese NER training.
The data contains four named entity types: Person, Organization, Group,
Place, Event, ArtProd, Abstract, Thing, Time and Numeric.
Data can be found on this web site.
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.
Detailed info about the NER tagset.
Note: Do not use this class, internal use only!
-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) Veraltet, zur Entfernung: Dieses API-Element wird in einer zukünftigen Version entfernt.ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens) Initializes a newADNameSampleStreamstream from aObjectStream<String>, that could be aPlainTextByLineStreamobject. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungvoidclose()Closes theObjectStreamand releases all allocated resources.read()Returns the nextObjectStreamobject.voidreset()Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
-
Konstruktordetails
-
ADNameSampleStream
Initializes a newADNameSampleStreamstream from aObjectStream<String>, that could be aPlainTextByLineStreamobject.- Parameter:
lineStream- AnObjectStream<String>as input.splitHyphenatedTokens- Iftruehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
-
ADNameSampleStream
@Deprecated(forRemoval=true) public ADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) throws IOException Veraltet, zur Entfernung: Dieses API-Element wird in einer zukünftigen Version entfernt.Initializes a newADNameSampleStreamfrom anInputStreamFactory- Parameter:
in- The CorpusInputStreamFactory.charsetName- Thecharsetto use for reading of the corpus.splitHyphenatedTokens- Iftruehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".- Löst aus:
IOException
-
-
Methodendetails
-
read
Beschreibung aus Schnittstelle kopiert:ObjectStreamReturns the nextObjectStreamobject. Calling this method repeatedly until it returnsnullwill return each object from the underlying source exactly once.- Angegeben von:
readin SchnittstelleObjectStream<NameSample>- Gibt zurück:
- The next object or
nullto signal that the stream is exhausted. - Löst aus:
IOException- Thrown if there is an error during reading.
-
reset
Beschreibung aus Schnittstelle kopiert:ObjectStreamRepositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required.The implementation of this method is optional.
- Angegeben von:
resetin SchnittstelleObjectStream<NameSample>- Löst aus:
IOException- Thrown if there is an error during resetting the stream.UnsupportedOperationException- Thrown if thereset()is not supported. By default, this is the case.
-
close
Beschreibung aus Schnittstelle kopiert:ObjectStreamCloses theObjectStreamand releases all allocated resources. After close was called, it's not allowed to callObjectStream.read()orObjectStream.reset().- Angegeben von:
closein SchnittstelleAutoCloseable- Angegeben von:
closein SchnittstelleObjectStream<NameSample>- Löst aus:
IOException- Thrown if there is an error during closing the stream.
-