Package opennlp.tools.formats.ad
Klasse ADNameSampleStream
java.lang.Object
opennlp.tools.formats.ad.ADNameSampleStream
- Alle implementierten Schnittstellen:
AutoCloseable
,ObjectStream<NameSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese NER training.
The data contains four named entity types: Person, Organization, Group,
Place, Event, ArtProd, Abstract, Thing, Time and Numeric.
Data can be found on this web site.
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.
Detailed info about the NER tagset.
Note: Do not use this class, internal use only!
-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungADNameSampleStream
(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) Veraltet, zur Entfernung: Dieses API-Element wird in einer zukünftigen Version entfernt.ADNameSampleStream
(ObjectStream<String> lineStream, boolean splitHyphenatedTokens) Initializes a newADNameSampleStream
stream from aObjectStream<String>
, that could be aPlainTextByLineStream
object. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungvoid
close()
Closes theObjectStream
and releases all allocated resources.read()
Returns the nextObjectStream
object.void
reset()
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
-
Konstruktordetails
-
ADNameSampleStream
Initializes a newADNameSampleStream
stream from aObjectStream<String>
, that could be aPlainTextByLineStream
object.- Parameter:
lineStream
- AnObjectStream<String>
as input.splitHyphenatedTokens
- Iftrue
hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
-
ADNameSampleStream
@Deprecated(forRemoval=true) public ADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) throws IOException Veraltet, zur Entfernung: Dieses API-Element wird in einer zukünftigen Version entfernt.Initializes a newADNameSampleStream
from anInputStreamFactory
- Parameter:
in
- The CorpusInputStreamFactory
.charsetName
- Thecharset
to use for reading of the corpus.splitHyphenatedTokens
- Iftrue
hyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".- Löst aus:
IOException
-
-
Methodendetails
-
read
Beschreibung aus Schnittstelle kopiert:ObjectStream
Returns the nextObjectStream
object. Calling this method repeatedly until it returnsnull
will return each object from the underlying source exactly once.- Angegeben von:
read
in SchnittstelleObjectStream<NameSample>
- Gibt zurück:
- The next object or
null
to signal that the stream is exhausted. - Löst aus:
IOException
- Thrown if there is an error during reading.
-
reset
Beschreibung aus Schnittstelle kopiert:ObjectStream
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required.The implementation of this method is optional.
- Angegeben von:
reset
in SchnittstelleObjectStream<NameSample>
- Löst aus:
IOException
- Thrown if there is an error during resetting the stream.UnsupportedOperationException
- Thrown if thereset()
is not supported. By default, this is the case.
-
close
Beschreibung aus Schnittstelle kopiert:ObjectStream
Closes theObjectStream
and releases all allocated resources. After close was called, it's not allowed to callObjectStream.read()
orObjectStream.reset()
.- Angegeben von:
close
in SchnittstelleAutoCloseable
- Angegeben von:
close
in SchnittstelleObjectStream<NameSample>
- Löst aus:
IOException
- Thrown if there is an error during closing the stream.
-