Klasse DictionaryLemmatizer
- Alle implementierten Schnittstellen:
Lemmatizer
Lemmatizer
implementation that works by simple dictionary lookup into
a Map
built from a file containing, for each line:
word\tabpostag\tablemma
.
-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungDictionaryLemmatizer
(File dictionaryFile) Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.DictionaryLemmatizer
(File dictionaryFile, Charset charset) Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.DictionaryLemmatizer
(InputStream dictionaryStream) Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.DictionaryLemmatizer
(InputStream dictionaryStream, Charset charset) Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.DictionaryLemmatizer
(Path dictionaryPath) Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary. -
Methodenübersicht
-
Konstruktordetails
-
DictionaryLemmatizer
Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.The input file should have, for each line,
word\tabpostag\tablemma
. Alternatively, if multiple lemmas are possible for each word-postag pair, then the format should beword\tab\postag\tablemma01#lemma02#lemma03
.- Parameter:
dictionaryStream
- The dictionary referenced by an openInputStream
.charset
- Thecharacter encoding
of the dictionary.- Löst aus:
IOException
- Thrown if IO errors occurred while reading in fromdictionaryStream
.
-
DictionaryLemmatizer
Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.The input file should have, for each line,
word\tabpostag\tablemma
. Alternatively, if multiple lemmas are possible for each word-postag pair, then the format should beword\tab\postag\tablemma01#lemma02#lemma03
.- Parameter:
dictionaryStream
- The dictionary referenced by an openInputStream
.- Löst aus:
IOException
- Thrown if IO errors occurred while reading in fromdictionaryStream
.
-
DictionaryLemmatizer
Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.The input file should have, for each line,
word\tabpostag\tablemma
. Alternatively, if multiple lemmas are possible for each word-postag pair, then the format should beword\tab\postag\tablemma01#lemma02#lemma03
.- Parameter:
dictionaryFile
- The dictionary referenced by a valid, readableFile
.- Löst aus:
IOException
- Thrown if IO errors occurred while reading in fromdictionaryFile
.
-
DictionaryLemmatizer
Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.The input file should have, for each line,
word\tabpostag\tablemma
. Alternatively, if multiple lemmas are possible for each word-postag pair, then the format should beword\tab\postag\tablemma01#lemma02#lemma03
.- Parameter:
dictionaryFile
- The dictionary referenced by a valid, readableFile
.charset
- Thecharacter encoding
of the dictionary.- Löst aus:
IOException
- Thrown if IO errors occurred while reading in fromdictionaryFile
.
-
DictionaryLemmatizer
Initializes aDictionaryLemmatizer
and relatedHashMap
from the input tab separated dictionary.The input file should have, for each line,
word\tabpostag\tablemma
. Alternatively, if multiple lemmas are possible for each word-postag pair, then the format should beword\tab\postag\tablemma01#lemma02#lemma03
.- Parameter:
dictionaryPath
- The dictionary referenced via a valid, readablePath
.- Löst aus:
IOException
- Thrown if IO errors occurred while reading in fromdictionaryPath
.
-
-
Methodendetails
-
getDictMap
- Gibt zurück:
- Retrieves the
Map
containing the dictionary.
-
lemmatize
Beschreibung aus Schnittstelle kopiert:Lemmatizer
Generates lemmas for the word and postag.- Angegeben von:
lemmatize
in SchnittstelleLemmatizer
- Parameter:
tokens
- An array of the tokenspostags
- an array of the pos tags- Gibt zurück:
- An array of possible lemmas for each token in the
toks
sequence.
-
lemmatize
Beschreibung aus Schnittstelle kopiert:Lemmatizer
Generates lemma tags for the word and postag.- Angegeben von:
lemmatize
in SchnittstelleLemmatizer
- Parameter:
tokens
- An array of the tokensposTags
- An array of the pos tags- Gibt zurück:
- A list of every possible lemma for each token in the
toks
sequence.
-