Package opennlp.tools.lemmatizer
Class DictionaryLemmatizer
- java.lang.Object
-
- opennlp.tools.lemmatizer.DictionaryLemmatizer
-
- All Implemented Interfaces:
Lemmatizer
public class DictionaryLemmatizer extends Object implements Lemmatizer
Lemmatize by simple dictionary lookup into a hashmap built from a file containing, for each line, word\tabpostag\tablemma.- Version:
- 2014-07-08
-
-
Constructor Summary
Constructors Constructor Description DictionaryLemmatizer(File dictionaryFile)
DictionaryLemmatizer(File dictionaryFile, Charset charset)
DictionaryLemmatizer(InputStream dictionary)
DictionaryLemmatizer(InputStream dictionary, Charset charset)
Construct a hashmap from the input tab separated dictionary.DictionaryLemmatizer(Path dictionaryFile)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<List<String>,List<String>>
getDictMap()
Get the Map containing the dictionary.String[]
lemmatize(String[] tokens, String[] postags)
Generates lemmas for the word and postag returning the result in an array.List<List<String>>
lemmatize(List<String> tokens, List<String> posTags)
Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.
-
-
-
Constructor Detail
-
DictionaryLemmatizer
public DictionaryLemmatizer(InputStream dictionary, Charset charset) throws IOException
Construct a hashmap from the input tab separated dictionary. The input file should have, for each line, word\tabpostag\tablemma. Alternatively, if multiple lemmas are possible for each word,postag pair, then the format should be word\tab\postag\tablemma01#lemma02#lemma03- Parameters:
dictionary
- the input dictionary via inputstreamcharset
- the encoding of the inputstream- Throws:
IOException
-
DictionaryLemmatizer
public DictionaryLemmatizer(InputStream dictionary) throws IOException
- Throws:
IOException
-
DictionaryLemmatizer
public DictionaryLemmatizer(File dictionaryFile) throws IOException
- Throws:
IOException
-
DictionaryLemmatizer
public DictionaryLemmatizer(File dictionaryFile, Charset charset) throws IOException
- Throws:
IOException
-
DictionaryLemmatizer
public DictionaryLemmatizer(Path dictionaryFile) throws IOException
- Throws:
IOException
-
-
Method Detail
-
getDictMap
public Map<List<String>,List<String>> getDictMap()
Get the Map containing the dictionary.- Returns:
- dictMap the Map
-
lemmatize
public String[] lemmatize(String[] tokens, String[] postags)
Description copied from interface:Lemmatizer
Generates lemmas for the word and postag returning the result in an array.- Specified by:
lemmatize
in interfaceLemmatizer
- Parameters:
tokens
- an array of the tokenspostags
- an array of the pos tags- Returns:
- an array of possible lemmas for each token in the sequence.
-
lemmatize
public List<List<String>> lemmatize(List<String> tokens, List<String> posTags)
Description copied from interface:Lemmatizer
Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.- Specified by:
lemmatize
in interfaceLemmatizer
- Parameters:
tokens
- an array of the tokensposTags
- an array of the pos tags- Returns:
- a list of every possible lemma for each token in the sequence.
-
-