Package opennlp.tools.dictionary
Class Dictionary
java.lang.Object
opennlp.tools.dictionary.Dictionary
- All Implemented Interfaces:
Iterable<StringList>
,SerializableArtifact
An iterable and serializable dictionary implementation.
- See Also:
-
Constructor Summary
ConstructorDescriptionInitializes an emptyDictionary
.Dictionary
(boolean caseSensitive) Initializes an emptyDictionary
.Initializes theDictionary
from an existing dictionary resource. -
Method Summary
Modifier and TypeMethodDescriptionConverts thisDictionary
to aSet<String>
.boolean
contains
(StringList tokens) Checks if this dictionary has the given entry.boolean
Class<?>
Retrieves the class which can serialize and recreate this artifact.int
int
int
hashCode()
boolean
iterator()
static Dictionary
Reads aDictionary
which has one entry per line.void
put
(StringList tokens) Adds the tokens to the dictionary as one new entry.void
remove
(StringList tokens) Removes the given tokens form the current instance.void
serialize
(OutputStream out) Writes the current instance to the givenOutputStream
.int
size()
toString()
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
Dictionary
public Dictionary()Initializes an emptyDictionary
. By default, the resulting instance will not be case-sensitive. -
Dictionary
public Dictionary(boolean caseSensitive) Initializes an emptyDictionary
.- Parameters:
caseSensitive
- Whether the new instance will operate case-sensitive, or not.
-
Dictionary
Initializes theDictionary
from an existing dictionary resource.- Parameters:
in
- TheInputStream
that references the dictionary content.- Throws:
IOException
- Thrown if IO errors occurred.
-
-
Method Details
-
put
Adds the tokens to the dictionary as one new entry.- Parameters:
tokens
- the new entry
-
getMinTokenCount
public int getMinTokenCount() -
getMaxTokenCount
public int getMaxTokenCount() -
contains
Checks if this dictionary has the given entry.- Parameters:
tokens
- The query of tokens to be checked for.- Returns:
true
if it contains the entry,false
otherwise.
-
remove
Removes the given tokens form the current instance.- Parameters:
tokens
- The tokens to be filtered out (= removed).
-
iterator
- Specified by:
iterator
in interfaceIterable<StringList>
- Returns:
- Retrieves a token-
Iterator
over all elements.
-
size
public int size()- Returns:
- Retrieves the number of tokens in the current instance.
-
serialize
Writes the current instance to the givenOutputStream
.- Parameters:
out
- A validOutputStream
, ready for serialization.- Throws:
IOException
- Thrown if IO errors occurred.
-
equals
-
hashCode
public int hashCode() -
toString
-
parseOneEntryPerLine
Reads aDictionary
which has one entry per line. The tokens inside an entry are whitespace delimited.- Parameters:
in
- AReader
instance used to parse the dictionary from.- Returns:
- The parsed
Dictionary
instance; guaranteed to be non-null
. - Throws:
IOException
- Thrown if IO errors occurred during read and parse operations.
-
asStringSet
Converts thisDictionary
to aSet<String>
.Note: Only
AbstractCollection.iterator()
,AbstractCollection.size()
andAbstractCollection.contains(Object)
methods are implemented.If this dictionary entries are multi tokens only the first token of the entry will be part of the
Set
.- Returns:
- A
Set
containing all entries of thisDictionary
.
-
getArtifactSerializerClass
Description copied from interface:SerializableArtifact
Retrieves the class which can serialize and recreate this artifact.Note: The serializer class must have a
public zero argument constructor
or an exception is thrown during model serialization/loading.- Specified by:
getArtifactSerializerClass
in interfaceSerializableArtifact
- Returns:
- Retrieves the serializer class for
Dictionary
- See Also:
-
isCaseSensitive
public boolean isCaseSensitive()- Returns:
true
, if thisDictionary
is case-sensitive.
-