Package opennlp.tools.tokenize
Class BPETokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.BPETokenizerFactory
A
BaseToolFactory for BPE tokenization that manages
the BPE merge rules artifact and its serialization within a
BPEModel.
This factory is responsible for:
- Providing the
BPETokenizerFactory.BPEMergesSerializerthat reads and writes BPE merge rules as a text-based artifact (bpe.merges) inside the model ZIP package. - Supplying the merge rules to the
BPEModelviaBaseToolFactory.createArtifactMap(). - Validating that a loaded model contains valid merge rules.
This class is typically not used directly. It is
instantiated internally by BPETokenizerTrainer
during training and by BPEModel during model
loading.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionCreates aBPETokenizerFactory.BPETokenizerFactory(String langCode) Creates aBPETokenizerFactorywith the given language code. -
Method Summary
Modifier and TypeMethodDescriptionCreates aMapwith pairs of keys andArtifactSerializer.voidValidates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactMap
-
Constructor Details
-
BPETokenizerFactory
public BPETokenizerFactory()Creates aBPETokenizerFactory. Required empty constructor for model loading. -
BPETokenizerFactory
Creates aBPETokenizerFactorywith the given language code.- Parameters:
langCode- The ISO language code. Must not benull.- Throws:
IllegalArgumentException- iflangCodeisnull.
-
-
Method Details
-
createArtifactSerializersMap
Creates aMapwith pairs of keys andArtifactSerializer. The models implementation should call this method fromBaseModel#createArtifactSerializersMap.The base implementation will return a
HashMapthat should be populated by subclasses.- Overrides:
createArtifactSerializersMapin classBaseToolFactory
-
createManifestEntries
- Overrides:
createManifestEntriesin classBaseToolFactory- Returns:
- Retrieves the manifest entries to be added to the model manifest.
-
validateArtifactMap
public void validateArtifactMap() throws opennlp.tools.util.InvalidFormatExceptionValidates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMapat the beginning of this method.- Specified by:
validateArtifactMapin classBaseToolFactory- Throws:
opennlp.tools.util.InvalidFormatException- Thrown if validation found invalid states.
-
getLanguageCode
- Returns:
- The ISO language code for this factory.
-