Package opennlp.tools.tokenize
Class BPEModel
java.lang.Object
opennlp.tools.util.model.BaseModel
opennlp.tools.tokenize.BPEModel
- All Implemented Interfaces:
Serializable,opennlp.tools.util.model.ArtifactProvider
The
BPEModel stores learned BPE merge operations and can be
serialized and deserialized for reuse.
A model is created by the BPETokenizerTrainer and contains an ordered
list of BPETokenizer.SymbolPair merge operations that define the BPE
vocabulary. The model is persisted as a standard OpenNLP ZIP package with a
bpe.merges artifact containing the merge rules.
Usage:
// Create via training
BPETokenizerTrainer trainer = new BPETokenizerTrainer();
BPEModel model = trainer.train(corpus, 10000, "en");
// Save to disk
model.serialize(Path.of("bpe-en.bin"));
// Load from disk
BPEModel loaded = new BPEModel(Path.of("bpe-en.bin"));
// Use for tokenization
BPETokenizer tokenizer = new BPETokenizer(loaded);
- See Also:
-
Field Summary
Fields inherited from class opennlp.tools.util.model.BaseModel
TRAINING_CUTOFF_PROPERTY, TRAINING_EVENTHASH_PROPERTY, TRAINING_ITERATIONS_PROPERTY -
Constructor Summary
ConstructorsConstructorDescriptionBPEModel(InputStream in) Initializes aBPEModelfrom anInputStream.BPEModel(List<BPETokenizer.SymbolPair> merges, Map<String, String> manifestInfoEntries, BPETokenizerFactory factory) Creates aBPEModelfrom trained merge rules. -
Method Summary
Modifier and TypeMethodDescriptionMethods inherited from class opennlp.tools.util.model.BaseModel
getArtifact, getLanguage, getManifestProperty, getVersion, isLoadedFromSerialized, serialize, serialize, serialize
-
Constructor Details
-
BPEModel
public BPEModel(List<BPETokenizer.SymbolPair> merges, Map<String, String> manifestInfoEntries, BPETokenizerFactory factory) Creates aBPEModelfrom trained merge rules.- Parameters:
merges- The ordered list of merge operations. Must not benull.manifestInfoEntries- Additional manifest info.factory- TheBPETokenizerFactory.
-
BPEModel
Initializes aBPEModelfrom anInputStream.- Parameters:
in- TheInputStreamfor loading the model.- Throws:
IOException- Thrown if IO errors occurred.
-
BPEModel
- Parameters:
modelFile- TheFilefor loading the model.- Throws:
IOException- Thrown if IO errors occurred.
-
BPEModel
- Parameters:
modelPath- ThePathfor loading the model.- Throws:
IOException- Thrown if IO errors occurred.
-
BPEModel
- Parameters:
modelURL- TheURLfor loading the model.- Throws:
IOException- Thrown if IO errors occurred.
-
-
Method Details
-
getFactory
- Returns:
- The active
BPETokenizerFactory.
-
getMerges
- Returns:
- An unmodifiable, ordered list of BPE merge operations stored in this model.
-