The Apache OpenNLP team is pleased to announce the release of Apache OpenNLP 3.0.0-M3.
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
Apache OpenNLP 3.0.0-M3 binary and source distributions are available for download from our download page.
The OpenNLP library is distributed by Maven Central as well. See the Maven dependency page for more details.
This release focuses on security hardening, new NLP capabilities, and dependency maintenance.
Three security issues are addressed in this release (also backported to 2.5.9).
DictionaryEntryPersistor (OPENNLP-1819, CVE-2026-40682)The DictionaryEntryPersistor previously used a SAXParserFactory that did not enable secure processing or disable DTD handling, leaving external entity resolution active. A malicious dictionary file could exploit this for local file disclosure or SSRF before any dictionary entry was processed.
The parsing path is now aligned with the project’s existing XmlUtil helper, which properly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl.
ExtensionLoader (OPENNLP-1820, CVE-2026-42027)ExtensionLoader.instantiateExtension() performed its isAssignableFrom type check after Class.forName() had already executed the target class’s static initializer, allowing a crafted model archive to trigger the static initializer of any class on the classpath.
The fix introduces a package-prefix allowlist consulted before Class.forName() is invoked:
Classes under opennlp.* remain permitted by default.
Other packages must be opted in via ExtensionLoader.registerAllowedPackage(String) or the OPENNLP_EXT_ALLOWED_PACKAGES system property (comma-separated list).
AbstractModelReader (OPENNLP-1821, CVE-2026-42440)getOutcomes(), getOutcomePatterns(), and getPredicates() read attacker-controlled 32-bit count fields from binary model streams and passed them directly to array allocations. A crafted .bin file could trigger an immediate OutOfMemoryError and crash the JVM.
Each count is now bounded (default 10,000,000, configurable via -DOPENNLP_MAX_ENTRIES=<n>), with negative or oversized values failing fast via IllegalArgumentException.
|
Warning
|
For all three issues, users who cannot upgrade immediately should restrict input (dictionary and model files) to trusted sources only. |
Roberta-based model support via ONNX (OPENNLP-1518)
Byte Pair Encoding (BPE) tokenization (OPENNLP-1220)
Parse.createFromTokens() convenience method for tokenized input (OPENNLP-53)
Thread-safe ME classes by eliminating shared mutable instance state (OPENNLP-1816)
Update log4j2 to 2.25.4 (OPENNLP-1817)
Update zlibsvm-core to 3.0.0 (OPENNLP-1818)
Update ONNX runtime to 1.25.0 (OPENNLP-1822)
For further details, check the full list of changes via the project’s issue tracker.
--The Apache OpenNLP Team
01 May 2026