The Apache OpenNLP team is pleased to announce the release of version 1.7.0 of Apache OpenNLP.
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
The OpenNLP 1.7.0 binary and source distributions are available for download from our download page: https://opennlp.apache.org/cgi-bin/download.cgi
The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: https://opennlp.apache.org/maven-dependency.html
This release introduces many new features, improvements and bug fixes. The API has been improved for a better consistency and deprecated methods were removed. Now Java 1.8 and Maven 3.3.9 are required.
Additionally, the release contains the following noteworthy changes:
OpenNLP is up to 50% faster at analyzing content
A lot of deprecated code has been removed
Code base has been cleaned up
There is a new brat annotation service
Documentation was improved and extended
A Naive Bayesian Classifier implementation was added
Morfologik addon is now included
Added a language model component
Added a CLI to the lemmatizer component.
Added a supervised statistical lemmatizer.
The lemmatizer component API has been entirely rewritten. The changes in the previously existing Dictionary-based lemmatizer are not backward compatible.
A detailed list of the issues related to this release can be found in the release notes.
For a complete list of fixed bugs and improvements please see the RELEASE_NOTES file included in the distribution.
--The Apache OpenNLP Team
31 December 2016