Fork me on GitHub

Apache OpenNLP 1.7.0 released

The Apache OpenNLP team is pleased to announce the release of version 1.7.0 of Apache OpenNLP.

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

The OpenNLP 1.7.0 binary and source distributions are available for download from our download page: https://opennlp.apache.org/cgi-bin/download.cgi

The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: https://opennlp.apache.org/maven-dependency.html

What is new in Apache OpenNLP 1.7.0

This release introduces many new features, improvements and bug fixes. The API has been improved for a better consistency and deprecated methods were removed. Now Java 1.8 and Maven 3.3.9 are required.

Additionally, the release contains the following noteworthy changes:

  • OpenNLP is up to 50% faster at analyzing content

  • A lot of deprecated code has been removed

  • Code base has been cleaned up

  • There is a new brat annotation service

  • Documentation was improved and extended

  • A Naive Bayesian Classifier implementation was added

  • Morfologik addon is now included

  • Added a language model component

  • Added a CLI to the lemmatizer component.

  • Added a supervised statistical lemmatizer.

  • The lemmatizer component API has been entirely rewritten. The changes in the previously existing Dictionary-based lemmatizer are not backward compatible.

A detailed list of the issues related to this release can be found in the release notes.

For a complete list of fixed bugs and improvements please see the RELEASE_NOTES file included in the distribution.

--The Apache OpenNLP Team

31 December 2016