FAQ - Apache OpenNLP

Where can I download the pre-trained models used in OpenNLP?

Models for 36 languages are available at the project’s Models download page or bundled in JAR files distributed via Maven Central (Sentence-Detector, Tokenization, Lemmatizer, POS Tagging).
How to train a Named Entity Recognition (NER) model?

To train the name finder model you need training data that contains the entities you would like to detect. Have a look at our manual, in special the sections under the Name Finder Training API. At the beginning of that section you can see how the data has to be marked up. Please note you that you need many sentences to successfully train the name finder.
How can I speed up the training time of (MaxEnt) models?

By default, training runs will be executed single-threaded. Try tweaking the value of TrainingParameters.THREADS_PARAM. Make sure to set this parameter to match your target environment. A good starting point is to set this to the number of CPU cores available at runtime. Please note, however, that only the compute-intensive parts of the training will benefit by tweaking this parameter.
Will my models trained with a previous version of OpenNLP still work with a newer version?

You should expect it to work. The corpora used is normally the same. However, the behavior may change when we fix bugs or add new features. The test results in the project Wiki may contain useful information about model compatibility.
Is there a commercial license for OpenNLP?

OpenNLP is licensed under the business-friendly Apache software license. You can read its Wikipedia page for more information.
How can I start contributing to this project?

Have a look at our Getting Involved page. We have a list of issues needing help there, as well as instructions to get started contributing to OpenNLP. You may also consider making a donation to the Apache Software Foundation.