Fork me on GitHub

Models Download

Use the links in the table below to download the pre-trained models for the Apache OpenNLP.

Important
All models are zip compressed (like a jar file), they must not be uncompressed.
Component Language Compatibility Description README and Reports File Signatures

Language Detector

Detects 103 languages

>= 1.8.3

Detects 103 languages in ISO 693-3 standard. Works well with longer texts that have at least 2 sentences or more from the same language.

README Effectiveness Misclassified

langdetect-183.bin

md5 sha1 asc

Sentence

fr

>= 1.0.0

Sentence detection model for French

README Evaluation Logs

opennlp-1.0-1.9.3fr-ud-ftb-sentence-1.0-1.9.3.bin

sha512 asc

Sentence

de

>= 1.0.0

Sentence detection model for German

README Evaluation Logs

opennlp-de-ud-gsd-sentence-1.0-1.9.3.bin

sha512 asc

Sentence

en

>= 1.0.0

Sentence detection model for English

README Evaluation Logs

opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin

sha512 asc

Sentence

it

>= 1.0.0

Sentence detection model for Italian

README Evaluation Logs

opennlp-it-ud-vit-sentence-1.0-1.9.3.bin

sha512 asc

Sentence

nl

>= 1.0.0

Sentence detection model for Dutch

README Evaluation Logs

opennlp-nl-ud-alpino-sentence-1.0-1.9.3.bin

sha512 asc

Parts of Speech

de

>= 1.0.0

Parts of speech model for German

README Evaluation Logs

opennlp-de-ud-gsd-pos-1.0-1.9.3.bin

sha512 asc

Parts of Speech

en

>= 1.0.0

Parts of speech model for English

README Evaluation Logs

opennlp-en-ud-ewt-pos-1.0-1.9.3.bin

sha512 asc

Parts of Speech

fr

>= 1.0.0

Parts of speech model for French

README Evaluation Logs

opennlp-fr-ud-ftb-pos-1.0-1.9.3.bin

md5 sha512 asc

Parts of Speech

it

>= 1.0.0

Parts of speech model for Italian

README Evaluation Logs

opennlp-it-ud-vit-pos-1.0-1.9.3.bin

sha512 asc

Parts of Speech

nl

>= 1.0.0

Parts of speech model for Dutch

README Evaluation Logs

opennlp-nl-ud-alpino-pos-1.0-1.9.3.bin

sha512 asc

Tokens

de

>= 1.0.0

Tokenizer model for German

README Evaluation Logs

opennlp-de-ud-gsd-tokens-1.0-1.9.3.bin

sha512 asc

Tokens

en

>= 1.0.0

Tokenizer model for English

README Evaluation Logs

opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin

sha512 asc

Tokens

fr

>= 1.0.0

Tokenizer model for French

README Evaluation Logs

opennlp-fr-ud-ftb-tokens-1.0-1.9.3.bin

sha512 asc

Tokens

it

>= 1.0.0

Tokenizer model for Italian

README Evaluation Logs

opennlp-it-ud-vit-tokens-1.0-1.9.3.bin

sha512 asc

Tokens

nl

>= 1.0.0

Tokenizer model for Dutch

README Evaluation Logs

opennlp-nl-ud-alpino-tokens-1.0-1.9.3.bin

sha512 asc

Verifying Signatures

The md5, sha1, sha512, and asc files are signature files and can be used to verify the integrity of the downloaded distribution package.

Use the following commands to verify the integrity:

  • gpg --print-md MD5 fileName.zip

  • gpg --print-md SHA1 fileName.tar.gz

  • gpg --verify fileName.tar.gz.asc

It might be necessary to import the KEYS file to verify the integrity of the asc files.

That can easily be done with:

  • gpg --import KEYS

More information about release signing and verifying signatures can be found here.

SourceForge Models

The models on Sourceforge for 1.5.0 are found here. and are fully compatible with Apache OpenNLP 2.3.2.

The models can be used for testing or getting started. Please train your own models for all other use cases.