Fork me on GitHub

Models Download

The Apache OpenNLP project provides several pre-trained model files:

  • 1 generic model to conduct language detection on a specified text input

  • 23 language specific models to accomplish: sentence detection, part of speech tagging, and tokenization.

Pre-trained models can be used for testing or getting started. Train your own models for specific use cases.

Models

Important
All models are zip compressed (like a jar file), they must not be uncompressed.

Use the URLs in the tables below to download the pre-trained models for use with the Apache OpenNLP toolkit.

Language detection

Language Description Compatibility README and Reports File Signatures

Detects 103 languages

Detects 103 languages in ISO 693-3 standard. Works well with longer texts that have at least 2 sentences or more from the same language.

>= 1.8.3

README Effectiveness Misclassified

langdetect-183.bin

md5 sha1 asc

Sentence detection

Note
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below.
ISO code Language Trained with OpenNLP UD version File Signatures

bg

Bulgarian

2.4.0

2.14

opennlp-bg-ud-btb-sentence-1.1-2.4.0.bin

sha512 asc

cs

Czech

2.4.0

2.14

opennlp-cs-ud-pdt-sentence-1.1-2.4.0.bin

sha512 asc

da

Danish

2.4.0

2.14

opennlp-da-ud-ddt-sentence-1.1-2.4.0.bin

sha512 asc

de

German

2.4.0

2.14

opennlp-de-ud-gsd-sentence-1.1-2.4.0.bin

sha512 asc

en

English

2.4.0

2.14

opennlp-en-ud-ewt-sentence-1.1-2.4.0.bin

sha512 asc

es

Spanish

2.4.0

2.14

opennlp-es-ud-gsd-sentence-1.1-2.4.0.bin

sha512 asc

et

Estonian

2.4.0

2.14

opennlp-et-ud-edt-sentence-1.1-2.4.0.bin

sha512 asc

fi

Finnish

2.4.0

2.14

opennlp-fi-ud-tdt-sentence-1.1-2.4.0.bin

sha512 asc

fr

French

2.4.0

2.14

opennlp-fr-ud-gsd-sentence-1.1-2.4.0.bin

sha512 asc

hr

Croatian

2.4.0

2.14

opennlp-hr-ud-set-sentence-1.1-2.4.0.bin

sha512 asc

it

Italian

2.4.0

2.14

opennlp-it-ud-vit-sentence-1.1-2.4.0.bin

sha512 asc

lv

Latvian

2.4.0

2.14

opennlp-lv-ud-lvtb-sentence-1.1-2.4.0.bin

sha512 asc

nl

Dutch

2.4.0

2.14

opennlp-nl-ud-alpino-sentence-1.1-2.4.0.bin

sha512 asc

no

Norwegian

2.4.0

2.14

opennlp-no-ud-bokmaal-sentence-1.1-2.4.0.bin

sha512 asc

pl

Polish

2.4.0

2.14

opennlp-pl-ud-pdb-sentence-1.1-2.4.0.bin

sha512 asc

pt

Portuguese

2.4.0

2.14

opennlp-pt-ud-gsd-sentence-1.1-2.4.0.bin

sha512 asc

ro

Romanian

2.4.0

2.14

opennlp-ro-ud-rrt-sentence-1.1-2.4.0.bin

sha512 asc

ru

Russian

2.4.0

2.14

opennlp-ru-ud-gsd-sentence-1.1-2.4.0.bin

sha512 asc

sk

Slovak

2.4.0

2.14

opennlp-sk-ud-snk-sentence-1.1-2.4.0.bin

sha512 asc

sl

Slovenian

2.4.0

2.14

opennlp-sl-ud-ssj-sentence-1.1-2.4.0.bin

sha512 asc

sr

Serbian

2.4.0

2.14

opennlp-sr-ud-set-sentence-1.1-2.4.0.bin

sha512 asc

sv

Swedish

2.4.0

2.14

opennlp-sv-ud-talbanken-sentence-1.1-2.4.0.bin

sha512 asc

uk

Ukrainian

2.4.0

2.14

opennlp-uk-ud-iu-sentence-1.1-2.4.0.bin

sha512 asc

Part of Speech Tagging

Note
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below.
ISO code Language Trained with OpenNLP UD version File Signatures

bg

Bulgarian

2.4.0

2.14

opennlp-bg-ud-btb-pos-1.1-2.4.0.bin

sha512 asc

cs

Czech

2.4.0

2.14

opennlp-cs-ud-pdt-pos-1.1-2.4.0.bin

sha512 asc

da

Danish

2.4.0

2.14

opennlp-da-ud-ddt-pos-1.1-2.4.0.bin

sha512 asc

de

German

2.4.0

2.14

opennlp-de-ud-gsd-pos-1.1-2.4.0.bin

sha512 asc

en

English

2.4.0

2.14

opennlp-en-ud-ewt-pos-1.1-2.4.0.bin

sha512 asc

es

Spanish

2.4.0

2.14

opennlp-es-ud-gsd-pos-1.1-2.4.0.bin

sha512 asc

et

Estonian

2.4.0

2.14

opennlp-et-ud-edt-pos-1.1-2.4.0.bin

sha512 asc

fi

Finnish

2.4.0

2.14

opennlp-fi-ud-tdt-pos-1.1-2.4.0.bin

sha512 asc

fr

French

2.4.0

2.14

opennlp-fr-ud-gsd-pos-1.1-2.4.0.bin

sha512 asc

hr

Croatian

2.4.0

2.14

opennlp-hr-ud-set-pos-1.1-2.4.0.bin

sha512 asc

it

Italian

2.4.0

2.14

opennlp-it-ud-vit-pos-1.1-2.4.0.bin

sha512 asc

lv

Latvian

2.4.0

2.14

opennlp-lv-ud-lvtb-pos-1.1-2.4.0.bin

sha512 asc

nl

Dutch

2.4.0

2.14

opennlp-nl-ud-alpino-pos-1.1-2.4.0.bin

sha512 asc

no

Norwegian

2.4.0

2.14

opennlp-no-ud-bokmaal-pos-1.1-2.4.0.bin

sha512 asc

pl

Polish

2.4.0

2.14

opennlp-pl-ud-pdb-pos-1.1-2.4.0.bin

sha512 asc

pt

Portuguese

2.4.0

2.14

opennlp-pt-ud-gsd-pos-1.1-2.4.0.bin

sha512 asc

ro

Romanian

2.4.0

2.14

opennlp-ro-ud-rrt-pos-1.1-2.4.0.bin

sha512 asc

ru

Russian

2.4.0

2.14

opennlp-ru-ud-gsd-pos-1.1-2.4.0.bin

sha512 asc

sk

Slovak

2.4.0

2.14

opennlp-sk-ud-snk-pos-1.1-2.4.0.bin

sha512 asc

sl

Slovenian

2.4.0

2.14

opennlp-sl-ud-ssj-pos-1.1-2.4.0.bin

sha512 asc

sr

Serbian

2.4.0

2.14

opennlp-sr-ud-set-pos-1.1-2.4.0.bin

sha512 asc

sv

Swedish

2.4.0

2.14

opennlp-sv-ud-talbanken-pos-1.1-2.4.0.bin

sha512 asc

uk

Ukrainian

2.4.0

2.14

opennlp-uk-ud-iu-pos-1.1-2.4.0.bin

sha512 asc

Tokenization

Note
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below.
ISO code Language Trained with OpenNLP UD version File Signatures

bg

Bulgarian

2.4.0

2.14

opennlp-bg-ud-btb-tokens-1.1-2.4.0.bin

sha512 asc

cs

Czech

2.4.0

2.14

opennlp-cs-ud-pdt-tokens-1.1-2.4.0.bin

sha512 asc

da

Danish

2.4.0

2.14

opennlp-da-ud-ddt-tokens-1.1-2.4.0.bin

sha512 asc

de

German

2.4.0

2.14

opennlp-de-ud-gsd-tokens-1.1-2.4.0.bin

sha512 asc

en

English

2.4.0

2.14

opennlp-en-ud-ewt-tokens-1.1-2.4.0.bin

sha512 asc

es

Spanish

2.4.0

2.14

opennlp-es-ud-gsd-tokens-1.1-2.4.0.bin

sha512 asc

et

Estonian

2.4.0

2.14

opennlp-et-ud-edt-tokens-1.1-2.4.0.bin

sha512 asc

fi

Finnish

2.4.0

2.14

opennlp-fi-ud-tdt-tokens-1.1-2.4.0.bin

sha512 asc

fr

French

2.4.0

2.14

opennlp-fr-ud-gsd-tokens-1.1-2.4.0.bin

sha512 asc

hr

Croatian

2.4.0

2.14

opennlp-hr-ud-set-tokens-1.1-2.4.0.bin

sha512 asc

it

Italian

2.4.0

2.14

opennlp-it-ud-vit-tokens-1.1-2.4.0.bin

sha512 asc

lv

Latvian

2.4.0

2.14

opennlp-lv-ud-lvtb-tokens-1.1-2.4.0.bin

sha512 asc

nl

Dutch

2.4.0

2.14

opennlp-nl-ud-alpino-tokens-1.1-2.4.0.bin

sha512 asc

no

Norwegian

2.4.0

2.14

opennlp-no-ud-bokmaal-tokens-1.1-2.4.0.bin

sha512 asc

pl

Polish

2.4.0

2.14

opennlp-pl-ud-pdb-tokens-1.1-2.4.0.bin

sha512 asc

pt

Portuguese

2.4.0

2.14

opennlp-pt-ud-gsd-tokens-1.1-2.4.0.bin

sha512 asc

ro

Romanian

2.4.0

2.14

opennlp-ro-ud-rrt-tokens-1.1-2.4.0.bin

sha512 asc

ru

Russian

2.4.0

2.14

opennlp-ru-ud-gsd-tokens-1.1-2.4.0.bin

sha512 asc

sk

Slovak

2.4.0

2.14

opennlp-sk-ud-snk-tokens-1.1-2.4.0.bin

sha512 asc

sl

Slovenian

2.4.0

2.14

opennlp-sl-ud-ssj-tokens-1.1-2.4.0.bin

sha512 asc

sr

Serbian

2.4.0

2.14

opennlp-sr-ud-set-tokens-1.1-2.4.0.bin

sha512 asc

sv

Swedish

2.4.0

2.14

opennlp-sv-ud-talbanken-tokens-1.1-2.4.0.bin

sha512 asc

uk

Ukrainian

2.4.0

2.14

opennlp-uk-ud-iu-tokens-1.1-2.4.0.bin

sha512 asc

Verifying Signatures

The sha512, sha1, md5, and asc files are signature files and can be used to verify the integrity of the downloaded distribution package.

Use the following commands to verify the integrity:

  • gpg --print-md SHA512 fileName.tar.gz

  • gpg --print-md SHA1 fileName.tar.gz

  • gpg --print-md MD5 fileName.zip

  • gpg --verify fileName.tar.gz.asc

It might be necessary to import the KEYS file to verify the integrity of the asc files.

That can easily be done with:

  • gpg --import KEYS

More information about release signing and verifying signatures can be found here.

SourceForge Resources

The models on Sourceforge for 1.5.0 are found here. They are fully compatible with Apache OpenNLP 2.5.0.