The Apache OpenNLP project provides several pre-trained model files:
1 generic model to conduct language detection on a specified text input
23 language specific models to accomplish: sentence detection, part of speech tagging, and tokenization.
Pre-trained models can be used for testing or getting started. Train your own models for specific use cases.
Important
|
All models are zip compressed (like a jar file), they must not be uncompressed. |
Use the URLs in the tables below to download the pre-trained models for use with the Apache OpenNLP toolkit.
Language | Description | Compatibility | README and Reports | File | Signatures |
---|---|---|---|---|---|
Detects 103 languages |
Detects 103 languages in ISO 693-3 standard. Works well with longer texts that have at least 2 sentences or more from the same language. |
>= 1.8.3 |
Note
|
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below. |
ISO code | Language | Trained with OpenNLP | UD version | File | Signatures |
---|---|---|---|---|---|
bg |
Bulgarian |
2.4.0 |
2.14 |
||
cs |
Czech |
2.4.0 |
2.14 |
||
da |
Danish |
2.4.0 |
2.14 |
||
de |
German |
2.4.0 |
2.14 |
||
en |
English |
2.4.0 |
2.14 |
||
es |
Spanish |
2.4.0 |
2.14 |
||
et |
Estonian |
2.4.0 |
2.14 |
||
fi |
Finnish |
2.4.0 |
2.14 |
||
fr |
French |
2.4.0 |
2.14 |
||
hr |
Croatian |
2.4.0 |
2.14 |
||
it |
Italian |
2.4.0 |
2.14 |
||
lv |
Latvian |
2.4.0 |
2.14 |
||
nl |
Dutch |
2.4.0 |
2.14 |
||
no |
Norwegian |
2.4.0 |
2.14 |
||
pl |
Polish |
2.4.0 |
2.14 |
||
pt |
Portuguese |
2.4.0 |
2.14 |
||
ro |
Romanian |
2.4.0 |
2.14 |
||
ru |
Russian |
2.4.0 |
2.14 |
||
sk |
Slovak |
2.4.0 |
2.14 |
||
sl |
Slovenian |
2.4.0 |
2.14 |
||
sr |
Serbian |
2.4.0 |
2.14 |
||
sv |
Swedish |
2.4.0 |
2.14 |
||
uk |
Ukrainian |
2.4.0 |
2.14 |
Note
|
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below. |
ISO code | Language | Trained with OpenNLP | UD version | File | Signatures |
---|---|---|---|---|---|
bg |
Bulgarian |
2.4.0 |
2.14 |
||
cs |
Czech |
2.4.0 |
2.14 |
||
da |
Danish |
2.4.0 |
2.14 |
||
de |
German |
2.4.0 |
2.14 |
||
en |
English |
2.4.0 |
2.14 |
||
es |
Spanish |
2.4.0 |
2.14 |
||
et |
Estonian |
2.4.0 |
2.14 |
||
fi |
Finnish |
2.4.0 |
2.14 |
||
fr |
French |
2.4.0 |
2.14 |
||
hr |
Croatian |
2.4.0 |
2.14 |
||
it |
Italian |
2.4.0 |
2.14 |
||
lv |
Latvian |
2.4.0 |
2.14 |
||
nl |
Dutch |
2.4.0 |
2.14 |
||
no |
Norwegian |
2.4.0 |
2.14 |
||
pl |
Polish |
2.4.0 |
2.14 |
||
pt |
Portuguese |
2.4.0 |
2.14 |
||
ro |
Romanian |
2.4.0 |
2.14 |
||
ru |
Russian |
2.4.0 |
2.14 |
||
sk |
Slovak |
2.4.0 |
2.14 |
||
sl |
Slovenian |
2.4.0 |
2.14 |
||
sr |
Serbian |
2.4.0 |
2.14 |
||
sv |
Swedish |
2.4.0 |
2.14 |
||
uk |
Ukrainian |
2.4.0 |
2.14 |
Note
|
All models below are compatible with OpenNLP versions >= 1.0.0. The README and evaluation logs refer to every language listed below. |
ISO code | Language | Trained with OpenNLP | UD version | File | Signatures |
---|---|---|---|---|---|
bg |
Bulgarian |
2.4.0 |
2.14 |
||
cs |
Czech |
2.4.0 |
2.14 |
||
da |
Danish |
2.4.0 |
2.14 |
||
de |
German |
2.4.0 |
2.14 |
||
en |
English |
2.4.0 |
2.14 |
||
es |
Spanish |
2.4.0 |
2.14 |
||
et |
Estonian |
2.4.0 |
2.14 |
||
fi |
Finnish |
2.4.0 |
2.14 |
||
fr |
French |
2.4.0 |
2.14 |
||
hr |
Croatian |
2.4.0 |
2.14 |
||
it |
Italian |
2.4.0 |
2.14 |
||
lv |
Latvian |
2.4.0 |
2.14 |
||
nl |
Dutch |
2.4.0 |
2.14 |
||
no |
Norwegian |
2.4.0 |
2.14 |
||
pl |
Polish |
2.4.0 |
2.14 |
||
pt |
Portuguese |
2.4.0 |
2.14 |
||
ro |
Romanian |
2.4.0 |
2.14 |
||
ru |
Russian |
2.4.0 |
2.14 |
||
sk |
Slovak |
2.4.0 |
2.14 |
||
sl |
Slovenian |
2.4.0 |
2.14 |
||
sr |
Serbian |
2.4.0 |
2.14 |
||
sv |
Swedish |
2.4.0 |
2.14 |
||
uk |
Ukrainian |
2.4.0 |
2.14 |
The sha512, sha1, md5, and asc files are signature files and can be used to verify the integrity of the downloaded distribution package.
Use the following commands to verify the integrity:
gpg --print-md SHA512 fileName.tar.gz
gpg --print-md SHA1 fileName.tar.gz
gpg --print-md MD5 fileName.zip
gpg --verify fileName.tar.gz.asc
It might be necessary to import the KEYS file to verify the integrity of the asc files.
That can easily be done with:
gpg --import KEYS
More information about release signing and verifying signatures can be found here.
The models on Sourceforge for 1.5.0 are found here. They are fully compatible with Apache OpenNLP 2.5.0.