Class LanguageDetectorME
- java.lang.Object
-
- opennlp.tools.langdetect.LanguageDetectorME
-
- All Implemented Interfaces:
Serializable
,LanguageDetector
public class LanguageDetectorME extends Object implements LanguageDetector
Implements a learnableLanguageDetector
.This will process the entire string when called with
predictLanguage(CharSequence)
orpredictLanguages(CharSequence)
.If you want this to stop early, use
probingPredictLanguages(CharSequence)
orprobingPredictLanguages(CharSequence, LanguageDetectorConfig)
. When run in probing mode, this starts at the beginning of the char sequence and runs language detection on chunks of text. If the end of the string is reached or there areLanguageDetectorConfig.getMinConsecImprovements()
consecutive predictions for the best language and the confidence increases over those last predictions and if the difference in confidence between the highest confidence language and the second highest confidence language is greater thanLanguageDetectorConfig.getMinDiff()
, the language detector will stop and report the results.The authors wish to thank Ken Krugler and Yalder} for the inspiration for many of the design components of this detector.
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description LanguageDetectorME(LanguageDetectorModel model)
Initializes an instance with a specificLanguageDetectorModel
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String[]
getSupportedLanguages()
Language
predictLanguage(CharSequence content)
Predicts theLanguage
for the fullcontent
length.Language[]
predictLanguages(CharSequence content)
Predicts thelanguages
for the fullcontent
length.ProbingLanguageDetectionResult
probingPredictLanguages(CharSequence content)
This will stop processing early if the stopping criteria specified inLanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.ProbingLanguageDetectionResult
probingPredictLanguages(CharSequence content, LanguageDetectorConfig config)
This will stop processing early if the stopping criteria specified inLanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.static LanguageDetectorModel
train(ObjectStream<LanguageSample> samples, TrainingParameters mlParams, LanguageDetectorFactory factory)
Starts a training of aLanguageDetectorModel
with the given parameters.
-
-
-
Constructor Detail
-
LanguageDetectorME
public LanguageDetectorME(LanguageDetectorModel model)
Initializes an instance with a specificLanguageDetectorModel
. Default feature generation is used.- Parameters:
model
- theLanguageDetectorModel
to be used.
-
-
Method Detail
-
predictLanguages
public Language[] predictLanguages(CharSequence content)
Description copied from interface:LanguageDetector
Predicts thelanguages
for the fullcontent
length.- Specified by:
predictLanguages
in interfaceLanguageDetector
- Parameters:
content
- The textual content to detect potentiallanguages
from.- Returns:
- the predicted languages
-
predictLanguage
public Language predictLanguage(CharSequence content)
Description copied from interface:LanguageDetector
Predicts theLanguage
for the fullcontent
length.- Specified by:
predictLanguage
in interfaceLanguageDetector
- Parameters:
content
- The textual content to detect potentiallanguages
from.- Returns:
- the language with the highest confidence
-
getSupportedLanguages
public String[] getSupportedLanguages()
- Specified by:
getSupportedLanguages
in interfaceLanguageDetector
- Returns:
- Retrieves an array of language (codes) that are supported by a
LanguageDetector
.
-
probingPredictLanguages
public ProbingLanguageDetectionResult probingPredictLanguages(CharSequence content)
This will stop processing early if the stopping criteria specified inLanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.- Parameters:
content
- content to be processed- Returns:
- A computed
ProbingLanguageDetectionResult
.
-
probingPredictLanguages
public ProbingLanguageDetectionResult probingPredictLanguages(CharSequence content, LanguageDetectorConfig config)
This will stop processing early if the stopping criteria specified inLanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.- Parameters:
content
- The textual content to process.config
- TheLanguageDetectorConfig
to customize detection.- Returns:
- A computed
ProbingLanguageDetectionResult
.
-
train
public static LanguageDetectorModel train(ObjectStream<LanguageSample> samples, TrainingParameters mlParams, LanguageDetectorFactory factory) throws IOException
Starts a training of aLanguageDetectorModel
with the given parameters.- Parameters:
samples
- TheObjectStream
ofLanguageSample
used as input for training.mlParams
- TheTrainingParameters
for the context of the training.factory
- TheLanguageDetectorFactory
for creating related objects defined viamlParams
.- Returns:
- A valid, trained
LanguageDetectorModel
instance. - Throws:
IOException
- Thrown if IO errors occurred.
-
-