public class LanguageDetectorME extends Object implements LanguageDetector
This will process the entire string when called with
predictLanguage(CharSequence)
or
predictLanguages(CharSequence)
.
If you want this to stop early, use probingPredictLanguages(CharSequence)
or probingPredictLanguages(CharSequence, LanguageDetectorConfig)
.
When run in probing mode, this starts at the beginning of the charsequence
and runs language detection on chunks of text. If the end of the
string is reached or there are LanguageDetectorConfig.getMinConsecImprovements()
consecutive predictions for the best language and the confidence
increases over those last predictions and if the difference
in confidence between the highest confidence language
and the second highest confidence language is greater than
LanguageDetectorConfig.getMinDiff()
, the language detector will
stop and report the results.
The authors wish to thank Ken Krugler and Yalder} for the inspiration for many of the design components of this detector.
Constructor and Description |
---|
LanguageDetectorME(LanguageDetectorModel model)
Initializes the current instance with a language detector model.
|
Modifier and Type | Method and Description |
---|---|
String[] |
getSupportedLanguages() |
Language |
predictLanguage(CharSequence content)
This will process the full content length.
|
Language[] |
predictLanguages(CharSequence content)
This will process the full content length.
|
ProbingLanguageDetectionResult |
probingPredictLanguages(CharSequence content)
This will stop processing early if the stopping criteria
specified in
LanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met. |
ProbingLanguageDetectionResult |
probingPredictLanguages(CharSequence content,
LanguageDetectorConfig config)
This will stop processing early if the stopping criteria
specified in
LanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met. |
static LanguageDetectorModel |
train(ObjectStream<LanguageSample> samples,
TrainingParameters mlParams,
LanguageDetectorFactory factory) |
public LanguageDetectorME(LanguageDetectorModel model)
model
- the language detector modelpublic Language[] predictLanguages(CharSequence content)
predictLanguages
in interface LanguageDetector
content
- public Language predictLanguage(CharSequence content)
predictLanguage
in interface LanguageDetector
content
- public String[] getSupportedLanguages()
getSupportedLanguages
in interface LanguageDetector
public ProbingLanguageDetectionResult probingPredictLanguages(CharSequence content)
LanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.content
- content to be processedpublic ProbingLanguageDetectionResult probingPredictLanguages(CharSequence content, LanguageDetectorConfig config)
LanguageDetectorConfig.DEFAULT_LANGUAGE_DETECTOR_CONFIG
are met.content
- content to processconfig
- config to customize detectionpublic static LanguageDetectorModel train(ObjectStream<LanguageSample> samples, TrainingParameters mlParams, LanguageDetectorFactory factory) throws IOException
IOException
Copyright © 2020 The Apache Software Foundation. All rights reserved.