opennlp.tools.sentdetect
Class SentenceDetectorME

java.lang.Object
  extended by opennlp.tools.sentdetect.SentenceDetectorME
All Implemented Interfaces:
SentenceDetector

public class SentenceDetectorME
extends Object
implements SentenceDetector

A sentence detector for splitting up raw text into sentences.

A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.


Field Summary
static String NO_SPLIT
          Constant indicates no sentence split.
static String SPLIT
          Constant indicates a sentence split.
 
Constructor Summary
SentenceDetectorME(SentenceModel model)
          Initializes the current instance.
SentenceDetectorME(SentenceModel model, Factory factory)
          Deprecated. Use a SentenceDetectorFactory to extend SentenceDetector functionality.
 
Method Summary
 double[] getSentenceProbabilities()
          Returns the probabilities associated with the most recent calls to sentDetect().
 String[] sentDetect(String s)
          Detect sentences in a String.
 Span[] sentPosDetect(String s)
          Detect the position of the first words of sentences in a String.
static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations)
          Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.
static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations, int cutoff, int iterations)
          Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.
static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations, TrainingParameters mlParams)
          Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.
static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, SentenceDetectorFactory sdFactory, TrainingParameters mlParams)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SPLIT

public static final String SPLIT
Constant indicates a sentence split.

See Also:
Constant Field Values

NO_SPLIT

public static final String NO_SPLIT
Constant indicates no sentence split.

See Also:
Constant Field Values
Constructor Detail

SentenceDetectorME

public SentenceDetectorME(SentenceModel model)
Initializes the current instance.

Parameters:
model - the SentenceModel

SentenceDetectorME

public SentenceDetectorME(SentenceModel model,
                          Factory factory)
Deprecated. Use a SentenceDetectorFactory to extend SentenceDetector functionality.

Method Detail

sentDetect

public String[] sentDetect(String s)
Detect sentences in a String.

Specified by:
sentDetect in interface SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A string array containing individual sentences as elements.

sentPosDetect

public Span[] sentPosDetect(String s)
Detect the position of the first words of sentences in a String.

Specified by:
sentPosDetect in interface SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A integer array containing the positions of the end index of every sentence

getSentenceProbabilities

public double[] getSentenceProbabilities()
Returns the probabilities associated with the most recent calls to sentDetect().

Returns:
probability for each sentence returned for the most recent call to sentDetect. If not applicable an empty array is returned.

train

public static SentenceModel train(String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations,
                                  TrainingParameters mlParams)
                           throws IOException
Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.

Throws:
IOException

train

public static SentenceModel train(String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  SentenceDetectorFactory sdFactory,
                                  TrainingParameters mlParams)
                           throws IOException
Throws:
IOException

train

@Deprecated
public static SentenceModel train(String languageCode,
                                             ObjectStream<SentenceSample> samples,
                                             boolean useTokenEnd,
                                             Dictionary abbreviations,
                                             int cutoff,
                                             int iterations)
                           throws IOException
Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.

Throws:
IOException

train

public static SentenceModel train(String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations)
                           throws IOException
Deprecated. Use train(String, ObjectStream, SentenceDetectorFactory, TrainingParameters) and pass in af SentenceDetectorFactory.

Throws:
IOException


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.