opennlp.tools.namefind
Class NameFinderME

java.lang.Object
  extended by opennlp.tools.namefind.NameFinderME
All Implemented Interfaces:
TokenNameFinder

public class NameFinderME
extends Object
implements TokenNameFinder

Class for creating a maximum-entropy-based name finder.


Field Summary
static String CONTINUE
           
static int DEFAULT_BEAM_SIZE
           
static String OTHER
           
static String START
           
 
Constructor Summary
NameFinderME(opennlp.model.MaxentModel mod)
          Deprecated. Use the new model API!
NameFinderME(opennlp.model.MaxentModel mod, NameContextGenerator cg)
          Deprecated. 
NameFinderME(opennlp.model.MaxentModel mod, NameContextGenerator cg, int beamSize)
          Deprecated. 
NameFinderME(TokenNameFinderModel model)
           
NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator generator, int beamSize)
           
NameFinderME(TokenNameFinderModel model, AdaptiveFeatureGenerator generator, int beamSize, SequenceValidator<String> sequenceValidator)
          Initializes the name finder with the specified model.
NameFinderME(TokenNameFinderModel model, int beamSize)
           
 
Method Summary
 void clearAdaptiveData()
          Forgets all adaptive data which was collected during previous calls to one of the find methods.
static Span[] dropOverlappingSpans(Span[] spans)
          Removes spans with are intersecting or crossing in anyway.
 Span[] find(String[] tokens)
          Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.
 Span[] find(String[] tokens, String[][] additionalContext)
          Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.
 double[] probs()
          Returns an array with the probabilities of the last decoded sequence.
 void probs(double[] probs)
          Populates the specified array with the probabilities of the last decoded sequence.
 double[] probs(Span[] spans)
          Returns an array of probabilities for each of the specified spans which is the arithmetic mean of the probabilities for each of the outcomes which make up the span.
static opennlp.maxent.GISModel train(opennlp.model.EventStream es, int iterations, int cut)
          Deprecated. 
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, AdaptiveFeatureGenerator generator, Map<String,Object> resources, int iterations, int cutoff)
          Trains a name finder model.
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, byte[] generatorDescriptor, Map<String,Object> resources, int iterations, int cutoff)
          Deprecated. use train(String, String, ObjectStream, TrainingParameters, byte[], Map) instead and pass in a TrainingParameters object.
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, Map<String,Object> resources)
           
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, Map<String,Object> resources, int iterations, int cutoff)
          Deprecated. use train(String, String, ObjectStream, TrainingParameters, AdaptiveFeatureGenerator, Map) instead and pass in a TrainingParameters object.
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, TrainingParameters trainParams, AdaptiveFeatureGenerator generator, Map<String,Object> resources)
          Trains a name finder model.
static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, TrainingParameters trainParams, byte[] featureGeneratorBytes, Map<String,Object> resources)
          Trains a name finder model.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BEAM_SIZE

public static final int DEFAULT_BEAM_SIZE
See Also:
Constant Field Values

START

public static final String START
See Also:
Constant Field Values

CONTINUE

public static final String CONTINUE
See Also:
Constant Field Values

OTHER

public static final String OTHER
See Also:
Constant Field Values
Constructor Detail

NameFinderME

public NameFinderME(TokenNameFinderModel model)

NameFinderME

public NameFinderME(TokenNameFinderModel model,
                    AdaptiveFeatureGenerator generator,
                    int beamSize,
                    SequenceValidator<String> sequenceValidator)
Initializes the name finder with the specified model.

Parameters:
model -
beamSize -

NameFinderME

public NameFinderME(TokenNameFinderModel model,
                    AdaptiveFeatureGenerator generator,
                    int beamSize)

NameFinderME

public NameFinderME(TokenNameFinderModel model,
                    int beamSize)

NameFinderME

@Deprecated
public NameFinderME(opennlp.model.MaxentModel mod)
Deprecated. Use the new model API!

Creates a new name finder with the specified model.

Parameters:
mod - The model to be used to find names.

NameFinderME

@Deprecated
public NameFinderME(opennlp.model.MaxentModel mod,
                               NameContextGenerator cg)
Deprecated. 

Creates a new name finder with the specified model and context generator.

Parameters:
mod - The model to be used to find names.
cg - The context generator to be used with this name finder.

NameFinderME

@Deprecated
public NameFinderME(opennlp.model.MaxentModel mod,
                               NameContextGenerator cg,
                               int beamSize)
Deprecated. 

Creates a new name finder with the specified model and context generator.

Parameters:
mod - The model to be used to find names.
cg - The context generator to be used with this name finder.
beamSize - The size of the beam to be used in decoding this model.
Method Detail

find

public Span[] find(String[] tokens)
Description copied from interface: TokenNameFinder
Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.

Specified by:
find in interface TokenNameFinder
Parameters:
tokens - an array of the tokens or words of the sequence, typically a sentence.
Returns:
an array of spans for each of the names identified.

find

public Span[] find(String[] tokens,
                   String[][] additionalContext)
Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.

Parameters:
tokens - an array of the tokens or words of the sequence, typically a sentence.
additionalContext - features which are based on context outside of the sentence but which should also be used.
Returns:
an array of spans for each of the names identified.

clearAdaptiveData

public void clearAdaptiveData()
Forgets all adaptive data which was collected during previous calls to one of the find methods. This method is typical called at the end of a document.

Specified by:
clearAdaptiveData in interface TokenNameFinder

probs

public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk. The specified array should be at least as large as the number of tokens in the previous call to chunk.

Parameters:
probs - An array used to hold the probabilities of the last decoded sequence.

probs

public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk.

Returns:
An array with the same number of probabilities as tokens were sent to chunk when it was last called.

probs

public double[] probs(Span[] spans)
Returns an array of probabilities for each of the specified spans which is the arithmetic mean of the probabilities for each of the outcomes which make up the span.

Parameters:
spans - The spans of the names for which probabilities are desired.
Returns:
an array of probabilities for each of the specified spans.

train

public static TokenNameFinderModel train(String languageCode,
                                         String type,
                                         ObjectStream<NameSample> samples,
                                         TrainingParameters trainParams,
                                         AdaptiveFeatureGenerator generator,
                                         Map<String,Object> resources)
                                  throws IOException
Trains a name finder model.

Parameters:
languageCode - the language of the training data
type - null or an override type for all types in the training data
samples - the training data
trainParams - machine learning train parameters
generator - null or the feature generator
resources - the resources for the name finder or null if none
Returns:
the newly trained model
Throws:
IOException

train

public static TokenNameFinderModel train(String languageCode,
                                         String type,
                                         ObjectStream<NameSample> samples,
                                         TrainingParameters trainParams,
                                         byte[] featureGeneratorBytes,
                                         Map<String,Object> resources)
                                  throws IOException
Trains a name finder model.

Parameters:
languageCode - the language of the training data
type - null or an override type for all types in the training data
samples - the training data
trainParams - machine learning train parameters
featureGeneratorBytes - descriptor to configure the feature generation or null
resources - the resources for the name finder or null if none
Returns:
the newly trained model
Throws:
IOException

train

public static TokenNameFinderModel train(String languageCode,
                                         String type,
                                         ObjectStream<NameSample> samples,
                                         AdaptiveFeatureGenerator generator,
                                         Map<String,Object> resources,
                                         int iterations,
                                         int cutoff)
                                  throws IOException
Trains a name finder model.

Parameters:
languageCode - the language of the training data
type - null or an override type for all types in the training data
samples - the training data
iterations - the number of iterations
cutoff -
resources - the resources for the name finder or null if none
Returns:
the newly trained model
Throws:
IOException
ObjectStreamException

train

@Deprecated
public static TokenNameFinderModel train(String languageCode,
                                                    String type,
                                                    ObjectStream<NameSample> samples,
                                                    Map<String,Object> resources,
                                                    int iterations,
                                                    int cutoff)
                                  throws IOException
Deprecated. use train(String, String, ObjectStream, TrainingParameters, AdaptiveFeatureGenerator, Map) instead and pass in a TrainingParameters object.

Throws:
IOException

train

public static TokenNameFinderModel train(String languageCode,
                                         String type,
                                         ObjectStream<NameSample> samples,
                                         Map<String,Object> resources)
                                  throws IOException
Throws:
IOException

train

@Deprecated
public static TokenNameFinderModel train(String languageCode,
                                                    String type,
                                                    ObjectStream<NameSample> samples,
                                                    byte[] generatorDescriptor,
                                                    Map<String,Object> resources,
                                                    int iterations,
                                                    int cutoff)
                                  throws IOException
Deprecated. use train(String, String, ObjectStream, TrainingParameters, byte[], Map) instead and pass in a TrainingParameters object.

Throws:
IOException

train

@Deprecated
public static opennlp.maxent.GISModel train(opennlp.model.EventStream es,
                                                       int iterations,
                                                       int cut)
                                     throws IOException
Deprecated. 

Throws:
IOException

dropOverlappingSpans

public static Span[] dropOverlappingSpans(Span[] spans)
Removes spans with are intersecting or crossing in anyway.

The following rules are used to remove the spans:
Identical spans: The first span in the array after sorting it remains
Intersecting spans: The first span after sorting remains
Contained spans: All spans which are contained by another are removed

Parameters:
spans -
Returns:
non-overlapping spans


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.