Klasse GISTrainer
- Alle implementierten Schnittstellen:
Trainer,EventTrainer
The reference paper for this implementation was Adwait Ratnaparkhi's tech report at the University of Pennsylvania's Institute for Research in Cognitive Science, and is available at ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.
The slack parameter used in the above implementation has been removed by default from the computation and a method for updating with Gaussian smoothing has been added per Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002). http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf.
The slack parameter can be used by setting useSlackParameter to true.
Gaussian smoothing can be used by setting useGaussianSmoothing to true.
A Prior can be used to train models which converge to the distribution which minimizes the
relative entropy between the distribution specified by the empirical constraints of the training
data and the specified prior. By default, the uniform distribution is used as the prior.
-
Feldübersicht
FelderVon Klasse geerbte Felder opennlp.tools.ml.AbstractEventTrainer
DATA_INDEXER_ONE_PASS_REAL_VALUE, DATA_INDEXER_ONE_PASS_VALUE, DATA_INDEXER_PARAM, DATA_INDEXER_TWO_PASS_VALUEVon Schnittstelle geerbte Felder opennlp.tools.ml.EventTrainer
EVENT_VALUE -
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungdoTrain(DataIndexer indexer) voidinit(TrainingParameters trainingParameters, Map<String, String> reportMap) booleanvoidsetGaussianSigma(double sigmaValue) Sets whether this trainer will use smoothing while training the model.voidsetSmoothing(boolean smooth) Sets whether this trainer will use smoothing while training the model.voidsetSmoothingObservation(double timesSeen) Sets whether this trainer will use smoothing while training the model.trainModel(int iterations, DataIndexer di) Trains a model using the GIS algorithm.trainModel(int iterations, DataIndexer di, int threads) Trains a model using the GIS algorithm.trainModel(int iterations, DataIndexer di, Prior modelPrior, int threads) Trains a model using the GIS algorithm.trainModel(ObjectStream<Event> eventStream) Trains a model using the GIS algorithm, assuming 100 iterations and no cutoff.trainModel(ObjectStream<Event> eventStream, int iterations, int cutoff) Trains a GIS model on the event in the specified event stream, using the specified number of iterations and the specified count cutoff.Von Klasse geerbte Methoden opennlp.tools.ml.AbstractEventTrainer
getDataIndexer, train, train, validateVon Klasse geerbte Methoden opennlp.tools.ml.AbstractTrainer
getAlgorithm, getCutoff, getIterations
-
Felddetails
-
LOG_LIKELIHOOD_THRESHOLD_PARAM
- Siehe auch:
-
LOG_LIKELIHOOD_THRESHOLD_DEFAULT
public static final double LOG_LIKELIHOOD_THRESHOLD_DEFAULT- Siehe auch:
-
MAXENT_VALUE
- Siehe auch:
-
-
Konstruktordetails
-
GISTrainer
public GISTrainer()Initializes aGISTrainer.Note:
The resulting instance does not print progress messages about training to STDOUT.
-
-
Methodendetails
-
isSortAndMerge
public boolean isSortAndMerge()- Angegeben von:
isSortAndMergein KlasseAbstractEventTrainer
-
init
- Angegeben von:
initin SchnittstelleTrainer- Setzt außer Kraft:
initin KlasseAbstractTrainer- Parameter:
trainingParameters- TheTrainingParametersto use.reportMap- TheMapinstance used as report map.
-
doTrain
- Angegeben von:
doTrainin KlasseAbstractEventTrainer- Löst aus:
IOException
-
setSmoothing
public void setSmoothing(boolean smooth) Sets whether this trainer will use smoothing while training the model.Note:
This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger.- Parameter:
smooth-trueif smoothing is desired,falseif not.
-
setSmoothingObservation
public void setSmoothingObservation(double timesSeen) Sets whether this trainer will use smoothing while training the model.Note:
This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger.- Parameter:
timesSeen- The "number" of times we want the trainer to imagine it saw a feature that it actually didn't see
-
setGaussianSigma
public void setGaussianSigma(double sigmaValue) Sets whether this trainer will use smoothing while training the model.Note:
This can improve model accuracy, though training will potentially take longer and use more memory. Model size will also be larger.- Parameter:
sigmaValue- The Gaussian sigma value used for smoothing.
-
trainModel
Trains a model using the GIS algorithm, assuming 100 iterations and no cutoff.- Parameter:
eventStream- TheeventStreamholding the data on which this model will be trained.- Gibt zurück:
- A trained
GISModelwhich can be used immediately or saved to disk using anGISModelWriter. - Löst aus:
IOException
-
trainModel
public GISModel trainModel(ObjectStream<Event> eventStream, int iterations, int cutoff) throws IOException Trains a GIS model on the event in the specified event stream, using the specified number of iterations and the specified count cutoff.- Parameter:
eventStream- Astreamof all events.iterations- The number of iterations to use for GIS.cutoff- The number of times a feature must occur to be included.- Gibt zurück:
- A trained
GISModelwhich can be used immediately or saved to disk using anGISModelWriter. - Löst aus:
IOException
-
trainModel
Trains a model using the GIS algorithm.- Parameter:
iterations- The number of GIS iterations to perform.di- TheDataIndexerused to compress events in memory.- Gibt zurück:
- A trained
GISModelwhich can be used immediately or saved to disk using anGISModelWriter. - Löst aus:
IllegalArgumentException- Thrown if parameters were invalid.
-
trainModel
Trains a model using the GIS algorithm.- Parameter:
iterations- The number of GIS iterations to perform.di- TheDataIndexerused to compress events in memory.threads- The number of thread to train with. Must be greater than0.- Gibt zurück:
- A trained
GISModelwhich can be used immediately or saved to disk using anGISModelWriter. - Löst aus:
IllegalArgumentException- Thrown if parameters were invalid.
-
trainModel
Trains a model using the GIS algorithm.- Parameter:
iterations- The number of GIS iterations to perform.di- TheDataIndexerused to compress events in memory.modelPrior- ThePriordistribution used to train this model.- Gibt zurück:
- A trained
GISModelwhich can be used immediately or saved to disk using anGISModelWriter. - Löst aus:
IllegalArgumentException- Thrown if parameters were invalid.
-