AbstractBottomUpParser |
Abstract class which contains code to tag and chunk parses for bottom up parsing and
leaves implementation of advancing parses and completing parses to extend class.
AbstractContextGenerator |
Abstract class containing many of the methods used to generate contexts for parsing.
AbstractDataIndexer |
Abstract class for collecting event and context counts used in training.
AbstractEventModelSequenceTrainer |
AbstractEventStream<T> |
AbstractEventTrainer |
AbstractModel |
AbstractModel.ModelType |
AbstractModelReader |
AbstractModelWriter |
AbstractObjectStream<T> |
AbstractParserEventStream |
Abstract class extended by parser event streams which perform tagging and chunking.
AbstractSampleStreamFactory<T> |
Base class for sample stream factories.
AbstractSequenceTrainer |
AbstractToSentenceSampleStream<T> |
AbstractTrainer |
AdaptiveFeatureGenerator |
An interface for generating features for name entity identification and for
updating document level contexts.
ADChunkSampleStream |
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese Chunker training.
ADChunkSampleStreamFactory |
A Factory to create a Arvores Deitadas ChunkStream from the command line
AdditionalContextFeatureGenerator |
ADNameSampleStream |
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese NER training.
ADNameSampleStreamFactory |
A Factory to create a Arvores Deitadas NameSampleDataStream from the command line
ADPOSSampleStream |
Note: Do not use this class, internal use only!
ADPOSSampleStreamFactory |
Note: Do not use this class, internal use only!
ADSentenceSampleStream |
Note: Do not use this class, internal use only!
ADSentenceSampleStreamFactory |
Note: Do not use this class, internal use only!
ADSentenceStream |
Stream filter which merges text lines into sentences, following the Arvores
Deitadas syntax.
ADSentenceStream.Sentence |
ADSentenceStream.SentenceParser |
Parses a sample of AD corpus.
ADTokenSampleStreamFactory |
Note: Do not use this class, internal use only!
AggregateCharSequenceNormalizer |
AggregatedFeatureGenerator |
AggregatedFeatureGeneratorFactory |
AncoraSpanishHeadRules |
Class for storing the Ancora Spanish head rules associated with parsing.
AncoraSpanishHeadRules.HeadRulesSerializer |
AnnotationConfiguration |
AnnotatorNoteAnnotation |
arabicStemmer |
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
ArrayMath |
Utility class for simple vector arithmetic.
ArrayMath |
Deprecated. |
ArtifactProvider |
Provides access to model persisted artifacts.
ArtifactSerializer<T> |
ArtifactToSerializerMapper |
Deprecated. |
AttachContextGenerator |
AttributeAnnotation |
Attributes |
BagOfWordsFeatureGenerator |
Generates a feature for each word in a document.
BaseLink |
Stores a minimal tuple of information.
BaseModel |
This model is a common based which can be used by the components
model classes.
BaseToolFactory |
Base class for all tool factories.
BasicContextGenerator |
Generate contexts for maxent decisions, assuming that the input
given to the getContext() method is a String containing contextual
predicates separated by spaces.
BasicFormatParams |
Common format parameters.
BasicTrainingParams |
Common training parameters.
BeamSearch<T> |
Performs k-best search over sequence.
BeamSearchContextGenerator<T> |
Interface for context generators used with a sequence beam search.
BigramNameFeatureGenerator |
BigramNameFeatureGeneratorFactory |
BilouCodec |
BilouNameFinderSequenceValidator |
BinaryFileDataReader |
BinaryGISModelReader |
A reader for GIS models stored in binary format.
BinaryGISModelWriter |
Model writer that saves models in binary format.
BinaryNaiveBayesModelReader |
BinaryNaiveBayesModelWriter |
Model writer that saves models in binary format.
BinaryPerceptronModelReader |
BinaryPerceptronModelWriter |
Model writer that saves models in binary format.
BinaryQNModelReader |
A reader for quasi-newton models stored in binary format.
BinaryQNModelWriter |
BioCodec |
BioNLP2004NameSampleStream |
Parser for the training files of the BioNLP/NLPBA 2004 shared task.
BioNLP2004NameSampleStreamFactory |
BratAnnotation |
BratAnnotationStream |
Reads the annotations from the brat .ann annotation file.
BratDocument |
BratDocumentParser |
BratDocumentStream |
BratNameSampleStream |
Generates Name Sample objects for a Brat Document object.
BratNameSampleStreamFactory |
BrownBigramFeatureGenerator |
Generates Brown cluster features for token bigrams.
BrownCluster |
Class to load a Brown cluster document: word\tword_class\tprob
The file containing the clustering lexicon has to be passed as the
value of the dict attribute of each BrownCluster feature generator.
BrownCluster.BrownClusterSerializer |
BrownClusterBigramFeatureGeneratorFactory |
Generates Brown clustering features for token bigrams.
BrownClusterTokenClassFeatureGeneratorFactory |
Generates Brown clustering features for token classes.
BrownClusterTokenFeatureGeneratorFactory |
Generates Brown clustering features for current token.
BrownTokenClasses |
Obtain the paths listed in the pathLengths array from the Brown class.
BrownTokenClassFeatureGenerator |
Generates Brown cluster features for current token and token class.
BrownTokenFeatureGenerator |
Generates Brown cluster features for current token.
BuildContextGenerator |
Class to generator predictive contexts for deciding how constituents should be combined together.
BuildContextGenerator |
Creates the features or contexts for the building phase of parsing.
BuildModelUpdaterTool |
ByteArraySerializer |
Cache<K,V> |
Provides fixed size, pre-allocated, least recently used replacement cache.
CachedFeatureGenerator |
CachedFeatureGeneratorFactory |
catalanStemmer |
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
CensusDictionaryCreatorTool |
This tool helps create a loadable dictionary for the NameFinder ,
from data collected from US Census data.
CharacterNgramFeatureGenerator |
CharacterNgramFeatureGeneratorFactory |
CharSequenceNormalizer |
A char sequence normalizer, used to adjusting (prune, substitute, add, etc.)
characters in order to remove noise from text
CheckContextGenerator |
Class for generating predictive context for deciding when a constituent is complete.
CheckContextGenerator |
CheckModelUpdaterTool |
ChunkContextGenerator |
Creates predivtive context for the pre-chunking phases of parsing.
Chunker |
The interface for chunkers which provide chunk tags for a sequence of tokens.
ChunkerContextGenerator |
Interface for the context generator used in syntactic chunking.
ChunkerConverterTool |
Tool to convert multiple data formats into native OpenNLP chunker training
ChunkerCrossValidator |
ChunkerCrossValidatorTool |
ChunkerDetailedFMeasureListener |
ChunkerEvaluationMonitor |
ChunkerEvaluator |
ChunkerEvaluatorTool |
ChunkerEventStream |
Class for creating an event stream out of data files for training a chunker.
ChunkerFactory |
ChunkerME |
The class represents a maximum-entropy-based chunker.
ChunkerMETool |
ChunkerModel |
ChunkerModelLoader |
Loads a Chunker Model for the command line tools.
ChunkerModelSerializer |
ChunkerSampleStreamFactory |
ChunkerTrainerTool |
ChunkEvaluationErrorListener |
ChunkSample |
Class for holding chunks for a single unit of text.
ChunkSampleSequenceStream |
ChunkSampleStream |
Parses the conll 2000 shared task shallow parser training data.
ChunkSampleStream |
CollectionObjectStream<E> |
ComparableEvent |
A maxent event representation which we can use to sort based on the
predicates indexes contained in the events.
ComparablePredicate |
A maxent predicate representation which we can use to sort based on the
ConfigurablePOSContextGenerator |
A context generator for the POS Tagger.
Conll02NameSampleStream |
Parser for the dutch and spanish ner training files of the CONLL 2002 shared task.
Conll02NameSampleStream.LANGUAGE |
Conll02NameSampleStreamFactory |
Note: Do not use this class, internal use only!
Conll03NameSampleStream |
An import stream which can parse the CONLL03 data.
Conll03NameSampleStream.LANGUAGE |
Conll03NameSampleStreamFactory |
ConlluLemmaSampleStream |
ConlluLemmaSampleStreamFactory |
Note: Do not use this class, internal use only!
ConlluPOSSampleStream |
ConlluPOSSampleStreamFactory |
Note: Do not use this class, internal use only!
ConlluSentence |
ConlluSentenceSampleStream |
ConlluSentenceSampleStreamFactory |
ConlluStream |
The CoNNL-U Format is specified here:
ConlluTagset |
ConlluTokenSampleStream |
ConlluTokenSampleStreamFactory |
ConlluWordLine |
ConllXPOSSampleStream |
Parses the data from the CONLL 06 shared task into POS Samples.
ConllXPOSSampleStreamFactory |
Note: Do not use this class, internal use only!
ConllXSentenceSampleStreamFactory |
Note: Do not use this class, internal use only!
ConllXTokenSampleStreamFactory |
Note: Do not use this class, internal use only!
Cons |
Class to hold feature information about a specific parse node.
ConstitParseSampleStream |
ConstitParseSampleStreamFactory |
Constituent |
Class used to hold constituents when reading parses.
Context |
Class which associates a real valued parameter or expected value with a particular contextual
predicate or feature.
ContextGenerator<T> |
Generate contexts for maxent decisions.
CrossValidationPartitioner<E> |
Provides access to training and test partitions for n-fold cross validation.
CrossValidationPartitioner.TrainingSampleStream<E> |
The TrainingSampleStream which iterates over
all training elements.
CustomFeatureGenerator |
Deprecated. |
CVParams |
Common cross validator parameters.
DataIndexer |
Object which compresses events in memory and performs feature selection.
DataIndexerFactory |
DataReader |
DataStream |
A interface for objects which can deliver a stream of training data to be
supplied to an EventStream.
DefaultChunkerContextGenerator |
Features based on chunking model described in Fei Sha and Fernando Pereira.
DefaultChunkerSequenceValidator |
DefaultEndOfSentenceScanner |
DefaultLanguageDetectorContextGenerator |
A context generator for language detector.
DefaultLemmatizerContextGenerator |
Simple feature generator for learning statistical lemmatizers.
DefaultLemmatizerSequenceValidator |
DefaultNameContextGenerator |
Class for determining contextual features for a tag/chunk style
named-entity recognizer.
DefaultPOSContextGenerator |
A context generator for the POS Tagger.
DefaultPOSSequenceValidator |
DefaultSDContextGenerator |
Generate event contexts for maxent decisions for sentence detection.
DefaultTokenContextGenerator |
Generate events for maxent decisions for tokenization.
DefinitionFeatureGeneratorFactory |
DetailedFMeasureEvaluatorParams |
EvaluatorParams for Chunker.
DetokenEvaluationErrorListener |
DetokenizationDictionary |
DetokenizationDictionary.Operation |
Detokenizer |
A Detokenizer merges tokens back to their untokenized representation.
Detokenizer.DetokenizationOperation |
This enum contains an operation for every token to merge the
tokens together to their detokenized form.
DetokenizerEvaluator |
DetokenizerParameter |
DetokenizerSampleStreamFactory<T> |
Base class for factories which need detokenizer.
DetokenizeSentenceSampleStream |
Dictionary |
This class is a dictionary.
DictionaryBuilderTool |
DictionaryDetokenizer |
A rule based detokenizer.
DictionaryDetokenizerTool |
DictionaryEntryPersistor |
This class is used by for reading and writing dictionaries of all kinds.
DictionaryFeatureGenerator |
DictionaryFeatureGeneratorFactory |
DictionaryLemmatizer |
Lemmatize by simple dictionary lookup into a hashmap built from a file
containing, for each line, word\tabpostag\tablemma.
DictionaryNameFinder |
This is a dictionary based name finder, it scans text
for names inside a dictionary.
DictionarySerializer |
DirectorySampleStream |
The directory sample stream allows for creating a stream
from a directory listing of files.
DoccatConverterTool |
DoccatCrossValidator |
Cross validator for document categorization
DoccatCrossValidatorTool |
DoccatEvaluationErrorListener |
DoccatEvaluationMonitor |
DoccatEvaluatorTool |
DoccatFactory |
The factory that provides Doccat default implementations and resources
DoccatFineGrainedReportListener |
Generates a detailed report for the POS Tagger.
DoccatModel |
A model for document categorization
DoccatModelLoader |
Loads a Document Categorizer Model for the command line tools.
DoccatTool |
DoccatTrainerTool |
DocumentBeginFeatureGenerator |
DocumentBeginFeatureGeneratorFactory |
DocumentCategorizer |
Interface for classes which categorize documents.
DocumentCategorizerEvaluator |
DocumentCategorizerEventStream |
Iterator-like class for modeling document classification events.
DocumentCategorizerME |
DocumentNameFinder |
Name finding interface which processes an entire document allowing the name finder to use context
from the entire document.
DocumentSample |
Class which holds a classified document and its category.
DocumentSampleStream |
This class reads in string encoded training samples, parses them and
outputs DocumentSample objects.
DocumentSampleStreamFactory |
DocumentToLineStream |
Reads a plain text file and return each line as a String object.
DownloadUtil |
This class facilitates the downloading of pretrained OpenNLP models.
DownloadUtil.ModelType |
The type of model.
DynamicEvalParameters |
EmojiCharSequenceNormalizer |
Normalizer for emojis.
EmptyLinePreprocessorStream |
Stream to to clean up empty lines for empty line separated document streams.
- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing
This stream should be used by the components that mark empty lines to mark document boundaries.
EncodingParameter |
Encoding parameter.
EndOfSentenceScanner |
Scans Strings, StringBuffers, and char[] arrays for the offsets of
sentence ending characters.
EntityLinker<T extends Span> |
EntityLinkers establish connections to external data to enrich extracted
EntityLinkerFactory |
Generates an EntityLinker implementation via properties file configuration
EntityLinkerProperties |
Properties wrapper for the EntityLinker framework
EntityLinkerTool |
Entry |
EntryInserter |
EvalitaNameSampleStream |
Parser for the Italian NER training files of the Evalita 2007 and 2009 NER shared tasks.
EvalitaNameSampleStream.LANGUAGE |
EvalitaNameSampleStreamFactory |
Note: Do not use this class, internal use only!
EvalParameters |
This class encapsulates the varibales used in producing probabilities from a model
and facilitaes passing these variables to the eval method.
EvaluationMonitor<T> |
Evaluator<T> |
The Evaluator is an abstract base class for evaluators.
EvaluatorParams |
Common evaluation parameters.
Event |
The context of a decision point during training.
EventAnnotation |
EventModelSequenceTrainer |
EventTraceStream |
EventTrainer |
Experimental |
Indicates that the API is not stable.
ExtensionLoader |
The ExtensionLoader is responsible to load extensions to the OpenNLP library.
ExtensionNotLoadedException |
Exception indicates that an OpenNLP extension could not be loaded.
ExtensionServiceKeys |
Factory |
Factory |
FeatureGenerator |
Interface for generating features for document categorization.
FeatureGeneratorResourceProvider |
FeatureGeneratorUtil |
This class provide common utilities for feature generation.
FileEventStream |
Class for using a file of events as an event stream.
FileToByteArraySampleStream |
FileToStringSampleStream |
Provides the ability to read the contents of files
contained in an object stream of files.
FilterObjectStream<S,T> |
FineGrainedEvaluatorParams |
Common evaluation parameters.
FMeasure |
The FMeasure is an utility class for evaluators
which measure precision, recall and the resulting f-measure.
Function |
Interface for a function
GapLabeler |
Interface for labeling nodes which contain traces so that these traces can be predicted
by the parser.
GeneratorFactory |
Creates a set of feature generators based on a provided XML descriptor.
GeneratorFactory.AbstractXmlFeatureGeneratorFactory |
GenericModelReader |
GenericModelSerializer |
GenericModelWriter |
GISModel |
A maximum entropy model which has been trained using the Generalized
Iterative Scaling procedure (implemented in
GISModelReader |
Abstract parent class for readers of GISModels.
GISModelWriter |
Abstract parent class for GISModel writers.
GISTrainer |
An implementation of Generalized Iterative Scaling.
Glove |
Warning: Experimental new feature, see OPENNLP-1144 for details, the API might be changed anytime.
greekStemmer |
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
HashSumEventStream |
HeadRules |
Interface for encoding the head rules associated with parsing.
HeadRules |
Class for storing the English head rules associated with parsing.
HeadRules.HeadRulesSerializer |
Index |
indonesianStemmer |
This class implements the stemming algorithm defined by a snowball script.
InputStreamFactory |
Allows repeated reads through a stream for certain types of model building.
InSpanGenerator |
Generates features if the tokens are recognized by the provided
TokenNameFinder .
InsufficientTrainingDataException |
This exception indicates that the provided training data is
insufficient to train the desired model.
InvalidFormatException |
This exception indicates that a resource violates the expected data format.
IrishSentenceBankDocument |
A structure to hold an Irish Sentence Bank document, which is a collection
of tokenized sentences.
IrishSentenceBankDocument.IrishSentenceBankFlex |
IrishSentenceBankDocument.IrishSentenceBankSentence |
IrishSentenceBankSentenceStreamFactory |
IrishSentenceBankTokenSampleStreamFactory |
irishStemmer |
This class was automatically generated by a Snowball to Java compiler
It implements the stemming algorithm defined by a snowball script.
Language |
Class for holding the document language and its confidence
LanguageDetector |
The interface for LanguageDetector which provide the @ Language according to the context.
LanguageDetectorConfig |
LanguageDetectorContextGenerator |
A context generator interface for language detector.
LanguageDetectorConverterTool |
LanguageDetectorCrossValidator |
Cross validator for language detector
LanguageDetectorCrossValidatorTool |
LanguageDetectorEvaluationErrorListener |
LanguageDetectorEvaluationMonitor |
LanguageDetectorEvaluator |
LanguageDetectorEvaluatorTool |
LanguageDetectorEventStream |
Iterator-like class for modeling language detector events.
LanguageDetectorFactory |
Default factory used by Language Detector.
LanguageDetectorFineGrainedReportListener |
Generates a detailed report for the POS Tagger.
LanguageDetectorME |
Implements learnable Language Detector
LanguageDetectorModel |
A model for language detection
LanguageDetectorModelLoader |
Loads a Language Detector Model for the command line tools.
LanguageDetectorSampleStream |
This class reads in string encoded training samples, parses them and
outputs LanguageSample objects.
LanguageDetectorSampleStreamFactory |
LanguageDetectorTool |
LanguageDetectorTrainerTool |
LanguageModel |
A language model can calculate the probability p (between 0 and 1) of a
certain sequence of tokens , given its underlying vocabulary.
LanguageParams |
LanguageSample |
Class which holds a classified document and its @ Language .
LanguageSampleStreamFactory<T> |
Stream factory for those streams which carry language.
LeipzigLanguageSampleStream |
LeipzigLanguageSampleStreamFactory |
Note: Do not use this class, internal use only!
LemmaEvaluationErrorListener |
LemmaSample |
Represents an lemmatized sentence.
LemmaSampleEventStream |
Class for creating an event stream out of data files for training a probabilistic lemmatizer.
LemmaSampleSequenceStream |
LemmaSampleStream |
Reads data for training and testing the lemmatizer.
Lemmatizer |
The interface for lemmatizers.
LemmatizerContextGenerator |
Interface for the context generator used for probabilistic lemmatizer.
LemmatizerEvaluationMonitor |
Interface for the lemmatizer evaluator.
LemmatizerEvaluator |
LemmatizerEvaluatorTool |
LemmatizerFactory |
LemmatizerFineGrainedReportListener |
Generates a detailed report for the Lemmatizer.
LemmatizerME |
A probabilistic lemmatizer.
LemmatizerMETool |
LemmatizerModel |
LemmatizerModelLoader |
Loads a Lemmatizer Model for the command line tools.
LemmatizerSampleStreamFactory |
LemmatizerTrainerTool |
LetsmtDocument |
A structure to hold the letsmt document.
LetsmtDocument.LetsmtDocumentHandler |
LetsmtDocument.LetsmtSentence |
LetsmtSentenceStreamFactory |
LineSearch |
Class that performs line search to find minimum
LineSearch.LineSearchResult |
Class to store lineSearch result
LinkedSpan<T extends BaseLink> |
An "default" extended span that holds additional information about the Span
LogProbabilities<T> |
Class implementing the probability distribution over labels returned by
a classifier as a log of probabilities.
LogProbability<T> |
Class implementing the probability for a label.
MarkableFileInputStreamFactory |
A factory that creates MarkableFileInputStream from a File
MascDocument |
MascDocumentStream |
MascNamedEntityParser |
A class to process the MASC Named entity stand-off annotation file
MascNamedEntitySampleStream |
MascNamedEntitySampleStreamFactory |
MascPennTagParser |
A class for parsing MASC's Penn tagging/tokenization stand-off annotation
MascPOSSampleStream |
MascPOSSampleStreamFactory |
MascSentence |
MascSentenceSampleStream |
MascSentenceSampleStreamFactory |
MascToken |
MascTokenSampleStream |
MascTokenSampleStreamFactory |
MascWord |
MaxentModel |
Interface for maximum entropy models.
Mean |
Calculates the arithmetic mean of values
added with the Mean.add(double) method.
ModelParameterChunker |
A helper class that handles Strings with more than 64k (65535 bytes) in length.
ModelType |
ModelUtil |
MosesSentenceSampleStream |
MosesSentenceSampleStreamFactory |
Muc6NameSampleStreamFactory |
MucNameContentHandler |
MucNameSampleStream |
MutableContext |
Class used to store parameters or expected values associated with this context which
can be updated or assigned.
MutableInt |
This is a non-thread safe mutable int.
MutableTagDictionary |
Interface that allows TagDictionary entries to be added and removed.
NaiveBayesEvalParameters |
Parameters for the evalution of a naive bayes classifier
NaiveBayesModel |
Class implementing the multinomial Naive Bayes classifier model.
NaiveBayesModelReader |
Abstract parent class for readers of NaiveBayes.
NaiveBayesModelWriter |
Abstract parent class for NaiveBayes writers.
NaiveBayesTrainer |
Trains models using the combination of EM algorithm and Naive Bayes classifier
which is described in:
Text Classification from Labeled and Unlabeled Documents using EM
Nigam, McCallum, et al paper of 2000
NameContextGenerator |
Interface for generating the context for an name finder by specifying a set of geature generators.
NameEvaluationErrorListener |
NameFinderCensus90NameStream |
This class helps to read the US Census data from the files to build a
StringList for each dictionary entry in the name-finder dictionary.
NameFinderEventStream |
Class for creating an event stream out of data files for training an name
NameFinderME |
Class for creating a maximum-entropy-based name finder.
NameFinderSequenceValidator |
NameSample |
Class for holding names for a single unit of text.
NameSampleCountersStream |
Counts tokens, sentences and names by type
NameSampleDataStream |
NameSampleDataStreamFactory |
NameSampleDataStreamFactory.Parameters |
NameSampleSequenceStream |
NameSampleTypeFilter |
A stream which removes Name Samples which do not have a certain type.
NameToSentenceSampleStream |
Note: Do not use this class, internal use only!
NameToSentenceSampleStreamFactory |
Note: Do not use this class, internal use only!
NameToTokenSampleStream |
Note: Do not use this class, internal use only!
NameToTokenSampleStreamFactory |
Note: Do not use this class, internal use only!
NegLogLikelihood |
Evaluate negative log-likelihood and its gradient from DataIndexer.
NewlineSentenceDetector |
The Newline Sentence Detector assumes that sentences are line delimited and
recognizes one sentence per non-empty line.
NGramCharModel |
NGramFeatureGenerator |
Generates ngram features for a document.
NGramGenerator |
Generates an nGram, with optional separator, and returns the grams as a list
of strings
NGramLanguageModel |
NGramLanguageModelTool |
NGramModel |
The NGramModel can be used to crate ngrams and character ngrams.
NGramUtils |
Utility class for ngrams.
NKJPSegmentationDocument |
NKJPSegmentationDocument.Pointer |
NKJPSentenceSampleStream |
NKJPSentenceSampleStreamFactory |
NKJPTextDocument |
NumberCharSequenceNormalizer |
Normalizer for numbers
ObjectDataReader |
ObjectStream<T> |
Reads Object s from a stream.
ObjectStreamUtils |
OnePassDataIndexer |
An indexer for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
OnePassRealValueDataIndexer |
An indexer for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
predicates and maintains event values.
OntoNotesFormatParameters |
OntoNotesNameSampleStream |
Name Sample Stream parser for the OntoNotes 4.0 corpus.
OntoNotesNameSampleStreamFactory |
OntoNotesParseSampleStream |
OntoNotesParseSampleStreamFactory |
OntoNotesPOSSampleStreamFactory |
OSGiExtensionLoader |
OSGi bundle activator which can use an OSGi service as
an OpenNLP extension.
OutcomePriorFeatureGenerator |
The definition feature maps the underlying distribution of outcomes.
ParagraphStream |
Stream filter which merges text lines into paragraphs.
ParallelNegLogLikelihood |
Evaluate negative log-likelihood and its gradient in parallel
Parse |
Data structure for holding parse constituents.
Parser |
Class for a shift reduce style parser based on Adwait Ratnaparkhi's 1998 thesis.
Parser |
Interface for full-syntactic parsers.
Parser |
Built/attach parser.
ParserChunkerFactory |
ParserChunkerSequenceValidator |
ParserConverterTool |
ParserCrossValidator |
ParserEvaluationMonitor |
ParserEvaluator |
Class for ParserEvaluator.
ParserEvaluatorTool |
ParserEventStream |
Wrapper class for one of four parser event streams.
ParserEventStream |
ParserEventTypeEnum |
Enumerated type of event types for the parser.
ParserFactory |
ParserModel |
This is an abstract base class for ParserModel implementations.
ParserModelLoader |
Loads a Parser Model for the command line tools.
ParserTool |
ParserTrainerTool |
ParserType |
ParseSampleStream |
ParseSampleStreamFactory |
ParseSampleStreamFactory.Parameters |
ParseToPOSSampleStream |
Note: Do not use this class, internal use only!
ParseToPOSSampleStreamFactory |
Note: Do not use this class, internal use only!
ParseToSentenceSampleStreamFactory |
Note: Do not use this class, internal use only!
ParseToTokenSampleStreamFactory |
Note: Do not use this class, internal use only!
PerceptronModel |
PerceptronModelReader |
Abstract parent class for readers of Perceptron.
PerceptronModelWriter |
Abstract parent class for Perceptron writers.
PerceptronTrainer |
Trains models using the perceptron algorithm.
PlainTextByLineStream |
Reads a plain text file and return each line as a String object.
PlainTextFileDataReader |
PlainTextNaiveBayesModelReader |
PlainTextNaiveBayesModelWriter |
Model writer that saves models in plain text format.
PorterStemmer |
Stemmer, implementing the Porter Stemming Algorithm
The Stemmer class transforms a word into its root form.
PortugueseContractionUtility |
Utility class to handle Portuguese contractions.
POSContextGenerator |
The interface for a context generator for the POS Tagger.
POSDictionary |
Provides a means of determining which tags are valid for a particular word
based on a tag dictionary read from a file.
POSEvaluationErrorListener |
POSEvaluator |
POSModel |
POSModelLoader |
Loads a POS Tagger Model for the command line tools.
POSModelSerializer |
POSSample |
Represents an pos-tagged sentence.
POSSampleEventStream |
POSSampleSequenceStream |
PosSampleStream |
POSTagger |
The interface for part of speech taggers.
POSTaggerConverterTool |
POSTaggerCrossValidator |
POSTaggerCrossValidatorTool |
POSTaggerEvaluationMonitor |
POSTaggerEvaluatorTool |
POSTaggerFactory |
The factory that provides POS Tagger default implementations and resources
POSTaggerFactory.POSDictionarySerializer |
PosTaggerFeatureGenerator |
PosTaggerFeatureGeneratorFactory |
POSTaggerFineGrainedReportListener |
Generates a detailed report for the POS Tagger.
POSTaggerME |
A part-of-speech tagger that uses maximum entropy.
POSTaggerNameFeatureGenerator |
Adds the token POS Tag as feature.
POSTaggerNameFeatureGeneratorFactory |
POSTaggerTool |
POSTaggerTrainerTool |
POSToSentenceSampleStream |
Note: Do not use this class, internal use only!
POSToSentenceSampleStreamFactory |
Note: Do not use this class, internal use only!
POSToTokenSampleStream |
Note: Do not use this class, internal use only!
POSToTokenSampleStreamFactory |
Note: Do not use this class, internal use only!
PrefixFeatureGenerator |
PrefixFeatureGeneratorFactory |
PreviousMapFeatureGenerator |
PreviousMapFeatureGeneratorFactory |
PreviousTwoMapFeatureGenerator |
Prior |
This interface allows one to implement a prior distribution for use in
maximum entropy model training.
Probabilities<T> |
Class implementing the probability distribution over labels returned by a classifier.
Probability<T> |
Class implementing the probability for a label.
ProbingLanguageDetectionResult |
QNMinimizer |
Implementation of L-BFGS which supports L1-, L2-regularization
and Elastic Net for solving convex optimization problems.
QNMinimizer.Evaluator |
Evaluate quality of training parameters.
QNMinimizer.L2RegFunction |
L2-regularized objective function
QNModel |
QNModelReader |
QNModelWriter |
QNTrainer |
Maxent model trainer using L-BFGS algorithm.
RealBasicEventStream |
RealValueFileEventStream |
RegexNameFinder |
Name finder based on a series of regular expressions.
RegexNameFinderFactory |
Returns a RegexNameFinder based on A selection of
defaults or a configuration and a selection of defaults
RegexNameFinderFactory.RegexAble |
RelationAnnotation |
ResetableIterator<E> |
This interface makes an Iterator resetable.
ReverseListIterator<T> |
An iterator for a list which returns values in the opposite order as the typical list iterator.
SDContextGenerator |
SDCrossValidator |
A cross validator for the sentence detector.
SDEventStream |
SegmenterObjectStream<S,T> |
SentenceContextGenerator |
Creates contexts/features for end-of-sentence detection in Thai text.
SentenceDetector |
The interface for sentence detectors, which find the sentence boundaries in
a text.
SentenceDetectorConverterTool |
SentenceDetectorCrossValidatorTool |
SentenceDetectorEvaluationMonitor |
SentenceDetectorEvaluator |
SentenceDetectorEvaluatorTool |
SentenceDetectorFactory |
The factory that provides SentenceDetecor default implementations and
SentenceDetectorME |
A sentence detector for splitting up raw text into sentences.
SentenceDetectorTool |
A sentence detector which uses a maxent model to predict the sentences.
SentenceDetectorTrainerTool |
SentenceEvaluationErrorListener |
SentenceFeatureGenerator |
This feature generator creates sentence begin and end features.
SentenceFeatureGeneratorFactory |
SentenceModel |
SentenceSample |
A SentenceSample contains a document with
begin indexes of the individual sentences.
SentenceSampleStream |
This class is a stream filter which reads a sentence by line samples from
a Reader and converts them into SentenceSample objects.
SentenceSampleStreamFactory |
Sequence<T> |
Class which models a sequence.
Sequence |
Represents a weighted sequence of outcomes.
SequenceClassificationModel<T> |
A classification model that can label an input sequence.
SequenceCodec<T> |
SequenceStream |
Interface for streams of sequences used to train sequence models.
SequenceStreamEventStream |
Class which turns a sequence stream into an event stream.
SequenceTrainer |
SequenceValidator<T> |
SerializableArtifact |
SgmlParser |
SAX style SGML parser.
SgmlParser.ContentHandler |
ShrinkCharSequenceNormalizer |
Normalizer to shrink repeated spaces / chars
SimplePerceptronSequenceTrainer |
Trains models for sequences using the perceptron algorithm.
SimpleTokenizer |
Performs tokenization using character classes.
SimpleTokenizerTool |
SnowballStemmer |
SnowballStemmer.ALGORITHM |
Span |
Class for storing start and end integer offsets.
SpanAnnotation |
Stemmer |
The stemmer is reducing a word to its stem.
StringList |
StringPattern |
Recognizes predefined patterns in strings.
StringUtil |
SuffixFeatureGenerator |
SuffixFeatureGeneratorFactory |
TagDictionary |
Interface to determine which tags are valid for a particular word
based on a tag dictionary.
TaggerModelReplacerTool |
TokenClassFeatureGenerator |
Generates features for different for the class of the token.
TokenClassFeatureGeneratorFactory |
TokenContextGenerator |
TokenEvaluationErrorListener |
TokenFeatureGenerator |
Generates a feature which contains the token itself.
TokenFeatureGeneratorFactory |
Tokenizer |
The interface for tokenizers, which segment a string into its tokens.
TokenizerConverterTool |
TokenizerCrossValidator |
TokenizerCrossValidatorTool |
TokenizerEvaluationMonitor |
TokenizerEvaluator |
TokenizerFactory |
The factory that provides Tokenizer default implementations and
TokenizerME |
A Tokenizer for converting raw text into separated tokens.
TokenizerMEEvaluatorTool |
TokenizerMETool |
TokenizerModel |
TokenizerModelLoader |
Loads a Tokenizer Model for the command line tools.
TokenizerStream |
TokenizerTrainerTool |
TokenNameFinder |
The interface for name finders which provide name tags for a sequence of tokens.
TokenNameFinderConverterTool |
Tool to convert multiple data formats into native OpenNLP name finder training
TokenNameFinderCrossValidator |
TokenNameFinderCrossValidatorTool |
TokenNameFinderDetailedFMeasureListener |
TokenNameFinderEvaluationMonitor |
TokenNameFinderEvaluator |
TokenNameFinderEvaluatorTool |
TokenNameFinderFactory |
TokenNameFinderFineGrainedReportListener |
Generates a detailed report for the NameFinder.
TokenNameFinderModel |
TokenNameFinderModel.FeatureGeneratorCreationError |
TokenNameFinderModelLoader |
Loads a Token Name Finder Model for the command line tools.
TokenNameFinderTool |
TokenNameFinderTrainerTool |
TokenPatternFeatureGenerator |
Partitions tokens into sub-tokens based on character classes and generates
class features for each of the sub-tokens and combinations of those sub-tokens.
TokenPatternFeatureGeneratorFactory |
TokenSample |
TokenSampleStream |
Class which produces an Iterator<TokenSample> from a file of space delimited token.
TokenSampleStream |
This class is a stream filter which reads in string encoded samples and creates
TokenSample s out of them.
TokenSampleStreamFactory |
TokenTag |
TokSpanEventStream |
TrainerFactory |
TrainerFactory.TrainerType |
TrainingParameters |
TrainingToolParams |
Common training parameters.
TrigramNameFeatureGenerator |
Adds trigram features based on tokens and token classes.
TrigramNameFeatureGeneratorFactory |
TwentyNewsgroupSampleStream |
TwentyNewsgroupSampleStreamFactory |
TwitterCharSequenceNormalizer |
Normalizer for Twitter character sequences
TwoPassDataIndexer |
Collecting event and context counts by making two passes over the events.
UncloseableInputStream |
UniformPrior |
Provide a maximum entropy model with a uniform prior.
UrlCharSequenceNormalizer |
Normalizer that removes URls and email addresses.
Version |
The Version class represents the OpenNlp Tools library version.
WhitespaceTokenizer |
This tokenizer uses white spaces to tokenize the input text.
WhitespaceTokenStream |
This stream formats a TokenSample s into whitespace
separated token strings.
WindowFeatureGenerator |
WindowFeatureGeneratorFactory |
WordClusterDictionary |
WordClusterDictionary.WordClusterDictionarySerializer |
WordClusterFeatureGenerator |
WordClusterFeatureGeneratorFactory |
Defines a word cluster generator factory; it reads an element containing
'w2vwordcluster' as a tag name; these clusters are typically produced by
word2vec or clark pos induction systems.
WordpieceTokenizer |
A WordPiece tokenizer.
WordTagSampleStream |
A stream filter which reads a sentence per line which contains
words and tags in word_tag format and outputs a POSSample objects.
WordTagSampleStreamFactory |
Note: Do not use this class, internal use only!
WordTagSampleStreamFactory.Parameters |
WordVector |
A word vector.
WordVectorTable |
A table that maps tokens to word vectors.
WordVectorType |
XmlUtil |