All Classes and Interfaces (Apache OpenNLP Tools 2.3.2 API)

Class

Description

Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.

AbstractContextGenerator

Abstract class containing many of the methods used to generate contexts for parsing.

AbstractDataIndexer

Abstract DataIndexer implementation for collecting event and context counts used in training.

AbstractEventModelSequenceTrainer

A basic EventModelSequenceTrainer implementation that processes events.

AbstractEventStream<T>

A base ObjectStream implementation for events.

AbstractEventTrainer

A basic EventTrainer implementation.

AbstractMLModelWriter

AbstractModel

A basic MaxentModel implementation.

AbstractModel.ModelType

AbstractModelReader

An abstract, basic implementation of a model reader.

AbstractModelWriter

An abstract, basic implementation of a model writer.

AbstractObjectStream<T>

A base ObjectStream implementation.

AbstractParserEventStream

Abstract class extended by parser event streams which perform tagging and chunking.

AbstractSampleStreamFactory<T,P>

Base class for sample stream factories.

AbstractToSentenceSampleStream<T>

AbstractTrainer

AdaptiveFeatureGenerator

An interface for generating features for name entity identification and for updating document level contexts.

ADChunkSampleStream

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

ADChunkSampleStreamFactory

A Factory to create a Arvores Deitadas ChunkStream from the command line utility.

AdditionalContextFeatureGenerator

The AdditionalContextFeatureGenerator generates the context from the passed in additional context.

ADNameSampleStream

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.

ADNameSampleStreamFactory

A Factory to create a Arvores Deitadas NameSampleDataStream from the command line utility.

ADPOSSampleStream

Note: Do not use this class, internal use only!

ADPOSSampleStreamFactory

Note: Do not use this class, internal use only!

ADSentenceSampleStream

Note: Do not use this class, internal use only!

ADSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

ADSentenceStream

Stream filter which merges text lines into sentences, following the Arvores Deitadas syntax.

ADSentenceStream.Sentence

ADSentenceStream.SentenceParser

Parses a sample of AD corpus.

ADSentenceStream.SentenceParser.Leaf

Represents the AD leaf

ADSentenceStream.SentenceParser.Node

Represents the AD node

ADSentenceStream.SentenceParser.TreeElement

Represents a tree element, Node or Leaf

ADTokenSampleStreamFactory

Note: Do not use this class, internal use only!

AggregateCharSequenceNormalizer

A CharSequenceNormalizer implementation that aggregates the functionality of other normalizers.

AggregatedFeatureGenerator

The AggregatedFeatureGenerator aggregates a set of AdaptiveFeatureGenerators and calls them to generate the features.

AggregatedFeatureGeneratorFactory

Among

AncoraSpanishHeadRules

Class for storing the Ancora Spanish head rules associated with parsing.

AncoraSpanishHeadRules.HeadRulesSerializer

AnnotationConfiguration

AnnotatorNoteAnnotation

arabicStemmer

This class implements the stemming algorithm defined by a snowball script.

ArrayMath

Utility class for simple vector arithmetic.

ArtifactProvider

Provides access to model persisted artifacts.

ArtifactSerializer<T>

Responsible to create an artifact from an InputStream.

AttachContextGenerator

Generates predictive contexts for deciding how constituents should be attached.

AttributeAnnotation

Attributes

The Attributes class stores name value pairs.

BagOfWordsFeatureGenerator

Generates a feature for each word in a document.

BaseLink

Represents a minimal tuple of information.

BaseModel

This is a common base model which can be used by the components' specific model classes.

BaseToolFactory

Base class for all tool factories.

BasicContextGenerator

A ContextGenerator implementation for maxent decisions, assuming that the input given to the BasicContextGenerator.getContext(String) method is a String containing contextual predicates separated by spaces, for instance:

BasicFormatParams

Common format parameters.

BasicTrainingParams

Common training parameters.

BeamSearch<T>

Performs k-best search over a sequence.

BeamSearchContextGenerator<T>

Interface for context generators used with a sequence beam search.

BigramNameFeatureGenerator

BigramNameFeatureGeneratorFactory

BilouCodec

The default SequenceCodec implementation according to the BILOU scheme.

BilouNameFinderSequenceValidator

A SequenceValidator implementation for the BilouCodec.

BinaryFileDataReader

A DataReader that reads files from a binary format.

BinaryGISModelReader

A GISModelReader that reads models from a binary format.

BinaryGISModelWriter

A GISModelWriter that writes models in a binary format.

BinaryNaiveBayesModelReader

A NaiveBayesModelReader that reads models from a binary format.

BinaryNaiveBayesModelWriter

A NaiveBayesModelWriter that writes models in a binary format.

BinaryPerceptronModelReader

A PerceptronModelReader that reads models from a binary format.

BinaryPerceptronModelWriter

A PerceptronModelWriter that writes models in a binary format.

BinaryQNModelReader

A QNModelReader that reads models from a binary format.

BinaryQNModelWriter

A QNModelWriter that writes models in a binary format.

BioCodec

The default SequenceCodec implementation according to the BIO scheme: B: 'beginning' of a NE I: 'inside', the word is inside a NE O: 'outside', the word is a regular word outside a NE See also the paper by Roth D. and Ratinov L.: Design Challenges and Misconceptions in Named Entity Recognition.

BioNLP2004NameSampleStream

A sample stream for the training files of the BioNLP/NLPBA 2004 shared task.

BioNLP2004NameSampleStreamFactory

BratAnnotation

BratAnnotationStream

Reads the annotations from the brat .ann annotation file.

BratDocument

Brat (brat rapid annotation tool) is based on the stav visualiser which was originally made in order to visualise BioNLP'11 Shared Task data.

BratDocumentParser

BratDocumentStream

BratNameSampleStream

Generates Name Sample objects for a Brat Document object.

BratNameSampleStreamFactory

BrownBigramFeatureGenerator

Generates Brown cluster features for token bigrams.

BrownCluster

Class to load a Brown cluster document: word\tword_class\tprob

BrownCluster.BrownClusterSerializer

BrownClusterBigramFeatureGeneratorFactory

Generates Brown clustering features for token bigrams.

BrownClusterTokenClassFeatureGeneratorFactory

Generates Brown clustering features for token classes.

BrownClusterTokenFeatureGeneratorFactory

Generates Brown clustering features for current token.

BrownTokenClasses

Obtain the paths listed in the pathLengths array from the Brown class.

BrownTokenClassFeatureGenerator

Generates BrownCluster features for current token and token class.

BrownTokenFeatureGenerator

Generates BrownCluster features for current token.

BuildContextGenerator

Generates predictive contexts for deciding how constituents should be combined.

BuildContextGenerator

Creates the features or contexts for the building phase of parsing.

BuildModelUpdaterTool

ByteArraySerializer

An ArtifactSerializer implementation for binary data, kept in byte[].

Cache<K,V>

Provides fixed size, pre-allocated, least recently used replacement cache.

CachedFeatureGenerator

Caches features of the aggregated generators.

CachedFeatureGeneratorFactory

catalanStemmer

This class implements the stemming algorithm defined by a snowball script.

CensusDictionaryCreatorTool

This tool helps create a loadable dictionary for the NameFinder, from data collected from US Census data.

CharacterNgramFeatureGenerator

The CharacterNgramFeatureGenerator uses character ngrams to generate features about each token.

CharacterNgramFeatureGeneratorFactory

CharSequenceNormalizer

A char sequence normalizer, used to adjusting (prune, substitute, add, etc.)

CheckContextGenerator

Generates predictive context for deciding when a constituent is complete.

CheckContextGenerator

Generates predictive context for deciding when a constituent is complete.

CheckModelUpdaterTool

Trains a new check model.

ChunkContextGenerator

Creates predictive context for the pre-chunking phases of parsing.

Chunker

The interface for chunkers which provide chunk tags for a sequence of tokens.

ChunkerContextGenerator

Interface for a BeamSearchContextGenerator used in syntactic chunking.

ChunkerConverterTool

Tool to convert multiple data formats into native OpenNLP chunker training format.

ChunkerCrossValidator

Cross validator for Chunker.

ChunkerCrossValidatorTool

ChunkerDetailedFMeasureListener

ChunkerEvaluationMonitor

A marker interface for evaluating chunkers.

ChunkerEvaluator

The ChunkerEvaluator measures the performance of the given Chunker with the provided reference samples.

ChunkerEvaluatorTool

A default ChunkSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

ChunkerEventStream

Class for creating an event stream out of data files for training a Chunker.

ChunkerFactory

ChunkerME

The class represents a maximum-entropy-based Chunker.

ChunkerMETool

ChunkerModel

The ChunkerModel is the model used by a learnable Chunker.

ChunkerModelLoader

Loads a ChunkerModel for the command line tools.

ChunkerModelSerializer

An ArtifactSerializer implementation for models.

ChunkerSampleStreamFactory

Factory producing OpenNLP ChunkSampleStreams.

ChunkerTrainerTool

ChunkEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

ChunkSample

Class for holding chunks for a single unit of text.

ChunkSampleSequenceStream

A SequenceStream implementation encapsulating samples.

ChunkSampleStream

Parses the conll 2000 shared task shallow parser training data.

ChunkSampleStream

CollectionObjectStream<E>

An ObjectStream implementation that works on a Collection of CollectionObjectStream as source for elements.

ComparableEvent

A maxent event representation which we can use to sort based on the predicates indexes contained in the events.

ComparablePredicate

A maxent predicate representation which we can use to sort based on the outcomes.

ConfigurablePOSContextGenerator

A configurable context generator for a POSTagger.

Conll02NameSampleStream

Parser for the Dutch and Spanish ner training files of the CONLL 2002 shared task.

Conll02NameSampleStream.LANGUAGE

Conll02NameSampleStreamFactory

Note: Do not use this class, internal use only!

Conll03NameSampleStream

An import stream which can parse the CONLL03 data.

Conll03NameSampleStream.LANGUAGE

Conll03NameSampleStreamFactory

ConlluLemmaSampleStream

ConlluLemmaSampleStreamFactory

Note: Do not use this class, internal use only!

ConlluPOSSampleStream

ConlluPOSSampleStreamFactory

Note: Do not use this class, internal use only!

ConlluSentence

ConlluSentenceSampleStream

ConlluSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

ConlluStream

The CoNNL-U Format is specified here.

ConlluTagset

ConlluTokenSampleStream

ConlluTokenSampleStreamFactory

Note: Do not use this class, internal use only!

ConlluWordLine

ConllXPOSSampleStream

Parses the data from the CONLL 06 shared task into POS Samples.

ConllXPOSSampleStreamFactory

Note: Do not use this class, internal use only!

ConllXSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

ConllXTokenSampleStreamFactory

Note: Do not use this class, internal use only!

Cons

Holds feature information about a specific Parse node.

ConstitParseSampleStream

ConstitParseSampleStreamFactory

Note: Do not use this class, internal use only!

Constituent

Holds constituents when reading parses.

Context

Class which associates a real valued parameter or expected value with a particular contextual predicate or feature.

ContextGenerator<T>

Represents a generator of contexts for maxent decisions.

CrossValidationPartitioner<E>

Provides access to training and test partitions for n-fold cross validation.

CrossValidationPartitioner.TrainingSampleStream<E>

The CrossValidationPartitioner.TrainingSampleStream which iterates over all training elements.

CVParams

Common cross validator parameters.

danishStemmer

This class implements the stemming algorithm defined by a snowball script.

DataIndexer

Represents an indexer which compresses events in memory and performs feature selection.

DataIndexerFactory

A factory that produces DataIndexer instances.

DataReader

Describes generic ways to read data from a DataInputStream.

DataStream

An interface for objects which can deliver a stream of training data to be supplied to an EventStream.

DefaultChunkerContextGenerator

Features based on chunking model described in Fei Sha and Fernando Pereira.

DefaultChunkerSequenceValidator

The default chunker SequenceValidator implementation.

DefaultEndOfSentenceScanner

Default implementation of the EndOfSentenceScanner.

DefaultLanguageDetectorContextGenerator

A context generator for language detector.

DefaultLemmatizerContextGenerator

Simple feature generator for learning statistical lemmatizers.

DefaultLemmatizerSequenceValidator

The default lemmatizer SequenceValidator implementation.

DefaultNameContextGenerator

A NameContextGenerator implementation for determining contextual features for a tag-chunk style named-entity recognizer.

DefaultPOSContextGenerator

A default context generator for a POSTagger.

DefaultPOSSequenceValidator

The default POS tagger SequenceValidator implementation.

DefaultSDContextGenerator

Generate event contexts for maxent decisions for sentence detection.

DefaultTokenContextGenerator

A default TokenContextGenerator which produces events for maxent decisions for tokenization.

DefinitionFeatureGeneratorFactory

DetokenEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

DetokenizationDictionary

DetokenizationDictionary.Operation

Detokenizer

A Detokenizer merges tokens back to their detokenized representation.

Detokenizer.DetokenizationOperation

This enum contains an operation for every token to merge the tokens together to their detokenized form.

DetokenizerEvaluator

The DetokenizerEvaluator measures the performance of the given Detokenizer with the provided reference samples.

DetokenizerParameter

DetokenizerSampleStreamFactory<T,P>

Base class for factories which need a Detokenizer.

DetokenizeSentenceSampleStream

Dictionary

An iterable and serializable dictionary implementation.

DictionaryBuilderTool

DictionaryDetokenizer

A rule based detokenizer.

DictionaryDetokenizerTool

DictionaryEntryPersistor

A persistor used by for reading and writing dictionaries of all kinds.

DictionaryFeatureGenerator

The DictionaryFeatureGenerator uses the DictionaryNameFinder to generated features for detected names based on the InSpanGenerator.

DictionaryFeatureGeneratorFactory

DictionaryLemmatizer

A Lemmatizer implementation that works by simple dictionary lookup into a Map built from a file containing, for each line:

DictionaryNameFinder

This is a Dictionary based name finder.

DictionarySerializer

An ArtifactSerializer implementation for dictionaries.

DirectorySampleStream

The directory sample stream allows for creating an ObjectStream<File> from a directory listing of files.

DoccatConverterTool

Tool to convert multiple data formats into native OpenNLP doccat training format.

DoccatCrossValidator

Cross validator for DocumentCategorizer.

DoccatCrossValidatorTool

DoccatEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

DoccatEvaluationMonitor

A marker interface for evaluating doccat.

DoccatEvaluatorTool

A default DocumentSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

DoccatFactory

The factory that provides Doccat default implementations and resources.

DoccatFineGrainedReportListener

Generates a detailed report for the POS Tagger.

DoccatModel

A model for document categorization

DoccatModelLoader

Loads a DoccatModel for the command line tools.

DoccatTool

DoccatTrainerTool

DocumentBeginFeatureGenerator

DocumentBeginFeatureGeneratorFactory

DocumentCategorizer

Interface for classes which categorize documents.

DocumentCategorizerEvaluator

The DocumentCategorizerEvaluator measures the performance of the given DocumentCategorizer with the provided reference samples.

DocumentCategorizerEventStream

Iterator-like class for modeling document classification events.

DocumentCategorizerME

A Max-Ent based implementation of DocumentCategorizer.

DocumentNameFinder

Interface for processing an entire document allowing a TokenNameFinder to use context from the entire document.

DocumentSample

Class which holds a classified document and its category.

DocumentSampleStream

Reads in string encoded training samples, parses them and outputs DocumentSample objects.

DocumentSampleStreamFactory

Factory producing OpenNLP DocumentSampleStreams.

DocumentToLineStream

Reads a plain text file and return each line as a String object.

DownloadUtil

This class facilitates the downloading of pretrained OpenNLP models.

DownloadUtil.ModelType

The type of model.

dutchStemmer

This class implements the stemming algorithm defined by a snowball script.

DynamicEvalParameters

EmojiCharSequenceNormalizer

A EmojiCharSequenceNormalizer implementation that normalizes text in terms of emojis.

EmptyLinePreprocessorStream

ObjectStream to clean up empty lines for empty line separated document streams.
- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing

This stream should be used by the components that mark empty lines to mark document boundaries.

EncodingParameter

Encoding parameter.

EndOfSentenceScanner

Scans CharSequence, StringBuffer, and char[] for the offsets of sentence ending characters.

englishStemmer

This class implements the stemming algorithm defined by a snowball script.

EntityLinker<T extends Span>

EntityLinkers establish connections with external data to enrich extracted entities.

EntityLinkerFactory

Generates a EntityLinker instances via a properties file configuration.

EntityLinkerProperties

Properties wrapper for EntityLinker implementations.

EntityLinkerTool

Entry

An Entry is a StringList which can optionally be mapped to attributes.

EntryInserter

EvalitaNameSampleStream

Parser for the Italian NER training files of the Evalita 2007 and 2009 NER shared tasks.

EvalitaNameSampleStream.LANGUAGE

EvalitaNameSampleStreamFactory

Note: Do not use this class, internal use only!

EvalParameters

This class encapsulates the variables used in producing probabilities from a model and facilitates passing these variables to the eval method.

EvaluationMonitor<T>

Evaluator<T>

An abstract base class for evaluators.

EvaluatorParams

Common evaluation parameters.

Event

The context of a decision point during training.

EventAnnotation

EventModelSequenceTrainer<T>

A specialized Trainer that is based on a 'EventModelSequence' approach.

EventTraceStream

EventTrainer

A specialized Trainer that is based on an Event approach.

Experimental

Indicates that a certain API feature is not stable and might change with a new release.

ExtensionLoader

The ExtensionLoader is responsible to load extensions to the OpenNLP library.

ExtensionNotLoadedException

Exception indicates that an OpenNLP extension could not be loaded.

ExtensionServiceKeys

Factory

FeatureGenerator

Interface for generating features for document categorization.

FeatureGeneratorResourceProvider

The FeatureGeneratorResourceProvider provides access to the resources available in the model.

FeatureGeneratorUtil

This class provide common utilities for feature generation.

FileEventStream

Class for using a file of events as an event stream.

FileToByteArraySampleStream

Note: Do not use this class, internal use only!

FileToStringSampleStream

Provides the ability to read the contents of files contained in an object stream of files.

FilterObjectStream<S,T>

Abstract base class for filtering streams.

FineGrainedEvaluatorParams

Common evaluation parameters.

finnishStemmer

This class implements the stemming algorithm defined by a snowball script.

FMeasure

The FMeasure is a utility class for evaluators which measures precision, recall and the resulting f-measure.

frenchStemmer

This class implements the stemming algorithm defined by a snowball script.

Function

Interface for a function.

GapLabeler

Represents a labeler for nodes which contain traces so that these traces can be predicted by a Parser.

GeneratorFactory

Creates a set of feature generators based on a provided XML descriptor.

GeneratorFactory.AbstractXmlFeatureGeneratorFactory

GenericModelReader

An generic AbstractModelReader implementation.

GenericModelSerializer

An ArtifactSerializer implementation for models.

GenericModelWriter

An generic AbstractModelWriter implementation.

germanStemmer

This class implements the stemming algorithm defined by a snowball script.

GISModel

A maximum entropy model which has been trained using the Generalized Iterative Scaling (GIS) procedure.

GISModelReader

The base class for readers of GIS models.

GISModelWriter

The base class for writers of GIS models.

GISTrainer

An implementation of Generalized Iterative Scaling (GIS).

Glove

GloVe is an unsupervised learning algorithm for obtaining vector representations for words.

greekStemmer

This class implements the stemming algorithm defined by a snowball script.

HashSumEventStream

A hash sum based AbstractObjectStream implementation.

HeadRules

Encoder for head rules associated with parsing.

HeadRules

Class for storing the English HeadRules associated with parsing.

HeadRules.HeadRulesSerializer

hungarianStemmer

This class implements the stemming algorithm defined by a snowball script.

Index

This classes indexes string lists.

indonesianStemmer

This class implements the stemming algorithm defined by a snowball script.

InputStreamFactory

Allows repeated reads through a stream for certain model building types.

InSpanGenerator

Generates features if the tokens are recognized by the provided TokenNameFinder.

InsufficientTrainingDataException

This exception indicates that the provided training data is insufficient to train a desired model.

Internal

Classes, fields, or methods annotated @Internal are for OpenNLP internal use only.

InvalidFormatException

This exception indicates that a resource violates the expected data format.

IrishSentenceBankDocument

A structure to hold an Irish Sentence Bank document, which is a collection of tokenized sentences.

IrishSentenceBankDocument.IrishSentenceBankFlex

IrishSentenceBankDocument.IrishSentenceBankSentence

IrishSentenceBankSentenceStreamFactory

IrishSentenceBankTokenSampleStreamFactory

irishStemmer

This class implements the stemming algorithm defined by a snowball script.

italianStemmer

This class implements the stemming algorithm defined by a snowball script.

Language

Class for holding the document language and its confidence

LanguageDetector

The interface for LanguageDetector which predicts the Language for a context.

LanguageDetectorConfig

LanguageDetectorContextGenerator

A context generator interface for LanguageDetector.

LanguageDetectorConverterTool

Tool to convert multiple data formats into native OpenNLP language detection training format.

LanguageDetectorCrossValidator

Cross validator for LanguageDetector.

LanguageDetectorCrossValidatorTool

LanguageDetectorEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

LanguageDetectorEvaluationMonitor

A marker interface for evaluating language detectors.

LanguageDetectorEvaluator

The LanguageDetectorEvaluator measures the performance of the given LanguageDetector with the provided reference LanguageSamples.

LanguageDetectorEvaluatorTool

A default LanguageSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

LanguageDetectorEventStream

Iterator-like class for modeling an event stream of samples.

LanguageDetectorFactory

Default factory used by LanguageDetector.

LanguageDetectorFineGrainedReportListener

Generates a detailed report for the POS Tagger.

LanguageDetectorME

Implements a learnable LanguageDetector.

LanguageDetectorModel

The LanguageDetectorModel is the model used by a learnable LanguageDetector.

LanguageDetectorModelLoader

Loads a LanguageDetectorModel for the command line tools.

LanguageDetectorSampleStream

This class reads in string encoded training samples, parses them and outputs LanguageSample objects.

LanguageDetectorSampleStreamFactory

Factory producing OpenNLP DocumentSampleStreams.

LanguageDetectorTool

LanguageDetectorTrainerTool

LanguageModel

A language model can calculate the probability p (between 0 and 1) of a certain sequence of tokens, given its underlying vocabulary.

LanguageParams

LanguageSample

Holds a classified document and its Language.

LanguageSampleStreamFactory<T,P>

Stream factory for those streams which carry language.

LeipzigLanguageSampleStream

LeipzigLanguageSampleStreamFactory

Note: Do not use this class, internal use only!

LemmaEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

LemmaSample

Represents a lemmatized sentence.

LemmaSampleEventStream

Class for creating an event stream out of data files for training a probabilistic Lemmatizer.

LemmaSampleSequenceStream

A SequenceStream implementation encapsulating samples.

LemmaSampleStream

Reads data for training and testing the Lemmatizer.

Lemmatizer

The common interface for lemmatizers.

LemmatizerContextGenerator

Interface for the context generator used for probabilistic Lemmatizer.

LemmatizerEvaluationMonitor

A marker interface for evaluating lemmatizers.

LemmatizerEvaluator

The LemmatizerEvaluator measures the performance of the given Lemmatizer with the provided reference samples.

LemmatizerEvaluatorTool

A default LemmaSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

LemmatizerFactory

The factory that provides Lemmatizer default implementation and resources.

LemmatizerFineGrainedReportListener

Generates a detailed report for the Lemmatizer.

LemmatizerME

A probabilistic Lemmatizer implementation.

LemmatizerMETool

LemmatizerModel

The LemmatizerModel is the model used by a learnable Lemmatizer.

LemmatizerModelLoader

Loads a LemmatizerModel for the command line tools.

LemmatizerSampleStreamFactory

Factory producing OpenNLP LemmaSampleStreams.

LemmatizerTrainerTool

LetsmtDocument

A structure to hold the letsmt document.

LetsmtDocument.LetsmtDocumentHandler

A content handler to receive and process SAX events.

LetsmtDocument.LetsmtSentence

LetsmtSentenceStreamFactory

LineSearch

Class that performs line search to find minimum.

LineSearch.LineSearchResult

Represents a LineSearch result.

LinkedSpan<T extends BaseLink>

A default, extended Span that holds additional information about a Span.

LogPrintStream

This class serves as an adapter for a Logger used within a PrintStream.

LogProbabilities<T>

Class implementing the probability distribution over labels returned by a classifier as a log of probabilities.

LogProbability<T>

A class implementing the logarithmic Probability for a label.

MarkableFileInputStreamFactory

A factory that creates MarkableFileInputStream from a File

MascDocument

MascDocumentStream

MascNamedEntityParser

A class to process the MASC Named entity stand-off annotation file

MascNamedEntitySampleStream

MascNamedEntitySampleStreamFactory

MascPennTagParser

A class for parsing MASC's Penn tagging/tokenization stand-off annotation

MascPOSSampleStream

MascPOSSampleStreamFactory

MascSentence

MascSentenceSampleStream

MascSentenceSampleStreamFactory

MascToken

A specialized Span to express tokens in documents.

MascTokenSampleStream

MascTokenSampleStreamFactory

MascWord

MaxentModel

Interface for maximum entropy models.

Mean

Calculates the arithmetic mean of values added with the Mean.add(double) method.

ModelParameterChunker

A helper class that handles Strings with more than 64k (65535 bytes) in length.

ModelType

Enumeration of supported model types.

ModelUtil

Utility class for handling of models.

MosesSentenceSampleStream

MosesSentenceSampleStreamFactory

Factory producing OpenNLP MosesSentenceSampleStream objects.

Muc6NameSampleStreamFactory

MucNameContentHandler

MucNameSampleStream

MutableContext

An extension of Context used to store parameters or expected values associated with this context which can be updated or assigned.

MutableInt

This is a non-thread safe mutable int.

MutableTagDictionary

Interface that allows TagDictionary entries to be added and removed.

NaiveBayesEvalParameters

Specialized parameters for the evaluation of a naive bayes classifier

NaiveBayesModel

A MaxentModel implementation of the multinomial Naive Bayes classifier model.

NaiveBayesModelReader

The base class for readers of models.

NaiveBayesModelWriter

The base class for NaiveBayesModel writers.

NaiveBayesTrainer

Trains models using the combination of EM algorithm and Naive Bayes classifier which is described in:

NameContextGenerator

Interface for generating the context for a name finder by specifying a set of feature generators.

NameEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

NameFinderCensus90NameStream

This class helps to read the US Census data from the files to build a StringList for each dictionary entry in the name-finder dictionary.

NameFinderEventStream

Class for creating an event stream out of data files for training an TokenNameFinder.

NameFinderME

A maximum-entropy-based name finder implementation.

NameFinderSequenceValidator

The default name finder SequenceValidator implementation.

NameSample

Encapsulates names for a single unit of text.

NameSampleCountersStream

Counts tokens, sentences and names by type.

NameSampleDataStream

The NameSampleDataStream class converts tagged strings provided by a DataStream to NameSample objects.

NameSampleDataStreamFactory

Factory producing OpenNLP NameSampleDataStreams.

NameSampleDataStreamFactory.Parameters

NameSampleSequenceStream

A SequenceStream implementation encapsulating samples.

NameSampleTypeFilter

A stream which removes name samples which do not have a certain type.

NameToSentenceSampleStream

Note: Do not use this class, internal use only!

NameToSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

NameToTokenSampleStream

Note: Do not use this class, internal use only!

NameToTokenSampleStreamFactory

Note: Do not use this class, internal use only!

NegLogLikelihood

Evaluate negative log-likelihood and its gradient from DataIndexer.

NewlineSentenceDetector

The Newline SentenceDetector assumes that sentences are line delimited and recognizes one sentence per non-empty line.

NGramCharModel

The NGramCharModel can be used to create character ngrams.

NGramFeatureGenerator

Generates ngram features for a document.

NGramGenerator

Generates an nGram, via an optional separator, and returns the grams as a list of strings

NGramLanguageModel

A LanguageModel based on a NGramModel using Stupid Backoff to get the probabilities of the ngrams.

NGramLanguageModelTool

Command line tool for NGramLanguageModel.

NGramModel

The NGramModel can be used to crate ngrams and character ngrams.

NGramUtils

Utility class for ngrams.

NKJPSegmentationDocument

NKJPSegmentationDocument.Pointer

NKJPSentenceSampleStream

NKJPSentenceSampleStreamFactory

NKJPTextDocument

The National corpus of Polish (NKJP) format.

norwegianStemmer

This class implements the stemming algorithm defined by a snowball script.

NumberCharSequenceNormalizer

A NumberCharSequenceNormalizer implementation that normalizes text in terms of numbers.

ObjectDataReader

A DataReader implementation based on ObjectInputStream.

ObjectStream<T>

Reads objects from a stream.

ObjectStreamUtils

OnePassDataIndexer

A DataIndexer for maxent model data which handles cutoffs for uncommon contextual predicates and provides a unique integer index for each of the predicates.

OnePassRealValueDataIndexer

A DataIndexer for maxent model data which handles cutoffs for uncommon contextual predicates and provides a unique integer index for each of the predicates and maintains event values.

OntoNotesFormatParameters

OntoNotesNameSampleStream

Name Sample Stream parser for the OntoNotes 4.0 corpus.

OntoNotesNameSampleStreamFactory

OntoNotesParseSampleStream

OntoNotesParseSampleStreamFactory

OntoNotesPOSSampleStreamFactory

OutcomePriorFeatureGenerator

The definition feature maps the underlying distribution of outcomes.

ParagraphStream

A FilterObjectStream which merges text lines into paragraphs.

ParallelNegLogLikelihood

Evaluate negative log-likelihood and its gradient in parallel

Parse

Data structure for holding parse constituents.

Parser

A shift reduce style Parser implementation based on Adwait Ratnaparkhi's 1998 thesis.

Parser

Defines common methods for full-syntactic parsers.

Parser

A built-attach Parser implementation.

ParserChunkerFactory

ParserChunkerSequenceValidator

The parser chunker SequenceValidator implementation.

ParserConverterTool

Tool to convert multiple data formats into native OpenNLP parser format.

ParserCrossValidator

Cross validator for a Parser.

ParserEvaluationMonitor

A marker interface for evaluating parsers.

ParserEvaluator

This implementation of Evaluator<Parse> behaves like EVALB with no exceptions, e.g, without removing punctuation tags, or equality between ADVP and PRT, as in COLLINS convention.

ParserEvaluatorTool

A default Parse-centric implementation of AbstractEvaluatorTool that prints to an output stream.

ParserEventStream

Wrapper class for one of four shift-reduce parser event streams.

ParserEventStream

Wrapper class for one of four built-attach parser event streams.

ParserEventTypeEnum

Enumeration of event types for a Parser.

ParserFactory

ParserModel

This is the default ParserModel implementation.

ParserModelLoader

Loads a ParserModel for the command line tools.

ParserTool

ParserTrainerTool

ParserType

Enumeration of supported Parser types.

ParseSampleStream

ParseSampleStreamFactory

Factory producing OpenNLP ParseSampleStreams.

ParseSampleStreamFactory.Parameters

ParseToPOSSampleStream

Note: Do not use this class, internal use only!

ParseToPOSSampleStreamFactory

Note: Do not use this class, internal use only!

ParseToSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

ParseToTokenSampleStreamFactory

Note: Do not use this class, internal use only!

PerceptronModel

A model implementation based one the perceptron algorithm.

PerceptronModelReader

The base class for readers of models.

PerceptronModelWriter

The base class for PerceptronModel writers.

PerceptronTrainer

Trains models using the perceptron algorithm.

PlainTextByLineStream

Reads a plain text file and returns each line as a String object.

PlainTextFileDataReader

A generic DataReader implementation for plain text files.

PlainTextNaiveBayesModelReader

A NaiveBayesModelReader that reads models from a plain text format.

PlainTextNaiveBayesModelWriter

A NaiveBayesModelWriter that writes models in a plain text format.

porterStemmer

This class implements the stemming algorithm defined by a snowball script.

PorterStemmer

A Stemmer, implementing the Porter Stemming Algorithm

PortugueseContractionUtility

Utility class to handle Portuguese contractions.

portugueseStemmer

This class implements the stemming algorithm defined by a snowball script.

POSContextGenerator

Interface for a BeamSearchContextGenerator used in POS tagging.

POSDictionary

Provides a means of determining which tags are valid for a particular word based on a TagDictionary read from a file.

POSEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

POSEvaluator

The POSEvaluator measures the performance of the given POSTagger with the provided reference samples.

POSModel

The POSModel is the model used by a learnable POSTagger.

POSModelLoader

Loads a POSModel for the command line tools.

POSModelSerializer

An ArtifactSerializer implementation for models.

POSSample

Represents an pos-tagged sentence.

POSSampleEventStream

Reads the samples from an Iterator and converts those samples into events which can be used by the maxent library for training.

POSSampleSequenceStream

A SequenceStream implementation encapsulating samples.

PosSampleStream

POSTagger

The interface for part of speech taggers.

POSTaggerConverterTool

Tool to convert multiple data formats into native OpenNLP part of speech tagging training format.

POSTaggerCrossValidator

POSTaggerCrossValidatorTool

POSTaggerEvaluationMonitor

A marker interface for evaluating pos taggers.

POSTaggerEvaluatorTool

A default POSSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

POSTaggerFactory

The factory that provides POSTagger default implementations and resources.

POSTaggerFactory.POSDictionarySerializer

PosTaggerFeatureGenerator

PosTaggerFeatureGeneratorFactory

POSTaggerFineGrainedReportListener

Generates a detailed report for the POS Tagger.

POSTaggerME

A part-of-speech tagger that uses maximum entropy.

POSTaggerNameFeatureGenerator

Adds the token POS Tag as feature.

POSTaggerNameFeatureGeneratorFactory

POSTaggerTool

POSTaggerTrainerTool

POSToSentenceSampleStream

Note: Do not use this class, internal use only!

POSToSentenceSampleStreamFactory

Note: Do not use this class, internal use only!

POSToTokenSampleStream

Note: Do not use this class, internal use only!

POSToTokenSampleStreamFactory

Note: Do not use this class, internal use only!

PrefixFeatureGenerator

PrefixFeatureGeneratorFactory

PreviousMapFeatureGenerator

This AdaptiveFeatureGenerator generates features indicating the outcome associated with a previously occurring word.

PreviousMapFeatureGeneratorFactory

PreviousTwoMapFeatureGenerator

This AdaptiveFeatureGenerator generates features indicating the outcome associated with two previously occurring words.

Prior

This interface allows one to implement a prior distribution for use in maximum entropy model training.

Probabilities<T>

Class implementing the probability distribution over labels returned by a classifier.

Probability<T>

Class implementing the probability for a label.

ProbingLanguageDetectionResult

A data container encapsulating language detection results.

QNMinimizer

Implementation of L-BFGS which supports L1-, L2-regularization and Elastic Net for solving convex optimization problems.

QNMinimizer.Evaluator

Evaluate quality of training parameters.

QNMinimizer.L2RegFunction

L2-regularized objective Function.

QNModel

A maximum entropy model which has been trained using the Quasi Newton (QN) algorithm.

QNModelReader

The base class for readers of QN models.

QNModelWriter

The base class for writers of models.

QNTrainer

A Maxent model Trainer using L-BFGS algorithm.

RealBasicEventStream

Class for real-valued events as an event stream. .

RealValueFileEventStream

Class for using a file of real-valued events as an event stream.

RegexNameFinder

A TokenNameFinder implementation based on a series of regular expressions.

RegexNameFinderFactory

Returns a RegexNameFinder based on a selection of defaults or a configuration and a selection of defaults.

RegexNameFinderFactory.DEFAULT_REGEX_NAME_FINDER

Enumeration of typical regex expressions available in OpenNLP.

RegexNameFinderFactory.RegexAble

RelationAnnotation

ResetableIterator<E>

This interface makes an Iterator resettable.

ReverseListIterator<T>

An iterator for a list which returns values in the opposite order as the typical list iterator.

romanianStemmer

This class implements the stemming algorithm defined by a snowball script.

russianStemmer

This class implements the stemming algorithm defined by a snowball script.

Sample

Represents a generic type of processable elements.

SDContextGenerator

Interface for SentenceDetectorME context generators.

SDCrossValidator

A cross validator for sentence detectors.

SDEventStream

SegmenterObjectStream<S,T>

SentenceContextGenerator

Creates contexts/features for end-of-sentence detection in Thai text.

SentenceDetector

The interface for sentence detectors, which find the sentence boundaries in a text.

SentenceDetectorConverterTool

Tool to convert multiple data formats into native OpenNLP sentence detector training format.

SentenceDetectorCrossValidatorTool

SentenceDetectorEvaluationMonitor

SentenceDetectorEvaluator

The SentenceDetectorEvaluator measures the performance of the given SentenceDetector with the provided reference SentenceSamples.

SentenceDetectorEvaluatorTool

A default SentenceSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

SentenceDetectorFactory

The factory that provides SentenceDetector default implementations and resources

SentenceDetectorME

A sentence detector for splitting up raw text into sentences.

SentenceDetectorTool

A sentence detector which uses a maxent model to predict the sentences.

SentenceDetectorTrainerTool

SentenceEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

SentenceFeatureGenerator

This feature generator creates sentence begin and end features.

SentenceFeatureGeneratorFactory

SentenceModel

The SentenceModel is the model used by a learnable SentenceDetector.

SentenceSample

A SentenceSample contains a document with begin indexes of the individual sentences.

SentenceSampleStream

This class is a stream filter which reads a sentence by line samples from an ObjectStream and converts them into SentenceSample objects.

SentenceSampleStreamFactory

Factory producing OpenNLP SentenceSampleStreams.

Sequence<T>

Class which models a sequence.

Sequence

Represents a weighted sequence of outcomes.

SequenceClassificationModel<T>

A classification model that can label an input Sequence.

SequenceCodec<T>

A codec for sequences of type SequenceCodec.

SequenceStream<S>

Interface for streams of sequences used to train sequence models.

SequenceStreamEventStream

Class which turns a SequenceStream into an event stream.

SequenceTrainer

SequenceValidator<T>

SerializableArtifact

A marker interface so that implementing classes can refer to the corresponding ArtifactSerializer implementation.

SgmlParser

SAX style SGML parser.

SgmlParser.ContentHandler

ShrinkCharSequenceNormalizer

A ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.

SimplePerceptronSequenceTrainer

Trains models with sequences using the perceptron algorithm.

SimpleTokenizer

A basic Tokenizer implementation which performs tokenization using character classes.

SimpleTokenizerTool

SnowballStemmer

SnowballStemmer.ALGORITHM

Span

Class for storing start and end integer offsets.

SpanAnnotation

spanishStemmer

This class implements the stemming algorithm defined by a snowball script.

Stemmer

The stemmer is reducing a word to its stem.

StringInterner

A marker-interface for a String interner implementation.

StringInterners

Provides string interning utility methods.

StringList

A StringList is an immutable list of Strings.

StringPattern

Recognizes predefined patterns in strings.

StringUtil

SuffixFeatureGenerator

SuffixFeatureGeneratorFactory

swedishStemmer

This class implements the stemming algorithm defined by a snowball script.

TagDictionary

Interface to determine which tags are valid for a particular word based on a tag dictionary.

TaggerModelReplacerTool

ThreadSafe

Classes, fields, or methods annotated @ThreadSafe are safe to use in multithreading contexts.

TokenClassFeatureGenerator

Generates features for different for the class of the token.

TokenClassFeatureGeneratorFactory

TokenContextGenerator

Interface for context generators required for TokenizerME.

TokenEvaluationErrorListener

A default implementation of EvaluationMonitor that prints to an output stream.

TokenFeatureGenerator

Generates a feature which contains the token itself.

TokenFeatureGeneratorFactory

Tokenizer

The interface for tokenizers, which segment a string into its tokens.

TokenizerConverterTool

Tool to convert multiple data formats into native OpenNLP sentence detector training format.

TokenizerCrossValidator

A cross validator for tokenizers.

TokenizerCrossValidatorTool

TokenizerEvaluationMonitor

A marker interface for evaluating tokenizers.

TokenizerEvaluator

The TokenizerEvaluator measures the performance of the given Tokenizer with the provided reference samples.

TokenizerFactory

The factory that provides Tokenizer default implementation and resources.

TokenizerME

A Tokenizer for converting raw text into separated tokens.

TokenizerMEEvaluatorTool

A default TokenSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

TokenizerMETool

TokenizerModel

The TokenizerModel is the model used by a learnable Tokenizer.

TokenizerModelLoader

Loads a TokenizerModel for the command line tools.

TokenizerStream

The TokenizerStream uses a Tokenizer to tokenize the input string and output samples.

TokenizerTrainerTool

TokenNameFinder

The interface for name finders which provide name tags for a sequence of tokens.

TokenNameFinderConverterTool

Tool to convert multiple data formats into native OpenNLP name finder training format.

TokenNameFinderCrossValidator

Cross validator for TokenNameFinder.

TokenNameFinderCrossValidatorTool

TokenNameFinderDetailedFMeasureListener

TokenNameFinderEvaluationMonitor

A marker interface for evaluating name finders.

TokenNameFinderEvaluator

The TokenNameFinderEvaluator measures the performance of the given TokenNameFinder with the provided reference samples.

TokenNameFinderEvaluatorTool

A default NameSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.

TokenNameFinderFactory

The factory that provides TokenNameFinder default implementations and resources.

TokenNameFinderFineGrainedReportListener

Generates a detailed report for the NameFinder.

TokenNameFinderModel

The TokenNameFinderModel is the model used by a learnable TokenNameFinder.

TokenNameFinderModel.FeatureGeneratorCreationError

TokenNameFinderModelLoader

Loads a TokenNameFinderModel for the command line tools.

TokenNameFinderTool

TokenNameFinderTrainerTool

TokenPatternFeatureGenerator

Partitions tokens into sub-tokens based on character classes and generates class features for each of the sub-tokens and combinations of those sub-tokens.

TokenPatternFeatureGeneratorFactory

TokenSample

A TokenSample is text with token spans.

TokenSampleStream

Class which produces an Iterator<TokenSample> from a file of space delimited token.

TokenSampleStream

This class is a stream filter which reads in string encoded samples and creates samples out of them.

TokenSampleStreamFactory

Factory producing OpenNLP TokenSampleStreams.

TokenTag

TokSpanEventStream

This class reads the samples via an Iterator and converts the samples into events which can be used by the maxent library for training.

Trainer

Represents a common base for training implementations.

TrainerFactory

A factory to initialize Trainer instances depending on a trainer type configured via TrainingParameters.

TrainerFactory.TrainerType

TrainingParameters

Declares and handles default parameters used for or during training models.

TrainingToolParams

Common training parameters.

TrigramNameFeatureGenerator

Adds trigram features based on tokens and token classes.

TrigramNameFeatureGeneratorFactory

turkishStemmer

This class implements the stemming algorithm defined by a snowball script.

TwentyNewsgroupSampleStream

TwentyNewsgroupSampleStreamFactory

TwitterCharSequenceNormalizer

A TwitterCharSequenceNormalizer implementation that normalizes text in terms of Twitter character patterns.

TwoPassDataIndexer

Collecting event and context counts by making two passes over the events.

UncloseableInputStream

An InputStream which cannot be closed.

UniformPrior

Provide a maximum entropy model with a uniform Prior.

UrlCharSequenceNormalizer

A UrlCharSequenceNormalizer implementation that normalizes text in terms of URls and email addresses.

Version

The Version class represents the OpenNLP Tools library version.

WhitespaceTokenizer

A basic Tokenizer implementation which performs tokenization using white spaces.

WhitespaceTokenStream

This stream formats ObjectStream of samples into whitespace separated token strings.

WindowFeatureGenerator

Generates previous and next features for a given AdaptiveFeatureGenerator.

WindowFeatureGeneratorFactory

WordClusterDictionary

WordClusterDictionary.WordClusterDictionarySerializer

WordClusterFeatureGenerator

WordClusterFeatureGeneratorFactory

Defines a word cluster generator factory; it reads an element containing 'w2vwordcluster' as a tag name; these clusters are typically produced by word2vec or clark pos induction systems.

WordpieceTokenizer

A Tokenizer implementation which performs tokenization using word pieces.

WordTagSampleStream

A stream filter which reads a sentence per line which contains words and tags in word_tag format and outputs a POSSample objects.

WordTagSampleStreamFactory

Note: Do not use this class, internal use only!

WordTagSampleStreamFactory.Parameters

WordVector

A word vector.

WordVectorTable

A table that maps tokens to word vectors.

WordVectorType

XmlUtil