All Classes and Interfaces

Class
Description
Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.
Abstract class containing many of the methods used to generate contexts for parsing.
A base ObjectStream implementation for events.
Abstract class extended by parser event streams which perform tagging and chunking.
The AdditionalContextFeatureGenerator generates the context from the passed in additional context.
A CharSequenceNormalizer implementation that aggregates the functionality of other normalizers.
The AggregatedFeatureGenerator aggregates a set of feature generators and calls them to generate the features.
Internal class used by Snowball stemmers
Class for storing the Ancora Spanish head rules associated with parsing.
 
This class implements the stemming algorithm defined by a snowball script.
Generates predictive contexts for deciding how constituents should be attached.
The Attributes class stores name value pairs.
Generates a feature for each word in a document.
This is a common base model which can be used by the components' specific model classes.
Base class for all tool factories.
Adds bigram features based on tokens and token classes.
The default SequenceCodec implementation according to the BILOU scheme.
A SequenceValidator implementation for the BilouCodec.
The default SequenceCodec implementation according to the BIO scheme: B: 'beginning' of a NE I: 'inside', the word is inside a NE O: 'outside', the word is a regular word outside a NE See also the paper by Roth D. and Ratinov L.: Design Challenges and Misconceptions in Named Entity Recognition.
Generates Brown cluster features for token bigrams.
Class to load a Brown cluster document in the format: word\tword_class\tprob.
 
Obtain the paths listed in the pathLengths array from the Brown class.
Generates BrownCluster features for current token and token class.
Generates BrownCluster features for current token.
Generates predictive contexts for deciding how constituents should be combined.
Creates the features or contexts for the building phase of parsing.
An ArtifactSerializer implementation for binary data, kept in byte[].
Caches features of the aggregated generators.
This class implements the stemming algorithm defined by a snowball script.
The CharacterNgramFeatureGenerator uses character ngrams to generate features about each token.
Generates predictive context for deciding when a constituent is complete.
Generates predictive context for deciding when a constituent is complete.
Creates predictive context for the pre-chunking phases of parsing.
Cross validator for Chunker.
The ChunkerEvaluator measures the performance of the given Chunker with the provided reference samples.
Class for creating an event stream out of data files for training a Chunker.
 
The class represents a maximum-entropy-based Chunker.
The ChunkerModel is the model used by a learnable Chunker.
An ArtifactSerializer implementation for models.
A SequenceStream implementation encapsulating samples.
Parses the conll 2000 shared task shallow parser training data.
 
An ObjectStream implementation that works on a Collection of E as source for elements.
A configurable context generator for a POSTagger.
Holds feature information about a specific Parse node.
Provides access to training and test partitions for n-fold cross validation.
The CrossValidationPartitioner.TrainingSampleStream which iterates over all training elements.
This class implements the stemming algorithm defined by a snowball script.
Features based on chunking model described in Fei Sha and Fernando Pereira.
The default chunker SequenceValidator implementation.
Default implementation of the EndOfSentenceScanner.
A context generator for language detector.
Simple feature generator for learning statistical lemmatizers.
The default lemmatizer SequenceValidator implementation.
A NameContextGenerator implementation for determining contextual features for a tag-chunk style named-entity recognizer.
A default context generator for a POSTagger.
The default POS tagger SequenceValidator implementation.
Generate event contexts for maxent decisions for sentence detection.
A default TokenContextGenerator which produces events for maxent decisions for tokenization.
 
 
An iterable and serializable dictionary implementation.
A rule based detokenizer.
A persistor used by for reading and writing dictionaries of all kinds.
The DictionaryFeatureGenerator uses a DictionaryNameFinder to generate features for detected names based on the InSpanGenerator.
A Lemmatizer implementation that works by simple dictionary lookup into a Map built from a file containing, for each line:
This is a Dictionary based name finder.
An ArtifactSerializer implementation for dictionaries.
Cross validator for DocumentCategorizer.
The factory that provides Doccat default implementations and resources.
A model for document categorization
This feature generator creates document begin features.
The DocumentCategorizerEvaluator measures the performance of the given DocumentCategorizer with the provided reference samples.
Iterator-like class for modeling document classification events.
A Max-Ent based implementation of DocumentCategorizer.
Reads in string encoded training samples, parses them and outputs DocumentSample objects.
This class facilitates the downloading of pretrained OpenNLP models.
This class implements the stemming algorithm defined by a snowball script.
A EmojiCharSequenceNormalizer implementation that normalizes text in terms of emojis.
ObjectStream to clean up empty lines for empty line separated document streams.
- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing

This stream should be used by the components that mark empty lines to mark document boundaries.
This class implements the stemming algorithm defined by a snowball script.
Generates a EntityLinker instances via a properties file configuration.
An Entry is a StringList which can optionally be mapped to attributes.
 
An abstract base class for evaluators.
 
 
 
This class provide common utilities for feature generation.
Abstract base class for filtering streams.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Creates a set of feature generators based on a provided XML descriptor.
 
An ArtifactSerializer implementation for models.
This class implements the stemming algorithm defined by a snowball script.
GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
This class implements the stemming algorithm defined by a snowball script.
Class for storing the English HeadRules associated with parsing.
 
This class implements the stemming algorithm defined by a snowball script.
This classes indexes string lists.
This class implements the stemming algorithm defined by a snowball script.
Generates features if the tokens are recognized by the provided TokenNameFinder.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
 
Cross validator for LanguageDetector.
The LanguageDetectorEvaluator measures the performance of the given LanguageDetector with the provided reference LanguageSamples.
Iterator-like class for modeling an event stream of samples.
Default factory used by LanguageDetector.
Implements a learnable LanguageDetector.
The LanguageDetectorModel is the model used by a learnable LanguageDetector.
This class reads in string encoded training samples, parses them and outputs LanguageSample objects.
Class for creating an event stream out of data files for training a probabilistic Lemmatizer.
A SequenceStream implementation encapsulating samples.
Reads data for training and testing the Lemmatizer.
The LemmatizerEvaluator measures the performance of the given Lemmatizer with the provided reference samples.
The factory that provides Lemmatizer default implementation and resources.
A probabilistic Lemmatizer implementation.
The LemmatizerModel is the model used by a learnable Lemmatizer.
This class serves as an adapter for a Logger used within a PrintStream.
Utility class for handling of models.
This is a non-thread safe mutable int.
Class for creating an event stream out of data files for training an TokenNameFinder.
A maximum-entropy-based name finder implementation.
The default name finder SequenceValidator implementation.
The NameSampleDataStream class converts tagged tokens provided by a DataStream to NameSample objects.
A SequenceStream implementation encapsulating samples.
A stream which removes name samples which do not have a certain type.
The Newline SentenceDetector assumes that sentences are line delimited and recognizes one sentence per non-empty line.
The NGramCharModel can be used to create character ngrams.
Generates ngram features for a document.
Generates an nGram, via an optional separator, and returns the grams as a list of strings
A LanguageModel based on a NGramModel using Stupid Backoff to get the probabilities of the ngrams.
The NGramModel can be used to crate ngrams and character ngrams.
Utility class for ngrams.
This class implements the stemming algorithm defined by a snowball script.
A NumberCharSequenceNormalizer implementation that normalizes text in terms of numbers.
The definition feature maps the underlying distribution of outcomes.
A shift reduce style Parser implementation based on Adwait Ratnaparkhi's 1998 thesis.
A built-attach Parser implementation.
 
The parser chunker SequenceValidator implementation.
Wrapper class for one of four shift-reduce parser event streams.
Wrapper class for one of four built-attach parser event streams.
 
This is the default ParserModel implementation.
 
This class implements the stemming algorithm defined by a snowball script.
A Stemmer, implementing the Porter Stemming Algorithm
This class implements the stemming algorithm defined by a snowball script.
Provides a means of determining which tags are valid for a particular word based on a TagDictionary read from a file.
The POSEvaluator measures the performance of the given POSTagger with the provided reference samples.
The POSModel is the model used by a learnable POSTagger.
An ArtifactSerializer implementation for models.
Reads the samples from an Iterator and converts those samples into events which can be used by the maxent library for training.
A SequenceStream implementation encapsulating samples.
 
Defines the format for part-of-speech tagging, i.e.
A mapping implementation for converting between different POS tag formats.
 
 
The factory that provides POSTagger default implementations and resources.
 
A POS tagging driven feature generator.
A part-of-speech tagger implementation that uses maximum entropy.
Adds the token POS tag as feature.
A feature generator implementation that generates prefix-based features.
This AdaptiveFeatureGenerator generates features indicating the outcome associated with a previously occurring word.
This AdaptiveFeatureGenerator generates features indicating the outcome associated with two previously occurring words.
A data container encapsulating language detection results.
A TokenNameFinder implementation based on a series of regular expressions.
Returns a RegexNameFinder based on a selection of defaults or a configuration and a selection of defaults.
Enumeration of typical regex expressions available in OpenNLP.
 
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
A cross validator for sentence detectors.
 
Creates contexts/features for end-of-sentence detection in Thai text.
The SentenceDetectorEvaluator measures the performance of the given SentenceDetector with the provided reference SentenceSamples.
The factory that provides SentenceDetector default implementations and resources
A sentence detector for splitting up raw text into sentences.
This feature generator creates sentence begin and end features.
The SentenceModel is the model used by a learnable SentenceDetector.
This class is a stream filter which reads a sentence by line samples from an ObjectStream and converts them into SentenceSample objects.
Class for using a Context Generator for Sentiment Analysis.
Class for performing cross validation on the Sentiment Analysis Parser.
The SentimentEvaluator measures the performance of the given SentimentME with the provided reference SentimentSamples.
Class for creating events for Sentiment Analysis that is later sent to MaxEnt.
Class for creating sentiment factories for training.
A SentimentDetector implementation for creating and using maximum-entropy-based Sentiment Analysis models.
Class for the basis of the Sentiment Analysis model.
Class for converting Strings through Data Stream to SentimentSample using tokenised text.
Class for creating a type filter.
A ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.
A basic Tokenizer implementation which performs tokenization using character classes.
Base class for a snowball stemmer
 
 
This class implements the stemming algorithm defined by a snowball script.
Provides string interning utility methods.
A StringList is an immutable list of Strings.
Recognizes predefined patterns in strings.
A feature generator implementation that generates suffix-based features.
This class implements the stemming algorithm defined by a snowball script.
A thread-safe version of the ChunkerME.
A thread-safe version of the LanguageDetectorME.
A thread-safe version of the LemmatizerME.
A thread-safe version of NameFinderME.
A thread-safe version of the POSTaggerME.
A thread-safe version of SentenceDetectorME.
A thread-safe version of TokenizerME.
Generates features for the class of a token.
Generates a feature which contains a token itself.
A cross validator for tokenizers.
The TokenizerEvaluator measures the performance of the given Tokenizer with the provided reference samples.
The factory that provides Tokenizer default implementation and resources.
A Tokenizer for converting raw text into separated tokens.
The TokenizerModel is the model used by a learnable Tokenizer.
The TokenizerStream uses a Tokenizer to tokenize the input string and output samples.
Cross validator for TokenNameFinder.
The TokenNameFinderEvaluator measures the performance of the given TokenNameFinder with the provided reference samples.
The factory that provides TokenNameFinder default implementations and resources.
The TokenNameFinderModel is the model used by a learnable TokenNameFinder.
 
Partitions tokens into sub-tokens based on character classes and generates class features for each of the sub-tokens and combinations of those sub-tokens.
Class which produces an Iterator<TokenSample> from a file of space delimited token.
This class is a stream filter which reads in string encoded samples and creates samples out of them.
This class reads the samples via an Iterator and converts the samples into events which can be used by the maxent library for training.
A factory to initialize Trainer instances depending on a trainer type configured via Parameters.
 
Adds trigram features based on tokens and token classes.
This class implements the stemming algorithm defined by a snowball script.
A TwitterCharSequenceNormalizer implementation that normalizes text in terms of Twitter character patterns.
An InputStream which cannot be closed.
A UrlCharSequenceNormalizer implementation that normalizes text in terms of URls and email addresses.
The Version class represents the OpenNLP Tools library version.
This stream formats ObjectStream of samples into whitespace separated token strings.
Generates previous (left-sided) and next (right-sided) features for a given AdaptiveFeatureGenerator.
 
 
An AdaptiveFeatureGenerator implementation of a word cluster feature generator.
Defines a word cluster GeneratorFactory; it reads an element containing 'w2vwordcluster' as a tag name.
A stream filter which reads a sentence per line that contains words and tags in word_tag format and outputs a POSSample objects.