All Classes and Interfaces
Class
Description
Abstract class which contains code to tag and chunk parses for bottom up parsing and
leaves implementation of advancing parses and completing parses to extend class.
Abstract class containing many of the methods used to generate contexts for parsing.
A base
ObjectStream implementation for events.Abstract class extended by parser event streams which perform tagging and chunking.
The
AdditionalContextFeatureGenerator generates the context from the passed
in additional context.A
CharSequenceNormalizer implementation that aggregates the
functionality of other normalizers.The
AggregatedFeatureGenerator aggregates a set of
feature generators and calls them
to generate the features.A
GeneratorFactory that produces AggregatedFeatureGenerator instances
when AggregatedFeatureGeneratorFactory.create() is called.Internal class used by Snowball stemmers
Class for storing the Ancora Spanish head rules associated with parsing.
This class implements the stemming algorithm defined by a snowball script.
Generates predictive contexts for deciding how constituents should be attached.
The
Attributes class stores name value pairs.Generates a feature for each word in a document.
This is a common base model which can be used by the components' specific
model classes.
Base class for all tool
factories.Adds bigram features based on tokens and token classes.
A
GeneratorFactory that produces BigramNameFeatureGenerator instances
when BigramNameFeatureGeneratorFactory.create() is called.The default
SequenceCodec implementation according to the BILOU scheme.A
SequenceValidator implementation for the BilouCodec.The default
SequenceCodec implementation according to the BIO scheme:
B: 'beginning' of a NE
I: 'inside', the word is inside a NE
O: 'outside', the word is a regular word outside a NE
See also the paper by Roth D. and Ratinov L.:
Design Challenges and Misconceptions in Named Entity Recognition.Generates Brown cluster features for token bigrams.
Class to load a Brown cluster document in the format:
word\tword_class\tprob.A
GeneratorFactory that produces BrownBigramFeatureGenerator instances
when BrownClusterBigramFeatureGeneratorFactory.create() is called.A
GeneratorFactory that produces BrownTokenClassFeatureGenerator instances
when BrownClusterTokenClassFeatureGeneratorFactory.create() is called.A
GeneratorFactory that produces BrownTokenFeatureGenerator instances
when BrownClusterTokenFeatureGeneratorFactory.create() is called.Obtain the paths listed in the pathLengths array from the Brown class.
Generates
BrownCluster features for current token and token class.Generates
BrownCluster features for current token.Generates predictive contexts for deciding how constituents should be combined.
Creates the features or contexts for the building phase of parsing.
An
ArtifactSerializer implementation for binary data, kept in byte[].Caches features of the aggregated
generators.A
GeneratorFactory that produces CachedFeatureGenerator instances
when CachedFeatureGeneratorFactory.create() is called.This class implements the stemming algorithm defined by a snowball script.
The
CharacterNgramFeatureGenerator uses character ngrams to
generate features about each token.A
GeneratorFactory that produces CharacterNgramFeatureGenerator instances
when CharacterNgramFeatureGeneratorFactory.create() is called.Generates predictive context for deciding when a constituent is complete.
Generates predictive context for deciding when a constituent is complete.
Creates predictive context for the pre-chunking phases of parsing.
Cross validator for
Chunker.The
ChunkerEvaluator measures the performance of the given Chunker with the provided
reference samples.Class for creating an event stream out of data files for training a
Chunker.The class represents a maximum-entropy-based
Chunker.The
ChunkerModel is the model used by a learnable Chunker.An
ArtifactSerializer implementation for models.A
SequenceStream implementation encapsulating samples.Parses the conll 2000 shared task shallow parser training data.
A configurable
context generator for a POSTagger.Holds feature information about a specific
Parse node.Provides access to training and test partitions for n-fold cross validation.
The
CrossValidationPartitioner.TrainingSampleStream which iterates over
all training elements.This class implements the stemming algorithm defined by a snowball script.
Features based on chunking model described in Fei Sha and Fernando Pereira.
The default chunker
SequenceValidator implementation.Default implementation of the
EndOfSentenceScanner.A context generator for
language detector.Simple feature generator for learning statistical lemmatizers.
The default lemmatizer
SequenceValidator implementation.A
NameContextGenerator implementation for determining contextual features
for a tag-chunk style named-entity recognizer.A default
context generator for a POSTagger.The default POS tagger
SequenceValidator implementation.Generate event contexts for maxent decisions for sentence detection.
A default
TokenContextGenerator which produces events for maxent decisions
for tokenization.A
GeneratorFactory that produces OutcomePriorFeatureGenerator instances
when DefinitionFeatureGeneratorFactory.create() is called.An iterable and serializable dictionary implementation.
A rule based detokenizer.
A persistor used by for reading and writing
dictionaries
of all kinds.The
DictionaryFeatureGenerator uses a DictionaryNameFinder
to generate features for detected names based on the InSpanGenerator.A
GeneratorFactory that produces DictionaryFeatureGenerator instances
when DictionaryFeatureGeneratorFactory.create() is called.A
Lemmatizer implementation that works by simple dictionary lookup into
a Map built from a file containing, for each line:This is a
Dictionary based name finder.An
ArtifactSerializer implementation for dictionaries.Cross validator for
DocumentCategorizer.The factory that provides Doccat default implementations and resources.
A model for document categorization
This feature generator creates document begin features.
A
GeneratorFactory that produces DocumentBeginFeatureGenerator instances
when DocumentBeginFeatureGeneratorFactory.create() is called.The
DocumentCategorizerEvaluator measures the performance of
the given DocumentCategorizer with the provided reference
samples.Iterator-like class for modeling document classification events.
A Max-Ent based implementation of
DocumentCategorizer.Reads in string encoded training samples, parses them and
outputs
DocumentSample objects.This class facilitates the downloading of pretrained OpenNLP models.
This class implements the stemming algorithm defined by a snowball script.
A
EmojiCharSequenceNormalizer implementation that normalizes text
in terms of emojis.ObjectStream to clean up empty lines for empty line separated document streams.- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing
This stream should be used by the components that mark empty lines to mark document boundaries.
This class implements the stemming algorithm defined by a snowball script.
Generates a
EntityLinker instances via a properties file configuration.An
Entry is a StringList which can
optionally be mapped to attributes.An abstract base class for evaluators.
This class provide common utilities for feature generation.
Abstract base class for filtering
streams.This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Creates a set of feature generators based on a provided XML descriptor.
An
ArtifactSerializer implementation for models.This class implements the stemming algorithm defined by a snowball script.
GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
This class implements the stemming algorithm defined by a snowball script.
Class for storing the English
HeadRules associated with parsing.This class implements the stemming algorithm defined by a snowball script.
This classes indexes
string lists.This class implements the stemming algorithm defined by a snowball script.
Generates features if the tokens are recognized by the provided
TokenNameFinder.This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Cross validator for
LanguageDetector.The
LanguageDetectorEvaluator measures the performance of
the given LanguageDetector with the provided reference
LanguageSamples.Iterator-like class for modeling an event stream of
samples.Default factory used by
LanguageDetector.Implements a learnable
LanguageDetector.The
LanguageDetectorModel is the model used by a learnable LanguageDetector.This class reads in string encoded
training samples, parses them
and outputs LanguageSample objects.Class for creating an event stream out of data files for training a probabilistic
Lemmatizer.A
SequenceStream implementation encapsulating samples.Reads data for training and testing the
Lemmatizer.The
LemmatizerEvaluator measures the performance of
the given Lemmatizer with the provided reference
samples.The factory that provides
Lemmatizer default implementation and
resources.A probabilistic
Lemmatizer implementation.The
LemmatizerModel is the model used by a learnable Lemmatizer.This class serves as an adapter for a
Logger used within a PrintStream.Utility class for handling of
models.This is a non-thread safe mutable int.
Class for creating an event stream out of data files for training an
TokenNameFinder.A maximum-entropy-based
name finder implementation.The default name finder
SequenceValidator implementation.The
NameSampleDataStream class converts tagged tokens
provided by a DataStream to NameSample objects.A
SequenceStream implementation encapsulating samples.A
stream which removes name samples
which do not have a certain type.The Newline
SentenceDetector assumes that sentences are line delimited and
recognizes one sentence per non-empty line.The
NGramCharModel can be used to create character ngrams.Generates ngram features for a document.
Generates an nGram, via an optional separator, and returns the grams as a list
of strings
A
LanguageModel based on a NGramModel using Stupid Backoff to get
the probabilities of the ngrams.The
NGramModel can be used to crate ngrams and character ngrams.Utility class for ngrams.
This class implements the stemming algorithm defined by a snowball script.
A
NumberCharSequenceNormalizer implementation that normalizes text
in terms of numbers.The definition feature maps the underlying distribution of outcomes.
A shift reduce style
Parser implementation
based on Adwait Ratnaparkhi's 1998 thesis.A built-attach
Parser implementation.The parser chunker
SequenceValidator implementation.Wrapper class for one of four
shift-reduce parser event streams.Wrapper class for one of four
built-attach parser event streams.This is the default
ParserModel implementation.This class implements the stemming algorithm defined by a snowball script.
A
Stemmer, implementing the
Porter Stemming AlgorithmThis class implements the stemming algorithm defined by a snowball script.
Provides a means of determining which tags are valid for a particular word
based on a
TagDictionary read from a file.The
POSEvaluator measures the performance of the given POSTagger
with the provided reference samples.The
POSModel is the model used by a learnable POSTagger.An
ArtifactSerializer implementation for models.Reads the
samples from an Iterator
and converts those samples into events which
can be used by the maxent library for training.A
SequenceStream implementation encapsulating samples.Defines the format for part-of-speech tagging, i.e.
A mapping implementation for converting between different POS tag formats.
The factory that provides
POSTagger default implementations and resources.A POS tagging driven feature generator.
A
GeneratorFactory that produces PosTaggerFeatureGenerator instances
when PosTaggerFeatureGeneratorFactory.create() is called.A
part-of-speech tagger implementation that uses maximum entropy.Adds the token POS tag as feature.
A
GeneratorFactory that produces POSTaggerNameFeatureGenerator instances
when POSTaggerNameFeatureGeneratorFactory.create() is called.A feature generator implementation that generates prefix-based features.
A
GeneratorFactory that produces PrefixFeatureGenerator instances
when PrefixFeatureGeneratorFactory.create() is called.This
AdaptiveFeatureGenerator generates features indicating the
outcome associated with a previously occurring word.A
GeneratorFactory that produces PreviousMapFeatureGenerator instances
when PreviousMapFeatureGeneratorFactory.create() is called.This
AdaptiveFeatureGenerator generates features indicating the
outcome associated with two previously occurring words.A data container encapsulating language detection results.
A
TokenNameFinder implementation based on a series of regular expressions.Returns a
RegexNameFinder based on a selection of
defaults or a configuration and a selection of defaults.Enumeration of typical regex expressions available in OpenNLP.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
A cross validator for
sentence detectors.Creates contexts/features for end-of-sentence detection in Thai text.
The
SentenceDetectorEvaluator measures the performance of
the given SentenceDetector with the provided reference
SentenceSamples.The factory that provides
SentenceDetector default implementations and
resourcesA sentence detector for splitting up raw text into sentences.
This feature generator creates sentence begin and end features.
A
GeneratorFactory that produces SentenceFeatureGenerator instances
when SentenceFeatureGeneratorFactory.create() is called.The
SentenceModel is the model used by a learnable
SentenceDetector.This class is a stream filter which reads a sentence by line samples from
an
ObjectStream and converts them into SentenceSample objects.Class for using a Context Generator for Sentiment Analysis.
Class for performing cross validation on the Sentiment Analysis Parser.
The
SentimentEvaluator measures the performance of
the given SentimentME with the provided reference
SentimentSamples.Class for creating events for Sentiment Analysis that is later sent to
MaxEnt.
Class for creating sentiment factories for training.
A
SentimentDetector implementation for creating and using
maximum-entropy-based Sentiment Analysis models.Class for the basis of the Sentiment Analysis model.
Class for converting Strings through Data Stream to
SentimentSample using
tokenised text.Class for creating a type filter.
A
ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.A basic
Tokenizer implementation which performs tokenization
using character classes.Base class for a snowball stemmer
This class implements the stemming algorithm defined by a snowball script.
Provides string interning utility methods.
A
StringList is an immutable list of Strings.Recognizes predefined patterns in strings.
A feature generator implementation that generates suffix-based features.
A
GeneratorFactory that produces SuffixFeatureGenerator instances
when SuffixFeatureGeneratorFactory.create() is called.This class implements the stemming algorithm defined by a snowball script.
A thread-safe version of the
ChunkerME.A thread-safe version of the
LanguageDetectorME.A thread-safe version of the
LemmatizerME.A thread-safe version of
NameFinderME.A thread-safe version of the
POSTaggerME.A thread-safe version of
SentenceDetectorME.A thread-safe version of
TokenizerME.Generates features for the class of a token.
A
GeneratorFactory that produces TokenClassFeatureGenerator instances
when TokenClassFeatureGeneratorFactory.create() is called.Generates a feature which contains a token itself.
A
GeneratorFactory that produces TokenFeatureGenerator instances
when TokenFeatureGeneratorFactory.create() is called.A cross validator for
tokenizers.The
TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
samples.The factory that provides
Tokenizer default implementation and
resources.A
Tokenizer for converting raw text into separated tokens.The
TokenizerModel is the model used
by a learnable Tokenizer.Cross validator for
TokenNameFinder.The
TokenNameFinderEvaluator measures the performance
of the given TokenNameFinder with the provided
reference samples.The factory that provides
TokenNameFinder default implementations and
resources.The
TokenNameFinderModel is the model used by a learnable TokenNameFinder.Partitions tokens into sub-tokens based on character classes and generates
class features for each of the sub-tokens and combinations of those sub-tokens.
A
GeneratorFactory instantiates TokenPatternFeatureGenerator instances
when TokenPatternFeatureGeneratorFactory.create() is called.Class which produces an Iterator<TokenSample> from a file of space delimited token.
This class is a
stream filter which reads in string encoded
samples and creates samples out of them.This class reads the
samples via an Iterator
and converts the samples into events which
can be used by the maxent library for training.A factory to initialize
Trainer instances depending on a trainer type
configured via Parameters.Adds trigram features based on tokens and token classes.
A
GeneratorFactory that produces TrigramNameFeatureGenerator instances
when TrigramNameFeatureGeneratorFactory.create() is called.This class implements the stemming algorithm defined by a snowball script.
A
TwitterCharSequenceNormalizer implementation that normalizes text
in terms of Twitter character patterns.An
InputStream which cannot be closed.A
UrlCharSequenceNormalizer implementation that normalizes text
in terms of URls and email addresses.The
Version class represents the OpenNLP Tools library version.This stream formats
ObjectStream of samples into whitespace
separated token strings.Generates previous (left-sided) and next (right-sided) features for a
given
AdaptiveFeatureGenerator.A
GeneratorFactory that produces WindowFeatureGenerator instances
when WindowFeatureGeneratorFactory.create() is called.An
AdaptiveFeatureGenerator implementation of a word cluster feature generator.Defines a word cluster
GeneratorFactory; it reads an element containing
'w2vwordcluster' as a tag name.A
stream filter which reads a sentence per line that contains
words and tags in word_tag format and outputs a POSSample objects.