All Classes and Interfaces (Apache OpenNLP :: Core

Class

Description

Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.

AbstractContextGenerator

Abstract class containing many of the methods used to generate contexts for parsing.

AbstractEventStream<T>

A base ObjectStream implementation for events.

AbstractParserEventStream

Abstract class extended by parser event streams which perform tagging and chunking.

AdditionalContextFeatureGenerator

The AdditionalContextFeatureGenerator generates the context from the passed in additional context.

AggregateCharSequenceNormalizer

A CharSequenceNormalizer implementation that aggregates the functionality of other normalizers.

AggregatedFeatureGenerator

The AggregatedFeatureGenerator aggregates a set of feature generators and calls them to generate the features.

AggregatedFeatureGeneratorFactory

A GeneratorFactory that produces AggregatedFeatureGenerator instances when AggregatedFeatureGeneratorFactory.create() is called.

Among

Internal class used by Snowball stemmers

AncoraSpanishHeadRules

Class for storing the Ancora Spanish head rules associated with parsing.

AncoraSpanishHeadRules.HeadRulesSerializer

arabicStemmer

This class implements the stemming algorithm defined by a snowball script.

AttachContextGenerator

Generates predictive contexts for deciding how constituents should be attached.

Attributes

The Attributes class stores name value pairs.

BagOfWordsFeatureGenerator

Generates a feature for each word in a document.

BaseModel

This is a common base model which can be used by the components' specific model classes.

BaseToolFactory

Base class for all tool factories.

BigramNameFeatureGenerator

Adds bigram features based on tokens and token classes.

BigramNameFeatureGeneratorFactory

A GeneratorFactory that produces BigramNameFeatureGenerator instances when BigramNameFeatureGeneratorFactory.create() is called.

BilouCodec

The default SequenceCodec implementation according to the BILOU scheme.

BilouNameFinderSequenceValidator

A SequenceValidator implementation for the BilouCodec.

BioCodec

The default SequenceCodec implementation according to the BIO scheme: B: 'beginning' of a NE I: 'inside', the word is inside a NE O: 'outside', the word is a regular word outside a NE See also the paper by Roth D. and Ratinov L.: Design Challenges and Misconceptions in Named Entity Recognition.

BrownBigramFeatureGenerator

Generates Brown cluster features for token bigrams.

BrownCluster

Class to load a Brown cluster document in the format: word\tword_class\tprob.

BrownCluster.BrownClusterSerializer

BrownClusterBigramFeatureGeneratorFactory

A GeneratorFactory that produces BrownBigramFeatureGenerator instances when BrownClusterBigramFeatureGeneratorFactory.create() is called.

BrownClusterTokenClassFeatureGeneratorFactory

A GeneratorFactory that produces BrownTokenClassFeatureGenerator instances when BrownClusterTokenClassFeatureGeneratorFactory.create() is called.

BrownClusterTokenFeatureGeneratorFactory

A GeneratorFactory that produces BrownTokenFeatureGenerator instances when BrownClusterTokenFeatureGeneratorFactory.create() is called.

BrownTokenClasses

Obtain the paths listed in the pathLengths array from the Brown class.

BrownTokenClassFeatureGenerator

Generates BrownCluster features for current token and token class.

BrownTokenFeatureGenerator

Generates BrownCluster features for current token.

BuildContextGenerator

Generates predictive contexts for deciding how constituents should be combined.

BuildContextGenerator

Creates the features or contexts for the building phase of parsing.

ByteArraySerializer

An ArtifactSerializer implementation for binary data, kept in byte[].

CachedFeatureGenerator

Caches features of the aggregated generators.

CachedFeatureGeneratorFactory

A GeneratorFactory that produces CachedFeatureGenerator instances when CachedFeatureGeneratorFactory.create() is called.

catalanStemmer

This class implements the stemming algorithm defined by a snowball script.

CharacterNgramFeatureGenerator

The CharacterNgramFeatureGenerator uses character ngrams to generate features about each token.

CharacterNgramFeatureGeneratorFactory

A GeneratorFactory that produces CharacterNgramFeatureGenerator instances when CharacterNgramFeatureGeneratorFactory.create() is called.

CheckContextGenerator

Generates predictive context for deciding when a constituent is complete.

CheckContextGenerator

Generates predictive context for deciding when a constituent is complete.

ChunkContextGenerator

Creates predictive context for the pre-chunking phases of parsing.

ChunkerCrossValidator

Cross validator for Chunker.

ChunkerEvaluator

The ChunkerEvaluator measures the performance of the given Chunker with the provided reference samples.

ChunkerEventStream

Class for creating an event stream out of data files for training a Chunker.

ChunkerFactory

ChunkerME

The class represents a maximum-entropy-based Chunker.

ChunkerModel

The ChunkerModel is the model used by a learnable Chunker.

ChunkerModelSerializer

An ArtifactSerializer implementation for models.

ChunkSampleSequenceStream

A SequenceStream implementation encapsulating samples.

ChunkSampleStream

Parses the conll 2000 shared task shallow parser training data.

ChunkSampleStream

CollectionObjectStream<E>

An ObjectStream implementation that works on a Collection of E as source for elements.

ConfigurablePOSContextGenerator

A configurable context generator for a POSTagger.

Cons

Holds feature information about a specific Parse node.

CrossValidationPartitioner<E>

Provides access to training and test partitions for n-fold cross validation.

CrossValidationPartitioner.TrainingSampleStream<E>

The CrossValidationPartitioner.TrainingSampleStream which iterates over all training elements.

danishStemmer

This class implements the stemming algorithm defined by a snowball script.

DefaultChunkerContextGenerator

Features based on chunking model described in Fei Sha and Fernando Pereira.

DefaultChunkerSequenceValidator

The default chunker SequenceValidator implementation.

DefaultEndOfSentenceScanner

Default implementation of the EndOfSentenceScanner.

DefaultLanguageDetectorContextGenerator

A context generator for language detector.

DefaultLemmatizerContextGenerator

Simple feature generator for learning statistical lemmatizers.

DefaultLemmatizerSequenceValidator

The default lemmatizer SequenceValidator implementation.

DefaultNameContextGenerator

A NameContextGenerator implementation for determining contextual features for a tag-chunk style named-entity recognizer.

DefaultPOSContextGenerator

A default context generator for a POSTagger.

DefaultPOSSequenceValidator

The default POS tagger SequenceValidator implementation.

DefaultSDContextGenerator

Generate event contexts for maxent decisions for sentence detection.

DefaultTokenContextGenerator

A default TokenContextGenerator which produces events for maxent decisions for tokenization.

DefinitionFeatureGeneratorFactory

A GeneratorFactory that produces OutcomePriorFeatureGenerator instances when DefinitionFeatureGeneratorFactory.create() is called.

DetokenizationDictionary

DetokenizationDictionary.Operation

Dictionary

An iterable and serializable dictionary implementation.

DictionaryDetokenizer

A rule based detokenizer.

DictionaryEntryPersistor

A persistor used by for reading and writing dictionaries of all kinds.

DictionaryFeatureGenerator

The DictionaryFeatureGenerator uses a DictionaryNameFinder to generate features for detected names based on the InSpanGenerator.

DictionaryFeatureGeneratorFactory

A GeneratorFactory that produces DictionaryFeatureGenerator instances when DictionaryFeatureGeneratorFactory.create() is called.

DictionaryLemmatizer

A Lemmatizer implementation that works by simple dictionary lookup into a Map built from a file containing, for each line:

DictionaryNameFinder

This is a Dictionary based name finder.

DictionarySerializer

An ArtifactSerializer implementation for dictionaries.

DoccatCrossValidator

Cross validator for DocumentCategorizer.

DoccatFactory

The factory that provides Doccat default implementations and resources.

DoccatModel

A model for document categorization

DocumentBeginFeatureGenerator

This feature generator creates document begin features.

DocumentBeginFeatureGeneratorFactory

A GeneratorFactory that produces DocumentBeginFeatureGenerator instances when DocumentBeginFeatureGeneratorFactory.create() is called.

DocumentCategorizerEvaluator

The DocumentCategorizerEvaluator measures the performance of the given DocumentCategorizer with the provided reference samples.

DocumentCategorizerEventStream

Iterator-like class for modeling document classification events.

DocumentCategorizerME

A Max-Ent based implementation of DocumentCategorizer.

DocumentSampleStream

Reads in string encoded training samples, parses them and outputs DocumentSample objects.

DownloadUtil

This class facilitates the downloading of pretrained OpenNLP models.

dutchStemmer

This class implements the stemming algorithm defined by a snowball script.

EmojiCharSequenceNormalizer

A EmojiCharSequenceNormalizer implementation that normalizes text in terms of emojis.

EmptyLinePreprocessorStream

ObjectStream to clean up empty lines for empty line separated document streams.
- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing

This stream should be used by the components that mark empty lines to mark document boundaries.

englishStemmer

This class implements the stemming algorithm defined by a snowball script.

EntityLinkerFactory

Generates a EntityLinker instances via a properties file configuration.

Entry

An Entry is a StringList which can optionally be mapped to attributes.

EntryInserter

Evaluator<T>

An abstract base class for evaluators.

EventTraceStream

Factory

FeatureGeneratorUtil

This class provide common utilities for feature generation.

FilterObjectStream<S,T>

Abstract base class for filtering streams.

finnishStemmer

This class implements the stemming algorithm defined by a snowball script.

frenchStemmer

This class implements the stemming algorithm defined by a snowball script.

GeneratorFactory

Creates a set of feature generators based on a provided XML descriptor.

GeneratorFactory.AbstractXmlFeatureGeneratorFactory

GenericModelSerializer

An ArtifactSerializer implementation for models.

germanStemmer

This class implements the stemming algorithm defined by a snowball script.

Glove

GloVe is an unsupervised learning algorithm for obtaining vector representations for words.

greekStemmer

This class implements the stemming algorithm defined by a snowball script.

HeadRules

Class for storing the English HeadRules associated with parsing.

HeadRules.HeadRulesSerializer

hungarianStemmer

This class implements the stemming algorithm defined by a snowball script.

Index

This classes indexes string lists.

indonesianStemmer

This class implements the stemming algorithm defined by a snowball script.

InSpanGenerator

Generates features if the tokens are recognized by the provided TokenNameFinder.

irishStemmer

This class implements the stemming algorithm defined by a snowball script.

italianStemmer

This class implements the stemming algorithm defined by a snowball script.

LanguageDetectorConfig

LanguageDetectorCrossValidator

Cross validator for LanguageDetector.

LanguageDetectorEvaluator

The LanguageDetectorEvaluator measures the performance of the given LanguageDetector with the provided reference LanguageSamples.

LanguageDetectorEventStream

Iterator-like class for modeling an event stream of samples.

LanguageDetectorFactory

Default factory used by LanguageDetector.

LanguageDetectorME

Implements a learnable LanguageDetector.

LanguageDetectorModel

The LanguageDetectorModel is the model used by a learnable LanguageDetector.

LanguageDetectorSampleStream

This class reads in string encoded training samples, parses them and outputs LanguageSample objects.

LemmaSampleEventStream

Class for creating an event stream out of data files for training a probabilistic Lemmatizer.

LemmaSampleSequenceStream

A SequenceStream implementation encapsulating samples.

LemmaSampleStream

Reads data for training and testing the Lemmatizer.

LemmatizerEvaluator

The LemmatizerEvaluator measures the performance of the given Lemmatizer with the provided reference samples.

LemmatizerFactory

The factory that provides Lemmatizer default implementation and resources.

LemmatizerME

A probabilistic Lemmatizer implementation.

LemmatizerModel

The LemmatizerModel is the model used by a learnable Lemmatizer.

LogPrintStream

This class serves as an adapter for a Logger used within a PrintStream.

ModelUtil

Utility class for handling of models.

MutableInt

This is a non-thread safe mutable int.

NameFinderEventStream

Class for creating an event stream out of data files for training an TokenNameFinder.

NameFinderME

A maximum-entropy-based name finder implementation.

NameFinderSequenceValidator

The default name finder SequenceValidator implementation.

NameSampleDataStream

The NameSampleDataStream class converts tagged tokens provided by a DataStream to NameSample objects.

NameSampleSequenceStream

A SequenceStream implementation encapsulating samples.

NameSampleTypeFilter

A stream which removes name samples which do not have a certain type.

NewlineSentenceDetector

The Newline SentenceDetector assumes that sentences are line delimited and recognizes one sentence per non-empty line.

NGramCharModel

The NGramCharModel can be used to create character ngrams.

NGramFeatureGenerator

Generates ngram features for a document.

NGramGenerator

Generates an nGram, via an optional separator, and returns the grams as a list of strings

NGramLanguageModel

A LanguageModel based on a NGramModel using Stupid Backoff to get the probabilities of the ngrams.

NGramModel

The NGramModel can be used to crate ngrams and character ngrams.

NGramUtils

Utility class for ngrams.

norwegianStemmer

This class implements the stemming algorithm defined by a snowball script.

NumberCharSequenceNormalizer

A NumberCharSequenceNormalizer implementation that normalizes text in terms of numbers.

OutcomePriorFeatureGenerator

The definition feature maps the underlying distribution of outcomes.

Parser

A shift reduce style Parser implementation based on Adwait Ratnaparkhi's 1998 thesis.

Parser

A built-attach Parser implementation.

ParserChunkerFactory

ParserChunkerSequenceValidator

The parser chunker SequenceValidator implementation.

ParserEventStream

Wrapper class for one of four shift-reduce parser event streams.

ParserEventStream

Wrapper class for one of four built-attach parser event streams.

ParserFactory

ParserModel

This is the default ParserModel implementation.

ParseSampleStream

porterStemmer

This class implements the stemming algorithm defined by a snowball script.

PorterStemmer

A Stemmer, implementing the Porter Stemming Algorithm

portugueseStemmer

This class implements the stemming algorithm defined by a snowball script.

POSDictionary

Provides a means of determining which tags are valid for a particular word based on a TagDictionary read from a file.

POSEvaluator

The POSEvaluator measures the performance of the given POSTagger with the provided reference samples.

POSModel

The POSModel is the model used by a learnable POSTagger.

POSModelSerializer

An ArtifactSerializer implementation for models.

POSSampleEventStream

Reads the samples from an Iterator and converts those samples into events which can be used by the maxent library for training.

POSSampleSequenceStream

A SequenceStream implementation encapsulating samples.

PosSampleStream

POSTagFormat

Defines the format for part-of-speech tagging, i.e.

POSTagFormatMapper

A mapping implementation for converting between different POS tag formats.

POSTagFormatMapper.NoOp

POSTaggerCrossValidator

POSTaggerFactory

The factory that provides POSTagger default implementations and resources.

POSTaggerFactory.POSDictionarySerializer

PosTaggerFeatureGenerator

A POS tagging driven feature generator.

PosTaggerFeatureGeneratorFactory

A GeneratorFactory that produces PosTaggerFeatureGenerator instances when PosTaggerFeatureGeneratorFactory.create() is called.

POSTaggerME

A part-of-speech tagger implementation that uses maximum entropy.

POSTaggerNameFeatureGenerator

Adds the token POS tag as feature.

POSTaggerNameFeatureGeneratorFactory

A GeneratorFactory that produces POSTaggerNameFeatureGenerator instances when POSTaggerNameFeatureGeneratorFactory.create() is called.

PrefixFeatureGenerator

A feature generator implementation that generates prefix-based features.

PrefixFeatureGeneratorFactory

A GeneratorFactory that produces PrefixFeatureGenerator instances when PrefixFeatureGeneratorFactory.create() is called.

PreviousMapFeatureGenerator

This AdaptiveFeatureGenerator generates features indicating the outcome associated with a previously occurring word.

PreviousMapFeatureGeneratorFactory

A GeneratorFactory that produces PreviousMapFeatureGenerator instances when PreviousMapFeatureGeneratorFactory.create() is called.

PreviousTwoMapFeatureGenerator

This AdaptiveFeatureGenerator generates features indicating the outcome associated with two previously occurring words.

ProbingLanguageDetectionResult

A data container encapsulating language detection results.

RegexNameFinder

A TokenNameFinder implementation based on a series of regular expressions.

RegexNameFinderFactory

Returns a RegexNameFinder based on a selection of defaults or a configuration and a selection of defaults.

RegexNameFinderFactory.DEFAULT_REGEX_NAME_FINDER

Enumeration of typical regex expressions available in OpenNLP.

RegexNameFinderFactory.RegexAble

romanianStemmer

This class implements the stemming algorithm defined by a snowball script.

russianStemmer

This class implements the stemming algorithm defined by a snowball script.

SDCrossValidator

A cross validator for sentence detectors.

SDEventStream

SentenceContextGenerator

Creates contexts/features for end-of-sentence detection in Thai text.

SentenceDetectorEvaluator

The SentenceDetectorEvaluator measures the performance of the given SentenceDetector with the provided reference SentenceSamples.

SentenceDetectorFactory

The factory that provides SentenceDetector default implementations and resources

SentenceDetectorME

A sentence detector for splitting up raw text into sentences.

SentenceFeatureGenerator

This feature generator creates sentence begin and end features.

SentenceFeatureGeneratorFactory

A GeneratorFactory that produces SentenceFeatureGenerator instances when SentenceFeatureGeneratorFactory.create() is called.

SentenceModel

The SentenceModel is the model used by a learnable SentenceDetector.

SentenceSampleStream

This class is a stream filter which reads a sentence by line samples from an ObjectStream and converts them into SentenceSample objects.

SentimentContextGenerator

Class for using a Context Generator for Sentiment Analysis.

SentimentCrossValidator

Class for performing cross validation on the Sentiment Analysis Parser.

SentimentEvaluator

The SentimentEvaluator measures the performance of the given SentimentME with the provided reference SentimentSamples.

SentimentEventStream

Class for creating events for Sentiment Analysis that is later sent to MaxEnt.

SentimentFactory

Class for creating sentiment factories for training.

SentimentME

A SentimentDetector implementation for creating and using maximum-entropy-based Sentiment Analysis models.

SentimentModel

Class for the basis of the Sentiment Analysis model.

SentimentSampleStream

Class for converting Strings through Data Stream to SentimentSample using tokenised text.

SentimentSampleTypeFilter

Class for creating a type filter.

ShrinkCharSequenceNormalizer

A ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.

SimpleTokenizer

A basic Tokenizer implementation which performs tokenization using character classes.

SnowballProgram

Base class for a snowball stemmer

SnowballStemmer

SnowballStemmer.ALGORITHM

spanishStemmer

This class implements the stemming algorithm defined by a snowball script.

StringInterners

Provides string interning utility methods.

StringList

A StringList is an immutable list of Strings.

StringPattern

Recognizes predefined patterns in strings.

SuffixFeatureGenerator

A feature generator implementation that generates suffix-based features.

SuffixFeatureGeneratorFactory

A GeneratorFactory that produces SuffixFeatureGenerator instances when SuffixFeatureGeneratorFactory.create() is called.

swedishStemmer

This class implements the stemming algorithm defined by a snowball script.

ThreadSafeChunkerME

A thread-safe version of the ChunkerME.

ThreadSafeLanguageDetectorME

A thread-safe version of the LanguageDetectorME.

ThreadSafeLemmatizerME

A thread-safe version of the LemmatizerME.

ThreadSafeNameFinderME

A thread-safe version of NameFinderME.

ThreadSafePOSTaggerME

A thread-safe version of the POSTaggerME.

ThreadSafeSentenceDetectorME

A thread-safe version of SentenceDetectorME.

ThreadSafeTokenizerME

A thread-safe version of TokenizerME.

TokenClassFeatureGenerator

Generates features for the class of a token.

TokenClassFeatureGeneratorFactory

A GeneratorFactory that produces TokenClassFeatureGenerator instances when TokenClassFeatureGeneratorFactory.create() is called.

TokenFeatureGenerator

Generates a feature which contains a token itself.

TokenFeatureGeneratorFactory

A GeneratorFactory that produces TokenFeatureGenerator instances when TokenFeatureGeneratorFactory.create() is called.

TokenizerCrossValidator

A cross validator for tokenizers.

TokenizerEvaluator

The TokenizerEvaluator measures the performance of the given Tokenizer with the provided reference samples.

TokenizerFactory

The factory that provides Tokenizer default implementation and resources.

TokenizerME

A Tokenizer for converting raw text into separated tokens.

TokenizerModel

The TokenizerModel is the model used by a learnable Tokenizer.

TokenizerStream

The TokenizerStream uses a Tokenizer to tokenize the input string and output samples.

TokenNameFinderCrossValidator

Cross validator for TokenNameFinder.

TokenNameFinderEvaluator

The TokenNameFinderEvaluator measures the performance of the given TokenNameFinder with the provided reference samples.

TokenNameFinderFactory

The factory that provides TokenNameFinder default implementations and resources.

TokenNameFinderModel

The TokenNameFinderModel is the model used by a learnable TokenNameFinder.

TokenNameFinderModel.FeatureGeneratorCreationError

TokenPatternFeatureGenerator

Partitions tokens into sub-tokens based on character classes and generates class features for each of the sub-tokens and combinations of those sub-tokens.

TokenPatternFeatureGeneratorFactory

A GeneratorFactory instantiates TokenPatternFeatureGenerator instances when TokenPatternFeatureGeneratorFactory.create() is called.

TokenSampleStream

Class which produces an Iterator<TokenSample> from a file of space delimited token.

TokenSampleStream

This class is a stream filter which reads in string encoded samples and creates samples out of them.

TokSpanEventStream

This class reads the samples via an Iterator and converts the samples into events which can be used by the maxent library for training.

TrainerFactory

A factory to initialize Trainer instances depending on a trainer type configured via Parameters.

TrainerFactory.TrainerType

TrigramNameFeatureGenerator

Adds trigram features based on tokens and token classes.

TrigramNameFeatureGeneratorFactory

A GeneratorFactory that produces TrigramNameFeatureGenerator instances when TrigramNameFeatureGeneratorFactory.create() is called.

turkishStemmer

This class implements the stemming algorithm defined by a snowball script.

TwitterCharSequenceNormalizer

A TwitterCharSequenceNormalizer implementation that normalizes text in terms of Twitter character patterns.

UncloseableInputStream

An InputStream which cannot be closed.

UrlCharSequenceNormalizer

A UrlCharSequenceNormalizer implementation that normalizes text in terms of URls and email addresses.

Version

The Version class represents the OpenNLP Tools library version.

WhitespaceTokenStream

This stream formats ObjectStream of samples into whitespace separated token strings.

WindowFeatureGenerator

Generates previous (left-sided) and next (right-sided) features for a given AdaptiveFeatureGenerator.

WindowFeatureGeneratorFactory

A GeneratorFactory that produces WindowFeatureGenerator instances when WindowFeatureGeneratorFactory.create() is called.

WordClusterDictionary

WordClusterDictionary.WordClusterDictionarySerializer

WordClusterFeatureGenerator

An AdaptiveFeatureGenerator implementation of a word cluster feature generator.

WordClusterFeatureGeneratorFactory

Defines a word cluster GeneratorFactory; it reads an element containing 'w2vwordcluster' as a tag name.

WordTagSampleStream

A stream filter which reads a sentence per line that contains words and tags in word_tag format and outputs a POSSample objects.

XmlUtil