All Classes and Interfaces
Class
Description
Abstract class which contains code to tag and chunk parses for bottom up parsing and
leaves implementation of advancing parses and completing parses to extend class.
Abstract class containing many of the methods used to generate contexts for parsing.
Abstract
DataIndexer
implementation for collecting
event and context counts used in training.A basic
EventModelSequenceTrainer
implementation that processes events
.A base
ObjectStream
implementation for events.A basic
EventTrainer
implementation.A basic
MaxentModel
implementation.An abstract, basic implementation of a model reader.
An abstract, basic implementation of a model writer.
A base
ObjectStream
implementation.Abstract class extended by parser event streams which perform tagging and chunking.
Base class for sample stream factories.
An interface for generating features for name entity identification and for
updating document level contexts.
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese Chunker training.
A Factory to create a Arvores Deitadas ChunkStream from the command line
utility.
The
AdditionalContextFeatureGenerator
generates the context from the passed
in additional context.Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese NER training.
A Factory to create a Arvores Deitadas NameSampleDataStream from the command line
utility.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Stream filter which merges text lines into sentences, following the Arvores
Deitadas syntax.
Parses a sample of AD corpus.
Represents the AD leaf
Represents the AD node
Represents a tree element, Node or Leaf
Note:
Do not use this class, internal use only!
A
CharSequenceNormalizer
implementation that aggregates the
functionality of other normalizers.The
AggregatedFeatureGenerator
aggregates a set of
AdaptiveFeatureGenerator
s and calls them to generate the features.Class for storing the Ancora Spanish head rules associated with parsing.
This class implements the stemming algorithm defined by a snowball script.
Utility class for simple vector arithmetic.
Provides access to model persisted artifacts.
Responsible to create an artifact from an
InputStream
.Generates predictive contexts for deciding how constituents should be attached.
The
Attributes
class stores name value pairs.Generates a feature for each word in a document.
Represents a minimal tuple of information.
This is a common base model which can be used by the components' specific
model classes.
Base class for all tool
factories
.A
ContextGenerator
implementation for maxent decisions, assuming that the input
given to the BasicContextGenerator.getContext(String)
method is a String containing contextual
predicates separated by spaces, for instance:Common format parameters.
Common training parameters.
Performs k-best search over a sequence.
Interface for context generators used with a sequence beam search.
The default
SequenceCodec
implementation according to the BILOU
scheme.A
SequenceValidator
implementation for the BilouCodec
.A
DataReader
that reads files from a binary format.A
GISModelReader
that reads models from a binary format.A
GISModelWriter
that writes models in a binary format.A
NaiveBayesModelReader
that reads models from a binary format.A
NaiveBayesModelWriter
that writes models in a binary format.A
PerceptronModelReader
that reads models from a binary format.A
PerceptronModelWriter
that writes models in a binary format.A
QNModelReader
that reads models from a binary format.A
QNModelWriter
that writes models in a binary format.The default
SequenceCodec
implementation according to the BIO
scheme:
B: 'beginning' of a NE
I: 'inside', the word is inside a NE
O: 'outside', the word is a regular word outside a NE
See also the paper by Roth D. and Ratinov L.:
Design Challenges and Misconceptions in Named Entity Recognition.A
sample stream
for the training files of the
BioNLP/NLPBA 2004 shared task.Reads the annotations from the brat
.ann
annotation file.Brat (brat rapid annotation tool) is based on the stav visualiser
which was originally made in order to visualise BioNLP'11 Shared Task data.
Generates Name Sample objects for a Brat Document object.
Generates Brown cluster features for token bigrams.
Class to load a Brown cluster document: word\tword_class\tprob
Generates Brown clustering features for token bigrams.
Generates Brown clustering features for token classes.
Generates Brown clustering features for current token.
Obtain the paths listed in the pathLengths array from the Brown class.
Generates
BrownCluster
features for current token and token class.Generates
BrownCluster
features for current token.Generates predictive contexts for deciding how constituents should be combined.
Creates the features or contexts for the building phase of parsing.
An
ArtifactSerializer
implementation for binary data, kept in byte[]
.Provides fixed size, pre-allocated, least recently used replacement cache.
Caches features of the aggregated
generators
.This class implements the stemming algorithm defined by a snowball script.
This tool helps create a loadable dictionary for the
NameFinder
,
from data collected from US Census data.The
CharacterNgramFeatureGenerator
uses character ngrams to
generate features about each token.A char sequence normalizer, used to adjusting (prune, substitute, add, etc.)
Generates predictive context for deciding when a constituent is complete.
Generates predictive context for deciding when a constituent is complete.
Trains a new check model.
Creates predictive context for the pre-chunking phases of parsing.
The interface for chunkers which provide chunk tags for a sequence of tokens.
Interface for a
BeamSearchContextGenerator
used in syntactic chunking.Tool to convert multiple data formats into native OpenNLP chunker training
format.
Cross validator for
Chunker
.A marker interface for evaluating
chunkers
.The
ChunkerEvaluator
measures the performance of the given Chunker
with the provided
reference samples
.A default
ChunkSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Class for creating an event stream out of data files for training a
Chunker
.The class represents a maximum-entropy-based
Chunker
.The
ChunkerModel
is the model used by a learnable Chunker
.Loads a
ChunkerModel
for the command line tools.An
ArtifactSerializer
implementation for models
.Factory producing OpenNLP
ChunkSampleStream
s.A default implementation of
EvaluationMonitor
that prints
to an output stream.Class for holding chunks for a single unit of text.
A
SequenceStream
implementation encapsulating samples
.Parses the conll 2000 shared task shallow parser training data.
An
ObjectStream
implementation that works on a
Collection
of CollectionObjectStream
as source for elements.A maxent event representation which we can use to sort based on the
predicates indexes contained in the events.
A maxent predicate representation which we can use to sort based on the
outcomes.
A configurable
context generator
for a POSTagger
.Parser for the Dutch and Spanish ner training files of the CONLL 2002 shared task.
Note:
Do not use this class, internal use only!
An import stream which can parse the CONLL03 data.
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
The CoNNL-U Format is specified
here.
Note:
Do not use this class, internal use only!
Parses the data from the CONLL 06 shared task into POS Samples.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Holds feature information about a specific
Parse
node.Note:
Do not use this class, internal use only!
Holds constituents when reading
parses
.Class which associates a real valued parameter or expected value with a particular contextual
predicate or feature.
Represents a generator of contexts for maxent decisions.
Provides access to training and test partitions for n-fold cross validation.
The
CrossValidationPartitioner.TrainingSampleStream
which iterates over
all training elements.Common cross validator parameters.
This class implements the stemming algorithm defined by a snowball script.
Represents an indexer which compresses events in memory and performs feature selection.
A factory that produces
DataIndexer
instances.Describes generic ways to read data from a
DataInputStream
.An interface for objects which can deliver a stream of training data to be
supplied to an EventStream.
Features based on chunking model described in Fei Sha and Fernando Pereira.
The default chunker
SequenceValidator
implementation.Default implementation of the
EndOfSentenceScanner
.A context generator for language detector.
Simple feature generator for learning statistical lemmatizers.
The default lemmatizer
SequenceValidator
implementation.A
NameContextGenerator
implementation for determining contextual features
for a tag-chunk
style named-entity recognizer.A default
context generator
for a POSTagger
.The default POS tagger
SequenceValidator
implementation.Generate event contexts for maxent decisions for sentence detection.
A default
TokenContextGenerator
which produces events for maxent decisions
for tokenization.A default implementation of
EvaluationMonitor
that prints
to an output stream.A
Detokenizer
merges tokens back to their detokenized representation.This enum contains an operation for every token to merge the
tokens together to their detokenized form.
The
DetokenizerEvaluator
measures the performance of
the given Detokenizer
with the provided reference
samples
.Base class for factories which need a
Detokenizer
.An iterable and serializable dictionary implementation.
A rule based detokenizer.
A persistor used by for reading and writing
dictionaries
of all kinds.The
DictionaryFeatureGenerator
uses the DictionaryNameFinder
to generated features for detected names based on the InSpanGenerator
.A
Lemmatizer
implementation that works by simple dictionary lookup into
a Map
built from a file containing, for each line:This is a
Dictionary
based name finder
.An
ArtifactSerializer
implementation for dictionaries
.The directory sample stream allows for creating an
ObjectStream<File>
from a directory listing of files.Tool to convert multiple data formats into native OpenNLP doccat training
format.
Cross validator for
DocumentCategorizer
.A default implementation of
EvaluationMonitor
that prints to an
output stream.A marker interface for evaluating
doccat
.A default
DocumentSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides Doccat default implementations and resources.
Generates a detailed report for the POS Tagger.
A model for document categorization
Loads a
DoccatModel
for the command line tools.Interface for classes which categorize documents.
The
DocumentCategorizerEvaluator
measures the performance of
the given DocumentCategorizer
with the provided reference
samples
.Iterator-like class for modeling document classification events.
A Max-Ent based implementation of
DocumentCategorizer
.Interface for processing an entire document allowing a
TokenNameFinder
to use context
from the entire document.Class which holds a classified document and its category.
Reads in string encoded training samples, parses them and
outputs
DocumentSample
objects.Factory producing OpenNLP
DocumentSampleStream
s.Reads a plain text file and return each line as a
String
object.This class facilitates the downloading of pretrained OpenNLP models.
The type of model.
This class implements the stemming algorithm defined by a snowball script.
A
EmojiCharSequenceNormalizer
implementation that normalizes text
in terms of emojis.ObjectStream
to clean up empty lines for empty line separated document streams.- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing
This stream should be used by the components that mark empty lines to mark document boundaries.
Encoding parameter.
This class implements the stemming algorithm defined by a snowball script.
EntityLinkers establish connections with external data to enrich extracted
entities.
Generates a
EntityLinker
instances via a properties
file configuration.Properties wrapper for
EntityLinker
implementations.An
Entry
is a StringList
which can
optionally be mapped to attributes.Parser for the Italian NER training files of the Evalita 2007 and 2009 NER shared tasks.
Note:
Do not use this class, internal use only!
This class encapsulates the variables used in producing probabilities from a model
and facilitates passing these variables to the eval method.
An abstract base class for evaluators.
Common evaluation parameters.
The context of a decision point during training.
A specialized
Trainer
that is based on a 'EventModelSequence' approach.Indicates that a certain API feature is not stable
and might change with a new release.
The
ExtensionLoader
is responsible to load extensions to the OpenNLP library.Exception indicates that an OpenNLP extension could not be loaded.
Interface for generating features for document categorization.
The
FeatureGeneratorResourceProvider
provides access to the resources
available in the model.This class provide common utilities for feature generation.
Class for using a file of
events
as an event stream
.Note:
Do not use this class, internal use only!
Provides the ability to read the contents of files
contained in an object stream of files.
Abstract base class for filtering
streams
.Common evaluation parameters.
This class implements the stemming algorithm defined by a snowball script.
The
FMeasure
is a utility class for evaluators
which measures precision, recall and the resulting f-measure.This class implements the stemming algorithm defined by a snowball script.
Interface for a function.
Represents a labeler for nodes which contain traces so that these traces can be predicted
by a
Parser
.Creates a set of feature generators based on a provided XML descriptor.
An generic
AbstractModelReader
implementation.An
ArtifactSerializer
implementation for models
.An generic
AbstractModelWriter
implementation.This class implements the stemming algorithm defined by a snowball script.
A maximum entropy model which has been trained using the Generalized
Iterative Scaling (GIS) procedure.
The base class for readers of
GIS models
.The base class for writers of
GIS models
.An implementation of Generalized Iterative Scaling (GIS).
GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
This class implements the stemming algorithm defined by a snowball script.
A hash sum based
AbstractObjectStream
implementation.Encoder for head rules associated with parsing.
Class for storing the English
HeadRules
associated with parsing.This class implements the stemming algorithm defined by a snowball script.
This classes indexes
string lists
.This class implements the stemming algorithm defined by a snowball script.
Allows repeated reads through a stream for certain model building types.
Generates features if the tokens are recognized by the provided
TokenNameFinder
.This exception indicates that the provided training data is
insufficient to train a desired model.
Classes, fields, or methods annotated
@Internal
are for OpenNLP
internal use only.This exception indicates that a resource violates the expected data format.
A structure to hold an Irish Sentence Bank document, which is a collection
of tokenized sentences.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Class for holding the document language and its confidence
The interface for
LanguageDetector
which predicts the Language
for a context.A context generator interface for
LanguageDetector
.Tool to convert multiple data formats into native OpenNLP language detection
training format.
Cross validator for
LanguageDetector
.A default implementation of
EvaluationMonitor
that prints to an
output stream.A marker interface for evaluating
language detectors
.The
LanguageDetectorEvaluator
measures the performance of
the given LanguageDetector
with the provided reference
LanguageSample
s.A default
LanguageSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Iterator-like class for modeling an event stream of
samples
.Default factory used by
LanguageDetector
.Generates a detailed report for the POS Tagger.
Implements a learnable
LanguageDetector
.The
LanguageDetectorModel
is the model used by a learnable LanguageDetector
.Loads a
LanguageDetectorModel
for the command line tools.This class reads in string encoded
training samples
, parses them
and outputs LanguageSample
objects.Factory producing OpenNLP
DocumentSampleStream
s.A language model can calculate the probability p (between 0 and 1) of a
certain
sequence of tokens
, given its underlying vocabulary.Holds a classified document and its
Language
.Stream factory for those streams which carry language.
Note:
Do not use this class, internal use only!
A default implementation of
EvaluationMonitor
that prints to an
output stream.Represents a lemmatized sentence.
Class for creating an event stream out of data files for training a probabilistic
Lemmatizer
.A
SequenceStream
implementation encapsulating samples
.Reads data for training and testing the
Lemmatizer
.The common interface for lemmatizers.
Interface for the context generator used for probabilistic
Lemmatizer
.A marker interface for evaluating
lemmatizers
.The
LemmatizerEvaluator
measures the performance of
the given Lemmatizer
with the provided reference
samples
.A default
LemmaSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
Lemmatizer
default implementation and
resources.Generates a detailed report for the Lemmatizer.
A probabilistic
Lemmatizer
implementation.The
LemmatizerModel
is the model used by a learnable Lemmatizer
.Loads a
LemmatizerModel
for the command line tools.Factory producing OpenNLP
LemmaSampleStream
s.A structure to hold the letsmt document.
A
content handler
to receive and process SAX events.Class that performs line search to find minimum.
Represents a LineSearch result.
This class serves as an adapter for a
Logger
used within a PrintStream
.Class implementing the probability distribution over labels returned by
a classifier as a log of probabilities.
A class implementing the logarithmic
Probability
for a label.A factory that creates
MarkableFileInputStream
from a File
A class to process the MASC Named entity stand-off annotation file
A class for parsing MASC's Penn tagging/tokenization stand-off annotation
Interface for maximum entropy models.
Calculates the arithmetic mean of values
added with the
Mean.add(double)
method.A helper class that handles Strings with more than 64k (65535 bytes) in length.
Enumeration of supported model types.
Utility class for handling of
models
.Factory producing OpenNLP
MosesSentenceSampleStream
objects.An extension of
Context
used to store parameters or expected values
associated with this context which can be updated or assigned.This is a non-thread safe mutable int.
Interface that allows
TagDictionary
entries to be added and removed.Specialized
parameters
for the evaluation of a naive bayes classifierA
MaxentModel
implementation of the multinomial Naive Bayes classifier model.The base class for readers of
models
.The base class for
NaiveBayesModel
writers.Trains
models
using the combination of EM algorithm
and Naive Bayes classifier which is described in:Interface for generating the context for a
name finder
by
specifying a set of feature generators.A default implementation of
EvaluationMonitor
that prints
to an output stream.This class helps to read the US Census data from the files to build a
StringList for each dictionary entry in the name-finder dictionary.
Class for creating an event stream out of data files for training an
TokenNameFinder
.A maximum-entropy-based
name finder
implementation.The default name finder
SequenceValidator
implementation.Encapsulates names for a single unit of text.
Counts tokens, sentences and names by type.
The
NameSampleDataStream
class converts tagged strings
provided by a DataStream
to NameSample
objects.Factory producing OpenNLP
NameSampleDataStream
s.A
SequenceStream
implementation encapsulating samples
.A
stream
which removes name samples
which do not have a certain type.Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Evaluate negative log-likelihood and its gradient from
DataIndexer
.The Newline
SentenceDetector
assumes that sentences are line delimited and
recognizes one sentence per non-empty line.The
NGramCharModel
can be used to create character ngrams.Generates ngram features for a document.
Generates an nGram, via an optional separator, and returns the grams as a list
of strings
A
LanguageModel
based on a NGramModel
using Stupid Backoff to get
the probabilities of the ngrams.Command line tool for
NGramLanguageModel
.The
NGramModel
can be used to crate ngrams and character ngrams.Utility class for ngrams.
The National corpus of Polish (NKJP) format.
This class implements the stemming algorithm defined by a snowball script.
A
NumberCharSequenceNormalizer
implementation that normalizes text
in terms of numbers.A
DataReader
implementation based on ObjectInputStream
.Reads
objects
from a stream.A
DataIndexer
for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
predicates.A
DataIndexer
for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
predicates and maintains event values.Name Sample Stream parser for the OntoNotes 4.0 corpus.
The definition feature maps the underlying distribution of outcomes.
A
FilterObjectStream
which merges text lines into paragraphs.Evaluate negative log-likelihood and its gradient in parallel
Data structure for holding parse constituents.
A shift reduce style
Parser
implementation
based on Adwait Ratnaparkhi's 1998 thesis.Defines common methods for full-syntactic parsers.
A built-attach
Parser
implementation.The parser chunker
SequenceValidator
implementation.Tool to convert multiple data formats into native OpenNLP parser
format.
Cross validator for a
Parser
.A marker interface for evaluating
parsers
.This implementation of
Evaluator<Parse>
behaves like EVALB
with no exceptions,
e.g, without removing punctuation tags, or equality between ADVP
and PRT
, as
in
COLLINS convention.A default
Parse
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Wrapper class for one of four
shift-reduce parser
event streams.Wrapper class for one of four
built-attach parser
event streams.Enumeration of event types for a
Parser
.This is the default
ParserModel
implementation.Loads a
ParserModel
for the command line tools.Enumeration of supported
Parser
types.Factory producing OpenNLP
ParseSampleStream
s.Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
A
model
implementation based one the perceptron algorithm.The base class for readers of
models
.The base class for
PerceptronModel
writers.Trains
models
using the perceptron algorithm.Reads a plain text file and returns each line as a
String
object.A generic
DataReader
implementation for plain text files.A
NaiveBayesModelReader
that reads models from a plain text format.A
NaiveBayesModelWriter
that writes models in a plain text format.This class implements the stemming algorithm defined by a snowball script.
A
Stemmer
, implementing the
Porter Stemming AlgorithmUtility class to handle Portuguese contractions.
This class implements the stemming algorithm defined by a snowball script.
Interface for a
BeamSearchContextGenerator
used in POS tagging.Provides a means of determining which tags are valid for a particular word
based on a
TagDictionary
read from a file.A default implementation of
EvaluationMonitor
that prints
to an output stream.The
POSEvaluator
measures the performance of the given POSTagger
with the provided reference samples
.Loads a
POSModel
for the command line tools.An
ArtifactSerializer
implementation for models
.Represents an pos-tagged
sentence
.A
SequenceStream
implementation encapsulating samples
.The interface for part of speech taggers.
Tool to convert multiple data formats into native OpenNLP part of speech tagging
training format.
A marker interface for evaluating
pos taggers
.A default
POSSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
POSTagger
default implementations and resources.Generates a detailed report for the POS Tagger.
A
part-of-speech tagger
that uses maximum entropy.Adds the token POS Tag as feature.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
This
AdaptiveFeatureGenerator
generates features indicating the
outcome associated with a previously occurring word.This
AdaptiveFeatureGenerator
generates features indicating the
outcome associated with two previously occurring words.This interface allows one to implement a prior distribution for use in
maximum entropy model training.
Class implementing the probability distribution over labels returned by a classifier.
Class implementing the probability for a label.
A data container encapsulating language detection results.
Implementation of L-BFGS which supports L1-, L2-regularization
and Elastic Net for solving convex optimization problems.
Evaluate quality of training parameters.
L2-regularized objective
Function
.A maximum entropy model which has been trained using the Quasi Newton (QN) algorithm.
The base class for readers of
QN models
.The base class for writers of
models
.A Maxent model
Trainer
using L-BFGS algorithm.Class for real-valued
events
as an
event stream
.
.Class for using a file of real-valued
events
as an
event stream
.A
TokenNameFinder
implementation based on a series of regular expressions.Returns a
RegexNameFinder
based on a selection of
defaults or a configuration and a selection of defaults.Enumeration of typical regex expressions available in OpenNLP.
This interface makes an
Iterator
resettable.An iterator for a list which returns values in the opposite order as the typical list iterator.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Represents a generic type of processable elements.
Interface for
SentenceDetectorME
context generators.A cross validator for
sentence detectors
.Creates contexts/features for end-of-sentence detection in Thai text.
The interface for sentence detectors, which find the sentence boundaries in
a text.
Tool to convert multiple data formats into native OpenNLP sentence detector
training format.
The
SentenceDetectorEvaluator
measures the performance of
the given SentenceDetector
with the provided reference
SentenceSample
s.A default
SentenceSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
SentenceDetector
default implementations and
resourcesA sentence detector for splitting up raw text into sentences.
A sentence detector which uses a maxent model to predict the sentences.
A default implementation of
EvaluationMonitor
that prints
to an output stream.This feature generator creates sentence begin and end features.
The
SentenceModel
is the model used by a learnable
SentenceDetector
.A
SentenceSample
contains a document with
begin indexes of the individual sentences.This class is a stream filter which reads a sentence by line samples from
an
ObjectStream
and converts them into SentenceSample
objects.Factory producing OpenNLP
SentenceSampleStream
s.Class which models a sequence.
Represents a weighted sequence of outcomes.
A classification model that can label an input
Sequence
.A codec for sequences of type
SequenceCodec
.Interface for streams of
sequences
used to train sequence models.Class which turns a
SequenceStream
into an event stream.A marker interface so that implementing classes can refer to
the corresponding
ArtifactSerializer
implementation.SAX style SGML parser.
A
ShrinkCharSequenceNormalizer
implementation that shrinks repeated spaces / chars in text.Trains
models
with sequences using the perceptron algorithm.A basic
Tokenizer
implementation which performs tokenization
using character classes.Class for storing start and end integer offsets.
This class implements the stemming algorithm defined by a snowball script.
The stemmer is reducing a word to its stem.
A marker-interface for a String interner implementation.
Provides string interning utility methods.
A
StringList
is an immutable list of String
s.Recognizes predefined patterns in strings.
This class implements the stemming algorithm defined by a snowball script.
Interface to determine which tags are valid for a particular word
based on a tag dictionary.
Classes, fields, or methods annotated
@ThreadSafe
are safe to use
in multithreading contexts.Generates features for different for the class of the token.
Interface for context generators required for
TokenizerME
.A default implementation of
EvaluationMonitor
that prints
to an output stream.Generates a feature which contains the token itself.
The interface for tokenizers, which segment a string into its tokens.
Tool to convert multiple data formats into native OpenNLP sentence detector
training format.
A cross validator for
tokenizers
.A marker interface for evaluating
tokenizers
.The
TokenizerEvaluator
measures the performance of
the given Tokenizer
with the provided reference
samples
.The factory that provides
Tokenizer
default implementation and
resources.A
Tokenizer
for converting raw text into separated tokens.A default
TokenSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The
TokenizerModel
is the model used
by a learnable Tokenizer
.Loads a
TokenizerModel
for the command line tools.The interface for name finders which provide name tags for a sequence of tokens.
Tool to convert multiple data formats into native OpenNLP name finder
training format.
Cross validator for
TokenNameFinder
.A marker interface for evaluating
name finders
.The
TokenNameFinderEvaluator
measures the performance
of the given TokenNameFinder
with the provided
reference samples
.A default
NameSample
-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
TokenNameFinder
default implementations and
resources.Generates a detailed report for the NameFinder.
The
TokenNameFinderModel
is the model used by a learnable TokenNameFinder
.Loads a
TokenNameFinderModel
for the command line tools.Partitions tokens into sub-tokens based on character classes and generates
class features for each of the sub-tokens and combinations of those sub-tokens.
A
TokenSample
is text with token spans.Class which produces an Iterator<TokenSample> from a file of space delimited token.
This class is a
stream filter
which reads in string encoded
samples and creates samples
out of them.Factory producing OpenNLP
TokenSampleStream
s.Represents a common base for training implementations.
A factory to initialize
Trainer
instances depending on a trainer type
configured via TrainingParameters
.Declares and handles default parameters used for or during training models.
Common training parameters.
Adds trigram features based on tokens and token classes.
This class implements the stemming algorithm defined by a snowball script.
A
TwitterCharSequenceNormalizer
implementation that normalizes text
in terms of Twitter character patterns.Collecting event and context counts by making two passes over the events.
An
InputStream
which cannot be closed.Provide a maximum entropy model with a uniform
Prior
.A
UrlCharSequenceNormalizer
implementation that normalizes text
in terms of URls and email addresses.The
Version
class represents the OpenNLP Tools library version.A basic
Tokenizer
implementation which performs tokenization
using white spaces.This stream formats
ObjectStream
of samples
into whitespace
separated token strings.Generates previous and next features for a given
AdaptiveFeatureGenerator
.Defines a word cluster generator factory; it reads an element containing
'w2vwordcluster' as a tag name; these clusters are typically produced by
word2vec or clark pos induction systems.
A
Tokenizer
implementation which performs tokenization
using word pieces.A stream filter which reads a sentence per line which contains
words and tags in
word_tag
format and outputs a POSSample
objects.Note:
Do not use this class, internal use only!
A word vector.
A table that maps tokens to word vectors.