All Classes and Interfaces

Class
Description
Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.
Abstract class containing many of the methods used to generate contexts for parsing.
Abstract DataIndexer implementation for collecting event and context counts used in training.
A basic EventModelSequenceTrainer implementation that processes events.
A base ObjectStream implementation for events.
A basic EventTrainer implementation.
 
A basic MaxentModel implementation.
 
An abstract, basic implementation of a model reader.
An abstract, basic implementation of a model writer.
A base ObjectStream implementation.
Abstract class extended by parser event streams which perform tagging and chunking.
Base class for sample stream factories.
 
 
An interface for generating features for name entity identification and for updating document level contexts.
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.
A Factory to create a Arvores Deitadas ChunkStream from the command line utility.
The AdditionalContextFeatureGenerator generates the context from the passed in additional context.
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.
A Factory to create a Arvores Deitadas NameSampleDataStream from the command line utility.
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Stream filter which merges text lines into sentences, following the Arvores Deitadas syntax.
 
Parses a sample of AD corpus.
Represents the AD leaf
Represents the AD node
Represents a tree element, Node or Leaf
Note: Do not use this class, internal use only!
A CharSequenceNormalizer implementation that aggregates the functionality of other normalizers.
The AggregatedFeatureGenerator aggregates a set of AdaptiveFeatureGenerators and calls them to generate the features.
 
Class for storing the Ancora Spanish head rules associated with parsing.
 
 
 
This class was automatically generated by a Snowball to Java compiler It implements the stemming algorithm defined by a snowball script.
Utility class for simple vector arithmetic.
Provides access to model persisted artifacts.
Responsible to create an artifact from an InputStream.
Generates predictive contexts for deciding how constituents should be attached.
 
The Attributes class stores name value pairs.
Generates a feature for each word in a document.
Represents a minimal tuple of information.
This is a common base model which can be used by the components' specific model classes.
Base class for all tool factories.
A ContextGenerator implementation for maxent decisions, assuming that the input given to the BasicContextGenerator.getContext(String) method is a String containing contextual predicates separated by spaces, for instance:
Common format parameters.
Common training parameters.
Performs k-best search over a sequence.
Interface for context generators used with a sequence beam search.
 
 
The default SequenceCodec implementation according to the BILOU scheme.
A SequenceValidator implementation for the BilouCodec.
A DataReader that reads files from a binary format.
A GISModelReader that reads models from a binary format.
A GISModelWriter that writes models in a binary format.
A NaiveBayesModelReader that reads models from a binary format.
A NaiveBayesModelWriter that writes models in a binary format.
A PerceptronModelReader that reads models from a binary format.
A PerceptronModelWriter that writes models in a binary format.
A QNModelReader that reads models from a binary format.
A QNModelWriter that writes models in a binary format.
The default SequenceCodec implementation according to the BIO scheme: B: 'beginning' of a NE I: 'inside', the word is inside a NE O: 'outside', the word is a regular word outside a NE See also the paper by Roth D. and Ratinov L.: Design Challenges and Misconceptions in Named Entity Recognition.
A sample stream for the training files of the BioNLP/NLPBA 2004 shared task.
 
 
Reads the annotations from the brat .ann annotation file.
Brat (brat rapid annotation tool) is based on the stav visualiser which was originally made in order to visualise BioNLP'11 Shared Task data.
 
 
Generates Name Sample objects for a Brat Document object.
 
Generates Brown cluster features for token bigrams.
Class to load a Brown cluster document: word\tword_class\tprob
 
Generates Brown clustering features for token bigrams.
Generates Brown clustering features for token classes.
Generates Brown clustering features for current token.
Obtain the paths listed in the pathLengths array from the Brown class.
Generates BrownCluster features for current token and token class.
Generates BrownCluster features for current token.
Generates predictive contexts for deciding how constituents should be combined.
Creates the features or contexts for the building phase of parsing.
 
An ArtifactSerializer implementation for binary data, kept in byte[].
Provides fixed size, pre-allocated, least recently used replacement cache.
Caches features of the aggregated generators.
 
This class was automatically generated by a Snowball to Java compiler It implements the stemming algorithm defined by a snowball script.
This tool helps create a loadable dictionary for the NameFinder, from data collected from US Census data.
The CharacterNgramFeatureGenerator uses character ngrams to generate features about each token.
 
A char sequence normalizer, used to adjusting (prune, substitute, add, etc.)
Generates predictive context for deciding when a constituent is complete.
Generates predictive context for deciding when a constituent is complete.
Trains a new check model.
Creates predictive context for the pre-chunking phases of parsing.
The interface for chunkers which provide chunk tags for a sequence of tokens.
Interface for a BeamSearchContextGenerator used in syntactic chunking.
Tool to convert multiple data formats into native OpenNLP chunker training format.
Cross validator for Chunker.
 
 
A marker interface for evaluating chunkers.
The ChunkerEvaluator measures the performance of the given Chunker with the provided reference samples.
A default ChunkSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
Class for creating an event stream out of data files for training a Chunker.
 
The class represents a maximum-entropy-based Chunker.
 
The ChunkerModel is the model used by a learnable Chunker.
Loads a ChunkerModel for the command line tools.
An ArtifactSerializer implementation for models.
Factory producing OpenNLP ChunkSampleStreams.
 
A default implementation of EvaluationMonitor that prints to an output stream.
Class for holding chunks for a single unit of text.
A SequenceStream implementation encapsulating samples.
Parses the conll 2000 shared task shallow parser training data.
 
An ObjectStream implementation that works on a Collection of CollectionObjectStream as source for elements.
A maxent event representation which we can use to sort based on the predicates indexes contained in the events.
A maxent predicate representation which we can use to sort based on the outcomes.
A configurable context generator for a POSTagger.
Parser for the Dutch and Spanish ner training files of the CONLL 2002 shared task.
 
Note: Do not use this class, internal use only!
An import stream which can parse the CONLL03 data.
 
 
 
Note: Do not use this class, internal use only!
 
Note: Do not use this class, internal use only!
 
 
Note: Do not use this class, internal use only!
The CoNNL-U Format is specified here.
 
 
Note: Do not use this class, internal use only!
 
Parses the data from the CONLL 06 shared task into POS Samples.
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Holds feature information about a specific Parse node.
 
Note: Do not use this class, internal use only!
Holds constituents when reading parses.
Class which associates a real valued parameter or expected value with a particular contextual predicate or feature.
Represents a generator of contexts for maxent decisions.
Provides access to training and test partitions for n-fold cross validation.
The CrossValidationPartitioner.TrainingSampleStream which iterates over all training elements.
Common cross validator parameters.
Represents an indexer which compresses events in memory and performs feature selection.
A factory that produces DataIndexer instances.
Describes generic ways to read data from a DataInputStream.
An interface for objects which can deliver a stream of training data to be supplied to an EventStream.
Features based on chunking model described in Fei Sha and Fernando Pereira.
The default chunker SequenceValidator implementation.
Default implementation of the EndOfSentenceScanner.
A context generator for language detector.
Simple feature generator for learning statistical lemmatizers.
The default lemmatizer SequenceValidator implementation.
A NameContextGenerator implementation for determining contextual features for a tag-chunk style named-entity recognizer.
A default context generator for a POSTagger.
The default POS tagger SequenceValidator implementation.
Generate event contexts for maxent decisions for sentence detection.
A default TokenContextGenerator which produces events for maxent decisions for tokenization.
 
A default implementation of EvaluationMonitor that prints to an output stream.
 
 
A Detokenizer merges tokens back to their detokenized representation.
This enum contains an operation for every token to merge the tokens together to their detokenized form.
The DetokenizerEvaluator measures the performance of the given Detokenizer with the provided reference samples.
 
Base class for factories which need a Detokenizer.
 
An iterable and serializable dictionary implementation.
 
A rule based detokenizer.
 
A persistor used by for reading and writing dictionaries of all kinds.
The DictionaryFeatureGenerator uses the DictionaryNameFinder to generated features for detected names based on the InSpanGenerator.
 
A Lemmatizer implementation that works by simple dictionary lookup into a Map built from a file containing, for each line:
This is a Dictionary based name finder.
An ArtifactSerializer implementation for dictionaries.
The directory sample stream allows for creating an ObjectStream<File> from a directory listing of files.
Tool to convert multiple data formats into native OpenNLP doccat training format.
Cross validator for DocumentCategorizer.
 
A default implementation of EvaluationMonitor that prints to an output stream.
A marker interface for evaluating doccat.
A default DocumentSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
The factory that provides Doccat default implementations and resources.
Generates a detailed report for the POS Tagger.
A model for document categorization
Loads a DoccatModel for the command line tools.
 
 
 
 
Interface for classes which categorize documents.
The DocumentCategorizerEvaluator measures the performance of the given DocumentCategorizer with the provided reference samples.
Iterator-like class for modeling document classification events.
A Max-Ent based implementation of DocumentCategorizer.
Interface for processing an entire document allowing a TokenNameFinder to use context from the entire document.
Class which holds a classified document and its category.
Reads in string encoded training samples, parses them and outputs DocumentSample objects.
Factory producing OpenNLP DocumentSampleStreams.
Reads a plain text file and return each line as a String object.
This class facilitates the downloading of pretrained OpenNLP models.
The type of model.
 
A EmojiCharSequenceNormalizer implementation that normalizes text in terms of emojis.
ObjectStream to clean up empty lines for empty line separated document streams.
- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing

This stream should be used by the components that mark empty lines to mark document boundaries.
Encoding parameter.
Scans CharSequence, StringBuffer, and char[] for the offsets of sentence ending characters.
EntityLinkers establish connections with external data to enrich extracted entities.
Generates a EntityLinker instances via a properties file configuration.
Properties wrapper for EntityLinker implementations.
 
An Entry is a StringList which can optionally be mapped to attributes.
 
Parser for the Italian NER training files of the Evalita 2007 and 2009 NER shared tasks.
 
Note: Do not use this class, internal use only!
This class encapsulates the variables used in producing probabilities from a model and facilitates passing these variables to the eval method.
 
An abstract base class for evaluators.
Common evaluation parameters.
The context of a decision point during training.
 
A specialized Trainer that is based on a 'EventModelSequence' approach.
 
A specialized Trainer that is based on an Event approach.
Indicates that a certain API feature is not stable and might change with a new release.
The ExtensionLoader is responsible to load extensions to the OpenNLP library.
Exception indicates that an OpenNLP extension could not be loaded.
 
 
 
Interface for generating features for document categorization.
The FeatureGeneratorResourceProvider provides access to the resources available in the model.
This class provide common utilities for feature generation.
Class for using a file of events as an event stream.
Note: Do not use this class, internal use only!
Provides the ability to read the contents of files contained in an object stream of files.
Abstract base class for filtering streams.
Common evaluation parameters.
The FMeasure is a utility class for evaluators which measures precision, recall and the resulting f-measure.
Interface for a function.
Represents a labeler for nodes which contain traces so that these traces can be predicted by a Parser.
Creates a set of feature generators based on a provided XML descriptor.
 
An generic AbstractModelReader implementation.
An ArtifactSerializer implementation for models.
An generic AbstractModelWriter implementation.
A maximum entropy model which has been trained using the Generalized Iterative Scaling (GIS) procedure.
The base class for readers of GIS models.
The base class for writers of GIS models.
An implementation of Generalized Iterative Scaling (GIS).
GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
This class was automatically generated by a Snowball to Java compiler It implements the stemming algorithm defined by a snowball script.
A hash sum based AbstractObjectStream implementation.
Encoder for head rules associated with parsing.
Class for storing the English HeadRules associated with parsing.
 
This classes indexes string lists.
This class implements the stemming algorithm defined by a snowball script.
Allows repeated reads through a stream for certain model building types.
Generates features if the tokens are recognized by the provided TokenNameFinder.
This exception indicates that the provided training data is insufficient to train a desired model.
Classes, fields, or methods annotated &#64;Internal are for OpenNLP internal use only.
This exception indicates that a resource violates the expected data format.
A structure to hold an Irish Sentence Bank document, which is a collection of tokenized sentences.
 
 
 
 
This class was automatically generated by a Snowball to Java compiler It implements the stemming algorithm defined by a snowball script.
Class for holding the document language and its confidence
The interface for LanguageDetector which predicts the Language for a context.
 
A context generator interface for LanguageDetector.
Tool to convert multiple data formats into native OpenNLP language detection training format.
Cross validator for LanguageDetector.
 
A default implementation of EvaluationMonitor that prints to an output stream.
A marker interface for evaluating language detectors.
The LanguageDetectorEvaluator measures the performance of the given LanguageDetector with the provided reference LanguageSamples.
A default LanguageSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
Iterator-like class for modeling an event stream of samples.
Default factory used by LanguageDetector.
Generates a detailed report for the POS Tagger.
Implements a learnable LanguageDetector.
The LanguageDetectorModel is the model used by a learnable LanguageDetector.
Loads a LanguageDetectorModel for the command line tools.
This class reads in string encoded training samples, parses them and outputs LanguageSample objects.
Factory producing OpenNLP DocumentSampleStreams.
 
 
A language model can calculate the probability p (between 0 and 1) of a certain sequence of tokens, given its underlying vocabulary.
 
Holds a classified document and its Language.
Stream factory for those streams which carry language.
 
Note: Do not use this class, internal use only!
A default implementation of EvaluationMonitor that prints to an output stream.
Represents a lemmatized sentence.
Class for creating an event stream out of data files for training a probabilistic Lemmatizer.
A SequenceStream implementation encapsulating samples.
Reads data for training and testing the Lemmatizer.
The common interface for lemmatizers.
Interface for the context generator used for probabilistic Lemmatizer.
A marker interface for evaluating lemmatizers.
The LemmatizerEvaluator measures the performance of the given Lemmatizer with the provided reference samples.
A default LemmaSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
The factory that provides Lemmatizer default implementation and resources.
Generates a detailed report for the Lemmatizer.
A probabilistic Lemmatizer implementation.
 
The LemmatizerModel is the model used by a learnable Lemmatizer.
Loads a LemmatizerModel for the command line tools.
Factory producing OpenNLP LemmaSampleStreams.
 
A structure to hold the letsmt document.
A content handler to receive and process SAX events.
 
 
Class that performs line search to find minimum.
Represents a LineSearch result.
A default, extended Span that holds additional information about a Span.
This class serves as an adapter for a Logger used within a PrintStream.
Class implementing the probability distribution over labels returned by a classifier as a log of probabilities.
A class implementing the logarithmic Probability for a label.
A factory that creates MarkableFileInputStream from a File
 
 
A class to process the MASC Named entity stand-off annotation file
 
 
A class for parsing MASC's Penn tagging/tokenization stand-off annotation
 
 
 
 
 
A specialized Span to express tokens in documents.
 
 
 
Interface for maximum entropy models.
Calculates the arithmetic mean of values added with the Mean.add(double) method.
A helper class that handles Strings with more than 64k (65535 bytes) in length.
Enumeration of supported model types.
Utility class for handling of models.
 
Factory producing OpenNLP MosesSentenceSampleStream objects.
 
 
 
An extension of Context used to store parameters or expected values associated with this context which can be updated or assigned.
This is a non-thread safe mutable int.
Interface that allows TagDictionary entries to be added and removed.
Specialized parameters for the evaluation of a naive bayes classifier
A MaxentModel implementation of the multinomial Naive Bayes classifier model.
The base class for readers of models.
The base class for NaiveBayesModel writers.
Trains models using the combination of EM algorithm and Naive Bayes classifier which is described in:
Interface for generating the context for a name finder by specifying a set of feature generators.
A default implementation of EvaluationMonitor that prints to an output stream.
This class helps to read the US Census data from the files to build a StringList for each dictionary entry in the name-finder dictionary.
Class for creating an event stream out of data files for training an TokenNameFinder.
A maximum-entropy-based name finder implementation.
The default name finder SequenceValidator implementation.
Encapsulates names for a single unit of text.
Counts tokens, sentences and names by type.
The NameSampleDataStream class converts tagged strings provided by a DataStream to NameSample objects.
Factory producing OpenNLP NameSampleDataStreams.
 
A SequenceStream implementation encapsulating samples.
A stream which removes name samples which do not have a certain type.
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Evaluate negative log-likelihood and its gradient from DataIndexer.
The Newline SentenceDetector assumes that sentences are line delimited and recognizes one sentence per non-empty line.
The NGramCharModel can be used to create character ngrams.
Generates ngram features for a document.
Generates an nGram, via an optional separator, and returns the grams as a list of strings
A LanguageModel based on a NGramModel using Stupid Backoff to get the probabilities of the ngrams.
Command line tool for NGramLanguageModel.
The NGramModel can be used to crate ngrams and character ngrams.
Utility class for ngrams.
 
 
 
 
The National corpus of Polish (NKJP) format.
A NumberCharSequenceNormalizer implementation that normalizes text in terms of numbers.
A DataReader implementation based on ObjectInputStream.
Reads objects from a stream.
 
A DataIndexer for maxent model data which handles cutoffs for uncommon contextual predicates and provides a unique integer index for each of the predicates.
A DataIndexer for maxent model data which handles cutoffs for uncommon contextual predicates and provides a unique integer index for each of the predicates and maintains event values.
 
Name Sample Stream parser for the OntoNotes 4.0 corpus.
 
 
 
 
The definition feature maps the underlying distribution of outcomes.
A FilterObjectStream which merges text lines into paragraphs.
Evaluate negative log-likelihood and its gradient in parallel
Data structure for holding parse constituents.
A shift reduce style Parser implementation based on Adwait Ratnaparkhi's 1998 thesis.
Defines common methods for full-syntactic parsers.
A built-attach Parser implementation.
 
The parser chunker SequenceValidator implementation.
Tool to convert multiple data formats into native OpenNLP parser format.
Cross validator for a Parser.
A marker interface for evaluating parsers.
This implementation of Evaluator<Parse> behaves like EVALB with no exceptions, e.g, without removing punctuation tags, or equality between ADVP and PRT, as in COLLINS convention.
A default Parse-centric implementation of AbstractEvaluatorTool that prints to an output stream.
Wrapper class for one of four shift-reduce parser event streams.
Wrapper class for one of four built-attach parser event streams.
Enumeration of event types for a Parser.
 
This is the default ParserModel implementation.
Loads a ParserModel for the command line tools.
 
 
Enumeration of supported Parser types.
 
Factory producing OpenNLP ParseSampleStreams.
 
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
A model implementation based one the perceptron algorithm.
The base class for readers of models.
The base class for PerceptronModel writers.
Trains models using the perceptron algorithm.
Reads a plain text file and returns each line as a String object.
A generic DataReader implementation for plain text files.
A NaiveBayesModelReader that reads models from a plain text format.
A NaiveBayesModelWriter that writes models in a plain text format.
Utility class to handle Portuguese contractions.
Interface for a BeamSearchContextGenerator used in POS tagging.
Provides a means of determining which tags are valid for a particular word based on a TagDictionary read from a file.
A default implementation of EvaluationMonitor that prints to an output stream.
The POSEvaluator measures the performance of the given POSTagger with the provided reference samples.
The POSModel is the model used by a learnable POSTagger.
Loads a POSModel for the command line tools.
An ArtifactSerializer implementation for models.
Represents an pos-tagged sentence.
Reads the samples from an Iterator and converts those samples into events which can be used by the maxent library for training.
A SequenceStream implementation encapsulating samples.
 
The interface for part of speech taggers.
Tool to convert multiple data formats into native OpenNLP part of speech tagging training format.
 
 
A marker interface for evaluating pos taggers.
A default POSSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
The factory that provides POSTagger default implementations and resources.
 
 
 
Generates a detailed report for the POS Tagger.
A part-of-speech tagger that uses maximum entropy.
Adds the token POS Tag as feature.
 
 
 
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
 
 
This AdaptiveFeatureGenerator generates features indicating the outcome associated with a previously occurring word.
 
This AdaptiveFeatureGenerator generates features indicating the outcome associated with two previously occurring words.
This interface allows one to implement a prior distribution for use in maximum entropy model training.
Class implementing the probability distribution over labels returned by a classifier.
Class implementing the probability for a label.
A data container encapsulating language detection results.
Implementation of L-BFGS which supports L1-, L2-regularization and Elastic Net for solving convex optimization problems.
Evaluate quality of training parameters.
L2-regularized objective Function.
A maximum entropy model which has been trained using the Quasi Newton (QN) algorithm.
The base class for readers of QN models.
The base class for writers of models.
A Maxent model Trainer using L-BFGS algorithm.
Class for real-valued events as an event stream. .
Class for using a file of real-valued events as an event stream.
A TokenNameFinder implementation based on a series of regular expressions.
Returns a RegexNameFinder based on a selection of defaults or a configuration and a selection of defaults.
Enumeration of typical regex expressions available in OpenNLP.
 
 
This interface makes an Iterator resettable.
An iterator for a list which returns values in the opposite order as the typical list iterator.
Represents a generic type of processable elements.
Interface for SentenceDetectorME context generators.
A cross validator for sentence detectors.
 
 
Creates contexts/features for end-of-sentence detection in Thai text.
The interface for sentence detectors, which find the sentence boundaries in a text.
Tool to convert multiple data formats into native OpenNLP sentence detector training format.
 
 
The SentenceDetectorEvaluator measures the performance of the given SentenceDetector with the provided reference SentenceSamples.
A default SentenceSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
The factory that provides SentenceDetector default implementations and resources
A sentence detector for splitting up raw text into sentences.
A sentence detector which uses a maxent model to predict the sentences.
 
A default implementation of EvaluationMonitor that prints to an output stream.
This feature generator creates sentence begin and end features.
 
The SentenceModel is the model used by a learnable SentenceDetector.
A SentenceSample contains a document with begin indexes of the individual sentences.
This class is a stream filter which reads a sentence by line samples from an ObjectStream and converts them into SentenceSample objects.
Factory producing OpenNLP SentenceSampleStreams.
Class which models a sequence.
Represents a weighted sequence of outcomes.
A classification model that can label an input Sequence.
A codec for sequences of type SequenceCodec.
Interface for streams of sequences used to train sequence models.
Class which turns a SequenceStream into an event stream.
 
 
A marker interface so that implementing classes can refer to the corresponding ArtifactSerializer implementation.
SAX style SGML parser.
 
A ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.
Trains models with sequences using the perceptron algorithm.
A basic Tokenizer implementation which performs tokenization using character classes.
 
 
 
Class for storing start and end integer offsets.
 
The stemmer is reducing a word to its stem.
A StringList is an immutable list of Strings.
Recognizes predefined patterns in strings.
 
 
 
Interface to determine which tags are valid for a particular word based on a tag dictionary.
 
Classes, fields, or methods annotated &#64;ThreadSafe are safe to use in multithreading contexts.
Generates features for different for the class of the token.
 
Interface for context generators required for TokenizerME.
A default implementation of EvaluationMonitor that prints to an output stream.
Generates a feature which contains the token itself.
 
The interface for tokenizers, which segment a string into its tokens.
Tool to convert multiple data formats into native OpenNLP sentence detector training format.
A cross validator for tokenizers.
 
A marker interface for evaluating tokenizers.
The TokenizerEvaluator measures the performance of the given Tokenizer with the provided reference samples.
The factory that provides Tokenizer default implementation and resources.
A Tokenizer for converting raw text into separated tokens.
A default TokenSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
 
The TokenizerModel is the model used by a learnable Tokenizer.
Loads a TokenizerModel for the command line tools.
The TokenizerStream uses a Tokenizer to tokenize the input string and output samples.
 
The interface for name finders which provide name tags for a sequence of tokens.
Tool to convert multiple data formats into native OpenNLP name finder training format.
Cross validator for TokenNameFinder.
 
 
A marker interface for evaluating name finders.
The TokenNameFinderEvaluator measures the performance of the given TokenNameFinder with the provided reference samples.
A default NameSample-centric implementation of AbstractEvaluatorTool that prints to an output stream.
The factory that provides TokenNameFinder default implementations and resources.
Generates a detailed report for the NameFinder.
The TokenNameFinderModel is the model used by a learnable TokenNameFinder.
 
Loads a TokenNameFinderModel for the command line tools.
 
 
Partitions tokens into sub-tokens based on character classes and generates class features for each of the sub-tokens and combinations of those sub-tokens.
 
A TokenSample is text with token spans.
Class which produces an Iterator<TokenSample> from a file of space delimited token.
This class is a stream filter which reads in string encoded samples and creates samples out of them.
Factory producing OpenNLP TokenSampleStreams.
 
This class reads the samples via an Iterator and converts the samples into events which can be used by the maxent library for training.
Represents a common base for training implementations.
A factory to initialize Trainer instances depending on a trainer type configured via TrainingParameters.
 
Declares and handles default parameters used for or during training models.
Common training parameters.
Adds trigram features based on tokens and token classes.
 
 
 
A TwitterCharSequenceNormalizer implementation that normalizes text in terms of Twitter character patterns.
Collecting event and context counts by making two passes over the events.
An InputStream which cannot be closed.
Provide a maximum entropy model with a uniform Prior.
A UrlCharSequenceNormalizer implementation that normalizes text in terms of URls and email addresses.
The Version class represents the OpenNLP Tools library version.
A basic Tokenizer implementation which performs tokenization using white spaces.
This stream formats ObjectStream of samples into whitespace separated token strings.
Generates previous and next features for a given AdaptiveFeatureGenerator.
 
 
 
 
Defines a word cluster generator factory; it reads an element containing 'w2vwordcluster' as a tag name; these clusters are typically produced by word2vec or clark pos induction systems.
A Tokenizer implementation which performs tokenization using word pieces.
A stream filter which reads a sentence per line which contains words and tags in word_tag format and outputs a POSSample objects.
Note: Do not use this class, internal use only!
 
A word vector.
A table that maps tokens to word vectors.