opennlp.tools.tokenize
Class TokenizerFactory

java.lang.Object
  extended by opennlp.tools.util.BaseToolFactory
      extended by opennlp.tools.tokenize.TokenizerFactory

public class TokenizerFactory
extends BaseToolFactory

The factory that provides Tokenizer default implementations and resources. Users can extend this class if their application requires overriding the TokenContextGenerator, Dictionary etc.


Constructor Summary
TokenizerFactory()
          Creates a TokenizerFactory that provides the default implementation of the resources.
TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
          Creates a TokenizerFactory.
 
Method Summary
static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
          Factory method the framework uses create a new TokenizerFactory.
 Map<String,Object> createArtifactMap()
          Creates a Map with pairs of keys and objects.
 Map<String,String> createManifestEntries()
          Creates the manifest entries that will be added to the model manifest
 Dictionary getAbbreviationDictionary()
          Gets the abbreviation dictionary
 Pattern getAlphaNumericPattern()
          Gets the alpha numeric pattern.
 TokenContextGenerator getContextGenerator()
          Gets the context generator
 String getLanguageCode()
          Gets the language code
 boolean isUseAlphaNumericOptmization()
          Gets whether to use alphanumeric optimization.
 void validateArtifactMap()
          Validates the parsed artifacts.
 
Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenizerFactory

public TokenizerFactory()
Creates a TokenizerFactory that provides the default implementation of the resources.


TokenizerFactory

public TokenizerFactory(String languageCode,
                        Dictionary abbreviationDictionary,
                        boolean useAlphaNumericOptimization,
                        Pattern alphaNumericPattern)
Creates a TokenizerFactory. Use this constructor to programmatically create a factory.

Parameters:
languageCode - the language of the natural text
abbreviationDictionary - an abbreviations dictionary
useAlphaNumericOptimization - if true alpha numerics are skipped
alphaNumericPattern - null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC
Method Detail

validateArtifactMap

public void validateArtifactMap()
                         throws InvalidFormatException
Description copied from class: BaseToolFactory
Validates the parsed artifacts. If something is not valid subclasses should throw an InvalidFormatException. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.

Specified by:
validateArtifactMap in class BaseToolFactory
Throws:
InvalidFormatException

createArtifactMap

public Map<String,Object> createArtifactMap()
Description copied from class: BaseToolFactory
Creates a Map with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.

The base implementation will return a HashMap that should be populated by sub-classes.

Overrides:
createArtifactMap in class BaseToolFactory

createManifestEntries

public Map<String,String> createManifestEntries()
Description copied from class: BaseToolFactory
Creates the manifest entries that will be added to the model manifest

Overrides:
createManifestEntries in class BaseToolFactory
Returns:
the manifest entries to added to the model manifest

create

public static TokenizerFactory create(String subclassName,
                                      String languageCode,
                                      Dictionary abbreviationDictionary,
                                      boolean useAlphaNumericOptimization,
                                      Pattern alphaNumericPattern)
                               throws InvalidFormatException
Factory method the framework uses create a new TokenizerFactory.

Throws:
InvalidFormatException

getAlphaNumericPattern

public Pattern getAlphaNumericPattern()
Gets the alpha numeric pattern.

Returns:
the user specified alpha numeric pattern or a default.

isUseAlphaNumericOptmization

public boolean isUseAlphaNumericOptmization()
Gets whether to use alphanumeric optimization.


getAbbreviationDictionary

public Dictionary getAbbreviationDictionary()
Gets the abbreviation dictionary

Returns:
null or the abbreviation dictionary

getLanguageCode

public String getLanguageCode()
Gets the language code


getContextGenerator

public TokenContextGenerator getContextGenerator()
Gets the context generator



Copyright © 2013 The Apache Software Foundation. All Rights Reserved.