Class TokenizerFactory

java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory

public class TokenizerFactory extends BaseToolFactory
The factory that provides Tokenizer default implementation and resources. Users can extend this class if their application requires overriding the TokenContextGenerator, Dictionary etc.
  • Constructor Details

    • TokenizerFactory

      public TokenizerFactory()
      Instantiates a TokenizerFactory that provides the default implementation of the resources.
    • TokenizerFactory

      public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
      Instantiates a TokenizerFactory. Use this constructor to programmatically create a factory.
      Parameters:
      languageCode - The ISO language code to be used for this factory.
      abbreviationDictionary - The Dictionary which holds abbreviations.
      useAlphaNumericOptimization - Whether alphanumerics are skipped, or not.
      alphaNumericPattern - null or a custom alphanumeric Pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC.
  • Method Details

    • validateArtifactMap

      public void validateArtifactMap() throws InvalidFormatException
      Description copied from class: BaseToolFactory
      Validates the parsed artifacts.

      Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.

      Specified by:
      validateArtifactMap in class BaseToolFactory
      Throws:
      InvalidFormatException - Thrown if validation found invalid states.
    • createArtifactMap

      public Map<String,Object> createArtifactMap()
      Description copied from class: BaseToolFactory
      A model's implementation should call this constructor that creates a model programmatically.

      The base implementation will return a HashMap that should be populated by subclasses.

      Overrides:
      createArtifactMap in class BaseToolFactory
      Returns:
      Retrieves a Map with pairs of keys and objects.
    • createManifestEntries

      public Map<String,String> createManifestEntries()
      Overrides:
      createManifestEntries in class BaseToolFactory
      Returns:
      Retrieves the manifest entries to be added to the model manifest.
    • create

      public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
      Factory method the framework uses instantiate a new TokenizerFactory.
      Parameters:
      subclassName - The name of the class implementing the TokenizerFactory.
      languageCode - The ISO language code the Tokenizer should use.
      abbreviationDictionary - An optional Dictionary containing abbreviations, or null if not present.
      useAlphaNumericOptimization - Whether the alphanumeric optimization is be enabled or not.
      alphaNumericPattern - The Pattern the alphanumeric optimization should use, if enabled.
      Returns:
      A valid TokenizerFactory instance.
      Throws:
      InvalidFormatException - Thrown if one of the input parameters doesn't comply the expected format.
    • getAlphaNumericPattern

      public Pattern getAlphaNumericPattern()
      Returns:
      Retrieves the (user-)specified alphanumeric Pattern or a default.
    • isUseAlphaNumericOptimization

      public boolean isUseAlphaNumericOptimization()
      Returns:
      true if the alphanumeric optimization is enabled, otherwise false.
    • getAbbreviationDictionary

      public Dictionary getAbbreviationDictionary()
      Returns:
      The abbreviation Dictionary or null if none is active.
    • getLanguageCode

      public String getLanguageCode()
      Returns:
      Retrieves the ISO language code in use.
    • getContextGenerator

      public TokenContextGenerator getContextGenerator()
      Returns:
      Retrieves a TokenContextGenerator instance.