Klasse TokenizerFactory

java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory

public class TokenizerFactory extends BaseToolFactory
The factory that provides Tokenizer default implementation and resources. Users can extend this class if their application requires overriding the TokenContextGenerator, Dictionary etc.
  • Konstruktordetails

    • TokenizerFactory

      public TokenizerFactory()
      Instantiates a TokenizerFactory that provides the default implementation of the resources.
    • TokenizerFactory

      public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
      Instantiates a TokenizerFactory. Use this constructor to programmatically create a factory.
      Parameter:
      languageCode - The ISO language code to be used for this factory.
      abbreviationDictionary - The Dictionary which holds abbreviations.
      useAlphaNumericOptimization - Whether alphanumerics are skipped, or not.
      alphaNumericPattern - null or a custom alphanumeric Pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC.
  • Methodendetails

    • validateArtifactMap

      public void validateArtifactMap() throws InvalidFormatException
      Beschreibung aus Klasse kopiert: BaseToolFactory
      Validates the parsed artifacts.

      Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.

      Angegeben von:
      validateArtifactMap in Klasse BaseToolFactory
      Löst aus:
      InvalidFormatException - Thrown if validation found invalid states.
    • createArtifactMap

      public Map<String,Object> createArtifactMap()
      Beschreibung aus Klasse kopiert: BaseToolFactory
      A model's implementation should call this constructor that creates a model programmatically.

      The base implementation will return a HashMap that should be populated by subclasses.

      Setzt außer Kraft:
      createArtifactMap in Klasse BaseToolFactory
      Gibt zurück:
      Retrieves a Map with pairs of keys and objects.
    • createManifestEntries

      public Map<String,String> createManifestEntries()
      Setzt außer Kraft:
      createManifestEntries in Klasse BaseToolFactory
      Gibt zurück:
      Retrieves the manifest entries to be added to the model manifest.
    • create

      public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
      Factory method the framework uses instantiate a new TokenizerFactory.
      Parameter:
      subclassName - The name of the class implementing the TokenizerFactory.
      languageCode - The ISO language code the Tokenizer should use.
      abbreviationDictionary - An optional Dictionary containing abbreviations, or null if not present.
      useAlphaNumericOptimization - Whether the alphanumeric optimization is be enabled or not.
      alphaNumericPattern - The Pattern the alphanumeric optimization should use, if enabled.
      Gibt zurück:
      A valid TokenizerFactory instance.
      Löst aus:
      InvalidFormatException - Thrown if one of the input parameters doesn't comply the expected format.
    • getAlphaNumericPattern

      public Pattern getAlphaNumericPattern()
      Gibt zurück:
      Retrieves the (user-)specified alphanumeric Pattern or a default.
    • isUseAlphaNumericOptimization

      public boolean isUseAlphaNumericOptimization()
      Gibt zurück:
      true if the alphanumeric optimization is enabled, otherwise false.
    • getAbbreviationDictionary

      public Dictionary getAbbreviationDictionary()
      Gibt zurück:
      The abbreviation Dictionary or null if none is active.
    • getLanguageCode

      public String getLanguageCode()
      Gibt zurück:
      Retrieves the ISO language code in use.
    • getContextGenerator

      public TokenContextGenerator getContextGenerator()
      Gibt zurück:
      Retrieves a TokenContextGenerator instance.