Class TokenizerFactory

    • Constructor Detail

      • TokenizerFactory

        public TokenizerFactory()
        Instantiates a TokenizerFactory that provides the default implementation of the resources.
      • TokenizerFactory

        public TokenizerFactory​(String languageCode,
                                Dictionary abbreviationDictionary,
                                boolean useAlphaNumericOptimization,
                                Pattern alphaNumericPattern)
        Instantiates a TokenizerFactory. Use this constructor to programmatically create a factory.
        Parameters:
        languageCode - The ISO language code to be used for this factory.
        abbreviationDictionary - The Dictionary which holds abbreviations.
        useAlphaNumericOptimization - Whether alphanumerics are skipped, or not.
        alphaNumericPattern - null or a custom alphanumeric Pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC.
    • Method Detail

      • createArtifactMap

        public Map<String,​Object> createArtifactMap()
        Description copied from class: BaseToolFactory
        A model's implementation should call this constructor that creates a model programmatically.

        The base implementation will return a HashMap that should be populated by subclasses.

        Overrides:
        createArtifactMap in class BaseToolFactory
        Returns:
        Retrieves a Map with pairs of keys and objects.
      • create

        public static TokenizerFactory create​(String subclassName,
                                              String languageCode,
                                              Dictionary abbreviationDictionary,
                                              boolean useAlphaNumericOptimization,
                                              Pattern alphaNumericPattern)
                                       throws InvalidFormatException
        Factory method the framework uses instantiate a new TokenizerFactory.
        Parameters:
        subclassName - The name of the class implementing the TokenizerFactory.
        languageCode - The ISO language code the Tokenizer should use.
        abbreviationDictionary - An optional Dictionary containing abbreviations, or null if not present.
        useAlphaNumericOptimization - Whether the alphanumeric optimization is be enabled or not.
        alphaNumericPattern - The Pattern the alphanumeric optimization should use, if enabled.
        Returns:
        A valid TokenizerFactory instance.
        Throws:
        InvalidFormatException - Thrown if one of the input parameters doesn't comply the expected format.
      • getAlphaNumericPattern

        public Pattern getAlphaNumericPattern()
        Returns:
        Retrieves the (user-)specified alphanumeric Pattern or a default.
      • isUseAlphaNumericOptimization

        public boolean isUseAlphaNumericOptimization()
        Returns:
        true if the alphanumeric optimization is enabled, otherwise false.
      • getAbbreviationDictionary

        public Dictionary getAbbreviationDictionary()
        Returns:
        The abbreviation Dictionary or null if none is active.
      • getLanguageCode

        public String getLanguageCode()
        Returns:
        Retrieves the ISO language code in use.