Class TokenizerFactory

    • Constructor Detail

      • TokenizerFactory

        public TokenizerFactory()
        Creates a TokenizerFactory that provides the default implementation of the resources.
      • TokenizerFactory

        public TokenizerFactory​(String languageCode,
                                Dictionary abbreviationDictionary,
                                boolean useAlphaNumericOptimization,
                                Pattern alphaNumericPattern)
        Creates a TokenizerFactory. Use this constructor to programmatically create a factory.
        Parameters:
        languageCode - the language of the natural text
        abbreviationDictionary - an abbreviations dictionary
        useAlphaNumericOptimization - if true alpha numerics are skipped
        alphaNumericPattern - null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC
    • Method Detail

      • createArtifactMap

        public Map<String,​Object> createArtifactMap()
        Description copied from class: BaseToolFactory
        Creates a Map with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.

        The base implementation will return a HashMap that should be populated by sub-classes.

        Overrides:
        createArtifactMap in class BaseToolFactory
      • create

        public static TokenizerFactory create​(String subclassName,
                                              String languageCode,
                                              Dictionary abbreviationDictionary,
                                              boolean useAlphaNumericOptimization,
                                              Pattern alphaNumericPattern)
                                       throws InvalidFormatException
        Factory method the framework uses create a new TokenizerFactory.
        Parameters:
        subclassName - the name of the class implementing the TokenizerFactory
        languageCode - the language code the tokenizer should use
        abbreviationDictionary - an optional dictionary containing abbreviations, or null if not present
        useAlphaNumericOptimization - indicate if the alpha numeric optimization should be enabled or disabled
        alphaNumericPattern - the pattern the alpha numeric optimization should use
        Returns:
        the instance of the Tokenizer Factory
        Throws:
        InvalidFormatException - if once of the input parameters doesn't comply if the expected format
      • getAlphaNumericPattern

        public Pattern getAlphaNumericPattern()
        Gets the alpha numeric pattern.
        Returns:
        the user specified alpha numeric pattern or a default.
      • isUseAlphaNumericOptmization

        public boolean isUseAlphaNumericOptmization()
        Gets whether to use alphanumeric optimization.
        Returns:
        true if the alpha numeric optimization is enabled, otherwise false
      • getAbbreviationDictionary

        public Dictionary getAbbreviationDictionary()
        Gets the abbreviation dictionary
        Returns:
        null or the abbreviation dictionary
      • getLanguageCode

        public String getLanguageCode()
        Retrieves the language code.
        Returns:
        the language code
      • getContextGenerator

        public TokenContextGenerator getContextGenerator()
        Gets the context generator
        Returns:
        a new instance of the context generator