Class LemmatizerME

  • All Implemented Interfaces:
    Lemmatizer

    public class LemmatizerME
    extends Object
    implements Lemmatizer
    A probabilistic lemmatizer. Tries to predict the induced permutation class for each word depending on its surrounding context. Based on Grzegorz ChrupaƂa. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. PhD dissertation, Dublin City University. http://grzegorz.chrupala.me/papers/phd-single.pdf
    • Constructor Detail

      • LemmatizerME

        public LemmatizerME​(LemmatizerModel model)
        Initializes the current instance with the provided model and the default beam size of 3.
        Parameters:
        model - the model
    • Method Detail

      • lemmatize

        public String[] lemmatize​(String[] toks,
                                  String[] tags)
        Description copied from interface: Lemmatizer
        Generates lemmas for the word and postag returning the result in an array.
        Specified by:
        lemmatize in interface Lemmatizer
        Parameters:
        toks - an array of the tokens
        tags - an array of the pos tags
        Returns:
        an array of possible lemmas for each token in the sequence.
      • lemmatize

        public List<List<String>> lemmatize​(List<String> toks,
                                            List<String> tags)
        Description copied from interface: Lemmatizer
        Generates a lemma tags for the word and postag returning the result in a list of every possible lemma for each token and postag.
        Specified by:
        lemmatize in interface Lemmatizer
        Parameters:
        toks - an array of the tokens
        tags - an array of the pos tags
        Returns:
        a list of every possible lemma for each token in the sequence.
      • predictSES

        public String[] predictSES​(String[] toks,
                                   String[] tags)
        Predict Short Edit Script (automatically induced lemma class).
        Parameters:
        toks - the array of tokens
        tags - the array of pos tags
        Returns:
        an array containing the lemma classes
      • predictLemmas

        public String[][] predictLemmas​(int numLemmas,
                                        String[] toks,
                                        String[] tags)
        Predict all possible lemmas (using a default upper bound).
        Parameters:
        numLemmas - the default number of lemmas
        toks - the tokens
        tags - the postags
        Returns:
        a double array containing all posible lemmas for each token and postag pair
      • decodeLemmas

        public static String[] decodeLemmas​(String[] toks,
                                            String[] preds)
        Decodes the lemma from the word and the induced lemma class.
        Parameters:
        toks - the array of tokens
        preds - the predicted lemma classes
        Returns:
        the array of decoded lemmas
      • topKSequences

        public Sequence[] topKSequences​(String[] sentence,
                                        String[] tags,
                                        double minSequenceScore)
      • probs

        public void probs​(double[] probs)
        Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to lemmatize. The specified array should be at least as large as the number of tokens in the previous call to lemmatize.
        Parameters:
        probs - An array used to hold the probabilities of the last decoded sequence.
      • probs

        public double[] probs()
        Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk.
        Returns:
        An array with the same number of probabilities as tokens were sent to chunk when it was last called.
      • topKLemmaClasses

        public Sequence[] topKLemmaClasses​(String[] sentence,
                                           String[] tags,
                                           double minSequenceScore)