public class StringUtil extends Object
| Constructor and Description | 
|---|
StringUtil()  | 
| Modifier and Type | Method and Description | 
|---|---|
static void | 
computeShortestEditScript(String wordForm,
                         String lemma,
                         int[][] distance,
                         StringBuffer permutations)
Computes the Shortest Edit Script (SES) to convert a word into its lemma. 
 | 
static String | 
decodeShortestEditScript(String wordForm,
                        String permutations)
Read predicted SES by the lemmatizer model and apply the
 permutations to obtain the lemma from the wordForm. 
 | 
static String | 
getShortestEditScript(String wordForm,
                     String lemma)
Get the SES required to go from a word to a lemma. 
 | 
static boolean | 
isEmpty(CharSequence theString)
 | 
static boolean | 
isWhitespace(char charCode)
Determines if the specified character is a whitespace. 
 | 
static boolean | 
isWhitespace(int charCode)
Determines if the specified character is a whitespace. 
 | 
static int[][] | 
levenshteinDistance(String wordForm,
                   String lemma)
Computes the Levenshtein distance of two strings in a matrix. 
 | 
static String | 
toLowerCase(CharSequence string)
Converts to lower case independent of the current locale via
  
Character.toLowerCase(int) which uses mapping information
 from the UnicodeData file. | 
static String | 
toUpperCase(CharSequence string)
Converts to upper case independent of the current locale via
  
Character.toUpperCase(char) which uses mapping information
 from the UnicodeData file. | 
public static boolean isWhitespace(char charCode)
Character.isWhitespace(int) whitespace.Character.SPACE_SEPARATOR).Character.isWhitespace(int) does not include no-break spaces.
 In OpenNLP no-break spaces are also considered as white spaces.charCode - public static boolean isWhitespace(int charCode)
Character.isWhitespace(int) whitespace.Character.SPACE_SEPARATOR).Character.isWhitespace(int) does not include no-break spaces.
 In OpenNLP no-break spaces are also considered as white spaces.charCode - public static String toLowerCase(CharSequence string)
Character.toLowerCase(int) which uses mapping information
 from the UnicodeData file.string - public static String toUpperCase(CharSequence string)
Character.toUpperCase(char) which uses mapping information
 from the UnicodeData file.string - public static boolean isEmpty(CharSequence theString)
true if CharSequence.length() is 0, otherwise
         falsepublic static int[][] levenshteinDistance(String wordForm, String lemma)
wordForm - the formlemma - the lemmapublic static void computeShortestEditScript(String wordForm, String lemma, int[][] distance, StringBuffer permutations)
wordForm - the tokenlemma - the target lemmadistance - the levenshtein distancepermutations - the number of permutationspublic static String decodeShortestEditScript(String wordForm, String permutations)
wordForm - the wordFormpermutations - the permutations predicted by the lemmatizer modelCopyright © 2021 The Apache Software Foundation. All rights reserved.