StringUtil (Apache OpenNLP Tools 1.9.4 API)

java.lang.Object
- opennlp.tools.util.StringUtil

```
public class StringUtil
extends Object
```

Constructor Summary

Constructors
Constructor and Description

StringUtil()

Constructors
Constructor and Description
`StringUtil()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`computeShortestEditScript(String wordForm, String lemma, int[][] distance, StringBuffer permutations)` Computes the Shortest Edit Script (SES) to convert a word into its lemma.
`static String`	`decodeShortestEditScript(String wordForm, String permutations)` Read predicted SES by the lemmatizer model and apply the permutations to obtain the lemma from the wordForm.
`static String`	`getShortestEditScript(String wordForm, String lemma)` Get the SES required to go from a word to a lemma.
`static boolean`	`isEmpty(CharSequence theString)` Returns `true` if `CharSequence.length()` is `0` or `null`.
`static boolean`	`isWhitespace(char charCode)` Determines if the specified character is a whitespace.
`static boolean`	`isWhitespace(int charCode)` Determines if the specified character is a whitespace.
`static int[][]`	`levenshteinDistance(String wordForm, String lemma)` Computes the Levenshtein distance of two strings in a matrix.
`static String`	`toLowerCase(CharSequence string)` Converts to lower case independent of the current locale via `Character.toLowerCase(int)` which uses mapping information from the UnicodeData file.
`static String`	`toUpperCase(CharSequence string)` Converts to upper case independent of the current locale via `Character.toUpperCase(char)` which uses mapping information from the UnicodeData file.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - StringUtil
```
public StringUtil()
```
- Method Detail
  - isWhitespace
```
public static boolean isWhitespace(char charCode)
```
    Determines if the specified character is a whitespace. A character is considered a whitespace when one of the following conditions is meet:
    - Its a Character.isWhitespace(int) whitespace.
    - Its a part of the Unicode Zs category (Character.SPACE_SEPARATOR).
    Character.isWhitespace(int) does not include no-break spaces. In OpenNLP no-break spaces are also considered as white spaces.
    Parameters:
    
    charCode -
    
    Returns:
    
    true if white space otherwise false
  - isWhitespace
```
public static boolean isWhitespace(int charCode)
```
    Determines if the specified character is a whitespace. A character is considered a whitespace when one of the following conditions is meet:
    - Its a Character.isWhitespace(int) whitespace.
    - Its a part of the Unicode Zs category (Character.SPACE_SEPARATOR).
    Character.isWhitespace(int) does not include no-break spaces. In OpenNLP no-break spaces are also considered as white spaces.
    Parameters:
    
    charCode -
    
    Returns:
    
    true if white space otherwise false
  - toLowerCase
```
public static String toLowerCase(CharSequence string)
```
    Converts to lower case independent of the current locale via Character.toLowerCase(int) which uses mapping information from the UnicodeData file.
    
    Parameters:
    
    string -
    
    Returns:
    
    lower cased String
  - toUpperCase
```
public static String toUpperCase(CharSequence string)
```
    Converts to upper case independent of the current locale via Character.toUpperCase(char) which uses mapping information from the UnicodeData file.
    
    Parameters:
    
    string -
    
    Returns:
    
    upper cased String
  - isEmpty
```
public static boolean isEmpty(CharSequence theString)
```
    Returns true if CharSequence.length() is 0 or null.
    
    Returns:
    
    true if CharSequence.length() is 0, otherwise false
    
    Since:
    
    1.5.1
  - levenshteinDistance
```
public static int[][] levenshteinDistance(String wordForm,
                                          String lemma)
```
    Computes the Levenshtein distance of two strings in a matrix. Based on pseudo-code provided here: https://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance which in turn is based on the paper Wagner, Robert A.; Fischer, Michael J. (1974), "The String-to-String Correction Problem", Journal of the ACM 21 (1): 168-173
    
    Parameters:
    
    wordForm - the form
    
    lemma - the lemma
    
    Returns:
    
    the distance
  - computeShortestEditScript
```
public static void computeShortestEditScript(String wordForm,
                                             String lemma,
                                             int[][] distance,
                                             StringBuffer permutations)
```
    Computes the Shortest Edit Script (SES) to convert a word into its lemma. This is based on Chrupala's PhD thesis (2008).
    
    Parameters:
    
    wordForm - the token
    
    lemma - the target lemma
    
    distance - the levenshtein distance
    
    permutations - the number of permutations
  - decodeShortestEditScript
```
public static String decodeShortestEditScript(String wordForm,
                                              String permutations)
```
    Read predicted SES by the lemmatizer model and apply the permutations to obtain the lemma from the wordForm.
    
    Parameters:
    
    wordForm - the wordForm
    
    permutations - the permutations predicted by the lemmatizer model
    
    Returns:
    
    the lemma
  - getShortestEditScript
```
public static String getShortestEditScript(String wordForm,
                                           String lemma)
```
    Get the SES required to go from a word to a lemma.
    
    Parameters:
    
    wordForm - the word
    
    lemma - the lemma
    
    Returns:
    
    the shortest edit script

Class StringUtil

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

StringUtil

Method Detail

isWhitespace

isWhitespace

toLowerCase

toUpperCase

isEmpty

levenshteinDistance

computeShortestEditScript

decodeShortestEditScript

getShortestEditScript