Klasse NameFinderME

java.lang.Object
opennlp.tools.namefind.NameFinderME
Alle implementierten Schnittstellen:
TokenNameFinder

public class NameFinderME extends Object implements TokenNameFinder
A maximum-entropy-based name finder implementation.
  • Felddetails

  • Konstruktordetails

  • Methodendetails

    • find

      public Span[] find(String[] tokens)
      Beschreibung aus Schnittstelle kopiert: TokenNameFinder
      Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.
      Angegeben von:
      find in Schnittstelle TokenNameFinder
      Parameter:
      tokens - An array of the tokens or words of the sequence, typically a sentence.
      Gibt zurück:
      An array of spans for each of the names identified.
    • find

      public Span[] find(String[] tokens, String[][] additionalContext)
      Generates name tags for the given sequence, typically a sentence, returning token spans for any identified names.
      Parameter:
      tokens - An array of the tokens or words of a sequence, typically a sentence.
      additionalContext - Features which are based on context outside of the sentence but which should also be used.
      Gibt zurück:
      An array of token spans for each of the names identified.
    • clearAdaptiveData

      public void clearAdaptiveData()
      Beschreibung aus Schnittstelle kopiert: TokenNameFinder
      Forgets all adaptive data which was collected during previous calls to one of the find methods.

      Note: This method should typically be called at the end of the processing of a document.

      Angegeben von:
      clearAdaptiveData in Schnittstelle TokenNameFinder
    • probs

      public void probs(double[] probs)
      Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to find(String[]). The specified array should be at least as large as the number of tokens in the previous call to find(String[]).
      Parameter:
      probs - An array with the probabilities of the last decoded sequence.
    • probs

      public double[] probs()
      Retrieves the probabilities of the last decoded sequence. The sequence was determined based on the previous call to find(String[]).
      Gibt zurück:
      An array with the same number of probabilities as tokens were sent to find(String[]) when it was last called.
    • probs

      public double[] probs(Span[] spans)
      Retrieves an array of probabilities for each of the specified spans which is the arithmetic mean of the probabilities for each of the outcomes which make up the span.
      Parameter:
      spans - The spans of the names for which probabilities are requested.
      Gibt zurück:
      An array of probabilities for each of the specified spans.
    • train

      public static TokenNameFinderModel train(String languageCode, String type, ObjectStream<NameSample> samples, TrainingParameters params, TokenNameFinderFactory factory) throws IOException
      Starts a training of a TokenNameFinderModel with the given parameters.
      Parameter:
      languageCode - The ISO conform language code.
      type - The type to use.
      samples - The ObjectStream of NameSample used as input for training.
      params - The TrainingParameters for the context of the training.
      factory - The TokenNameFinderFactory for creating related objects defined via params.
      Gibt zurück:
      A valid, trained TokenNameFinderModel instance.
      Löst aus:
      IOException - Thrown if IO errors occurred during training.
    • dropOverlappingSpans

      public static Span[] dropOverlappingSpans(Span[] spans)
      Removes spans with are intersecting or crossing in any way.

      The following rules are used to remove the spans:
      Identical spans: The first span in the array after sorting it remains.
      Intersecting spans: The first span after sorting remains.
      Contained spans: All spans which are contained by another are removed.

      Parameter:
      spans - The input spans.
      Gibt zurück:
      The resulting non-overlapping spans.