opennlp.tools.parser
Class AbstractBottomUpParser

java.lang.Object
  extended by opennlp.tools.parser.AbstractBottomUpParser
All Implemented Interfaces:
Parser
Direct Known Subclasses:
Parser, Parser

public abstract class AbstractBottomUpParser
extends Object
implements Parser

Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.

Note:
The nodes within the returned parses are shared with other parses and therefore their parent node references will not be consistent with their child node reference. setParents can be used to make the parents consistent with a particular parse, but subsequent calls to setParents can invalidate the results of earlier calls.


Field Summary
static String COMPLETE
          Outcome used when a constituent is complete.
static String CONT
          Prefix for outcomes continuing a constituent.
static double defaultAdvancePercentage
          The default amount of probability mass required of advanced outcomes.
static int defaultBeamSize
          The default beam size used if no beam size is given.
static String INC_NODE
          The label for the top if an incomplete node.
static String INCOMPLETE
          Outcome used when a constituent is incomplete.
static String OTHER
          Outcome for token which is not contained in a basal constituent.
static String START
          Prefix for outcomes starting a constituent.
static String TOK_NODE
          The label for a token node.
static String TOP_NODE
          The label for the top node.
static Integer ZERO
          The integer 0.
 
Constructor Summary
AbstractBottomUpParser(POSTagger tagger, Chunker chunker, HeadRules headRules, int beamSize, double advancePercentage)
           
 
Method Summary
static Dictionary buildDictionary(ObjectStream<Parse> data, HeadRules rules, int cutoff)
          Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.
static Dictionary buildDictionary(ObjectStream<Parse> data, HeadRules rules, TrainingParameters params)
          Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.
static Parse[] collapsePunctuation(Parse[] chunks, Set<String> punctSet)
          Removes the punctuation from the specified set of chunks, adds it to the parses adjacent to the punctuation is specified, and returns a new array of parses with the punctuation removed.
 Parse parse(Parse tokens)
          Returns a parse for the specified parse of tokens.
 Parse[] parse(Parse tokens, int numParses)
          Returns the specified number of parses or fewer for the specified tokens.
 void setErrorReporting(boolean errorReporting)
          Specifies whether the parser should report when it was unable to find a parse for a particular sentence.
static void setParents(Parse p)
          Assigns parent references for the specified parse so that they are consistent with the children references.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultBeamSize

public static final int defaultBeamSize
The default beam size used if no beam size is given.

See Also:
Constant Field Values

defaultAdvancePercentage

public static final double defaultAdvancePercentage
The default amount of probability mass required of advanced outcomes.

See Also:
Constant Field Values

TOP_NODE

public static final String TOP_NODE
The label for the top node.

See Also:
Constant Field Values

INC_NODE

public static final String INC_NODE
The label for the top if an incomplete node.

See Also:
Constant Field Values

TOK_NODE

public static final String TOK_NODE
The label for a token node.

See Also:
Constant Field Values

ZERO

public static final Integer ZERO
The integer 0.


START

public static final String START
Prefix for outcomes starting a constituent.

See Also:
Constant Field Values

CONT

public static final String CONT
Prefix for outcomes continuing a constituent.

See Also:
Constant Field Values

OTHER

public static final String OTHER
Outcome for token which is not contained in a basal constituent.

See Also:
Constant Field Values

COMPLETE

public static final String COMPLETE
Outcome used when a constituent is complete.

See Also:
Constant Field Values

INCOMPLETE

public static final String INCOMPLETE
Outcome used when a constituent is incomplete.

See Also:
Constant Field Values
Constructor Detail

AbstractBottomUpParser

public AbstractBottomUpParser(POSTagger tagger,
                              Chunker chunker,
                              HeadRules headRules,
                              int beamSize,
                              double advancePercentage)
Method Detail

setErrorReporting

public void setErrorReporting(boolean errorReporting)
Specifies whether the parser should report when it was unable to find a parse for a particular sentence.

Parameters:
errorReporting - If true then un-parsed sentences are reported, false otherwise.

setParents

public static void setParents(Parse p)
Assigns parent references for the specified parse so that they are consistent with the children references.

Parameters:
p - The parse whose parent references need to be assigned.

collapsePunctuation

public static Parse[] collapsePunctuation(Parse[] chunks,
                                          Set<String> punctSet)
Removes the punctuation from the specified set of chunks, adds it to the parses adjacent to the punctuation is specified, and returns a new array of parses with the punctuation removed.

Parameters:
chunks - A set of parses.
punctSet - The set of punctuation which is to be removed.
Returns:
An array of parses which is a subset of chunks with punctuation removed.

parse

public Parse[] parse(Parse tokens,
                     int numParses)
Description copied from interface: Parser
Returns the specified number of parses or fewer for the specified tokens.
Note: The nodes within the returned parses are shared with other parses and therefore their parent node references will not be consistent with their child node reference. Parse.setParent(Parse) can be used to make the parents consistent with a particular parse, but subsequent calls to setParents can invalidate the results of earlier calls.

Specified by:
parse in interface Parser
Parameters:
tokens - A parse containing the tokens with a single parent node.
numParses - The number of parses desired.
Returns:
the specified number of parses for the specified tokens.

parse

public Parse parse(Parse tokens)
Description copied from interface: Parser
Returns a parse for the specified parse of tokens.

Specified by:
parse in interface Parser
Parameters:
tokens - The root node of a flat parse containing only tokens.
Returns:
A full parse of the specified tokens or the flat chunks of the tokens if a fullparse could not be found.

buildDictionary

public static Dictionary buildDictionary(ObjectStream<Parse> data,
                                         HeadRules rules,
                                         TrainingParameters params)
                                  throws IOException
Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.

Parameters:
data - The data stream of parses.
rules - The head rules for the parses.
params - can contain a cutoff, the minimum number of entries required for the n-gram to be saved as part of the dictionary.
Returns:
A dictionary object.
Throws:
IOException

buildDictionary

public static Dictionary buildDictionary(ObjectStream<Parse> data,
                                         HeadRules rules,
                                         int cutoff)
                                  throws IOException
Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.

Parameters:
data - The data stream of parses.
rules - The head rules for the parses.
cutoff - The minimum number of entries required for the n-gram to be saved as part of the dictionary.
Returns:
A dictionary object.
Throws:
IOException


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.