This class represents the code associated with a UDR. More...
#include <sqludr.h>
Public Member Functions | |
UDR () | |
virtual | ~UDR () |
virtual void | describeParamsAndColumns (UDRInvocationInfo &info) |
virtual void | describeDataflowAndPredicates (UDRInvocationInfo &info) |
virtual void | describeConstraints (UDRInvocationInfo &info) |
virtual void | describeStatistics (UDRInvocationInfo &info) |
virtual void | describeDesiredDegreeOfParallelism (UDRInvocationInfo &info, UDRPlanInfo &plan) |
virtual void | describePlanProperties (UDRInvocationInfo &info, UDRPlanInfo &plan) |
virtual void | completeDescription (UDRInvocationInfo &info, UDRPlanInfo &plan) |
virtual void | processData (UDRInvocationInfo &info, UDRPlanInfo &plan) |
bool | getNextRow (UDRInvocationInfo &info, int tableIndex=0) |
void | emitRow (UDRInvocationInfo &info) |
virtual void | debugLoop () |
virtual int | getFeaturesSupportedByUDF () |
This class represents the code associated with a UDR.
UDR writers can create a derived class and implement these methods for their specific UDR. The base class also has default methods for all but the runtime call. See https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface for examples.
To use this interface, the UDR writer must provide a function of type CreateInterfaceObjectFunc with a name that's the UDR external name, and it must have "C" linkage. Example, assuming the external name of the UDF is MYUDF:
// define a class that is derived from UDR class MyUDFInterface : public UDR { // Override any virtual methods where the UDF author would // like to change the default behavior. It is fine to add // other methods and data members, just make sure to free // up all resources in the destructor. ... }; // define a "factory" function to return an object of this class extern "C" SQLUDR_LIBFUNC UDR * MYUDF() { return new MyUDFInterface; }
UDR::UDR | ( | ) |
Default constructor.
Use this in the constructor of a derived class.
UDR::~UDR | ( | ) | [virtual] |
Virtual Destructor.
Override this destructor and deallocate any resources of a derived class, if necessary. Note that a UDR object may be used for several UDR invocations, sometimes at the same time, in one or more queries. Therefore, this class is for storing resources that can be shared among multiple invocations. Note also that compile time and run time may happen in different processes, so it is not possible to carry state from compile time to run time calls for invocations with this class. See below for how to carry invocation-related information between the different phases.
UDRException |
void UDR::completeDescription | ( | UDRInvocationInfo & | info, | |
UDRPlanInfo & | plan | |||
) | [virtual] |
Seventh and final method of the compiler interface for TMUDFs (optional).
This final compile time call gives the UDF writer the opportunity to examine the chosen query plan, to pass information on to the runtime method, using UDRPlanInfo::addPlanData(), and to clean up any resources related to the compile phase of a particular TMUDF invocation.
The default implementation does nothing.
info | A description of the UDR invocation. | |
plan | Plan-related description of the UDR invocation. |
UDRException |
void UDR::debugLoop | ( | ) | [virtual] |
Debugging hook for UDRs.
This method is called in debug Trafodion builds when certain flags are set in the UDR_DEBUG_FLAGS CQD (CONTROL QUERY DEFAULT). See https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface#Tutorial:Theobject-orientedUDFinterface-DebuggingUDFcode for details.
The default implementation prints out the process id and then goes into an endless loop. The UDF writer can then attach a debugger, set breakpoints and force the execution out of the loop.
Note that the printout of the pid may not always be displayed on a terminal, for example if the process is executing on a different node.
void UDR::describeConstraints | ( | UDRInvocationInfo & | info | ) | [virtual] |
Third method of the compiler interface (optional).
Set up logical constraints on the UDF result table.
When the compiler calls this method, it will have synthesized constraints on the table-valued inputs, if any. The UDR writer can now indicate constraints on the table-valued result.
The default implementation does nothing.
info | A description of the UDR invocation. |
UDRException |
void UDR::describeDataflowAndPredicates | ( | UDRInvocationInfo & | info | ) | [virtual] |
Second method of the compiler interface (optional).
Eliminate unneeded columns and decide where to execute predicates.
This is the second call in the compiler interface, after describeParamsAndColumns(). When the compiler calls this, it will have marked the UDF result columns with a usage code, indicating any output columns that are not required for this particular query. It will also have created a list of predicates that need to be evaluated.
This method should do three things:
The default implementation does not mark any of the table-valued input columns as NOT_USED. It also does not mark any output columns as NOT_PRODUCED. Predicate handling in the default implementation depends on the function type:
NOTE: When eliminating columns from the table-valued inputs or the table-valued result, column numbers may change in the next call, as these columns are actually removed from the lists. If the UDF carries state between calls and if that state refers to column numbers, they will need to be updated. This is best done in this describeDataflowAndPredicates() call.
info | A description of the UDR invocation. |
UDRException |
void UDR::describeDesiredDegreeOfParallelism | ( | UDRInvocationInfo & | info, | |
UDRPlanInfo & | plan | |||
) | [virtual] |
Fifth method of the compiler interface (optional).
Describe the desired parallelism of a UDR.
This method can be used to specify a desired degree of parallelism, either in absolute or relative terms.
The default behavior is to allow any degree of parallelism for TMUDFs of function type UDRInvocationInfo::MAPPER or UDRInvocationInfo::REDUCER that have exactly one table-valued input. The default behavior forces serial execution in all other cases. The reason is that for a single table-valued input, there is a natural way to parallelize the function by parallelizing its input a la MapReduce. In all other cases, parallel execution requires active participation by the UDF, which is why the UDF needs to signal explicitly that it can handle such flavors of parallelism.
Default implementation:
if (info.getNumTableInputs() == 1 && (info.getFuncType() == UDRInvocationInfo::MAPPER || info.getFuncType() == UDRInvocationInfo::REDUCER)) plan.setDesiredDegreeOfParallelism(UDRPlanInfo::ANY_DEGREE_OF_PARALLELISM); else plan.setDesiredDegreeOfParallelism(1); // serial execution
info | A description of the UDR invocation. | |
plan | Plan-related description of the UDR invocation. |
UDRException |
void UDR::describeParamsAndColumns | ( | UDRInvocationInfo & | info | ) | [virtual] |
First method of the compiler interface (optional).
Describe the output columns of a TMUDF, based on a description of its parameters (including parameter values that are specified as a constant) and the description of the table-valued input columns.
When the compiler calls this, it will have set up the formal and actual parameter descriptions as well as an output column description containing all the output parameters defined in the CREATE FUNCTION DDL (if any).
This method should do a general check of things it expects that can be validated at this time. Things to check:
Setting the function type with the UDRInvocationInfo::setFuncType() method will help the compiler generate more efficient code,
The method should then generate a description of the table-valued output columns, if applicable and if the columns provided at DDL time are not sufficient. The "See also" section points to methods to set these values.
Columns of the table-valued output can be declard as "pass-thru" columns to make many optimizations simpler.
This method must also add to or alter the formal parameter list to match the list of actual parameters.
The default implementation does nothing.
info | A description of the UDR invocation. |
UDRException |
void UDR::describePlanProperties | ( | UDRInvocationInfo & | info, | |
UDRPlanInfo & | plan | |||
) | [virtual] |
Sixth method of the compiler interface (optional).
The query optimizer calls this method once for every plan alternative considered for a UDR invocation. It provides the required partitioning and ordering of the result. The UDR writer can decide whether these requirements are acceptable to the UDR and whether any partitioning or ordering of the table-valued inputs is required to produce the required result properties.
TBD: Default behavior.
info | A description of the UDR invocation. | |
plan | Plan-related description of the UDR invocation. |
UDRException |
void UDR::describeStatistics | ( | UDRInvocationInfo & | info | ) | [virtual] |
Fourth method of the compiler interface (optional).
Set up statistics for the table-valued result.
When the optimizer calls this method, it will have synthesized some statistics for the table-valued inputs, if any. The UDR writer can now indicate the estimated row count for the table-valued result and estimated number of unique values for the output columns.
The default implementation does nothing. If no estimated cardinality is set for the output table and no estimated number of unique values is set for output columns, the optimizer will make default assumptions. Here are some of these default assumptions:
info | A description of the UDR invocation. |
UDRException |
void UDR::emitRow | ( | UDRInvocationInfo & | info | ) |
Emit a row of the table-valued result.
This method can only be called from within processData().
info | A description of the UDR invocation. |
UDRException |
int UDR::getFeaturesSupportedByUDF | ( | ) | [virtual] |
For versioning, return features supported by the UDR writer.
This method can be used in the future to facilitate changes in the UDR interface. UDR writers will be able to indicte through this method whether they support new features.
The default implementation returns 0 (no extra features are supported).
bool UDR::getNextRow | ( | UDRInvocationInfo & | info, | |
int | tableIndex = 0 | |||
) |
Read a row of a table-value input.
This method can only be called from within processData().
info | A description of the UDR invocation. | |
tableIndex | Indicator for which table-valued input to read data. |
UDRException |
void UDR::processData | ( | UDRInvocationInfo & | info, | |
UDRPlanInfo & | plan | |||
) | [virtual] |
Runtime code for UDRs (required).
This is the only method that is mandatory in the implementation of a UDR (in addition to the factory method).
This method needs to set the output column values and emit rows by calling the emitRows() method. It can read rows from table-valued inputs, using the getNextRow() method.
info | A description of the UDR invocation. | |
plan | Plan-related description of the UDR invocation. |
UDRException |