tmudr::UDR Class Reference

This class represents the code associated with a UDR. More...

#include <sqludr.h>

Public Member Functions
	UDR ()
virtual	~UDR ()
virtual void	describeParamsAndColumns (UDRInvocationInfo &info)
virtual void	describeDataflowAndPredicates (UDRInvocationInfo &info)
virtual void	describeConstraints (UDRInvocationInfo &info)
virtual void	describeStatistics (UDRInvocationInfo &info)
virtual void	describeDesiredDegreeOfParallelism (UDRInvocationInfo &info, UDRPlanInfo &plan)
virtual void	describePlanProperties (UDRInvocationInfo &info, UDRPlanInfo &plan)
virtual void	completeDescription (UDRInvocationInfo &info, UDRPlanInfo &plan)
virtual void	processData (UDRInvocationInfo &info, UDRPlanInfo &plan)
bool	getNextRow (UDRInvocationInfo &info, int tableIndex=0)
void	emitRow (UDRInvocationInfo &info)
virtual void	debugLoop ()
virtual int	getFeaturesSupportedByUDF ()

Detailed Description

This class represents the code associated with a UDR.

UDR writers can create a derived class and implement these methods for their specific UDR. The base class also has default methods for all but the runtime call. See https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface for examples.

To use this interface, the UDR writer must provide a function of type CreateInterfaceObjectFunc with a name that's the UDR external name, and it must have "C" linkage. Example, assuming the external name of the UDF is MYUDF:

  // define a class that is derived from UDR
  class MyUDFInterface : public UDR
  {
    // Override any virtual methods where the UDF author would
    // like to change the default behavior. It is fine to add
    // other methods and data members, just make sure to free
    // up all resources in the destructor.
    ...
  };
  // define a "factory" function to return an object of this class
  extern "C"
  SQLUDR_LIBFUNC UDR * MYUDF()
  {
    return new MyUDFInterface;
  }

If the describeParamsAndColumns() interface is not used, all parameters and result table columns must be declared in the CREATE TABLE MAPPING FUNCTION DDL.
When using the describeParamsAndColumns() interface, additional parameters and all output columns can be defined at compile time.
A UDR writer can decide to override none, some or all of the virtual methods in the compiler interface. The run-time interface, processData(), must always be provided.
See file sqludr.cpp for the default implementation of these methods.
When overriding methods, the UDR writer has the option to call the default method to do part of the work, and then to implement additional logic.
Multiple UDRs could share the same subclass of UDR. The UDR name is passed in UDRInvocationInfo, so the logic can depend on the name.
A single query may invoke the same UDR more than once. A different UDRInvocationInfo object will be passed for each such invocation.
The UDR object or the object of its derived class may be reused for multiple queries, so its life time can exceed that of a UDRInvocationInfo object.
Different instances of UDR (or derived class) objects will be created in the processes that compile and execute a query.
Based on the previous three bullets, UDR writers should not store state that relates to a UDR invocation in a UDR (or derived) object. There are special classes to do that. It is ok to use the UDR derived class to store resources that are shared between UDR invocations, such as connections to server processes etc. These need to be cleaned up by overloading the destructor.
The optimizer may try different execution plans for a UDR invocation, e.g. with different partitioning and ordering of input and/or output data. These alternative plans share the same UDRInvocationInfo object but they will use different UDRPlanInfo objects.

Constructor & Destructor Documentation

UDR::UDR ( )

Default constructor.

Use this in the constructor of a derived class.

UDR::~UDR ( ) [virtual]

Virtual Destructor.

Override this destructor and deallocate any resources of a derived class, if necessary. Note that a UDR object may be used for several UDR invocations, sometimes at the same time, in one or more queries. Therefore, this class is for storing resources that can be shared among multiple invocations. Note also that compile time and run time may happen in different processes, so it is not possible to carry state from compile time to run time calls for invocations with this class. See below for how to carry invocation-related information between the different phases.

See also:: UDRInvocationInfo::setUDRWriterCompileTimeData(); UDRPlanInfo::setUDRWriterCompileTimeData(); UDRPlanInfo::addPlanData()

Exceptions:

UDRException

Member Function Documentation

void UDR::completeDescription	(	UDRInvocationInfo &	info,
		UDRPlanInfo &	plan
	)			`[virtual]`

Seventh and final method of the compiler interface for TMUDFs (optional).

This final compile time call gives the UDF writer the opportunity to examine the chosen query plan, to pass information on to the runtime method, using UDRPlanInfo::addPlanData(), and to clean up any resources related to the compile phase of a particular TMUDF invocation.

The default implementation does nothing.

See also:: UDRPlanInfo::addPlanData(); UDRPlanInfo::getUDRWriterCompileTimeData(); UDRInvocationInfo::getUDRWriterCompileTimeData()

Parameters:

	info	A description of the UDR invocation.
	plan	Plan-related description of the UDR invocation.

Exceptions:

UDRException

void UDR::debugLoop ( ) [virtual]

Debugging hook for UDRs.

This method is called in debug Trafodion builds when certain flags are set in the UDR_DEBUG_FLAGS CQD (CONTROL QUERY DEFAULT). See https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface#Tutorial:Theobject-orientedUDFinterface-DebuggingUDFcode for details.

The default implementation prints out the process id and then goes into an endless loop. The UDF writer can then attach a debugger, set breakpoints and force the execution out of the loop.

Note that the printout of the pid may not always be displayed on a terminal, for example if the process is executing on a different node.

void UDR::describeConstraints ( UDRInvocationInfo & info ) [virtual]

Third method of the compiler interface (optional).

Set up logical constraints on the UDF result table.

When the compiler calls this method, it will have synthesized constraints on the table-valued inputs, if any. The UDR writer can now indicate constraints on the table-valued result.

The default implementation does nothing.

See also:: TableInfo::getNumConstraints(); TableInfo::getConstraint(); TableInfo::addCardinalityConstraint(); TableInfo::addUniquenessConstraint(); UDRInvocationInfo::propagateConstraintsFor1To1UDFs()

Parameters:

info

A description of the UDR invocation.

Exceptions:

UDRException

void UDR::describeDataflowAndPredicates ( UDRInvocationInfo & info ) [virtual]

Second method of the compiler interface (optional).

Eliminate unneeded columns and decide where to execute predicates.

This is the second call in the compiler interface, after describeParamsAndColumns(). When the compiler calls this, it will have marked the UDF result columns with a usage code, indicating any output columns that are not required for this particular query. It will also have created a list of predicates that need to be evaluated.

This method should do three things:

Mark columns of the table-valued inputs as not used, based on the result column usage and internal needs of the UDF. Such input columns will later be eliminated.
Mark output columns that are not used and that can be easily suppressed by the UDF as NOT_PRODUCED. Such columns will be eliminated as well.
Decide where to evaluate each predicate, a) on the UDF result (default), b) inside the UDF by code written by the UDF writer, or c) in the table-valued inputs.

The default implementation does not mark any of the table-valued input columns as NOT_USED. It also does not mark any output columns as NOT_PRODUCED. Predicate handling in the default implementation depends on the function type:

UDRInvocationInfo::GENERIC: No predicates are pushed down, because the compiler does not know whether any of the eliminated rows might have altered the output of the UDF. One example is the "sessionize" UDF, where eliminated rows can lead to differences in session ids.
UDRInvocationInfo::MAPPER: All predicates on pass-thru columns are pushed down to table-valued inputs. Since the UDF carries no state between the input rows it sees, eliminating any input rows will not alter any results for other rows.
UDRInvocationInfo::REDUCER: Only predicates on the PARTITION BY columns will be pushed to table-valued inputs. These predicates may eliminate entire groups of rows (partitions), and since no state is carried between such groups that is valid.

NOTE: When eliminating columns from the table-valued inputs or the table-valued result, column numbers may change in the next call, as these columns are actually removed from the lists. If the UDF carries state between calls and if that state refers to column numbers, they will need to be updated. This is best done in this describeDataflowAndPredicates() call.

See also:: ColumnInfo::getUsage(); ColumnInfo::setUsage() (to mark output columns as NOT_PRODUCED); UDRInvocationInfo::setFuncType(); UDRInvocationInfo::setChildColumnUsage() (to mark unused input columns); UDRInvocationInfo::setUnusedPassthruColumns(); UDRInvocationInfo::pushPredicatesOnPassthruColumns(); UDRInvocationInfo::setPredicateEvaluationCode()

Parameters:

info

A description of the UDR invocation.

Exceptions:

UDRException

void UDR::describeDesiredDegreeOfParallelism	(	UDRInvocationInfo &	info,
		UDRPlanInfo &	plan
	)			`[virtual]`

Fifth method of the compiler interface (optional).

Describe the desired parallelism of a UDR.

This method can be used to specify a desired degree of parallelism, either in absolute or relative terms.

The default behavior is to allow any degree of parallelism for TMUDFs of function type UDRInvocationInfo::MAPPER or UDRInvocationInfo::REDUCER that have exactly one table-valued input. The default behavior forces serial execution in all other cases. The reason is that for a single table-valued input, there is a natural way to parallelize the function by parallelizing its input a la MapReduce. In all other cases, parallel execution requires active participation by the UDF, which is why the UDF needs to signal explicitly that it can handle such flavors of parallelism.

Default implementation:

  if (info.getNumTableInputs() == 1 &&
      (info.getFuncType() == UDRInvocationInfo::MAPPER ||
       info.getFuncType() == UDRInvocationInfo::REDUCER))
    plan.setDesiredDegreeOfParallelism(UDRPlanInfo::ANY_DEGREE_OF_PARALLELISM);
  else
    plan.setDesiredDegreeOfParallelism(1); // serial execution

See also:: UDRPlanInfo::setDesiredDegreeOfParallelism(); UDRInvocationInfo::setFuncType()

Parameters:

	info	A description of the UDR invocation.
	plan	Plan-related description of the UDR invocation.

Exceptions:

UDRException

void UDR::describeParamsAndColumns ( UDRInvocationInfo & info ) [virtual]

First method of the compiler interface (optional).

Describe the output columns of a TMUDF, based on a description of its parameters (including parameter values that are specified as a constant) and the description of the table-valued input columns.

When the compiler calls this, it will have set up the formal and actual parameter descriptions as well as an output column description containing all the output parameters defined in the CREATE FUNCTION DDL (if any).

This method should do a general check of things it expects that can be validated at this time. Things to check:

Number, types and values of actual parameters.
Number of table-valued inputs and columns of these inputs.
PARTITION BY and ORDER BY clause specified for input tables.
Other things like user ids, etc.

Setting the function type with the UDRInvocationInfo::setFuncType() method will help the compiler generate more efficient code,

The method should then generate a description of the table-valued output columns, if applicable and if the columns provided at DDL time are not sufficient. The "See also" section points to methods to set these values.

Columns of the table-valued output can be declard as "pass-thru" columns to make many optimizations simpler.

This method must also add to or alter the formal parameter list to match the list of actual parameters.

The default implementation does nothing.

See also:: UDRInvocationInfo::par(); UDRInvocationInfo::getNumTableInputs(); UDRInvocationInfo::in(); UDRInvocationInfo::setFuncType(); UDRInvocationInfo::addFormalParameter(); UDRInvocationInfo::addPassThruColumns(); TupleInfo::addColumn(); TupleInfo::addIntegerColumn(); TupleInfo::addLongColumn(); TupleInfo::addCharColumn(); TupleInfo::addVarCharColumn(); TupleInfo::addColumns(); TupleInfo::addColumnAt(); TupleInfo::deleteColumn(int); TupleInfo::deleteColumn(const std::string &)

Parameters:

info

A description of the UDR invocation.

Exceptions:

UDRException

void UDR::describePlanProperties	(	UDRInvocationInfo &	info,
		UDRPlanInfo &	plan
	)			`[virtual]`

Sixth method of the compiler interface (optional).

The query optimizer calls this method once for every plan alternative considered for a UDR invocation. It provides the required partitioning and ordering of the result. The UDR writer can decide whether these requirements are acceptable to the UDR and whether any partitioning or ordering of the table-valued inputs is required to produce the required result properties.

TBD: Default behavior.

Parameters:

	info	A description of the UDR invocation.
	plan	Plan-related description of the UDR invocation.

Exceptions:

UDRException

void UDR::describeStatistics ( UDRInvocationInfo & info ) [virtual]

Fourth method of the compiler interface (optional).

Set up statistics for the table-valued result.

When the optimizer calls this method, it will have synthesized some statistics for the table-valued inputs, if any. The UDR writer can now indicate the estimated row count for the table-valued result and estimated number of unique values for the output columns.

The default implementation does nothing. If no estimated cardinality is set for the output table and no estimated number of unique values is set for output columns, the optimizer will make default assumptions. Here are some of these default assumptions:

UDRs of type UDRInvocationInfo::MAPPER return one output row for each row in their largest input table.
UDRs of type UDRInvocationInfo::REDUCER return one output row for every partition in their largest partitioned input table.
For output columns that are passthru columns, the estimated unique entries are the same as for the underlying column in the table-valued input.
Other default cardinality and unique entry counts can be influenced with defaults (CONTROL QUERY DEFAULT) in Trafodion SQL.

See also:: UDRInvocationInfo::setFuncType(); ColumnInfo::getEstimatedUniqueEntries(); ColumnInfo::setEstimatedUniqueEntries(); TableInfo::getEstimatedNumRows(); TableInfo::setEstimatedNumRows(); TableInfo::getEstimatedNumPartitions()

Parameters:

info

A description of the UDR invocation.

Exceptions:

UDRException

void UDR::emitRow ( UDRInvocationInfo & info )

Emit a row of the table-valued result.

This method can only be called from within processData().

Parameters:

info

A description of the UDR invocation.

Exceptions:

UDRException

int UDR::getFeaturesSupportedByUDF ( ) [virtual]

For versioning, return features supported by the UDR writer.

This method can be used in the future to facilitate changes in the UDR interface. UDR writers will be able to indicte through this method whether they support new features.

The default implementation returns 0 (no extra features are supported).

Returns:: A yet to be determined set of bit flags or codes for supported features.

bool UDR::getNextRow	(	UDRInvocationInfo &	info,
		int	tableIndex = `0`
	)

Read a row of a table-value input.

This method can only be called from within processData().

Parameters:

	info	A description of the UDR invocation.
	tableIndex	Indicator for which table-valued input to read data.

Returns:: true if another row could be read, false if it reached end of data.

Exceptions:

UDRException

void UDR::processData	(	UDRInvocationInfo &	info,
		UDRPlanInfo &	plan
	)			`[virtual]`

Runtime code for UDRs (required).

This is the only method that is mandatory in the implementation of a UDR (in addition to the factory method).

This method needs to set the output column values and emit rows by calling the emitRows() method. It can read rows from table-valued inputs, using the getNextRow() method.

See also:: TupleInfo::setInt(); TupleInfo::setString(); emitRow(); getNextRow(); TupleInfo::getInt(); TupleInfo::getString(); UDRInvocationInfo::copyPassThruData()

Parameters:

	info	A description of the UDR invocation.
	plan	Plan-related description of the UDR invocation.

Exceptions:

UDRException

The documentation for this class was generated from the following files:

sqludr.h
sqludr.cpp

tmudr::UDR Class Reference

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation