This is the source-only release of the Apache Trafodion (incubating) project. In addition to including a number of new features and improvements across the project, the focus of this release is to comply with Apache release guidelines.

Build instructions are available here.

Supported Platforms

The following platforms are supported in this release.

Operating Systems CentOS 6.5 – 6.7
Hadoop Distributions Cloudera distributions CDH 5.3.x
Hortonworks distribution HDP 2.2
Java Version JDK 1.7.0_67 or newer
HBase Version HBase 0.98.x

Enhancements

This release contains the following new features.

Category Feature Defect ID
Marketability, Infrastructure, and Scalability
  • Critical and high defect repairs.
  • Infrastructure refresh, including support for HDP 2.2 and CDH 5.3
Performance
  • Query plan quality improvements identified by PoC/benchmarks, such as costing changes to help generate MDAM plans when appropriate.
  • Performance and hardening improvements to user-defined routines (UDRs) (that is, stored procedures and C scalar user-defined functions (UDFs)).
  • Hybrid Query Cache, which improves transaction performance and efficiency by moving the compiler’s SQL similarity detection check to the parser phase. Query caching allows reuse of pre-existing optimized SQL execution plans thereby eliminating costly compile and optimization overhead.
  • Skew Buster, a patented feature in Trafodion that can recognize situations where data is skewed in intermediate stages of a query and adjust the query plan and execution time redistribution of intermediate data to ensure that all data is evenly distributed over all processing nodes.
  • Immediate Update Statistics for the entire table based on the sample taken during fast data loading. (Technology Preview–Complete But Not Fully Tested)
High Availability (HA) and Distributed Transaction Management (DTM)
  • Transaction management efficiency and performance enhancements.
  • Improvement of overall cluster HA now that the Trafodion Transaction Manager (TM) process is a persistent process, which eliminates the triggering of a node failure if the TM fails.
  • DTM local transaction support to minimize the overhead of transactions by eliminating interactions with the TM process when the scope is local to the client that began the transaction. (Technology Preview–Work in Progress)
  • Ability to run DDL statements in transactions, thus providing database consistency protection for DDL operations. (Technology Preview–Work in Progress)
  • Stateless/Stateful Concurrency Control (SSCC), which ensures that transactions prevent data contention anomalies that can corrupt the consistency of a database. It works by preventing user transactions that interfere with each other’s data. SSCC is an extension of the Snapshot Isolation (SI) algorithm. SI prevents the majority of anomalies associated with data corruption and provides superior isolation to Multi-Version Concurrency Control (MVCC), which is used by DTM today. (Technology Preview–Work in Progress)
Usability
Manageability
  • Support for HP Data Services Manager (HP DSM), a unified, browser-based tool for management of Hadoop, Vertica, and now Trafodion data services. NOTE: The version of HP DSM that integrates with Trafodion is not yet available.
  • Stability and overhead optimizations to reduce the overhead of capturing and maintaining query performance information (in repository tables).
  • Query cancel for DDL, update statistics, and additional child query operations. For details, see the CONTROL QUERY CANCEL statement in the Trafodion SQL Reference Manual.

Security
  • Security subsystem hardening improvements including performance and QA testing.
  • Security enhancements for the Trafodion metadata, data loader, and Data Connectivity Services (DCS).
  • Upgrade authorization.
  • Ability to grant privileges on behalf of a role using the GRANTED BY clause. For details, see the GRANT statements in the Trafodion SQL Reference Manual.
Installer
  • Prompts to configure and enable security.
  • Support for the latest distributions, HDP 2.2 and CDH 5.3.

Fixes

This release contains fixes to around 96 defects, including 17 critical defects, 53 high defects, 20 medium defects, and two low defects. Those defects were filed through Launchpad.

Known Issues

EXECUTE.BATCH update creates core-file

Defect: 1274962

Symptom: EXECUTE.BATCH hangs for a long time doing updates, and the update creates a core file.

Cause: To be determined.

Solution: Batch updates and ODBC row arrays do not currently work.

Random update statistics failures with HBase OutOfOrderScannerNextException

Defect: 1391271

Symptom: While running update statistics commands, you see HBase OutOfOrderScannerNextException errors.

Cause: The default hbase.rpc.timeout and hbase.client.scanner.timeout.period values might be too low given the size of the tables. Sampling in update statistics is implemented using the HBase Random RowFilter. For very large tables with several billion rows, the sampling ratio required to get a sample of 1 million rows is very small. This can result in HBase client connection timeout errors since there may be no row returned by a RegionServer for an extended period of time.

Solution: Increase the hbase.rpc.timeout and hbase.client.scanner.timeout.period values. We have found that increasing those values to 600 seconds (10 minutes) might sometimes prevent many timeout-related errors. For more information, see the HBase Configuration and Fine Tuning Recommendations.

If increasing the hbase.rpc.timeout and hbase.client.scanner.timeout.period values does not work, try increasing the chosen sampling size. Choose a sampling percentage higher than the default setting of 1 million rows for large tables. For example, suppose table T has one billion rows. The following UPDATE STATISTICS statement will sample a million rows, or approximately one-tenth of one percent of the total rows:

update statistics for table T on every column sample;

To sample one percent of the rows, regardless of the table size, you must explicitly state the sampling rate as follows:

update statistics for table T on every column sample random 1 percent;

Following update statistics, stats do not take effect immediately

Defect: 1409937

Symptom: Immediately following an update statistics operation, the generated query plan does not seem to reflect the existence of statistics. For example, in a session, you create, and populate a table and then run update statistics on the table, prepare a query, and exit. A serial plan is generated and the estimated cardinality is 100 for both tables. In a new session, you prepare the same query, and a parallel plan is generated where the estimated cardinality reflects the statistics.

Cause: This is a day-one issue.

Solution: Retry the query after two minutes. Set CQD HIST_NO_STATS_REFRESH_INTERVAL to ‘0’. Run an UPDATE STATISTICS statement. Perform DML operations in a different session.

Back to top


Apache, Apache Maven, Apache Maven Fluido Skin, the Apache feather logo, the Apache Maven project logo and the Apache Incubator project logo are trademarks of the Apache Software Foundation.