My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
HivePatch  
Hive patch description.
HivePatch
Updated Dec 29, 2011 by huaiyin....@gmail.com

Introduction

We have implemented a Hive Patch (HIVE-2206) that has subset functionality of the correlation optimizer in the YSmart. This patch can support TPC-H Q17 and TPC-H Q18. However, since Hive uses all columns as partition keys for aggregation functions with keyword distinct and uses incremental-based aggregation functions (e.g.\ count), the correlation optimizer in HIVE-2206 cannot optimize TPC-H Q21. To support TPC-H Q21, you can use an additional patch we provided in the page of Downloads. This additional patch adds a HashSet-based UDF function count_distinct. Thus, for TPC-H Q21, you can replace count(distinct l_suppkey) with count_distinct(l_suppkey). Then, the correlation optimizer in HIVE-2206 should be able to optimize TPC-H Q21.

Usage

  • Enable the correlation optimizer in HIVE-2206: set hive.optimize.correlation=true;.
  • Disable the correlation optimizer in HIVE-2206: set hive.optimize.correlation=false;

Note: This optimizer is disabled by default.


Sign in to add a comment
Powered by Google Project Hosting