Published on February 20, 2014
Hive on Tez Gunther Hagleitner (email@example.com) © Hortonworks Inc. 2013.
Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Stinger Project (announced February 2013) Hive 0.11, May 2013: • Base Optimizations • SQL Analytic Functions • ORCFile, Modern File Format Goals: Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB Hive 0.12, October 2013: • • • • VARCHAR, DATE Types ORCFile predicate pushdown Advanced Optimizations Performance Boosts via YARN Coming Soon: SQL Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop © Hortonworks Inc. 2013. • • • • • Hive on Apache Tez Query Service Buffer Cache Cost Based Optimizer (Optiq) Vectorized Processing
SQL: Enhancing SQL Semantics Hive SQL Datatypes Hive SQL Semantics SQL Compliance INT SELECT, INSERT TINYINT/SMALLINT/BIGINT GROUP BY, ORDER BY, SORT BY BOOLEAN JOIN on explicit join key FLOAT Inner, outer, cross and semi joins DOUBLE Sub-queries in FROM clause Hive 12 provides a wide array of SQL data types and semantics so your existing tools integrate more seamlessly with Hadoop STRING ROLLUP and CUBE TIMESTAMP UNION BINARY Windowing Functions (OVER, RANK, etc) DECIMAL Custom Java UDFs ARRAY, MAP, STRUCT, UNION Standard Aggregation (SUM, AVG, etc.) DATE Advanced UDFs (ngram, Xpath, URL) VARCHAR Sub-queries in WHERE, HAVING CHAR Expanded JOIN Syntax SQL Compliant Security (GRANT, etc.) INSERT/UPDATE/DELETE (ACID) © Hortonworks Inc. 2013. Available Hive 0.12 Roadmap
Stinger: Hive performance Feature Description Benefit Tez Integration Tez is significantly better engine than MapReduce Latency Vectorized Query Take advantage of modern hardware by processing thousand-row blocks rather than row-at-a-time. Throughput Query Planner ORC File Using extensive statistics now available in Metastore to better plan and optimize query, including predicate pushdown during compilation to eliminate portions of input (beyond partition pruning) Latency Columnar, type aware format with indices Latency Cost Based Optimizer Join re-ordering and other optimizations based on (Optiq) column statistics including histograms etc. (future) © Hortonworks Inc. 2013. Latency Page 4
Hive on Tez – Basics • • • • • • Hive/Tez integration in Hive 0.13 (Tez 0.2.0) Hive/Tez 0.3.0 available in branch Hive on MR works unchanged on hadoop 1 and 2 Hive on Tez is only available on hadoop 2 Turn on: hive.execution.engine=tez Hive looks and feels the same on both MR and Tez – CLI/JDBC/UDFs/SQL/Metastore • Explain/reporting is similar to MR, but reflects differences in plan/execution • Deployment is simple (Tez comes as a client-side library) • Job client can submit MR via Tez – And emulate NES too (duck hunt anyone?) © Hortonworks Inc. 2013. Page 5
Query 88 select * from (select count(*) h8_30_to_9 from store_sales JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk JOIN time_dim ON store_sales.ss_sold_time_sk = time_dim.t_time_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk where time_dim.t_hour = 8 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or (household_demographics.hd_dep_count = 1 and household_demographics.hd_vehicle_count<=1+2)) and store.s_store_name = 'ese') s1 JOIN (select count(*) h9_to_9_30 from store_sales ... • 8 full table scans © Hortonworks Inc. 2013. Page 6
Query 88: M/R Total MapReduce jobs = 29 ... Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 39 seconds 380 msec OK 345617 687625 686131 1032842 1030364 606859 604232 692428 Time taken: 403.28 seconds, Fetched: 1 row(s) © Hortonworks Inc. 2013. Page 7
Query 88: Tez Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 241/241 Map 19: 1/1 Map 2: 1/1 Map 20: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 241/241 Map 27: 1/1 Map 28: 1/1 Map 29: 1/1 Map 30: 240/240 Map 32: 241/241 Map 34: 1/1 Map 36: 1/1 Map 37: 1/1 Map 38: 1/1 Map 42: 1/1 Map 43: 1/1 Map 44: 240/240 Reducer 10: 1/1 Reducer 17: 1/1 Reducer 25: 1/1 Reducer 33: 1/1 Reducer 4: 1/1 Reducer 40: 1/1 Reducer 45: 1/1 Reducer 47: 1/1 Reducer 5: 1/1 Reducer 7: 1/1 Reducer 8: 1/1 Reducer 9: 1/1 Status: Finished successfully OK 345617 687625 686131 1032842 1030364 606859 Time taken: 90.233 seconds, Fetched: 1 row(s) © Hortonworks Inc. 2013. Map 13: 1/1 Map 18: 1/1 Map 21: 1/1 Map 26: 1/1 Map 3: 241/241 Map 35: 1/1 Map 39: 241/241 Map 46: 241/241 Reducer 31: 1/1 Reducer 41: 1/1 Reducer 6: 1/1 604232 692428 Page 8
HIVE-4660 Hive-on-MR vs. Hive-on-Tez SELECT g1.x, g1.avg, g2.cnt FROM (SELECT a.x, AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1 JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2 ON (g1.x = g2.x) ORDER BY avg; Hive – MR M M Hive – Tez GROUP b BY b.x M M GROUP a BY a.x R Tez avoids unnecessary writes to HDFS GROUP BY x M M R M M M R R HDFS JOIN (a,b) HDFS M GROUP BY a.x JOIN (a,b) R M R R ORDER BY HDFS M ORDER BY R © Hortonworks Inc. 2013. R M
Hive on Tez - Execution Feature Tez Session Tez Container PreLaunch Description Overcomes Map-Reduce job-launch latency by prelaunching Tez AppMaster Latency Overcomes Map-Reduce latency by pre-launching hot containers ready to serve queries. Latency Finished maps and reduces pick up more work Tez Container Re-Use rather than exiting. Reduces latency and eliminates difficult split-size tuning. Out of box performance! Runtime reRuntime query tuning by picking aggregation configuration of DAG parallelism using online query statistics Tez In-Memory Cache Hot data kept in RAM for fast access. Complex DAGs © Hortonworks Inc. 2013. Benefit Tez Broadcast Edge and Map-Reduce-Reduce pattern improve query scale and throughput. Latency Throughput Latency Throughput Page 10
Pipelined Splits Reduce start-up latency by launching tasks as soon as they are ready hive.orc.compute.splits.num.threads=10 Map task Tez AM tez.am.grouping.split-waves=1.4 © Hortonworks Inc. 2013.
Pipelined Early Exit (Limit) hive.orc.compute.splits.num.threads=10 Map task Done! tez.am.grouping.split-waves=1.4 © Hortonworks Inc. 2013.
HIVE-5775 Statistics and Cost-based optimization • Statistics: – Hive has table and column level statistics – Used to determine parallelism, join selection • Optiq: Open source, Apache licensed query execution framework in Java – Used by Apache Drill, Apache Cascade, Lucene DB, … – Based on Volcano paper – 20 man years dev, more than 50 optimization rules • Goals for hive – – – – Ease of Use – no manual tuning for queries, make choices automatically based on cost View Chaining/Ad hoc queries involving multiple views Help enable BI Tools front-ending Hive Emphasis on latency reduction • Cost computation will be used for Join ordering Join algorithm selection Tez vertex boundary selection © Hortonworks Inc. 2013. Page 13
Broadcast Join • Similar to map-join w/o the need to build a hash table on the client • Will work with any level of sub-query nesting • Uses stats to determine if applicable • How it works: – Broadcast result set is computed in parallel on the cluster – Join processor are spun up in parallel – Broadcast set is streamed to join processor – Join processors build hash table – Other relation is joined with hashtable • Tez handles: – Best parallelism – Best data transfer of the hashed relation – Best scheduling to avoid latencies © Hortonworks Inc. 2013.
1-1 Edge • Typical star schema join involve join between large number of tables • Dimension aren’t always tiny (Customer dimension) • Might not be able to handle all dimensions in single vertex as broadcast joins • Tez allows streaming records from one processor to the next via a 1-1 Edge – Transfer details (streaming, files, etc) are handled transparently – Scheduling/cluster capacity is worked out by Tez • Allows hive to build a pipeline of in memory joins which we can stream records through © Hortonworks Inc. 2013.
Dynamically partitioned Hash join • Kicks in when large table is bucketed – Bucketed table – Dynamic as part of query processing • Will use custom edge to match the partitioning on the smaller table • Allows hash-join in cases where broadcast would be too large • Tez gives us the option of building custom edges and vertex managers – Fine grained control over how the data is replicated and partitioned – Scheduling and actual data transfer is handled by Tez © Hortonworks Inc. 2013.
Shuffle join and Grouping w/o sort • In the MR model joins, grouping and window functions are typically mapped onto a shuffle phase • That means sorting will be used to implement the operator • With Tez it’s easy to switch out the implementation – Decouples partitioning, algorithm and transfer of the data – The nitty-gritty details are still abstracted away (data movement, scheduling, parallelism) • With the right statistics that let’s hive pick the right implementation for the query at hand © Hortonworks Inc. 2013.
Union all • Common operation in decision support queries • Caused additional no-op stages in MR plans – Last stage spins up multi-input mapper to write result – Intermediate unions have to be materialized before additional processing • Tez has union that handles these cases transparently w/o any intermediate steps © Hortonworks Inc. 2013.
Multi-insert queries • Allows the same input to be split and written to different tables or partitions – Avoids duplicate scans/processing – Useful for ETL – Similar to “Splits” in PIG • In MR a “split” in the operator pipeline has to be written to HDFS and processed by multiple additional MR jobs • Tez allows to send the mulitple outputs directly to downstream processors © Hortonworks Inc. 2013.
Putting it all together Tez Session populates container pool Dimension table calculation and HDFS split generation in parallel Dimension tables broadcasted to Hive MapJoin tasks Final Reducer prelaunched and fetches completed inputs TPCDS – Query-27 with Hive on Tez Architecting the Future of Big Data © Hortonworks Inc. 2013 Page 20
TPC-DS 10 TB Query Times (Tuned Queries) Data: 10 TB data loaded into ORCFile using defaults. Hardware: 20 nodes: 24 CPU threads, 64 GB RAM, 5 local SATA drives each. Queries: TPC-DS queries with partition filters added. © Hortonworks Inc. 2013. Page 21
TPC-DS 30 TB Query Times (Tuned Queries) Data: 30 TB data loaded into ORCFile using defaults. Hardware: 14 nodes: 2x Intel E5-2660 v2, 256GB RAM, 23x 7.2kRPM drives each. Queries: TPC-DS queries with partition filters added. © Hortonworks Inc. 2013. Page 22
Tez Concurrency Over 30 TB Dataset • Test: – Launch 20 queries concurrently and measure total time to completion. – Used the 20 tuned TPC-DS queries from previous slide. – Each query used once. – Queries hit multiple different fact tables with different access patterns. • Results: – The 20 queries finished within 27.5 minutes. – Average 82.65 seconds per query. • Data and Hardware details: – 30 Terabytes of data loaded into ORCFile using all defaults. – Hardware: 14 nodes: 2x Intel E5-2660 v2, 256GB RAM, 23x 7.2kRPM drives each. © Hortonworks Inc. 2013. Page 23
HUG Meetup Jan 2015: Apache Flink - Fast and reliable large-scale data processing - Duration: 32:44. ydntheater 390 views
Hadoop Meetup (HUG) February 2014: Pig On Tez - Part 2/2. Skip navigation ... Hadoop Meetup (HUG) February 2014: Hive On Tez - Duration: 24:57.
Cascading on Apache Tez February 2014 HUG : Hive On Tez ... My comments on Hortonworks' benchmarks of 'Hive on Tez' vs 'Hive on Spark' vs 'Spark SQL'
View 2552 Tez posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn. LinkedIn Home What is LinkedIn? Join Today
Apache™ Tez generalizes the MapReduce ... Incubator in February 2013 and then ... were released as part of Hive 0.13 on Apr 21, 2014 ...
February 2014 HUG : Hive On Tez. February 2014 HUG : Tez Details and Insides. Apache Tez: Accelerating Hadoop Query Processing. Login or Join. Processing