Presto

80 %
20 %
Information about Presto
Software

Published on April 22, 2014

Author: chenchunss

Source: slideshare.net

Presto chenchun@meituan.com Thursday, 24 April, 14

Content • Background • Architecture • Key points for low query latency • TPCH benchmark test • What we do • Reference Thursday, 24 April, 14

Background • 300+PB data stored in Hadoop/HDFS-based clusters • More queries and get results faster improves analysts, data scientists, and engineers productivity • MapReduce and Hive are designed for large-scale, reliable computation • External projects too nascent or did not meet our requirements for flexibility and scale Thursday, 24 April, 14

Architecture Thursday, 24 April, 14

Key points for low latency • In memory parallel computing • Pipeline • Data local computation • Data cache • Dynamic compile part of the plan to byte code • Careful use of memory and data structure • BlinkDB liked approximate queries • Traditional SQL optimize • GC control Thursday, 24 April, 14

Compile flow Thursday, 24 April, 14

In memory parallel computing select c1.rank, count(*) from dim.city c1 join dim.city c2 on c1.id = c2.id where c1.id > 10 group by c1.rank limit 10; Thursday, 24 April, 14

In memory parallel computing Thursday, 24 April, 14

In memory parallel computing Thursday, 24 April, 14

In memory parallel computing • PlanDistribution=Source – InputSplit[] splits = inputFormat.getSplits(jobConf , 0); • PlanDistribution=Hash – Hash Shuffle – Fixed Workers – query.initial-hash-partitions Thursday, 24 April, 14

SplitRunner thread number task.shard.max-threads=availableProcessors() * 4 Pipeline - TaskExecutor Thursday, 24 April, 14

Pipeline - Operator process flow Page(max page size: 1MB, max rows: 16 * 1024 ) Thursday, 24 April, 14

Pipeline - ExchangeOperator Thursday, 24 April, 14

Data local computation • Select acceptable nodes (as least 10 nodes by default) – Nodes has the same address – If not enough, add nodes in the same rack – If not enough, randomly select nodes in other racks • Select the node with the smallest number of assignments (pending tasks) Thursday, 24 April, 14

Data cache • Google Guava LoadingCache • Cached Objects – HiveMeta database table partition – Byte Code Class FilterAndProjectOperatorFactoryFactory, ScanFilterAndProjectOperatorFactoryFactory – functions Thursday, 24 April, 14

Dynamic compile plan to byte code • Presto dynamic compile FilterAndProjectOperator and ScanFilterAndProjectOperator to byte code which lets the JIT optimize and generate native machine code • How much does it speed up ? • ScanFilterAndProjectOperator Thursday, 24 April, 14

Careful use mem & data structure • Slice – Unsafe#copyMemory – 20% ~ 30% speed up for ORCFile write performance • ThreadLocalRandom – ThreadLocal seed instead of AtomicLong – 100% speed up • ListenableFuture – Async Callback Thursday, 24 April, 14

Approximate queries • approx_avg, approx_distinct, approx_percentile • +50% speed up Thursday, 24 April, 14

Traditional SQL optimize • ImplementSampleAsFilter • LimitPushDown • MaterializeSamplePullUp • MergeProjections • PredicatePushDown • PruneRedundantProjections • PruneUnreferencedOutputs • SetFlatteningOptimizer • SimplifyExpressions • UnaliasSymbolReferences Thursday, 24 April, 14

GC control • A JDK 1.7 BUG • When code cache fills up, there is a chance that JIT might stop compile byte code to native code. • By forcing classes to unload from the perm gen, we let the code cache evictor make room before the cache fills up. • System.gc() Thursday, 24 April, 14

TPCH benchmark test • Run presto-main/src/test/java/com/facebook/ presto/benchmark/BenchmarkSuite.java • A part of the result as below Thursday, 24 April, 14

What we do • Support kerberos authentication • Implicit type coercion • Support reading lzo compressed tables • Implement useful functions • Fix planning issue when using DISTICT aggregations in HAVING clause • https://github.com/MTDATA/presto/commits/ mt-0.60 Thursday, 24 April, 14

Reference • http://prestodb.io/ • https://www.facebook.com/notes/facebook- engineering/presto-interacting-with-petabytes-of- data-at-facebook/10151786197628920 • http://www.slideshare.net/zhusx/presto-overview? from_search=1 • http://www.slideshare.net/frsyuki/hadoop-source- code-reading-15-in-japan-presto Thursday, 24 April, 14

Thanks Thursday, 24 April, 14

Add a comment

Related presentations

Speaker: Matt Stine Developing for the Cloud Track Marc Andressen has famou...

This presentation explains how to develop a Web API in Java using (JAX-RS or Restl...

1 App,

1 App,

November 10, 2014

How to bring innovation to your organization by streamlining the deployment proces...

Cisco Call-control solutions can handle voice, video and data

Nathan Sharp of Siemens Energy recently spoke at the SAP Project Management in Atl...

Related pages

presto - motipdupli.com

Die MOTIP DUPLI Group, europäischer Marktführer in Lackaerosolen mit den Marken DUPLI-COLOR, presto, MoTip, bietet dem Selbermacher alles für die ...
Read more

Presscontainer und Müllpressen von PRESTO

Konzeption, Produktion und Vertrieb von Abfallentsorgungssystemen und Entwässerungsanlagen. Presscontainer und Müllpressen von PRESTO.
Read more

Presto – Wikipedia

Presto steht für: Kunst/Kultur. Presto (Musik), eine Tempobezeichnung in der Musik; Presto (Album), ein Album der Band Rush; Presto (Band), eine ...
Read more

Presto - Download - CHIP

Presto Das Linux-Betriebssystem Presto des Xandros-Entwicklers verspricht die Bootzeit Ihres PCs oder Laptops erheblich zu verringern. Statt minutenlang zu ...
Read more

Pizzeria Presto Presto Recklinghausen | Hertener Str. 207 ...

Recklinghausen | Pizzeria Presto Presto in 45659 Recklinghausen. Pizza in Recklinghausen online bestellen.
Read more

Presto (Musik) – Wikipedia

presto (ital. „schnell“, frz. „vite“) ist eine musikalische Vortragsbezeichnung, die ein sehr schnelles Tempo vorschreibt. Presto gilt ...
Read more

PRESTO! my PC | Aktuelles und Neuigkeiten zu ...

Ein Thema was die Frauen mehr interessiert als die Herren, sind schöne, dichte, lange und [MEHR ZU DIESEM THEMA]
Read more

Presto (company) - Wikipedia, the free encyclopedia

Presto is an Australian streaming company which offers subscriptions to unlimited viewing of selected movies, and from 2015, TV shows. The service ...
Read more

Presto Pagemanager 7 Deluxe - PC-WELT

Presto Pagemanager 7 Deluxe ist eine Software zur Verwaltung und Bearbeitung von Dokumenten, die sich auch für den Einsatz im Team eignet. Presto ...
Read more

Presto! PageManager Professional Download – GIGA

Presto! PageManager Professional 8 Download bei GIGA. Nach dem Presto! PageManager Professional Download können Sie die Verwaltung und Freigabe von ...
Read more