Top 10 Performance Gotchas for scaling in-memory Algorithms.

50 %
50 %
Information about Top 10 Performance Gotchas for scaling in-memory Algorithms.
Technology

Published on November 17, 2013

Author: srisatish

Source: slideshare.net

Description

Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O.

"Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.

Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."

Better Predictions! H2O – The Open Source Math Engine !

H2O – Open Source in-memory Machine Learning for Big Data 4/23/13

Universe is sparse. Life is messy. 
 Data is sparse & messy.! - Lao Tzu

Hadoop = opportunity Not enough Data Scientists Analysts won’t code java

Group  By   Grep   Messy   NAs   Classifica-on   Regression   Clustering                           Ensembles 100’s       nanos     models                           H 2O Big Data the Adhoc   Explora-on   Math   Modeling   Real-­‐-me   Scoring   Prediction Engine

No New API! Big  Data   Explora-on   Modeling   Scoring   Real-­‐-me     H 2O the Prediction Engine Approximate! results each step!

Intellectual   Legacy     Math  needs     to  be  free     Open  Source     Support and Innovation hFps://github.com/0xdata/h2o   H 2O the Prediction Engine

All Top 10ʼs are binary! - Anonymous

10      Move Code not Data   Data chunks > code chunks TCP for Data. UDP for Control. >> Generated Java Assist

A Chunk, Unit of Parallel Access A Frame: Vec[] age   sex   zip   ID   car   JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap Vecs aligned in heaps l Optimized for concurrent access l Random access any row, any JVM l 

9      Chunk-ing Express!   season for Variable-sized chunks and a season Uniform chunks. Tightly-packed! (chunk is also unit of batch!)

8      Reduce early. Reduce Often!   No Expensive intermediate states. Fine-grain parallelism wins! >> Fork / Join

8      Reduce early. Reduce Often!   Vec   Vec   Vec   Vec   Vec   All CPUs grab Chunks in parallel Map/Reduce & F/J handles all sync JVM 1 Hea p JVM 2 Hea p JVM 3 Hea p JVM 4 Hea p

7      Slow is not different from Dead   Debugging slow >> Heartbeats, Messages Two General’s Paradox

6      Memory Manager   in-memory system as good as your memory manager! lazy eviction. compress. align. Corollary: Track down Leaks!

5      Memory Overheads   Use primitives // A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); } void set(long idx, double d); // writable void append(double d); // variable sized

4      Cache-­‐Oblivious   Tree size Bin size Recursively divide Till Data à Cache

3      EC2 – Nothing is bounded   User-mode reliability S3 Readers will TCP Reset Mux your connections Not all toolkits are equal. >> JetS3

2 No Locks, No Cry   Non-Blocking Data Structures. // VOLATILE READ before key compare. // CAS private final boolean CAS_kvs( final Object[] oldkvs, final Object[] newkvs ) { return _unsafe.compareAndSwapObject(this, _kvs_offset, oldkvs, newkvs ); }

1 endian wars ended! Keep-It-Simple-Serialization.   byte[ ]. roll-your-own. fast. public AutoBuffer putA1 ( byte[] ary, int sofar, int length ) { while( sofar < length ) { int len = Math.min(length - sofar, _bb.remaining()); _bb.put(ary, sofar, len); sofar += len; if( sofar < length ) sendPartial(); } return this; }

Data Movement is a Defect. Slowing down helps communication. Got Speed?  

0      Math always produces a number   Accuracy rules over speed. Predictive Performance

1      Shuffle   Data presentation bias. Sorted data => interesting results

2      Random acts of Kindness?  

3      Convex Problems: ADMM  

4  Amdahl strikes: Cholesky / QR Decomposition   Matrix operations jama, jblas.. all single node. Distributed version needs data transfer!

5    Random  Forests   embarrassingly parallel binning tree-building splits

6    Boos-ng   iterate & stage weak-learners => strong learners each tree can be parallel minimize communication

7    Neural  Nets  &  Clustering   embarrassingly parallel pre-calculate base stats distance calculation weight matrices – small footprint

8    Ensembles   Daisy chain a bunch of models Interleave. JIT – Minimize loops over data.

9      Tools   Deterministic versions first! Got Pen & Paper? Optimize often. Test Big Data soon.

Replace NAs to improves
 predictive performance by about 10pc.
 
 
 ! - Newton

Munging Missing Features
 impute NAs with mean
 impute NAs with knn
 impute with recursive pca! - Boyd

Unbalanced data
 single rare classes
 Fraud / No-Fraud! Stratify

Unbalanced data
 multiple rare classes
 Browse, Click, Purchase! Stratify

10      Data is the System   Use Customer Data Algorithms for Sparse vs. Dense Unbalanced Data. Robustness under noise

Before H2O Velocity:  Events   Online  Scoring   Volume:  HDFS   Rule  Engine   Munging slice n dice Features HIVE/SQL Applications Explora-on   Data Scientist        Modeling   Offline  Scoring   Engineer Business Analyst Ensemble models Low latency Classification Regression Clustering Optimal Model Predictions

Big  Data   Explora-on   Modeling   Scoring   Real-­‐-me     Big Data beats Better Algorithms!

Big  Data   Explora-on   Modeling   Scoring   Real-­‐-me     Big Data and Better Algorithms! Scale & Parallelism!

Intellectual   Legacy     Math  needs     to  be  free     Open  Source     Support and Innovation hFps://github.com/0xdata/h2o   H 2O the Prediction Engine

Better Predictions! H2O – The Open Source Math Engine !

Distributed Coding Taxonomy l  No Distribution Coding: l  l  l  Whole Algorithms, Whole Vector-Math! REST + JSON: e.g. load data, GLM, get results! Simple Data-Parallel Coding: l  l  l  Per-Row (or neighbor row) Math! Map/Reduce-style: e.g. Any dense linear algebra! Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank! 0xdata.c45  

Distributed Coding Taxonomy l  No Distribution Coding: l  l  Whole Algorithms, Whole Vector-Math! l  REST + JSON: e.g. load data, GLM, get results! Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math! l  l  Read  the  docs!   This  talk!   Map/Reduce-style: e.g. Any dense linear algebra! Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank! Join  our  GIT!   46  

Distributed Data Taxonomy Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame 0xdata.c47  

Usecases Conversion, Retention & Churn! •  Lead Conversion! •  Engagement! •  Product Placement! •  Recommendations! Pricing Engine! Fraud Detection!

Add a comment

Related presentations

Related pages

Top 10 Performance Gotchas in Scaling In-memory Algorithms

Mark Derricutt discusses the importance of having different read and write data models when working with RESTful web APIs.
Read more

Top 10 Performance Gotchas in scaling in-memory Algorithms ...

Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ...
Read more

Presentation: Top 10 Performance Gotchas in Scaling In ...

Bio. SriSatish Ambati is co-founder and CEO of 0xdata (@hexadata), the makers of H2O. H2O brings better in-memory math algorithms to big data and ...
Read more

Algorithms - InfoQ: Software Development News, Videos & Books

... offer several algorithms useful for ... Gotchas in Scaling In-memory Algorithms by ... Algorithms For Ultimate Performance by ...
Read more

Image scaling - Wikipedia, the free encyclopedia

... image scaling is the process of resizing ... A scaling algorithm that relies on sampling a specific number of pixels would sample non ... [10] when ...
Read more

In-memory database - Wikipedia, the free encyclopedia

An in-memory database ... Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than ...
Read more

Top 10 SQL Server 2005 Performance Issues for Data ...

Add Custom Data Mining Algorithms to ... Top 10 SQL Server 2005 Performance Issues ... The top performance bottlenecks or gotchas to avoid ...
Read more

a bitmap resize algorithm - davids wiskunde website

A Bitmap Resize Algorithm: download ... read the BMresize procedure code: Preface. The bitmap resize program has been ... The scaling factor f is replaced ...
Read more