Casual mass parallel computing

50 %
50 %
Information about Casual mass parallel computing

Published on March 14, 2014

Author: aragozin



Slide deck from NoSQL day in Minsk

Casual mass parallel data processing in Java Alexey Ragozin Mar 2014

Building new bicycle …

Build Vs. Buy Build • No dedicated team to support infrastructure • Very specific tasks • Exclusive use of infrastructure • Reasonable scale Buy • Product can bought as service (internal or external) • Large scale • Multi tenancy • You are going to use advanced features (e.g. map/reduce)

“Casual” computing • Small computation farms (< 100 servers) • Team owns both application and grid • Java platform • Reasonably short batches (< 24 hours) • Reasonably small data sets (< 10 TiB)

Simple master slave topology Master process Task queue Slave Slave Slave Scheduler AdvertiseTask Report

Simple master slave topology Control plane  RMI Queue / scheduler  Simple in memory queue  May be more complex than just task queue Data plane …

Data plane Never, ever, try to send data over RMI  File system  Avoid network mounts! In-memory key-value  Client side sharding works best Disk database (RDBMS or NoSQL)  Consider prefetch of data Direct socket streaming …

Distributed objects revised Pit falls of CORBA/RMI • IDL – functional contract • IDL – protocol Separating concerns • Functional contract – wrapper object • Protocol – hidden remote interface

Distributed objects revised Renewed distributed objects paradigm Strong • Polymorphism • Encapsulation  Network protocol, caching aspects etc Weak • Homogenous code base required • Synchronous network communications

Brute force  Build / package  Deploy / SCP  Restart slaves  Start batch  Change code, repeat Deployment problem Computation grid software  Compile and run batch Behind scene  Your classes would be collected  Associated with batch  Deployed on participating slaves

Central scheduler topology Batch controller Slave Slave Slave Pull task Task Report Queue server Task queue Batch controller Add tasks Consume reports

Or more elaborated

Flow organized tasks • Input data available before task starts • e.g. Map/Reduce Collaborative tasks • Tasks communicate intermediate results to each other • e.g. physic simulations Flavors of parallel processing

Get back to data plane Rules of thumb • Insert / delete – never update • Write locally (reducing risks) • Read remotely (retry on error) • Store input as is  File system  Document / column oriented NoSQL • Input and temporary data is different  Choose right store for each

Exploiting file system Avoid network file systems • File system concept is not designed to be distributed • Good network file system cannot not exists • Use simple remote file access protocols • SCP (unencrypteddatatransferoptionsaddedbyCERNguys) • HTTP (ifyoureallydonotwantSCP) Cheap SAN could be build from open source

Algorithmic optimization Parallel computing • N times speed up will increase your OPEX and CAPEX cost by N*lg(N) Algorithmic optimization • Up front costs only • Orders of magnitude optimization opportunities • Exciting coding • Ecological way of computing 

Streaming algorithms Finding N most frequent elements • Min-Count Estimating number of unique values • HyperLogLog Distribution histograms

NanoCloud – drastically simplified coding for computing clusters

@Test public void hello_remote_world() { Cloud cloud = CloudFactory.createSimpleSshCloud(); cloud.node("").exec(new Callable<Void>(){ @Override public Void call() throws Exception { String localhost = InetAddress.getLocalHost().toString(); System.out.println("Hi! I'm running on " + localhost); return null; } }); } As easy as …

All you need is … NanoCloud requirements  SSHd  Java (1.6 and above) present  Works though NAT and firewalls  Works on Amazon EC2  Works everywhere where SSH works

Master – slave communications Master process Slave hostSSH (Single TCP) Slave Slave RMI (TCP) std err std out std in diag Slave controller Slave controller multiplexed slave streams Agent

Links NanoCloud • • Maven Central: org.gridkit.lab:telecontrol-ssh:0.7.23 • ANT task •

Thank you Alexey Ragozin - my articles - my open source code - community events in Moscow

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Casual mass parallel data processing in Java - YouTube

Casual mass parallel data processing in Java Altoros. Subscribe Subscribed Unsubscribe 340 340. ... Association for Computing Machinery (ACM) ...
Read more

Parallel computing - Documents

2. Overview What is Parallel Computing Why Use Parallel Computing Concepts and Terminology Parallel Computer Memory Architectures Parallel Programming ...
Read more

Parallel Computing - Documents

A nice book about parallel computing. A nice book about parallel computing.
Read more

Parallel Computing - Documents

Parallel Computing Lecturer: Satinder Pal Singh E-mail: CS-517 Parallel Efficient Algorithms Slide 1 Course Contents Slide 2 Recommended ...
Read more

Parallel Computing - Documents

Fundamentals of Parallel Computer Architecture Multichip and Multicore Systems Yan Solihin North Carolina State University ... Share Parallel Computing.
Read more

Parallel Computing - Documents

Parallel Computing Parallel Computing Lecturer: Satinder Pal Singh E-mail: CS-517 Parallel Efficient Algorithms Slide 1 Course Contents ...
Read more