Scaling Systems for Research Computing

50 %
50 %
Information about Scaling Systems for Research Computing

Published on February 25, 2014

Author: adamkraut



Molecular Medicine Triconference 2014

Scaling Systems for Research Computing !1

Intro to BioTeam Who, What, Why The ‘Meta’ Issue What is driving all of this? 1 2 Scalable Infrastructure 3 Scalable Software 4 Compliance 5 Q&A 6 !2

BioTeam Who, What, Why ... ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 10+ years bridging the “gap” between science, IT & high performance computing ‣ Our wide-ranging work is what gets us invited to speak at events like this ... !3

Bioinformatics and Big Iron !4

BioTeam Culture ‣ We are a distributed company • BioTeam is 100% REMOTE • All employees are MANAGERS • Workflow is mostly ASYNCHRONOUS ‣ Prefer small interdisciplinary TEAMS • Value placed on TRUST and PERFORMANCE !5

BioTeam Today ‣ 10 full-time employees in 2014 • 2 dedicated to HPC Infrastructure • 2 dedicated to Software Development • 1 dedicated to Products • 1 dedicated to Government Services • 1 dedicated to Cloud Computing ‣ 10+ years supporting Life Sciences Research !6

The ‘meta’ issue !7

Science is changing faster than IT infrastructure !8

Cloud Computing !9

Amazon vs. Other Clouds ‣ AWS has by far the most useful IaaS building blocks today • First choice for most Bio-IT use cases ‣ AWS quietly rolls out killer features • Spot Market • Virtual Private Cloud ‣ Provider decision may be based on where your data actually resides !10

Real world simulation project 11

Google Massive resources and API’s galore ‣ Google started with PaaS and worked down ‣ Google Exacycle for Visiting Faculty (closed) • 1 billion core hours on demand; what’s next? ‣ Google is DEVELOPER centric; everything has an API ‣ Culture is based on Science and Engineering !12

Tools and Techniques !13

Configuration Management Devops ‣ Required in almost every cloud project ‣ Chef/Puppet/Ansible/Fabric • Domain specific languages; Agent-based versus SSH; Abstraction ‣ Key is reducing institutionalized knowledge and sharing recipes ‣ Docker/lxc could be disrupting • Lightweight differential images; not very HPC friendly at this point ‣ Orchestration tools lagging behind provisioning and configuration ‣ Best techniques are making their way back into HPC !14

Devops !15

MIT StarCluster open-source cluster computing toolkit ‣ Ideal for most HPC use cases • Includes Grid Engine, NFS, and MPI • NEW Support for Virtual Private Cloud! ‣ Works with Spot Instances ‣ Extensible via plugins • Hadoop • HTCondor • GlusterFS • IPython Notebook !16

Private Clouds !17

Private Cloud Where is your datacenter? !18

Public Cloud AWS Regions !19

Public Cloud Google Datacenters !20

Scalable Software !21

Types of Parallelism In modern processors and coprocessors Instruction Level Vector Level Thread Level Node Level Micro-architectural techniques such as pipelined execution, out-of/in-order execution, super-scalar execution, branch prediction… Using SIMD vector processing instructions for SSE, AVX, Phi Multi-core architectures with or without Hyper-Threading Many-core architecture with smart round robin hardware multithreading Distributed Computing Cluster Computing !22

Intel Xeon Phi Coprocessor Fully functional multi-thread execution unit ‣ 50+ cores with a ring interconnect ‣ 64-bit addressing ‣ Scalar unit based on Intel Pentium family ‣ Vector unit 512-bit SIMD Instructions ‣ 4 hardware threads per core ‣ Highly Parallel device ‣ SMP on-a-chip !23

Programming Xeon Phi Choices Offloaded Native ‣ Pragma/directives based ‣ Simpler programming model ‣ Better serial processing ‣ Quicker to test key kernels ‣ More memory ‣ Some constraints ‣ Better file access ‣ Memory availability ‣ Makes full use of available resources ‣ File I/O access !24

Intel Optimization Example Mapping with Burrows-Wheeler Aligner (BWA) Xeon (baseline) Xeon (optimized) Xeon + Phi ‣ Replace pthreads with OpenMP ‣ Better load balancing ‣ Overlap I/O and Compute ‣ Better thread usage ‣ Efficient memory allocation ‣ Vectorized performance critical loops 1.86 2 1 1 1 1.24 0 0 ‣ Data prefetch to reduce memory latency Source: Life Sciences Optimization - Intel - SC13 !25

Intel Optimization Example Protein sequence analysis with MPI-HMMER Xeon Xeon + Phi ‣ No source code changes required ‣ Use #pragma unroll to improve loop performance ‣ Double nested loop in Viterbi algorithm is auto-vectorized for Xeon and Phi by Intel compilers 1.56 2 1 1 1 0 0 Source: Life Sciences Optimization - Intel - SC13 !26

Intel Optimization Example Assembly with Velour ‣ Intel and UIUC released open-source alternative to velveth ‣ > 10x reduction in memory usage • Intelligently caching portions of assembly to disk • 700GB to 60GB ‣ ‣ Cook, Jeffrey J. 2011. Scaling short read de novo DNA sequence assembly to gigabase genomes. !27

Programming Xeon Phi Recommendations ‣ Host can have multiple Phi cards ‣ MLK libraries are pre-optimized ‣ OpenMP is applicable to multi-core and manycore programming • omp offload target(mic) ‣ MPI supports distributed computation and combines with other models • OpenMP within nodes and MPI between nodes ‣ Xeon optimizations translate well to Phi !28

Parallel Programming In the Life Sciences ‣ Targets: CPU, Coprocessors, GPU, FGPA, ASIC ‣ There is no silver bullet ‣ Problem decomposition is the most critical step ‣ Think in parallel ‣ Using Intel compilers can yield ~30% speedup in many cases • vtune and other analysis tools are available ‣ Must optimize at one or more levels !29


Parallel Programming Recommendations ‣ Leaving performance on the table • Low hanging fruit; splitting input files into parts • Avoid using languages with poor concurrency model and GIL ‣ Exploit thread-level parallelism • Use multi-threading and multi-processing to fully utilize multicore processors ‣ Use Intel’s Auto-Vectorizing compiler • Take advantage of SIMD parallelism and wider vectors on Phi ‣ Prepare for a heterogenous many-core future • Hybrid Programming (OpenMP + MPI) !31

I <3 Julia A fresh approach to technical computing ‣ Homoiconic; Dynamic type system ‣ Designed for parallelism and distributed computation ‣ MATLAB-like syntax and extensive math library ‣ Call C functions directly ‣ Call Python functions ‣ IJulia Notebook ‣ Open Source !32

Compliance !33

Compliance Overview ‣ Need a compliance apparatus ‣ Often a barrier to competition ‣ Compute and Storage are easy • Policy and procedures are harder ‣ AWS and Google will now sign BAA !34

Compliance Strategy ‣ Keys are protecting data and preventing access ‣ Data management - points of control ‣ Encrypt data in flight and at rest • Use S3 server-side encryption • Google Persistent Disks are automatically encrypted ‣ Use credential rotation policies ‣ Lock down security groups and firewalls ‣ Use VPN for all public connections ‣ Log everything and audit often !35 !36

ACK ! ! !37

#pragma presentations

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages


Read more

Distributed Systems and Parallel Computing - Research at ...

Distributed Systems and Parallel Computing No ... Some of our research involves answering ... elastic resource scaling for multi-tenant cloud systems ...
Read more

Control of Large Scale Computing Systems -

Control of Large Scale Computing Systems ... IBM Thomas J. Watson Research Center Hawthorne, New York ... Along with the scaling of computing systems, ...
Read more

Scaling Spark on HPC Systems - Lawrence Berkeley National ...

Scaling Spark on HPC Systems ... Performance Computing systems. ... duction at National Energy Research Scientific Computing
Read more

Research Challenges for Computing Systems - CORDIS

Research Challenges for Computing Systems ... scaling, or Moore’s law ... The Challenges in Computing Systems Research 14
Read more

Tera-scale Computing Research Overview - Intel

The Intel® Tera-scale Computing Research Program is Intel’s overarching effort ... of building and programming systems with dozens of energy ...
Read more

Identifying Social Computing Dimensions: A ...

Identifying Social Computing ... for future research. Keywords: Social computing, ... newsgroup systems. Today’s social computing technology is ...
Read more

Real-Time Dynamic Voltage Scaling for Low-Power Embedded ...

Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems ... U.S. Airforce Office of Scientific Research under Grant AFOSR
Read more