advertisement

What Is High Throughput Distributed Computing

33 %
67 %
advertisement
Information about What Is High Throughput Distributed Computing

Published on October 19, 2008

Author: guest3bd2a12

Source: slideshare.net

advertisement

What is High Throughput Distributed Computing? CERN Computing Summer School 2001 Santander Les Robertson CERN - IT Division [email_address]

Outline High Performance Computing (HPC) and High Throughput Computing (HTC) Parallel processing so difficult with HPC applications so easy with HTC Some models of distributed computing HEP applications Offline computing for LHC Extending HTC to the Grid

High Performance Computing (HPC) and High Throughput Computing (HTC)

Parallel processing

so difficult with HPC applications

so easy with HTC

Some models of distributed computing

HEP applications

Offline computing for LHC

Extending HTC to the Grid

“Speeding Up” the Calculation? Use the fastest processor available -- but this gives only a small factor over modest (PC) processors Use many processors, performing bits of the problem in parallel -- and since quite fast processors are inexpensive we can think of using very many processors in parallel

Use the fastest processor available

-- but this gives only a small factor over modest (PC) processors

Use many processors, performing bits of the problem in parallel

-- and since quite fast processors are inexpensive we can think

of using very many processors in parallel

High Performance – or – High Throughput? The key questions are – granularity & degree of parallelism Have you got one big problem or a bunch of little ones? To what extent can the “problem” be decomposed into sort-of-independent parts ( grains ) that can all be processed in parallel? Granularity fine-grained parallelism – the independent bits are small, need to exchange information, synchronise often coarse-grained – the problem can be decomposed into large chunks that can be processed independently Practical limits on the degree of parallelism – how many grains can be processed in parallel? degree of parallelism v. grain size grain size limited by the efficiency of the system at synchronising grains

The key questions are – granularity & degree of parallelism

Have you got one big problem or a bunch of little ones? To what extent can the “problem” be decomposed into sort-of-independent parts ( grains ) that can all be processed in parallel?

Granularity

fine-grained parallelism – the independent bits are small, need to exchange information, synchronise often

coarse-grained – the problem can be decomposed into large chunks that can be processed independently

Practical limits on the degree of parallelism –

how many grains can be processed in parallel?

degree of parallelism v. grain size

grain size limited by the efficiency of the system at synchronising grains

High Performance – v. – High Throughput? fine-grained problems need a high performance system that enables rapid synchronisation between the bits that can be processed in parallel and runs the bits that are difficult to parallelise as fast as possible coarse-grained problems can use a high throughput system that maximises the number of parts processed per minute High Throughput Systems use a large number of inexpensive processors, inexpensively interconnected while High Performance Systems use a smaller number of more expensive processors expensively interconnected

fine-grained problems need a high performance system

that enables rapid synchronisation between the bits that can be processed in parallel

and runs the bits that are difficult to parallelise as fast as possible

coarse-grained problems can use a high throughput system

that maximises the number of parts processed per minute

High Throughput Systems use a large number of inexpensive processors, inexpensively interconnected

while High Performance Systems use a smaller number of more expensive processors expensively interconnected

High Performance – v. – High Throughput? There is nothing fundamental here – it is just a question of financial trade-offs like: how much more expensive is a “fast” computer than a bunch of slower ones? how much is it worth to get the answer more quickly? how much investment is necessary to improve the degree of parallelisation of the algorithm? But the target is moving - Since the cost chasm first opened between fast and slower computers 12-15 years ago an enormous effort has gone into finding parallelism in “big” problems Inexorably decreasing computer costs and de-regulation of the wide area network infrastructure have opened the door to ever larger computing facilities – clusters  fabrics  (inter)national grids demanding ever-greater degrees of parallelism

There is nothing fundamental here – it is just a question of financial trade-offs like:

how much more expensive is a “fast” computer than a bunch of slower ones?

how much is it worth to get the answer more quickly?

how much investment is necessary to improve the degree of parallelisation of the algorithm?

But the target is moving -

Since the cost chasm first opened between fast and slower computers 12-15 years ago an enormous effort has gone into finding parallelism in “big” problems

Inexorably decreasing computer costs and de-regulation of the wide area network infrastructure have opened the door to ever larger computing facilities – clusters  fabrics  (inter)national grids demanding ever-greater degrees of parallelism

High Performance Computing

A quick look at HPC problems Classical high-performance applications numerical simulations of complex systems such as weather climate combustion mechanical devices and structures crash simulation electronic circuits manufacturing processes chemical reactions image processing applications like medical scans military sensors earth observation, satellite reconnaisance seismic prospecting

Classical high-performance applications

numerical simulations of complex systems such as

weather

climate

combustion

mechanical devices and structures

crash simulation

electronic circuits

manufacturing processes

chemical reactions

image processing applications like

medical scans

military sensors

earth observation, satellite reconnaisance

seismic prospecting

Approaches to parallelism Domain decomposition Functional decomposition graphics from Designing and Building Parallel Programs (Online) , by Ian Foster - http://www-unix.mcs.anl.gov/dbpp/

Domain decomposition

Functional decomposition

Of course – it’s not that simple graphic from Designing and Building Parallel Programs (Online) , by Ian Foster - http://www-unix.mcs.anl.gov/dbpp/

The design process Data or functional decomposition  building an abstract task model Building a model for communication between tasks  interaction patterns Agglomeration – to fit the abstract model to the constraints of the target hardware interconnection topology speed, latency, overhead of communications Mapping the tasks to the processors load balancing task scheduling graphic from Designing and Building Parallel Programs (Online) , by Ian Foster - http://www- unix . mcs . anl . gov / dbpp /

Data or functional decomposition

 building an abstract task model

Building a model for communication between tasks

 interaction patterns

Agglomeration – to fit the abstract model to the constraints of the target hardware

interconnection topology

speed, latency, overhead of communications

Mapping the tasks to the processors

load balancing

task scheduling

Large scale parallelism – the need for standards “ Supercomputer” market is on trouble; diminishing number of suppliers; questionable future Increasingly risky to design for specific tightly coupled architectures like - SGI (Cray, Origin), NEC, Hitachi Require a standard for communication between partitions/tasks that works also on loosely coupled systems (“massively parallel processors” – MPP – IBM SP, Compaq) Paradigm is message passing rather than shared memory – tasks rather than threads Parallel Virtual Machine - PVM MPI – Message Passing Interface

“ Supercomputer” market is on trouble; diminishing number of suppliers; questionable future

Increasingly risky to design for specific tightly coupled architectures like - SGI (Cray, Origin), NEC, Hitachi

Require a standard for communication between partitions/tasks that works also on loosely coupled systems (“massively parallel processors” – MPP – IBM SP, Compaq)

Paradigm is message passing rather than shared memory – tasks rather than threads

Parallel Virtual Machine - PVM

MPI – Message Passing Interface

MPI – Message Passing Interface industrial standard – http://www. mpi -forum.org source code portability widely available; efficient implementations SPMD (Single Program Multiple Data) model Point-to-point communication (send/receive/wait; blocking/non-blocking) Collective operations (broadcast; scatter/gather; reduce) Process groups, topologies comprehensive and rich functionality

industrial standard – http://www. mpi -forum.org

source code portability

widely available; efficient implementations

SPMD (Single Program Multiple Data) model

Point-to-point communication (send/receive/wait; blocking/non-blocking)

Collective operations (broadcast; scatter/gather; reduce)

Process groups, topologies

comprehensive and rich functionality

MPI – Collective operations IBM Redbook - http://www. redbooks . ibm .com/ redbooks /SG245380.html Defining high level data functions allows highly efficient implementations, e.g. minimising data copies

IBM Redbook - http://www. redbooks . ibm .com/ redbooks /SG245380.html

The limits of parallelism - Amdahl’s Law If we have N processors: s + p Speedup = ———— s + p/N taking s as the fraction of the time spent in the sequential part of the program ( s + p = 1 ) 1 Speedup = ————  1/s s + (1-s)/N s – time spent in a serial processor on serial parts of the code p – time spent in a serial processor on parts that could be executed in parallel Amdahl, G.M., Validity of single-processor approach to achieving large scale computing capability Proc. AFIPS Conf., Reston, VA, 1967, pp. 483-485

If we have N processors:

s + p Speedup = ———— s + p/N

taking s as the fraction of the time spent in the sequential part of the program ( s + p = 1 )

1 Speedup = ————  1/s s + (1-s)/N

Amdahl’s Law - maximum speedup

Load Balancing - real life is (much) worse Often have to use barrier synchronisation between each step, and different cells require different amounts of computation Real time sequential part s =  s i Real time parallelisable part on a sequential processor p =  k  j p k j Real time parallelised T = s +  max( p k j ) T = s +  max( p k j ) >> s + p/N s 1 s i s j s N : : : : : : t … … p k 1 p k j p k M … … p K 1 p K j p K M … … p 1 1 p 1 j p 1 M

Often have to use barrier synchronisation between each step, and different cells require different amounts of computation

Real time sequential part s =  s i

Real time parallelisable part on a sequential processor p =  k  j p k j

Real time parallelised T = s +  max( p k j )

T = s +  max( p k j ) >> s + p/N

Gustafson’s Interpretation The problem size scales with the number of processors With a lot more processors (computing capacity) available you can and will do much more work in less time The complexity of the application rises to fill the capacity available But the sequential part remains approximately constant Gustafson, J.L., Re-evaluating Amdahl’s Law, CACM 31(5), 1988, pp. 532-533

The problem size scales with the number of processors

With a lot more processors (computing capacity) available you can and will do much more work in less time

The complexity of the application rises to fill the capacity available

But the sequential part remains approximately constant

Gustafson, J.L., Re-evaluating Amdahl’s Law, CACM 31(5), 1988, pp. 532-533

Amdahl’s Law - maximum speedup with Gustafson’s appetite potential 1,000 X speedup with 0.1% sequential code

The importance of the network Communication Overhead adds to the inherent sequential part of the program to limit the Amdahl speedup Latency – the round-trip time (RTT) to communicate between two processors communications overhead c = latency + data_transfer_time s + p Speedup = ————— s + c + p/N For fine grained parallel programs the problem is latency , not bandwidth t sequential communications overhead parallelisable … … … … … …

Communication Overhead adds to the inherent sequential part of the program to limit the Amdahl speedup Latency – the round-trip time (RTT) to communicate between two processors communications overhead

c = latency + data_transfer_time

s + p Speedup = ————— s + c + p/N

For fine grained parallel programs the problem is latency , not bandwidth

Latency Comparison – Efficient MPI implementation on Linux cluster (source: Real World Computing Partnership, Tsukuba Research Center) Network Bandwidth (MByte/sec) RTT Latency (microsecond) Myrinet 146 20 Gigabit Ethernet(Sysconect) 73 61 Fast Ethernet(EEPRO100) 11 100

Comparison – Efficient MPI implementation on Linux cluster (source: Real World Computing Partnership, Tsukuba Research Center)

High Throughput Computing

High Throughput Computing - HTC Roughly speaking – HPC – deals with one large problem HTC – is appropriate when the problem can be decomposed into many (very many) smaller problems that are essentially independent Build a profile of all MasterCard customers who purchased an airline ticket and rented a car in August Analyse the purchase patterns of Wallmart customers in the LA area last month Generate 10 6 CMS events Web surfing, Web searching Database queries HPC – problems that are hard to parallelise – single processor performance is important HTC – problems that are easy to parallelise – can be adapted to very large numbers of processors

Roughly speaking –

HPC – deals with one large problem

HTC – is appropriate when the problem can be decomposed into many (very many) smaller problems that are essentially independent

Build a profile of all MasterCard customers who purchased an airline ticket and rented a car in August

Analyse the purchase patterns of Wallmart customers in the LA area last month

Generate 10 6 CMS events

Web surfing, Web searching

Database queries

HPC – problems that are hard to parallelise – single processor performance is important

HTC – problems that are easy to parallelise – can be adapted to very large numbers of processors

HTC - HPC High Throughput Granularity can be selected to fit the environment Load balancing easy Mixing workloads is easy Sustained throughput is the key goal the order in which the individual tasks execute is (usually) not important if some equipment goes down the work can be re-run later easy to re-schedule dynamically the workload to different configurations High Performance Granularity largely defined by the algorithm, limitations in the hardware Load balancing difficult Hard to schedule different workloads Reliability is all important if one part fails the calculation stops (maybe even aborts!) check-pointing essential – all the processes must be restarted from the same synchronisation point hard to dynamically re- configure for smaller number of processors

High Throughput

Granularity can be selected to fit the environment

Load balancing easy

Mixing workloads is easy

Sustained throughput is the key goal

the order in which the individual tasks execute is (usually) not important

if some equipment goes down the work can be re-run later

easy to re-schedule dynamically the workload to different configurations

High Performance

Granularity largely defined by the algorithm, limitations in the hardware

Load balancing difficult

Hard to schedule different workloads

Reliability is all important

if one part fails the calculation stops (maybe even aborts!)

check-pointing essential – all the processes must be restarted from the same synchronisation point

hard to dynamically re- configure for smaller number of processors

Distributed Computing

Distributed Computing Local distributed systems Clusters Parallel computers (IBM SP) Geographically distributed systems Computational Grids HPC – as we have seen Needs low latency AND good communication bandwidth HTC distributed systems The bandwidth is important, the latency is less significant If latency is poor more processes can be run in parallel to cover the waiting time

Local distributed systems

Clusters

Parallel computers (IBM SP)

Geographically distributed systems

Computational Grids

HPC – as we have seen

Needs low latency AND good communication bandwidth

HTC distributed systems

The bandwidth is important, the latency is less significant

If latency is poor more processes can be run in parallel to cover the waiting time

Shared Data If the granularity is course enough – the different parts of the problem can be synchronised simply by sharing data Example – event reconstruction all of the events to be reconstructed are stored in a large data store processes (jobs) read successive raw events, generating processed event records, until there are no raw events left the result is the concatenation of the processed events (and folding together some histogram data) synchronisation overhead can be minimised by partitioning the input and output data

If the granularity is course enough – the different parts of the problem can be synchronised simply by sharing data

Example – event reconstruction

all of the events to be reconstructed are stored in a large data store

processes (jobs) read successive raw events, generating processed event records, until there are no raw events left

the result is the concatenation of the processed events (and folding together some histogram data)

synchronisation overhead can be minimised by partitioning the input and output data

Data Sharing - Files Global file namespace maps universal name to network node, local name Remote data access Caching strategies Local or intermediate caching Replication Migration Access control, authentication issues Locking issues NFS AFS Web folders Highly scalable for read-only data

Global file namespace

maps universal name to network node, local name

Remote data access

Caching strategies

Local or intermediate caching

Replication

Migration

Access control, authentication issues

Locking issues

NFS

AFS

Web folders

Highly scalable for read-only data

Data Sharing – Databases, Objects File sharing is probably the simplest paradigm for building distributed systems Database and object sharing look the same But – Files are universal, fundamental systems concepts – standard interfaces, functionality Databases are not yet fundamental, built-in but there are only a few standards Objects even less so – still at the application level – so harder to implement efficient and universal caching, remote access, etc.

File sharing is probably the simplest paradigm for building distributed systems

Database and object sharing look the same

But –

Files are universal, fundamental systems concepts – standard interfaces, functionality

Databases are not yet fundamental, built-in but there are only a few standards

Objects even less so – still at the application level – so harder to implement efficient and universal caching, remote access, etc.

Client-server Examples Web browsing Online banking Order entry …… .. The functionality is divided between the two parts – for example exploit locality of data (e.g. perform searches, transformations on node where data resides) exploit different hardware capabilities (e.g. central supercomputer, graphics workstation) security concerns – restrict sensitive data to defined geographical locations (e.g. account queries) reliability concerns (e.g. perform database updates on highly reliable servers) Usually the server implements pre-defined, standardised functions client server request response

Examples

Web browsing

Online banking

Order entry

…… ..

The functionality is divided between the two parts – for example

exploit locality of data (e.g. perform searches, transformations on node where data resides)

exploit different hardware capabilities (e.g. central supercomputer, graphics workstation)

security concerns – restrict sensitive data to defined geographical locations (e.g. account queries)

reliability concerns (e.g. perform database updates on highly reliable servers)

Usually the server implements pre-defined, standardised functions

3-Tier client-server server client client client server client client client server client client client database server data extracts replicated on intermediate servers changes batched for asynchronous treatment by database server Enables - scaling up client query capacity isolation of main database

data extracts replicated on intermediate servers

changes batched for asynchronous treatment by database server

Enables -

scaling up client query capacity

isolation of main database

Peer-to-Peer - P2P Peer-to-Peer  decentralisation of function and control Taking advantage of the computational resources at the edge of the network The functions are shared between the distributed parts – without central control Programs to cooperate without being designed as a single application So P2P is just a democratic form of parallel programming - SETI The parallel HPC problems we have looked at, using MPI All the buzz of P2P is because new interfaces promise to bring this to the commercial world; allow different communities, businesses to collaborate through the internet XML SOAP . NET JXTA

Peer-to-Peer  decentralisation of function and control

Taking advantage of the computational resources at the edge of the network

The functions are shared between the distributed parts – without central control

Programs to cooperate without being designed as a single application

So P2P is just a democratic form of parallel programming -

SETI

The parallel HPC problems we have looked at, using MPI

All the buzz of P2P is because new interfaces promise to bring this to the commercial world; allow different communities, businesses to collaborate through the internet

XML

SOAP

. NET

JXTA

Simple Object Access Protocol - SOAP SOAP – simple, lightweight mechanism for exchanging objects between peers in a distributed environment using XML carried over HTTP SOAP consists of three parts: The SOAP envelope - what is in a message; who should deal with it, and whether it is optional or mandatory The SOAP encoding rules - serialisation definition for exchanging instances of application-defined datatypes. The SOAP Remote Procedure Call representation

SOAP – simple, lightweight mechanism for exchanging objects between peers in a distributed environment using XML carried over HTTP

SOAP consists of three parts:

The SOAP envelope - what is in a message; who should deal with it, and whether it is optional or mandatory

The SOAP encoding rules - serialisation definition for exchanging instances of application-defined datatypes.

The SOAP Remote Procedure Call representation

Microsoft’s .NET .NET is a framework, or environment for building, deploying and running Web services and other internet applications Common Language Runtime - C++, C#, Visual Basic and JScript Framework classes Aiming at a standard but Windows only

.NET is a framework, or environment for building, deploying and running Web services and other internet applications

Common Language Runtime - C++, C#, Visual Basic and JScript

Framework classes

Aiming at a standard but Windows only

JXTA Interoperability locating JXTA peers communication Platform, language and network independence Implementable on anything – phone – VCR - PDA – PC A set of protocols Security model Peer discovery Peer groups XML encoding http://www.jxta.org/project/www/docs/TechOverview.pdf

Interoperability

locating JXTA peers

communication

Platform, language and network independence

Implementable on anything –

phone – VCR - PDA – PC

A set of protocols

Security model

Peer discovery

Peer groups

XML encoding

End of Part 1 Tomorrow: HEP applications Offline computing for LHC Extending HTC to the Grid

HEP Applications

interactive physics analysis batch physics analysis detector event summary data raw data event reprocessing event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) processed data [email_address] CERN

HEP Computing Characteristics Large numbers of independent events - trivial parallelism – “job” granularity Modest floating point requirement - SPECint performance Large data sets - smallish records, mostly read-only Modest I/O rates - few MB/sec per fast processor Simulation cpu-intensive mostly static input data very low output data rate Reconstruction very modest I/O easy to partition input data easy to collect output data

Large numbers of independent events - trivial parallelism – “job” granularity

Modest floating point requirement - SPECint performance

Large data sets - smallish records, mostly read-only

Modest I/O rates - few MB/sec per fast processor

Simulation

cpu-intensive

mostly static input data

very low output data rate

Reconstruction

very modest I/O

easy to partition input data

easy to collect output data

Analysis ESD analysis modest I/O rates read only ESD BUT Very large input database Chaotic workload – unpredictable, no limit to the requirements AOD analysis potentially very high I/O rates but modest database

ESD analysis

modest I/O rates

read only ESD

BUT

Very large input database

Chaotic workload –

unpredictable, no limit to the requirements

AOD analysis

potentially very high I/O rates

but modest database

HEP Computing Characteristics Large numbers of independent events - trivial parallelism – “job” granularity Large data sets - smallish records, mostly read-only Modest I/O rates - few MB/sec per fast processor Modest floating point requirement - SPECint performance Chaotic workload – research environment  unpredictable, no limit to the requirements Very large aggregate requirements – computation, data Scaling up is not just big – it is also complex … and once you exceed the capabilities of a single geographical installation ………?

Large numbers of independent events - trivial parallelism – “job” granularity

Large data sets - smallish records, mostly read-only

Modest I/O rates - few MB/sec per fast processor

Modest floating point requirement - SPECint performance

Chaotic workload –

research environment  unpredictable, no limit to the requirements

Very large aggregate requirements – computation, data

Scaling up is not just big – it is also complex

… and once you exceed the capabilities of a single geographical installation ………?

Task Farming

Task Farming Decompose the data into large independent chunks Assign one task (or job) to each chunk Put all the tasks in a queue for a scheduler which manages a large “farm” of processors, each of which has access to all of the data The scheduler runs one or more jobs on each processor When a job finishes the next job in the queue is started Until all the jobs have been run Collect the output files

Decompose the data into large independent chunks

Assign one task (or job) to each chunk

Put all the tasks in a queue for a scheduler which manages a large “farm” of processors, each of which has access to all of the data

The scheduler runs one or more jobs on each processor

When a job finishes the next job in the queue is started

Until all the jobs have been run

Collect the output files

Task Farming Task farming is good for a very large problem Which has selectable granularity largely independent tasks loosely shared data HEP – -- Simulation -- Reconstruction -- and much of the Analysis

Task farming is good for

a very large problem

Which has

selectable granularity

largely independent tasks

loosely shared data

The SHIFT Software Model (1990) [email_address] From the application’s viewpoint – this is simply file sharing – all data available to all processes standard APIs – disk I/O; mass storage; job scheduler; can be implemented over an IP network mass storage model – tape data cached on disk (stager) physical implementation - transparent to the application/user scalable, heterogeneous flexible evolution – scalable capacity; multiple platforms; seamless integration of new technologies disk servers application servers stage (migration) servers tape servers queue servers IP network

Current Implementation of SHIFT racks of dual-cpu Linux PCs Linux PC controllers IDE disks Linux PC controllers Robots – STK Powderhorn Drives - STK 9840, STK 9940, IBM 3590 Ethernet 100BaseT, Gigabit mass storage application servers WAN data cache

Fermilab Reconstruction Farms 1991 – farms of RISC workstations introduced for reconstruction replaced special purpose processors (emulators, ACP) Ethernet network Integrated with tape systems cps – job scheduler, event manager

1991 – farms of RISC workstations introduced for reconstruction

replaced special purpose processors (emulators, ACP)

Ethernet network

Integrated with tape systems

cps – job scheduler, event manager

Condor – a hunter of unused cycles The hunter of idle workstations (1986) ClassAd Matchmaking users advertise their requirements systems advertise their capabilities & constraints Directed Acyclic Graph Manager – DAGman define dependencies between jobs Checkpoint – reschedule – restart if the owner of the workstation returns or if there is some failure Share data through files global shared files Condor file system calls Flocking interconnecting pools of Condor-content workstations http://www. cs . wisc . edu /condor/

The hunter of idle workstations (1986)

ClassAd Matchmaking

users advertise their requirements

systems advertise their capabilities & constraints

Directed Acyclic Graph Manager – DAGman

define dependencies between jobs

Checkpoint – reschedule – restart

if the owner of the workstation returns

or if there is some failure

Share data through files

global shared files

Condor file system calls

Flocking

interconnecting pools of Condor-content workstations

Layout of the Condor Pool = ClassAd Communication Pathway Central Manager master collector negotiator schedd startd = Process Spawned Desktop schedd startd master Desktop schedd startd master Cluster Node master startd Cluster Node master startd ondor C http://www.cs.wisc.edu/condor

How Flocking Works Add a line to your condor_config : FLOCK_HOSTS = Pool-Foo, Pool-Bar Schedd Collector Negotiator Central Manager (CONDOR_HOST ) Pool-Foo Central Manager Pool-Bar Central Manager Submit Machine Collector Negotiator Collector Negotiator ondor C http://www.cs.wisc.edu/condor

Add a line to your condor_config :

FLOCK_HOSTS = Pool-Foo, Pool-Bar

Friendly Condor Pool Home Condor Pool 600 Condor jobs ondor C http://www.cs.wisc.edu/condor

Finer grained HTC

The food chain in reverse – -- The PC has consumed the market for larger computers destroying the species -- There is no choice but to harness the PCs

Berkeley - Networks of Workstations (1994) Single system view Shared resources Virtual machine Single address space Global Layer Unix – GLUnix Serverless Network File Service – xFS Research project A Case for Networks of Workstations: NOW, IEEE Micro , Feb, 1995, Thomas E. Anderson, David E. Culler, David A. Patterson http://now. cs . berkeley . edu

Single system view

Shared resources

Virtual machine

Single address space

Global Layer Unix – GLUnix

Serverless Network File Service – xFS

Research project

Beowulf Nasa Goddard (Thomas Sterling, Donald Becker) - 1994 16 Intel PCs – Ethernet - Linux Caltech/JPL, Los Alamos Parallel applications from the Supercomputing community Oak Ridge – 1996 – The Stone SouperComputer problem – generate an eco-region map of the US, 1 km grid 64-way PC cluster proposal rejected re-cycle rejected desktop systems The experience, emphasis on do-it-yourself, packaging of some of the tools, and probably the name – stimulated wide-spread adoption of clusters in the super-computing world

Nasa Goddard (Thomas Sterling, Donald Becker) - 1994

16 Intel PCs – Ethernet - Linux

Caltech/JPL, Los Alamos

Parallel applications from the Supercomputing community

Oak Ridge – 1996 – The Stone SouperComputer

problem – generate an eco-region map of the US, 1 km grid

64-way PC cluster proposal rejected

re-cycle rejected desktop systems

The experience, emphasis on do-it-yourself, packaging of some of the tools, and probably the name – stimulated wide-spread adoption of clusters in the super-computing world

Parallel ROOT Facility - Proof ROOT object oriented analysis tool Queries are performed in parallel on an arbitrary number of processors Load balancing: Slaves receive work from Master process in “packets” Packet size is adapted to current load, number of slaves, etc. proof

ROOT object oriented analysis tool

Queries are performed in parallel on an arbitrary number of processors

Load balancing:

Slaves receive work from Master process in “packets”

Packet size is adapted to current load, number of slaves, etc.

LHC Computing

CERN's Users in the World Europe: 267 institutes, 4603 users Elsewhere: 208 institutes, 1632 users

The Large Hadron Collider Project 4 detectors CMS ATLAS LHC b Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs

Worldwide distributed computing system Small fraction of the analysis at CERN ESD analysis – using 12-20 large regional centres how to use the resources efficiently establishing and maintaining a uniform physics environment Data exchange – with tens of smaller regional centres, universities, labs

Worldwide distributed computing system

Small fraction of the analysis at CERN

ESD analysis – using 12-20 large regional centres

how to use the resources efficiently

establishing and maintaining a uniform physics environment

Data exchange – with tens of smaller regional centres, universities, labs

Planned capacity evolution at CERN Mass Storage Disk CPU LHC Other experiments LHC Other experiments Moore’s law

Are Grids a solution? The Grid – Ian Foster, Carl Kesselman – The Globus Project “ Dependable, consistent, pervasive access to [high-end] resources” Dependable: provides performance and functionality guarantees Consistent: uniform interfaces to a wide variety of resources Pervasive: ability to “plug in” from anywhere

The Grid – Ian Foster, Carl Kesselman – The Globus Project

“ Dependable, consistent, pervasive access to [high-end] resources”

Dependable:

provides performance and functionality guarantees

Consistent:

uniform interfaces to a wide variety of resources

Pervasive:

ability to “plug in” from anywhere

The Grid The GRID ubiquitous access to computation in the sense that the WEB provides ubiquitous access to information

Globus Architecture www.globus.org Applications Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus Gloperf High-level Services and Tools DUROC globusrun MPI Nimrod/G MPI-IO CC++ GlobusView Testbed Status GASS middleware Uniform application program interface to grid resources Grid infrastructure primitives Mapped to local implementations, architectures, policies Local Services LSF Condor MPI NQE Easy TCP Solaris Irix AIX UDP

The nodes of the Grid are managed by different people so have different access and usage policies and may have different architectures The geographical distribution means that there cannot be a central status status information and resource availability is “published” (remember Condor Classified Ads) Grid schedulers can only have an approximate view of resources The Grid Middleware tries to present this as a coherent virtual computing centre

The nodes of the Grid

are managed by different people

so have different access and usage policies

and may have different architectures

The geographical distribution

means that there cannot be a central status

status information and resource availability is “published” (remember Condor Classified Ads)

Grid schedulers can only have an approximate view of resources

The Grid Middleware tries to present this as a coherent virtual computing centre

Core Services Security Information Service Resource Management – Grid scheduler, standard resource allocation Remote Data Access – global namespace, caching, replication Performance and Status Monitoring Fault detection Error Recovery Management

Security

Information Service

Resource Management – Grid scheduler, standard resource allocation

Remote Data Access – global namespace, caching, replication

Performance and Status Monitoring

Fault detection

Error Recovery Management

The Promise of Grid Technology What does the Grid do for you? you submit your work and the Grid Finds convenient places for it to be run Optimises use of the widely dispersed resources Organises efficient access to your data Caching, migration, replication Deals with authentication to the different sites that you will be using Interfaces to local site resource allocation mechanisms, policies Runs your jobs Monitors progress Recovers from problems .. and .. Tells you when your work is complete

What does the Grid do for you?

you submit your work

and the Grid

Finds convenient places for it to be run

Optimises use of the widely dispersed resources

Organises efficient access to your data

Caching, migration, replication

Deals with authentication to the different sites that you will be using

Interfaces to local site resource allocation mechanisms, policies

Runs your jobs

Monitors progress

Recovers from problems

.. and .. Tells you when your work is complete

LHC Computing Model 2001 - evolving [email_address] The LHC Computing Centre The opportunity of Grid technology CMS ATLAS LHC b CERN Tier 0 Centre at CERN physics group regional group Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department    Desktop Germany Tier 1 USA UK France Italy ……… . CERN Tier 1 ……… . CERN Tier 0

Add a comment

Related pages

High-throughput computing - Wikipedia, the free encyclopedia

High-throughput computing ... This research is similar to transaction processing, but at a much larger and distributed scale. Some HTC systems, ...
Read more

HTCondor - High Throughput Computing

High Throughput Computing ... are strongly linked to computing throughput. ... Distributed ownership of computing resources is the major ...
Read more

What is the Open Science Grid? - Indiana University ...

The Open Science Grid (OSG) is a high-throughput distributed computing ... is a high-throughput distributed computing infrastructure designed to ...
Read more

What is throughput? - Definition from WhatIs.com

throughput definition. ... Throughput applies at higher levels of the IT infrastructure as well. ... Search Mobile Computing.
Read more

HTCondor - Wikipedia, the free encyclopedia

High-Throughput Computing: License: Apache License 2.0: Website: ... List of distributed computing projects; References External links. HTCondor ...
Read more

High Throughput Computing - University of Wisconsin–Madison

national High Throughput Computing infrastructure ... distributed computing infrastructure for advanced science and engineering [27].
Read more

A high-throughput bioinformatics distributed computing ...

A high-throughput bioinformatics distributed computing platform ... Recently distributed computing has emerged as an inexpensive alternative to ...
Read more

A high-throughput bioinformatics distributed computing ...

A high-throughput bioinformatics distributed computing platform ... area of distributed computing emerged as a viable alternative to dedicated parallel
Read more

Matchmaking: distributed resource management for high ...

... particularly those built to support high throughput computing. Obstacles include heterogeneity of ... High Performance Distributed Computing, 1998.
Read more