advertisement

Linking Programming models between Grids, Web 2.0 and Multicore

67 %
33 %
advertisement
Information about Linking Programming models between Grids, Web 2.0 and Multicore

Published on May 31, 2007

Author: Foxsden

Source: slideshare.net

Description

Distributed Programming Abstractions are discussed using Web Services, Grids, Parallel Computing and Multicore architectures
advertisement

Linking Programming models between Grids, Web 2.0 and Multicore Distributed Programming Abstractions Workshop NESC Edinburgh UK May 31 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email_address] http:// www.infomall.org

Points in Talk All parallel programming projects more or less fail All distributed programming projects report success There are several hundred in Grid workflow area alone Few constraints on distributed programming Composition (in distributed computing) v decomposition (in parallel computing) There is not much difference between distributed programming and a key paradigm of parallel computing (functional parallelism) Pervasive use of  64 core chips in the future will often require one to build a Grid on a chip i.e. to execute a traditional distributed application on a chip XML is a pretty dubious syntax for expressing programs Web 2.0 is pretty scruffy but there are some large companies and many users behind it. Web 2.0 and Grids will converge and features of both will survive or disappear in merged environment Web 2.0 has a more plausible approach to distributed programming than Web Services/Grids Dominant Distributed Programming models will support Multicore, Web 2.0 and Grids

All parallel programming projects more or less fail

All distributed programming projects report success

There are several hundred in Grid workflow area alone

Few constraints on distributed programming

Composition (in distributed computing) v decomposition (in parallel computing)

There is not much difference between distributed programming and a key paradigm of parallel computing (functional parallelism) Pervasive use of  64 core chips in the future will often require one to build a Grid on a chip i.e. to execute a traditional distributed application on a chip

XML is a pretty dubious syntax for expressing programs

Web 2.0 is pretty scruffy but there are some large companies and many users behind it.

Web 2.0 and Grids will converge and features of both will survive or disappear in merged environment

Web 2.0 has a more plausible approach to distributed programming than Web Services/Grids

Dominant Distributed Programming models will support Multicore, Web 2.0 and Grids

Some More points Services could be universal abstraction in parallel and distributed computing Whereas objects could not be universal so perhaps should move away from their use Gateways/Portals (Portlets, Widgets, Gadgets) are natural user (application usage) interface to a collection of services Important Data (SQL, WFS, RSS Feeds) abstractions Divide Parallel Programming Run-time (matching application structure) into 3 or 4 Broad classes Inter-entity communication time characteristic of different programming model 1-5 µs for MPI/Thread switching to 1-1000 milliseconds for services on the Grid and 25 µs for services inside a chip Multicore Commodity Programming Model Marine corps write libraries in “HLA++”, MPI or dynamic threads (internally one microsecond latency) expressed as services Service s composed/ mashup ed by “ millions ” Many composition or mashup approaches Functional (cf. Google Map Reduce for data transformations) Dataflow Workflow Visual Script The difficulties of making effective use of multicore chips will so great that it will be main driver of new programming environments Microsoft CCR DSS is good example of unification of parallel and distributed computing

Services could be universal abstraction in parallel and distributed computing

Whereas objects could not be universal so perhaps should move away from their use

Gateways/Portals (Portlets, Widgets, Gadgets) are natural user (application usage) interface to a collection of services

Important Data (SQL, WFS, RSS Feeds) abstractions

Divide Parallel Programming Run-time (matching application structure) into 3 or 4 Broad classes

Inter-entity communication time characteristic of different programming model

1-5 µs for MPI/Thread switching to 1-1000 milliseconds for services on the Grid and 25 µs for services inside a chip

Multicore Commodity Programming Model

Marine corps write libraries in “HLA++”, MPI or dynamic threads (internally one microsecond latency) expressed as services

Service s composed/ mashup ed by “ millions ”

Many composition or mashup approaches

Functional (cf. Google Map Reduce for data transformations)

Dataflow

Workflow

Visual

Script

The difficulties of making effective use of multicore chips will so great that it will be main driver of new programming environments

Microsoft CCR DSS is good example of unification of parallel and distributed computing

Some Details See http://www.slideshare.net/Foxsden or more conventionally Web 2.0 and Grid Tutorial http://grids.ucs.indiana.edu/ptliupages/presentations/CTSpartIMay21-07.ppt http://grids.ucs.indiana.edu/ptliupages/presentations/Web20Tutorial_CTS.ppt Multicore and Parallel Computing Tutorial http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/index.html “Web 2.0” citation site http://www.connotea.org/user/crmc

See http://www.slideshare.net/Foxsden or more conventionally

Web 2.0 and Grid Tutorial

http://grids.ucs.indiana.edu/ptliupages/presentations/CTSpartIMay21-07.ppt

http://grids.ucs.indiana.edu/ptliupages/presentations/Web20Tutorial_CTS.ppt

Multicore and Parallel Computing Tutorial

http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/index.html

“Web 2.0” citation site http://www.connotea.org/user/crmc

Web 2.0 and Web Services I Web Services have clearly defined protocols (SOAP) and a well defined mechanism (WSDL) to define service interfaces There is good .NET and Java support The so-called WS-* specifications provide a rich sophisticated but complicated standard set of capabilities for security, fault tolerance, meta-data, discovery, notification etc. “ Narrow Grids ” build on Web Services and provide a robust managed environment with growing adoption in Enterprise systems and distributed science (so called e-Science) Web 2.0 supports a similar architecture to Web services but has developed in a more chaotic but remarkably successful fashion with a service architecture with a variety of protocols including those of Web and Grid services Over 400 Interfaces defined at http:// www.programmableweb.com/apis Web 2.0 also has many well known capabilities with Google Maps and Amazon Compute/Storage services of clear general relevance There are also Web 2.0 services supporting novel collaboration modes and user interaction with the web as seen in social networking sites, portals, MySpace, YouTube,

Web Services have clearly defined protocols (SOAP) and a well defined mechanism (WSDL) to define service interfaces

There is good .NET and Java support

The so-called WS-* specifications provide a rich sophisticated but complicated standard set of capabilities for security, fault tolerance, meta-data, discovery, notification etc.

“ Narrow Grids ” build on Web Services and provide a robust managed environment with growing adoption in Enterprise systems and distributed science (so called e-Science)

Web 2.0 supports a similar architecture to Web services but has developed in a more chaotic but remarkably successful fashion with a service architecture with a variety of protocols including those of Web and Grid services

Over 400 Interfaces defined at http:// www.programmableweb.com/apis

Web 2.0 also has many well known capabilities with Google Maps and Amazon Compute/Storage services of clear general relevance

There are also Web 2.0 services supporting novel collaboration modes and user interaction with the web as seen in social networking sites, portals, MySpace, YouTube,

Web 2.0 and Web Services II I once thought Web Services were inevitable but this is no longer clear to me Web services are complicated , slow and non functional WS-Security is unnecessarily slow and pedantic (canonicalization of XML) WS-RM (Reliable Messaging) seems to have poor adoption and doesn’t work well in collaboration WSDM (distributed management) specifies a lot There are de facto standards like Google Maps and powerful suppliers like Google which “define the rules” One can easily combine SOAP (Web Service) based services/systems with HTTP messages but the “lowest common denominator” suggests additional structure/complexity of SOAP will not easily survive We can use the term Grids strictly as Narrow Grids that are collections of Web Services (or even more strictly OGSA Grids ) or just call any collections of services as “ Broad Grids ” which actually is quite often done

I once thought Web Services were inevitable but this is no longer clear to me

Web services are complicated , slow and non functional

WS-Security is unnecessarily slow and pedantic (canonicalization of XML)

WS-RM (Reliable Messaging) seems to have poor adoption and doesn’t work well in collaboration

WSDM (distributed management) specifies a lot

There are de facto standards like Google Maps and powerful suppliers like Google which “define the rules”

One can easily combine SOAP (Web Service) based services/systems with HTTP messages but the “lowest common denominator” suggests additional structure/complexity of SOAP will not easily survive

We can use the term Grids strictly as Narrow Grids that are collections of Web Services (or even more strictly OGSA Grids ) or just call any collections of services as “ Broad Grids ” which actually is quite often done

Attack of the Killer Multicores Today commodity Intel systems are sold with 8 cores spread over two processors Specialized chips such as GPU ’s and IBM Cell processor have substantially more cores Moore’s Law implies and will be satisfied by and imply exponentially increasing number of cores doubling every 1.5-3 Years Modest increase in clock speed Intel has already prototyped a 80 core Server chip ready in 2011? Huge activity in parallel computing programming (recycled from the past?) Some programming models and application styles similar to Grids We will have a Grid on a chip …………….

Today commodity Intel systems are sold with 8 cores spread over two processors

Specialized chips such as GPU ’s and IBM Cell processor have substantially more cores

Moore’s Law implies and will be satisfied by and imply exponentially increasing number of cores doubling every 1.5-3 Years

Modest increase in clock speed

Intel has already prototyped a 80 core Server chip ready in 2011?

Huge activity in parallel computing programming (recycled from the past?)

Some programming models and application styles similar to Grids

We will have a Grid on a chip …………….

Grids meet Multicore Systems The expected rapid growth in the number of cores per chip has important implications for Grids With 16-128 cores on a single commodity system 5 years from now one will both be able to build a Grid like application on a chip and indeed must build such an application to get the Moore’s law performance increase Otherwise you will “waste” cores ….. One will not want to reprogram as you move your application from a 64 node cluster or transcontinental implementation to a single chip Grid However multicore chips have a very different architecture from Grids Shared not Distributed Memory Latencies measured in microseconds not milliseconds Thus Grid and multicore technologies will need to “converge” and converged technology model will have different requirements from current Grid assumptions

The expected rapid growth in the number of cores per chip has important implications for Grids

With 16-128 cores on a single commodity system 5 years from now one will both be able to build a Grid like application on a chip and indeed must build such an application to get the Moore’s law performance increase

Otherwise you will “waste” cores …..

One will not want to reprogram as you move your application from a 64 node cluster or transcontinental implementation to a single chip Grid

However multicore chips have a very different architecture from Grids

Shared not Distributed Memory

Latencies measured in microseconds not milliseconds

Thus Grid and multicore technologies will need to “converge” and converged technology model will have different requirements from current Grid assumptions

Grid versus Multicore Applications It seems likely that future multicore applications will involve a loosely coupled mix of multiple modules that fall into three classes Data access/query/store Analysis and/or simulation User visualization and interaction This is precisely mix that Grids support but Grids of course involve distributed modules Grids and Web 2.0 use service oriented architectures to describe system at module level – is this appropriate model for multicore programming? Where do multicore systems get their data from ?

It seems likely that future multicore applications will involve a loosely coupled mix of multiple modules that fall into three classes

Data access/query/store

Analysis and/or simulation

User visualization and interaction

This is precisely mix that Grids support but Grids of course involve distributed modules

Grids and Web 2.0 use service oriented architectures to describe system at module level – is this appropriate model for multicore programming?

Where do multicore systems get their data from ?

What is …? What if …? Is it …? R ecognition M ining S ynthesis Create a model instance RMS: Recognition Mining Synthesis Model-based multimodal recognition Find a model instance Model Real-time analytics on dynamic, unstructured, multimodal datasets Photo-realism and physics-based animation Model-less Real-time streaming and transactions on static – structured datasets Very limited realism Intel has probably most sophisticated analysis of future “killer” multicore applications – they are “just” standard Grid and parallel computing Tomorrow Today

What is a tumor? Is there a tumor here? What if the tumor progresses? It is all about dealing efficiently with complex multimodal datasets R ecognition M ining S ynthesis Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html

Intel’s Application Stack PC07Intro [email_address]

Role of Data in Grid/Multicore I One typically is told to place compute ( analysis ) at the data but most of the computing power is in multicore clients on the edge These multicore clients can get data from the internet i.e. distributed sources This could be personal interests of client and used by client to help user interact with world It could be cached or copied It could be a standalone calculation or part of a distributed coordinated computation (SETI@Home) Or they could get data from set of local sensors (video-cams and environmental sensors) naturally stored on client or locally to client

One typically is told to place compute ( analysis ) at the data but most of the computing power is in multicore clients on the edge

These multicore clients can get data from the internet i.e. distributed sources

This could be personal interests of client and used by client to help user interact with world

It could be cached or copied

It could be a standalone calculation or part of a distributed coordinated computation (SETI@Home)

Or they could get data from set of local sensors (video-cams and environmental sensors) naturally stored on client or locally to client

Role of Data in Grid/Multicore Note that as you increase sophistication of data analysis, you increase ratio of compute to I/O Typical modern datamining approach like Support Vector Machine is sophisticated (dense) matrix algebra and not just text matching http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/PC07BYOPA.ppt Time complexity of Sophisticated data analysis will make it more attractive to fetch data from the Internet and cache/store on client It will also help with memory bandwidth problems in multicore chips In this vision, the Grid “just” acts as a source of data and the Grid application runs locally

Note that as you increase sophistication of data analysis, you increase ratio of compute to I/O

Typical modern datamining approach like Support Vector Machine is sophisticated (dense) matrix algebra and not just text matching

http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/PC07BYOPA.ppt

Time complexity of Sophisticated data analysis will make it more attractive to fetch data from the Internet and cache/store on client

It will also help with memory bandwidth problems in multicore chips

In this vision, the Grid “just” acts as a source of data and the Grid application runs locally

Multicore Programming Paradigms At a very high level, there are three or four broad classes of parallelism Coarse grain functional parallelism typified by workflow and often used to build composite “metaproblems” whose parts are also parallel “ Compute-File”, Database/Sensor, Community, Service, Pleasing Parallel (Master-worker) are sub-classses Large Scale loosely synchronous data parallelism where dynamic irregular work has clear synchronization points as in most large scale scientific and engineering problems Fine grain (asynchronous) thread parallelism as used in search algorithms which are often data parallel (over choices) but don’t have universal synchronization points Discrete Event Simulations are either a fourth class or a variant of thread parallelism PC07Intro [email_address]

At a very high level, there are three or four broad classes of parallelism

Coarse grain functional parallelism typified by workflow and often used to build composite “metaproblems” whose parts are also parallel

“ Compute-File”, Database/Sensor, Community, Service, Pleasing Parallel (Master-worker) are sub-classses

Large Scale loosely synchronous data parallelism where dynamic irregular work has clear synchronization points as in most large scale scientific and engineering problems

Fine grain (asynchronous) thread parallelism as used in search algorithms which are often data parallel (over choices) but don’t have universal synchronization points

Discrete Event Simulations are either a fourth class or a variant of thread parallelism

Data Parallel Time Dependence A simple form of data parallel applications are synchronous with all elements of the application space being evolved with essentially the same instructions Such applications are suitable for SIMD computers and run well on vector supercomputers (and GPUs but these are more general than just synchronous) However synchronous applications also run fine on MIMD machines SIMD CM-2 evolved to MIMD CM-5 with same data parallel language CMFortran The iterative solutions to Laplace’s equation are synchronous as are many full matrix algorithms Synchronization on MIMD machines is accomplished by messaging It is automatic on SIMD machines! Application Time Application Space Synchronous Identical evolution algorithms t 0 t 1 t 2 t 3 t 4

A simple form of data parallel applications are synchronous with all elements of the application space being evolved with essentially the same instructions

Such applications are suitable for SIMD computers and run well on vector supercomputers (and GPUs but these are more general than just synchronous)

However synchronous applications also run fine on MIMD machines

SIMD CM-2 evolved to MIMD CM-5 with same data parallel language CMFortran

The iterative solutions to Laplace’s equation are synchronous as are many full matrix algorithms

Local Messaging for Synchronization MPI_SENDRECV is typical primitive Processors do a send followed by a receive or a receive followed by a send In two stages (needed to avoid race conditions), one has a complete left shift Often follow by equivalent right shift, do get a complete exchange This logic guarantees correctly updated data is sent to processors that have their data at same simulation time ……… 8 Processors Application and Processor Time Application Space Communication Phase Compute Phase Communication Phase Compute Phase Communication Phase Compute Phase Communication Phase

MPI_SENDRECV is typical primitive

Processors do a send followed by a receive or a receive followed by a send

In two stages (needed to avoid race conditions), one has a complete left shift

Often follow by equivalent right shift, do get a complete exchange

This logic guarantees correctly updated data is sent to processors that have their data at same simulation time

Loosely Synchronous Applications This is most common large scale science and engineering and one has the traditional data parallelism but now each data point has in general a different update Comes from heterogeneity in problems that would be synchronous if homogeneous Time steps typically uniform but sometimes need to support variable time steps across application space – however ensure small time steps are  t = (t 1 -t 0 )/Integer so subspaces with finer time steps do synchronize with full domain The time synchronization via messaging is still valid However one no longer load balances (ensure each processor does equal work in each time step) by putting equal number of points in each processor Load balancing although NP complete is in practice surprisingly easy Distinct evolution algorithms for each data point in each processor Application Time Application Space t 0 t 1 t 2 t 3 t 4

This is most common large scale science and engineering and one has the traditional data parallelism but now each data point has in general a different update

Comes from heterogeneity in problems that would be synchronous if homogeneous

Time steps typically uniform but sometimes need to support variable time steps across application space – however ensure small time steps are  t = (t 1 -t 0 )/Integer so subspaces with finer time steps do synchronize with full domain

The time synchronization via messaging is still valid

However one no longer load balances (ensure each processor does equal work in each time step) by putting equal number of points in each processor

Load balancing although NP complete is in practice surprisingly easy

MPI Futures? MPI likely to become more important as multicore systems become more common Should use MPI when MPI needed and use other messaging for other cases (such as linking services) where different features/performance appropriate MPI has too many primitives which will handicap broad implementation/adoption Perhaps only have one collective primitive like CCR which allows general collective operations to be built by user

MPI likely to become more important as multicore systems become more common

Should use MPI when MPI needed and use other messaging for other cases (such as linking services) where different features/performance appropriate

MPI has too many primitives which will handicap broad implementation/adoption

Perhaps only have one collective primitive like CCR which allows general collective operations to be built by user

Fine Grain Dynamic Applications Here there is no natural universal ‘time’ as there is in science algorithms where an iteration number or Mother Nature’s time gives global synchronization Loose (zero) coupling or special features of application needed for successful parallelization In computer chess, the minimax scores at parent nodes provide multiple dynamic synchronization points Application Time Application Space Application Space Application Time

Here there is no natural universal ‘time’ as there is in science algorithms where an iteration number or Mother Nature’s time gives global synchronization

Loose (zero) coupling or special features of application needed for successful parallelization

In computer chess, the minimax scores at parent nodes provide multiple dynamic synchronization points

Computer Chess Thread level parallelism unlike position evaluation parallelism used in other systems Competed with poor reliability and results in 1987 and 1988 ACM Computer Chess Championships Increasing search depth

Thread level parallelism unlike position evaluation parallelism used in other systems

Competed with poor reliability and results in 1987 and 1988 ACM Computer Chess Championships

Discrete Event Simulations These are familiar in military and circuit (system) simulations when one uses macroscopic approximations Also probably paradigm of most multiplayer Internet games/worlds Note Nature is perhaps synchronous when viewed quantum mechanically in terms of uniform fundamental elements (quarks and gluons etc.) It is loosely synchronous when considered in terms of particles and mesh points It is asynchronous when viewed in terms of tanks, people, arrows etc. Circuit simulations can be done loosely synchronously but inefficient as many inactive elements Battle of Hastings

These are familiar in military and circuit (system) simulations when one uses macroscopic approximations

Also probably paradigm of most multiplayer Internet games/worlds

Note Nature is perhaps synchronous when viewed quantum mechanically in terms of uniform fundamental elements (quarks and gluons etc.)

It is loosely synchronous when considered in terms of particles and mesh points

It is asynchronous when viewed in terms of tanks, people, arrows etc.

Circuit simulations can be done loosely synchronously but inefficient as many inactive elements

Programming Models The three major models are supported by HPCS languages which are very interesting but too monolithic So the Fine grain thread parallelism and Large Scale loosely synchronous data parallelism styles are distinctive to parallel computing while Coarse grain functional parallelism of multicore overlaps with workflows from Grids and Mashups from Web 2.0 Seems plausible that a more uniform approach evolve for coarse grain case although this is least constrained of programming styles as typically latency issues are not critical Multicore would have strongest performance constraints Web 2.0 and Multicore the most important usability constraints A possible model for broad use of multicores is that the difficult parallel algorithms are coded as libraries ( Fine grain thread parallelism and Large Scale loosely synchronous data parallelism styles) while the general user uses composes with visual interfaces, scripting and systems like Google MapReduce

The three major models are supported by HPCS languages which are very interesting but too monolithic

So the Fine grain thread parallelism and Large Scale loosely synchronous data parallelism styles are distinctive to parallel computing while

Coarse grain functional parallelism of multicore overlaps with workflows from Grids and Mashups from Web 2.0

Seems plausible that a more uniform approach evolve for coarse grain case although this is least constrained of programming styles as typically latency issues are not critical

Multicore would have strongest performance constraints

Web 2.0 and Multicore the most important usability constraints

A possible model for broad use of multicores is that the difficult parallel algorithms are coded as libraries ( Fine grain thread parallelism and Large Scale loosely synchronous data parallelism styles) while the general user uses composes with visual interfaces, scripting and systems like Google MapReduce

Google MapReduce Simplified Data Processing on Large Clusters http:// labs.google.com/papers/mapreduce.html This is a dataflow model between services where services can do useful document oriented data parallel applications including reductions The decomposition of services onto cluster engines is automated The large I/O requirements of datasets changes efficiency analysis in favor of dataflow Services (count words in example) can obviously be extended to general parallel applications There are many alternatives to language expressing either dataflow and/or parallel operations and indeed one should support multiple languages in spirit of services PC07Intro [email_address]

http:// labs.google.com/papers/mapreduce.html

This is a dataflow model between services where services can do useful document oriented data parallel applications including reductions

The decomposition of services onto cluster engines is automated

The large I/O requirements of datasets changes efficiency analysis in favor of dataflow

Services (count words in example) can obviously be extended to general parallel applications

There are many alternatives to language expressing either dataflow and/or parallel operations and indeed one should support multiple languages in spirit of services

Programming Models The services and objects in distributed computing are usually “natural” (come from application) whereas parts connected by MPI (or created by parallelizing compiler) come from “artificial” decompositions and not naturally considered services Services in multicore (parallel computing) are original modules before decomposition and its these modules that coarse grain functional parallelism addresses Most of “ difficult ” issues in parallel computing concern treatment of decomposition

The services and objects in distributed computing are usually “natural” (come from application) whereas parts connected by MPI (or created by parallelizing compiler) come from “artificial” decompositions and not naturally considered services

Services in multicore (parallel computing) are original modules before decomposition and its these modules that coarse grain functional parallelism addresses

Most of “ difficult ” issues in parallel computing concern treatment of decomposition

Parallel Software Paradigms: Top Level In the conventional two-level Grid/Web Service programming model , one programs each individual service and then separately programs their interaction This is Grid-aware Services programming model SAGA supports Grid-aware programs? This is generalized to multicore with “Marine Corps” programming services for “difficult” cases Loosely Synchronous Fine Grain threading Discrete Event Simulation “Average” Programmer produces mashups or workflows from these parallelized services

In the conventional two-level Grid/Web Service programming model , one programs each individual service and then separately programs their interaction

This is Grid-aware Services programming model

SAGA supports Grid-aware programs?

This is generalized to multicore with “Marine Corps” programming services for “difficult” cases

Loosely Synchronous

Fine Grain threading

Discrete Event Simulation

“Average” Programmer produces mashups or workflows from these parallelized services

The Marine Corps Lack of Programming Paradigm Library Model One could assume that parallel computing is “just too hard for real people” and assume that we use a Marine Corps of programmers to build as libraries excellent parallel implementations of “all” core capabilities e.g. the primitives identified in the Intel application analysis e.g. the primitives supported in Google MapReduce , HPF , PeakStream , Microsoft Data Parallel .NET etc. These primitives are orchestrated (linked together) by overall frameworks such as workflow or mashups The Marine Corps probably is content with efficient rather than easy to use programming models

One could assume that parallel computing is “just too hard for real people” and assume that we use a Marine Corps of programmers to build as libraries excellent parallel implementations of “all” core capabilities

e.g. the primitives identified in the Intel application analysis

e.g. the primitives supported in Google MapReduce , HPF , PeakStream , Microsoft Data Parallel .NET etc.

These primitives are orchestrated (linked together) by overall frameworks such as workflow or mashups

The Marine Corps probably is content with efficient rather than easy to use programming models

Component Parallel and Program Parallel Component parallel paradigm is where one explicitly programs the different parts of a parallel application with the linkage either specified externally as in workflow or in components themselves as in most other component parallel approaches In Grids, components are natural In Parallel computing, components are produced by decomposition In the program parallel paradigm, one writes a single program to describe the whole application and some combination of compiler and runtime breaks up the program into the multiple parts that execute in parallel Note that a program parallel approach will often call a built in runtime library written in component parallel fashion A parallelizing compiler could call an MPI library routine Could perhaps better call “ Program Parallel ” as “ Implicitly Parallel ” and “ Component Parallel ” as “ Explicitly Parallel ”

Component parallel paradigm is where one explicitly programs the different parts of a parallel application with the linkage either specified externally as in workflow or in components themselves as in most other component parallel approaches

In Grids, components are natural

In Parallel computing, components are produced by decomposition

In the program parallel paradigm, one writes a single program to describe the whole application and some combination of compiler and runtime breaks up the program into the multiple parts that execute in parallel

Note that a program parallel approach will often call a built in runtime library written in component parallel fashion

A parallelizing compiler could call an MPI library routine

Could perhaps better call “ Program Parallel ” as “ Implicitly Parallel ” and “ Component Parallel ” as “ Explicitly Parallel ”

Component Parallel and Program Parallel Program Parallel approaches include Data structure parallel as in Google MapReduce , HPF (High Performance Fortran), HPCS (High-Productivity Computing Systems) or “SIMD” co-processor languages ( PeakStream , ClearSpeed and Microsoft Data Parallel .NET) Parallelizing compilers including OpenMP annotation Note OpenMP and HPF have failed in some sense for large scale parallel computing (writing algorithm in standard sequential languages throws away information needed for parallelization) Component Parallel approaches include MPI (and related systems like PVM ) parallel message passing PGAS (Partitioned Global Address Space CAF, UPC, Titanium, HPJava ) C++ futures and active objects CSP … Microsoft CCR and DSS Workflow and Mashups Discrete Event Simulation

Program Parallel approaches include

Data structure parallel as in Google MapReduce , HPF (High Performance Fortran), HPCS (High-Productivity Computing Systems) or “SIMD” co-processor languages ( PeakStream , ClearSpeed and Microsoft Data Parallel .NET)

Parallelizing compilers including OpenMP annotation

Note OpenMP and HPF have failed in some sense for large scale parallel computing (writing algorithm in standard sequential languages throws away information needed for parallelization)

Component Parallel approaches include

MPI (and related systems like PVM ) parallel message passing

PGAS (Partitioned Global Address Space CAF, UPC, Titanium, HPJava )

C++ futures and active objects

CSP … Microsoft CCR and DSS

Workflow and Mashups

Discrete Event Simulation

Why people like MPI! Jason J Beech-Brandt, and Andrew A. Johnson, at AHPCRC Minneapolis BenchC is unstructured finite element CFD Solver Looked at OpenMP on shared memory Altix with some effort to optimize Optimized UPC on several machines MPI always good but other approaches erratic Other studies reach similar conclusions? cluster After Optimization of UPC cluster

Jason J Beech-Brandt, and Andrew A. Johnson, at AHPCRC Minneapolis

BenchC is unstructured finite element CFD Solver

Looked at OpenMP on shared memory Altix with some effort to optimize

Optimized UPC on several machines

MPI always good but other approaches erratic

Other studies reach similar conclusions?

Web 2.0 Systems are Portals, Services, Resources Captures the incredible development of interactive Web sites enabling people to create and collaborate The world does itself in large numbers!

Captures the incredible development of interactive Web sites enabling people to create and collaborate

Mashups v Workflow? Mashup Tools are reviewed at http:// blogs.zdnet.com/Hinchcliffe/?p =63 Workflow Tools are reviewed by Gannon and Fox http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf Both include scripting in PHP, Python, sh etc. as both implement distributed programming at level of services Mashups use all types of service interfaces and do not have the potential robustness (security) of Grid service approach Typically “pure” HTTP ( REST )

Mashup Tools are reviewed at http:// blogs.zdnet.com/Hinchcliffe/?p =63

Workflow Tools are reviewed by Gannon and Fox http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf

Both include scripting in PHP, Python, sh etc. as both implement distributed programming at level of services

Mashups use all types of service interfaces and do not have the potential robustness (security) of Grid service approach

Typically “pure” HTTP ( REST )

Web 2.0 APIs http://www.programmableweb.com/apis has (May 14 2007) 431 Web 2.0 APIs with GoogleMaps the most often used in Mashups This site acts as a “ UDDI ” for Web 2.0

http://www.programmableweb.com/apis has (May 14 2007) 431 Web 2.0 APIs with GoogleMaps the most often used in Mashups

This site acts as a “ UDDI ” for Web 2.0

The List of Web 2.0 API’s Each site has API and its features Divided into broad categories Only a few used a lot ( 42 API’s used in more than 10 mashups ) RSS feed of new APIs Amazon S3 growing in popularity

Each site has API and its features

Divided into broad categories

Only a few used a lot ( 42 API’s used in more than 10 mashups )

RSS feed of new APIs

Amazon S3 growing in popularity

APIs/Mashups per Protocol Distribution Number of Mashups Number of APIs REST SOAP XML-RPC REST, XML-RPC REST, XML-RPC, SOAP REST, SOAP JS Other google maps netvibes live.com virtual earth google search amazon S3 amazon ECS flickr ebay youtube 411sync del.icio.us yahoo! search yahoo! geocoding technorati yahoo! images trynt yahoo! local

4 more Mashups each day For a total of 1906 April 17 2007 (4.0 a day over last month) Note ClearForest runs Semantic Web Services Mashup competitions (not workflow competitions) Some Mashup types : aggregators, search aggregators, visualizers, mobile, maps, games Growing number of commercial Mashup Tools

For a total of 1906 April 17 2007 (4.0 a day over last month)

Note ClearForest runs Semantic Web Services Mashup competitions (not workflow competitions)

Some Mashup types : aggregators, search aggregators, visualizers, mobile, maps, games

Implication for Grid Technology of Multicore and Web 2.0 I Web 2.0 and Grids are addressing a similar application class although Web 2.0 has focused on user interactions So technology has similar requirements Multicore differs significantly from Grids in component location and this seems particularly significant for data Not clear therefore how similar applications will be Intel RMS multicore application class pretty similar to Grids Multicore has more stringent software requirements than Grids as latter has intrinsic network overhead

Web 2.0 and Grids are addressing a similar application class although Web 2.0 has focused on user interactions

So technology has similar requirements

Multicore differs significantly from Grids in component location and this seems particularly significant for data

Not clear therefore how similar applications will be

Intel RMS multicore application class pretty similar to Grids

Multicore has more stringent software requirements than Grids as latter has intrinsic network overhead

Implication for Grid Technology of Multicore and Web 2.0 II Multicore chips require low overhead protocols to exploit low latency that suggests simplicity We need to simplify MPI AND Grids! Web 2.0 chooses simplicity (REST rather than SOAP) to lower barrier to everyone participating Web 2.0 and Multicore tend to use traditional (possibly visual) (scripting) languages for equivalent of workflow whereas Grids use visual interface backend recorded in BPEL Google MapReduce illustrates a popular Web 2.0 and Multicore approach to dataflow

Multicore chips require low overhead protocols to exploit low latency that suggests simplicity

We need to simplify MPI AND Grids!

Web 2.0 chooses simplicity (REST rather than SOAP) to lower barrier to everyone participating

Web 2.0 and Multicore tend to use traditional (possibly visual) (scripting) languages for equivalent of workflow whereas Grids use visual interface backend recorded in BPEL

Google MapReduce illustrates a popular Web 2.0 and Multicore approach to dataflow

Implication for Grid Technology of Multicore and Web 2.0 III Web 2.0 and Grids both use SOA Service Oriented Architectures Seems likely that Multicore will also adopt although a more conventional object oriented approach also possible Services should help multicore applications integrate modules from different sources Multicore will use fine grain objects but coarse grain services “ System of Systems”: Grids, Web 2.0 and Multicore are likely to build systems hierarchically out of smaller systems We need to support Grids of Grids, Webs of Grids, Grids of Multicores etc. i.e. systems of systems of all sorts

Web 2.0 and Grids both use SOA Service Oriented Architectures

Seems likely that Multicore will also adopt although a more conventional object oriented approach also possible

Services should help multicore applications integrate modules from different sources

Multicore will use fine grain objects but coarse grain services

“ System of Systems”: Grids, Web 2.0 and Multicore are likely to build systems hierarchically out of smaller systems

We need to support Grids of Grids, Webs of Grids, Grids of Multicores etc. i.e. systems of systems of all sorts

The Ten areas covered by the 60 core WS-* Specifications WSRP (Remote Portlets) 10: Portals and User Interfaces WS-Policy, WS-Agreement 9: Policy and Agreements WSDM, WS-Management, WS-Transfer 8: Management WSRF, WS-MetadataExchange, WS-Context 7: System Metadata and State UDDI, WS-Discovery 6: Service Discovery WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation 5: Security BPEL, WS-Choreography, WS-Coordination 4: Workflow and Transactions WS-Notification, WS-Eventing (Publish-Subscribe) 3: Notification WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM 2: Service Internet XML, WSDL, SOAP 1: Core Service Model Typical Grid/Web Service Examples WS-* Specification Area

WS-* Areas and Web 2.0 Start Pages, AJAX and Widgets(Netvibes) Gadgets 10: Portals and User Interfaces Service dependent. Processed by application 9: Policy and Agreements WS-Transfer style Protocols GET PUT etc. 8: Management==Interaction Processed by application – no system state – Microformats are a universal metadata approach 7: System Metadata and State http://www.programmableweb.com 6: Service Discovery SSL, HTTP Authentication/Authorization, OpenID is Web 2.0 Single Sign on 5: Security Mashups, Google MapReduce Scripting with PHP JavaScript …. 4: Workflow and Transactions (no Transactions in Web 2.0) Hard with HTTP without polling – JMS perhaps? 3: Notification No special QoS. Use JMS or equivalent? 2: Service Internet XML becomes optional but still useful SOAP becomes JSON RSS ATOM WSDL becomes REST with API as GET PUT etc. Axis becomes XmlHttpRequest 1: Core Service Model Web 2.0 Approach WS-* Specification Area

WS-* Areas and Multicore Web 2.0 technology popular 10: Portals and User Interfaces Handled by application 9: Policy and Agreements Interaction between objects key issue in parallel programming trading off efficiency versus performance 8: Management == Interaction Environment Variables 7: System Metadata and State Use libraries 6: Service Discovery Not so important intrachip 5: Security Many approaches; scripting languages popular 4: Workflow and Transactions Publish-Subscribe for events and Interrupts 3: Notification Not so important intrachip 2: Service Internet Fine grain Java C# C++ Objects and coarse grain services as in DSS . Information passed explicitly or by handles. MPI needs to be updated to handle non scientific applications as in CCR 1: Core Service Model Typical Grid/Web Service Examples WS-* Specification Area

CCR as an example of a Cross Paradigm Run Time Naturally supports fine grain thread switching with message passing with around 4 microsecond latency for 4 threads switching to 4 others on an AMD PC with C#. Threads spawned – no rendezvous Has around 50 microsecond latency for coarse grain service interactions with DSS extension which supports Web 2.0 style messaging MPI Collectives – Shift and Exchange vary from 10 to 20 microsecond latency in rendezvous mode Not as good as best MPI’s but managed code and supports Grids Web 2.0 and Parallel Computing …… See http://grids.ucs.indiana.edu/ptliupages/publications/CCRApril16open.pdf

Naturally supports fine grain thread switching with message passing with around 4 microsecond latency for 4 threads switching to 4 others on an AMD PC with C#. Threads spawned – no rendezvous

Has around 50 microsecond latency for coarse grain service interactions with DSS extension which supports Web 2.0 style messaging

MPI Collectives – Shift and Exchange vary from 10 to 20 microsecond latency in rendezvous mode

Not as good as best MPI’s but managed code and supports Grids Web 2.0 and Parallel Computing ……

See http://grids.ucs.indiana.edu/ptliupages/publications/CCRApril16open.pdf

Microsoft CCR Supports exchange of messages between threads using named ports FromHandler: Spawn threads without reading ports Receive: Each handler reads one item from a single port MultipleItemReceive: Each handler reads a prescribed number of items of a given type from a given port. Note items in a port can be general structures but all must have same type. MultiplePortReceive: Each handler reads a one item of a given type from multiple ports. JoinedReceive: Each handler reads one item from each of two ports. The items can be of different type. Choice: Execute a choice of two or more port-handler pairings Interleave: Consists of a set of arbiters (port -- handler pairs) of 3 types that are Concurrent, Exclusive or Teardown (called at end for clean up). Concurrent arbiters are run concurrently but exclusive handlers are http:// msdn.microsoft.com /robotics/ PC07Intro [email_address]

Supports exchange of messages between threads using named ports

FromHandler: Spawn threads without reading ports

Receive: Each handler reads one item from a single port

MultipleItemReceive: Each handler reads a prescribed number of items of a given type from a given port. Note items in a port can be general structures but all must have same type.

MultiplePortReceive: Each handler reads a one item of a given type from multiple ports.

JoinedReceive: Each handler reads one item from each of two ports. The items can be of different type.

Choice: Execute a choice of two or more port-handler pairings

Interleave: Consists of a set of arbiters (port -- handler pairs) of 3 types that are Concurrent, Exclusive or Teardown (called at end for clean up). Concurrent arbiters are run concurrently but exclusive handlers are

http:// msdn.microsoft.com /robotics/

Overhead (latency) of AMD 4-core PC with 4 execution threads on MPI style Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as custom CCR pattern. Compute time is 10 seconds divided by number of stages Stages (millions) Time Microseconds Rendezvous exchange as two shifts Rendezvous exchange customized for MPI Rendezvous Shift

Overhead (latency) of INTEL 8-core PC with 8 execution threads on MPI style Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as custom CCR pattern. Compute time is 15 seconds divided by number of stages Stages (millions) Time Microseconds Rendezvous exchange as two shifts Rendezvous exchange customized for MPI Rendezvous Shift

Timing of HP Opteron Multicore as a function of number of simultaneous two-way service messages processed (November 2006 DSS Release) CGL Measurements of Axis 2 shows about 500 microseconds – DSS is 10 times better PC07Intro [email_address] DSS Service Measurements

CGL Measurements of Axis 2 shows about 500 microseconds – DSS is 10 times better

Add a comment

Related pages

PPT - Linking Programming models between Grids, Web 2.0 ...

Linking Programming models between Grids, Web 2.0 and Multicore PowerPoint PPT Presentation
Read more

Grids Challenged by a Web 2.0 and Multicore Sandwich

Grids Challenged by a Web 2.0 and Multicore ... Grid goals of linking the world's ... programming model is common for Grids with services ...
Read more

The JSON Saga | Many PPT

The JSON Saga I Discovered JSON . I ... Linking Programming models between Grids, Web 2.0 and Multicore SAGA ... CTS2007 Web 2.0 in a Web Services and Grid ...
Read more

Parallel Execution Models for Future Multicore ...

Linking Programming models between Grids, Web 2.0 ... A View from Berkeley 2.0 How ... OOO execution Multicore architectures & programming models ...
Read more

Grids challenged by a Web 2.0 and multicore sandwich ...

... security models. 4.2. Parallel Programming 2.0 Web 2.0 can ... multicore problem); between ... Grids, Web 2.0, and Multicore are likely ...
Read more

5. Security Paradigms and Pervasive Trust Paradigm | Many PPT

Security Paradigms and ... Linking Programming models between Grids, Web 2.0 and Multicore There is not much difference between distributed programming and ...
Read more

Parallel Data Mining from Multicore to Cloudy Grids

Parallel Data Mining from Multicore to Cloudy Grids . ... programming model with other implementations ... Grids Challenged by a Web 2.0 and Multicore ...
Read more

PPT – Web 2.0, Grids and Parallel Computing PowerPoint ...

Web 2.0, Grids and Parallel Computing. Description: ...
Read more