advertisement

exercise results

44 %
56 %
advertisement
Information about exercise results
Entertainment

Published on January 16, 2008

Author: Diana

Source: authorstream.com

advertisement

Slide1:  /*----------------------*/ /* Parallel hello world */ /*----------------------*/ #include <stdio.h> #include <math.h> #include <mpi.h> Int main(int argc, char * argv[]) { int taskid, ntasks; double pi; /*------------------------------------*/ /* establish the parallel environment */ /*------------------------------------*/ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &taskid); MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /*------------------------------------*/ /* say hello from each MPI task */ /*------------------------------------*/ printf("Hello from task %d.\n", taskid); if (taskid == 0) pi = 4.0*atan(1.0); else pi = 0.0; /*------------------------------------*/ /* do a broadcast from node 0 to all */ /*------------------------------------*/ MPI_Bcast(&pi, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); printf("node %d: pi = %.10lf\n", taskid, pi); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return(0); } Slide2:  Hello from task 0. node 0: pi = 3.1415926536 Hello from task 1. Hello from task 2. Hello from task 3. node 1: pi = 3.1415926536 node 2: pi = 3.1415926536 node 3: pi = 3.1415926536 OUTPUT FROM hello.c on 4 processors 1. Why is the order messed up? 2. What would you do to fix it? Slide3:  Answer: The control flow on different processors is not ordered – they all run their own copy of the executable independently. Thus, when each writes output it does so independently of the others – which makes the output unordered. 2. To fix it : export MP_STDOUTMODE=ordered Then the output will look like the following: Hello from task 0. node 0: pi = 3.1415926536 Hello from task 1. node 1: pi = 3.1415926536 Hello from task 2. node 2: pi = 3.1415926536 Hello from task 3. node 3: pi = 3.1415926536 Slide4:  msglen = 32000 bytes, elapsed time = 0.3494 msec msglen = 40000 bytes, elapsed time = 0.4000 msec msglen = 48000 bytes, elapsed time = 0.4346 msec msglen = 56000 bytes, elapsed time = 0.4490 msec msglen = 64000 bytes, elapsed time = 0.5072 msec msglen = 72000 bytes, elapsed time = 0.5504 msec msglen = 80000 bytes, elapsed time = 0.5503 msec msglen = 100000 bytes, elapsed time = 0.6499 msec msglen = 120000 bytes, elapsed time = 0.7484 msec msglen = 140000 bytes, elapsed time = 0.8392 msec msglen = 160000 bytes, elapsed time = 0.9485 msec msglen = 240000 bytes, elapsed time = 1.2639 msec msglen = 320000 bytes, elapsed time = 1.5975 msec msglen = 400000 bytes, elapsed time = 1.9967 msec msglen = 480000 bytes, elapsed time = 2.3739 msec msglen = 560000 bytes, elapsed time = 2.7295 msec msglen = 640000 bytes, elapsed time = 3.0754 msec msglen = 720000 bytes, elapsed time = 3.4746 msec msglen = 800000 bytes, elapsed time = 3.7441 msec msglen = 1000000 bytes, elapsed time = 4.6994 msec latency = 50.0 microseconds bandwidth = 212.79 MBytes/sec (approximate values for MPI_Isend/MPI_Irecv/MPI_Waitall) 3. How do you find the Bandwidth and Latency from this data? Slide5:  Y = X/B + L : B = Bandwidth (Bytes/sec), L = Latency 5. Monte Carlo to Compute π :  5. Monte Carlo to Compute π Main Idea Consider unit Square with Embedded Circle Generate Random Points inside Square Out of N trials, m points are inside circle Then π ~ 4m/N Error ~ 1/N Simple to Parallelize Moldeling Method::  Moldeling Method: 0 1 1 0 THROW MANY DARTS FRACTION INSIDE CIRCLE = π/4 Slide8:  MPI PROGRAM DEFINES WORKING NODES Slide9:  EACH NODE COMPUTES ESTIMATE OF PI INDEPENDENTLY Slide10:  NODE 0 COMPUTES AVERAGES AND WRITES OUTPUT Slide11:  #include <stdio.h> #include <math.h> #include <mpi.h> #include "MersenneTwister.h" void mcpi(int, int, int); int monte_carlo(int, int); //========================================= // Main Routine //========================================= int main(int argc, char * argv[]) { int ntasks, taskid, nworkers; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &ntasks); MPI_Comm_rank(MPI_COMM_WORLD, &taskid); if (taskid == 0) { printf(" #cpus #trials pi(est) err(est) err(abs) time(s) Mtrials/s\n");} /*--------------------------------------------------*/ /* do monte-carlo with a variable number of workers */ /*--------------------------------------------------*/ for (nworkers=ntasks; nworkers>=1; nworkers = nworkers/2) { mcpi(nworkers, taskid, ntasks);} MPI_Finalize(); return 0;} Slide12:  //============================================================ // Routine to split tasks into groups and distribute the work //============================================================ void mcpi(int nworkers, int taskid, int ntasks) { MPI_Comm comm; int worker, my_hits, total_hits, my_trials; int total_trials = 64000000; double tbeg, tend, elapsed, rate; double pi_estimate, est_error, abs_error; /*---------------------------------------------*/ /* make a group consisting of just the workers */ /*---------------------------------------------*/ if (taskid < nworkers) worker = 1; else worker = 0; MPI_Comm_split(MPI_COMM_WORLD, worker, taskid, &comm); Slide13:  if (worker) { /*------------------------------------------*/ /* divide the work among all of the workers */ my_trials = total_trials / nworkers; MPI_Barrier(comm); tbeg = MPI_Wtime(); /* each worker gets a unique seed, and works independently */ my_hits = monte_carlo(taskid, my_trials); /* add the hits from each worker to get total_hits */ MPI_Reduce(&my_hits, &total_hits, 1, MPI_INT, MPI_SUM, 0, comm); tend = MPI_Wtime(); elapsed = tend - tbeg; rate = 1.0e-6*double(total_trials)/elapsed; /* report the results including elapsed times and rates */ if (taskid == 0) { pi_estimate = 4.0*double(total_hits)/double(total_trials); est_error = pi_estimate/sqrt(double(total_hits)); abs_error = fabs(M_PI - pi_estimate); printf("%6d %9d %9.5lf %9.5lf %9.5lf %8.3lf %9.2lf\n", nworkers, total_trials, pi_estimate, est_error, abs_error, elapsed, rate); } } MPI_Barrier(MPI_COMM_WORLD); } Slide14:  //========================================= // Monte Carlo worker routine: return hits //========================================= int monte_carlo(int taskid, int trials) { int hits = 0; int xseed, yseed; double xr, yr; xseed = 1 * (taskid + 1); yseed = 1357 * (taskid + 1); MTRand xrandom( xseed ); MTRand yrandom( yseed ); for (int t=0; t<trials; t++) { xr = xrandom(); yr = yrandom(); if ( (xr*xr + yr*yr) < 1.0 ) hits++; } return hits; } SHOW DEMO:  SHOW DEMO Slide18:  Exact = 3.1415926…. Solutions to Some problems:  Solutions to Some problems Homework:  Homework Consider a set S of n variables. Variable j communicates with j+1 and j-1 with periodic boundary conditions. Write a code that maps these variables to a line of points in random order. Define an appropriate cost function to measure the communication time for such a map. Write a routine that swaps the variables between the points on the linne using SA or some other method to decrease the cost function. Can you find the minimum of your function? What application does this remind you of? Slide21:  Define x(N), M(N) Initialize map M(i) to a random value in 1:N without repetition. Let H(i,j) = min(abs(i-j),N-abs( i-j)), C(i,j) = delta(mod(i-2+N,N)+1,j)+delta(mod(I,N)+1,j) F = sum_i,j C(i,j)H(M(i),M(j))/2N Code:Define parameters N-b-values,N-at-fixed-b Loop over N-b-values Define b (vary from 0 - 20) Compute F for map M (say F0) Loop over i=1,N-at-fixed-b Pick two nodes at random (say j, k) Swap map(j) and map(k) in map M to define a new map MP Compute F for new map MP (say F1), generate random number R in (0,1) If(R<exp[-(F1-F0)*b]) then M  MP, F0F Print b,F0 Endloop over N-at-fixed-b Endloop over N-b-value Minimum value of F should be 1 What application does this remind you of? Sorting Generic Parallelization Problem:  Generic Parallelization Problem N = K x L x M variables, P = xyz processors Processor has KLM/xyz variables. Each variable does F flops before communicating B bytes from each face to the near neighbor processor on the processor grid. Processing speed is f flops/s, compute efficiency c. Computations and communciations do not overlap. Latency and bandwidth are d and b resp.. Communication efficiency is e, a. Write a formula for the time between communication events. b. Write a formula for the time between computation events. c. Let K=L=M=N^(1/3) and x=y=z=p^(1/3). Write a formula for the efficiency E = [time to compute]/[time to compute + time to communicate]. Explore this as a function of F, B, P and N. Use c~e~1. To make this relevant to BG/L use, d = 10 microsec, b = 1.4Gb/s , f = 2.8GFlops/s. Solution to Generic Parallelization Problem:  Solution to Generic Parallelization Problem A_compute = Amount of computation = NF/P t_compute = A_compute/(fc) = (NF/(Pcf) Amount of data communicated by each processor = A_communicate = B[(KL/(xy) + LM/(yz) + MK/(zx)] t_communicate = 6d + A_communicate/(be) = 6d + (B/(be))[KL/(xy) + LM/(yz) + MK/(zx)]  6d + 3(B/(be))[N/P]^(2/3) for symmetric case Slide24:  E = 1/[1 + 6d(fc/F)x + 3(c/e)*(fB/Fb)x^(1/3)] x = P/N Weak Scaling: x = constant as P∞ Latency dominates when second term in denominator is biggest. x > 5e(-6) (F/c) and x > 5e(-6) (B/e)^(3/2) using BG/L parameters a. F = B = 1 (transaction processing) latency bound for N/P < 200,000 b. F = 100, B = 1 latency bound if N/P < 2000 c. F = 1000, B = 100 latency bound if N/P < 200 Slide25:  c = e = 0.5; b = 1.4Gb/s, f = 2.8 GF/s – BG/L params E = 1/[1 + 6d(fc/F)[P/N] + 3(c/e)*(fB/Fb)[P/N]^(1/3)] Few Nodes Many Variables Nodes = Variables F=1 F=10 F=100 Slide26:  c = e = 0.5; b = 1.4Gb/s, f = 2.8 GF/s – BG/L params E = 1/[1 + 6d(fc/F)[P/N] + 3(c/e)*(fB/Fb)[P/N]^(1/3)] Few Nodes Many Variables Nodes = Variables F=1000 F=10 F=100 Slide27:  c = e = 0.5; b = 1.4Gb/s, f = 2.8 GF/s – BG/L params E = 1/[1 + 6d(fc/F)[P/N] + 3(c/e)*(fB/Fb)[P/N]^(1/3)] Few Nodes Many Variables Nodes = Variables F=1000 F=100 F=10000 Slide28:  E = 1/[1 + 6d(fc/F)[P/N] + 3(c/e)*(fB/Fb)[P/N]^(1/3)] = 1/[E1 + EL + EB] E1 EB EL Latency Dominated Compute Dominated Bandwidth Affected Strong Scaling:  Strong Scaling N fixed, P ∞ For any N, eventually P will win! Dominated by Latency and Bandwidth in Various Regimes Large Values of N scale better Chart next slide Slide30:  N=1000000 N=100000 Lesson: In the Best of Cases, Strong Scaling is Very Hard to Achieve Thank You for the Honor of the Opportunity to Lecture at Tokyo University!:  Thank You for the Honor of the Opportunity to Lecture at Tokyo University!

Add a comment

Related presentations

Related pages

Workout Plans | Exercise Database | Workout Logger

Exercise Smarter Log workouts, gain insights, and reach your goals. Get started as a free user! ... and workout plans to stay safe and get results. ...
Read more

Are You Getting Results from Your Workouts? - Verywell

Are you getting the results you want from your exercise program? If not, you may need to take a look at what you're doing, and your definition of results.
Read more

7 Most Effective Exercises - WebMD

Does Your Workout Really Work? Done right, these seven exercises give you results that you can see and feel. You can you do them at a gym or at home.
Read more

Getting Results - Verywell

Exercise Getting Results. Seemingly small things like tracking your progress, increasing your intensity, and even taking a rest day make a big difference ...
Read more

6 Truths About Exercise That Nobody Wants to Believe

With that in mind, here are 6 exercise tips, ... all you really have to do is focus on these simple concepts and you’ll see results. ...
Read more

English Grammar Exercises - Englisch-hilfen.de

Learn English with our Free Online Grammar Exercises, Reference. Menu. Englisch-hilfen.de/ ... 2035 Conditional sentences II, Exercise – medium (2 gaps)
Read more

How to Exercise (with Pictures) - wikiHow

How to Exercise. This wikiHow will teach you how to exercise. Put on some breathable clothes and shoes made for exercise to get started. Wear the right ...
Read more

Exercise results - HELBLING e-zone

By using our services, you agree to our use of cookies. What are cookies? OK
Read more

Exercise: 7 benefits of regular physical activity - Mayo ...

Exercise: 7 benefits of regular physical activity. You know exercise is good for you, but do you know how good?
Read more