67 %
33 %
Information about bluegene01

Published on September 18, 2007

Author: Malbern


Emulating Massively Parallel (PetaFLOPS) Machines:  Emulating Massively Parallel (PetaFLOPS) Machines Neelam Saboo, Arun Kumar Singla Joshua Mostkoff Unger, Gengbin Zheng, Laxmikant V. Kalé Department of Computer Science Parallel Programming Laboratory Roadmap:  Roadmap BlueGene Architecture Need for an Emulator Charm++ BlueGene Converse BlueGene Future Work Blue Gene: Processor-in-memory Case Study:  Blue Gene: Processor-in-memory Case Study Five steps to a PetaFLOPS, taken from: FUNCTIONAL MODEL: 34X34X36 cube of shared memory nodes each having 25 processors. SMP Node:  SMP Node 25 processors 200 processing elements Input/Output Buffer 32 x 128 bytes Network Connected to six neighbors via duplex link 16 bit @ 500 MHz = 1 Gigabyte/s Latencies: 5 cycles per hop 75 cycles per turn Processor:  Processor STATS: 500 MHz Memory-side cache eliminates coherency problems 10 cycles local cache 20 cycles remote cache 10 cycles cache miss 8 integer units sharing 2 floating point units 8 x 25 x ~40,000 = ~8 x 106 processing elements! Need for Emulator:  Need for Emulator Emulator – enables programmer to develop, compile, and run software using programming interface that will be used in actual machine Emulator Objectives:  Emulator Objectives Emulate Blue Gene and other petaFLOPS machines. Memory limitations and time limitations on single processor requires that simulation MUST be performed on parallel architecture. Issues: Assume that program written for processor-in-memory machine will handle out-of-order execution and messaging. Therefore don’t need complex event queue/rollback. Emulator Implementation:  Emulator Implementation What are basic data structures/interface? Machine configuration (topology), handler registration Nodes with node-level shared data Threads (associated with each node) representing processing elements Communication between nodes How to handle all these objects on parallel architecture? How to handle object-to-object communication? Difficulties of implementation eased by using Charm++, object-oriented parallel programming paradigm. Experiments on Emulator:  Experiments on Emulator Sample applications implemented: Primes Jacobi relaxation MD prototype ApoA-I: 92k Atoms 40,000 atoms, no bonds calculated, nearest neighbor cutoff Ran full Blue Gene (with 8 x 106 threads) on ~100 ASCI-Red processors Collective Operations:  Collective Operations Explore different algorithms for broadcasts and reductions RING LINE OCTREE x y z Use 'primitive' 30 x 30 x 20 (10 threads) Blue Gene emulation on 50 processor Linux cluster Converse BlueGene Emulator Objective:  Converse BlueGene Emulator Objective Performance estimation (with proper time stamping) Provide API for building Charm++ on top of emulator. Bluegene Emulator :  Bluegene Emulator Node Structure Communication threads Non-affinity message queue Affinity message queue Worker thread inBuffer Performance:  Performance Pingpong Close to Converse pingpong; 81-103 us v.s. 92 us RTT Charm++ pingpong 116 us RTT Charm++ Bluegene pingpong 134-175 us RTT Charm++ on top of Emulator:  Charm++ on top of Emulator BlueGene thread represents Charm++ node; Name conflict: Cpv, Ctv MsgSend, etc CkMyPe(), CkNumPes(), etc Future Work: Simulator:  Future Work: Simulator LeanMD : Fully functional MD with only cutoff How can we examine performance of algorithms on variants of processor-in-memory design in massive system? Several layers of detail to measure Basic: Correctly model performance, timestamp messages with correction for out-of-order execution More detailed: network performance, memory access, modeling sharing of floating-point unit, estimation techniques

Add a comment

Related presentations

Related pages

BLUEGENE01 - Musician in Dover NH -

BLUEGENE01: Musician in Dover, New Hampshire. Currently seeking: Band to Join, Vocalist, Rhythm Guitar, Lead Guitar and more.STUDIO WORK. GIGGING. I PLAY ...
Read more

[ubuntu] The disk drive for X is not ready yet or not present

I have a total of 3 hard drives, two 640's that are at RAID O (BlueGene01 below) which I don't get an error message on. I get the FAQ; Forum ...
Read more

Overview of the Blue Gene/L system architecture

Overview of the Blue Gene/L system architecture A. Gara M. A. Blumrich D. Chen G. L.-T. Chiu P. Coteus M. E. Giampapa R. A. Haring P. Heidelberger D. Hoenicke
Read more

Bruce & Lynn - Band in Wolfeboro NH -

Bruce & Lynn: Band in Wolfeboro, New Hampshire. Currently seeking: Lead Guitar, ... BLUEGENE01 Dover, NH. Musician. View Contact. Bob AK Center Ossipee, NH ...
Read more

Blue Gene PDF -

Blue Gene downloads at - Download free pdf files,ebooks and documents - IBM Blue Gene/P Application Development
Read more