Published on September 18, 2007
Emulating Massively Parallel (PetaFLOPS) Machines: Emulating Massively Parallel (PetaFLOPS) Machines Neelam Saboo, Arun Kumar Singla Joshua Mostkoff Unger, Gengbin Zheng, Laxmikant V. Kalé Department of Computer Science Parallel Programming Laboratory http://charm.cs.uiuc.edu Roadmap: Roadmap BlueGene Architecture Need for an Emulator Charm++ BlueGene Converse BlueGene Future Work Blue Gene: Processor-in-memory Case Study: Blue Gene: Processor-in-memory Case Study Five steps to a PetaFLOPS, taken from: http://www.research.ibm.com/bluegene/ FUNCTIONAL MODEL: 34X34X36 cube of shared memory nodes each having 25 processors. SMP Node: SMP Node 25 processors 200 processing elements Input/Output Buffer 32 x 128 bytes Network Connected to six neighbors via duplex link 16 bit @ 500 MHz = 1 Gigabyte/s Latencies: 5 cycles per hop 75 cycles per turn Processor: Processor STATS: 500 MHz Memory-side cache eliminates coherency problems 10 cycles local cache 20 cycles remote cache 10 cycles cache miss 8 integer units sharing 2 floating point units 8 x 25 x ~40,000 = ~8 x 106 processing elements! Need for Emulator: Need for Emulator Emulator – enables programmer to develop, compile, and run software using programming interface that will be used in actual machine Emulator Objectives: Emulator Objectives Emulate Blue Gene and other petaFLOPS machines. Memory limitations and time limitations on single processor requires that simulation MUST be performed on parallel architecture. Issues: Assume that program written for processor-in-memory machine will handle out-of-order execution and messaging. Therefore don’t need complex event queue/rollback. Emulator Implementation: Emulator Implementation What are basic data structures/interface? Machine configuration (topology), handler registration Nodes with node-level shared data Threads (associated with each node) representing processing elements Communication between nodes How to handle all these objects on parallel architecture? How to handle object-to-object communication? Difficulties of implementation eased by using Charm++, object-oriented parallel programming paradigm. Experiments on Emulator: Experiments on Emulator Sample applications implemented: Primes Jacobi relaxation MD prototype ApoA-I: 92k Atoms 40,000 atoms, no bonds calculated, nearest neighbor cutoff Ran full Blue Gene (with 8 x 106 threads) on ~100 ASCI-Red processors Collective Operations: Collective Operations Explore different algorithms for broadcasts and reductions RING LINE OCTREE x y z Use 'primitive' 30 x 30 x 20 (10 threads) Blue Gene emulation on 50 processor Linux cluster Converse BlueGene Emulator Objective: Converse BlueGene Emulator Objective Performance estimation (with proper time stamping) Provide API for building Charm++ on top of emulator. Bluegene Emulator : Bluegene Emulator Node Structure Communication threads Non-affinity message queue Affinity message queue Worker thread inBuffer Performance: Performance Pingpong Close to Converse pingpong; 81-103 us v.s. 92 us RTT Charm++ pingpong 116 us RTT Charm++ Bluegene pingpong 134-175 us RTT Charm++ on top of Emulator: Charm++ on top of Emulator BlueGene thread represents Charm++ node; Name conflict: Cpv, Ctv MsgSend, etc CkMyPe(), CkNumPes(), etc Future Work: Simulator: Future Work: Simulator LeanMD : Fully functional MD with only cutoff How can we examine performance of algorithms on variants of processor-in-memory design in massive system? Several layers of detail to measure Basic: Correctly model performance, timestamp messages with correction for out-of-order execution More detailed: network performance, memory access, modeling sharing of floating-point unit, estimation techniques
BLUEGENE01: Musician in Dover, New Hampshire. Currently seeking: Band to Join, Vocalist, Rhythm Guitar, Lead Guitar and more.STUDIO WORK. GIGGING. I PLAY ...
I have a total of 3 hard drives, two 640's that are at RAID O (BlueGene01 below) which I don't get an error message on. I get the FAQ; Forum ...
Overview of the Blue Gene/L system architecture A. Gara M. A. Blumrich D. Chen G. L.-T. Chiu P. Coteus M. E. Giampapa R. A. Haring P. Heidelberger D. Hoenicke
Bruce & Lynn: Band in Wolfeboro, New Hampshire. Currently seeking: Lead Guitar, ... BLUEGENE01 Dover, NH. Musician. View Contact. Bob AK Center Ossipee, NH ...
Blue Gene downloads at Ebook-kings.com - Download free pdf files,ebooks and documents - IBM Blue Gene/P Application Development