67 %
33 %
Information about PrelimI

Published on October 7, 2007

Author: FunSchool


Minimal Overhead Fault Tolerance Protocol for High Performance Computing Systems:  Minimal Overhead Fault Tolerance Protocol for High Performance Computing Systems PhD Candidate: Yijian Yang Advisor: Yuan Shi Temple University Outline:  Outline Problem statement Key concepts Solution space Open problem Stateless parallel processing: Synergy Proposed solution Future work Problem statement:  Problem statement High-performance multiprocessor computing systems have not been designed to handle component failures. As modern high-performance systems grow in size: More processors means more points of failure Low-cost, custom-assembled cluster implies higher failure rate than custom multiprocessors. Applications requirement far exceeding MTBF Multiprocessor Fault Tolerance: A property that enables any application to continue operating in the event of multiple component failures in the multiprocessor system. Key concepts:  Key concepts Consistent Global System states Failure models Stop-failure Byzantine Stable storage It refers to the storage must ensure that the recovery data persist through the tolerated failures and their corresponding recoveries. Garbage collection Defined as the deletion of useless recovery information P0 P0 P1 P2 P1 P2 Consistent State Inconsistent State M1 M1 M2 M2 a b Existing Solutions:  Existing Solutions Fault tolerance Time (Rollback-recovery) Redundancy: Replicated System Group communication: Transaction processing: Casual Checkpoint-based Log-based Optimistic Blocking Communicated-based Uncoordinated Coordinated Non-blocking Pessimistic (MOM [Cannon S. 1994]) (Isis [Kaashoek M. 1990]) Overhead for intermediate result 1.Extra operation. 2.Blocking sending operation. 3.Bottleneck at sequencer. Processes achieve fault tolerance by using stable storage device to save recovery information periodically during failure-free execution. Upon a failure, a failed process uses the saved information to restart the computation from an intermediate state, thereby reducing the amount of lost computation. 2 – PC 2PC -> Low availability P2 Rollback recovery:  Rollback recovery V.S. V.S. V.S. V.S. Next Checkpoint-based V.S. Log-based:  Checkpoint-based V.S. Log-based Back Coordinated V.S. Uncoordinated:  Coordinated V.S. Uncoordinated Coordinated Pros: Simplifying recovery Not susceptible to domino effect One permanent checkpoint on stable storage, no need for garbage collection. Cons: Latency in committing output Uncoordinated Pros: Maximal autonomy. A process may reduce overhead by taking checkpoints whenever the amount of state information to be saved is small. Cons: Domino effect Useless checkpoint Garbage collection Back Non-blocking V.S. Blocking:  Non-blocking V.S. Blocking Blocking Large overhead Non-blocking Initiator Checkpoint Request Initiator Initiator P0 P0 P1 P1 Checkpoint Request C0,x C0,x C0,x C1,x C1,x C1,x m m m Checkpoint Request a b c P0 P1 a. Checkpoint inconsistency; b. with FIFO channels; c. non-FIFO channels Back System-level V.S. Application-level:  System-level V.S. Application-level Back Open problem – late and early message:  Open problem – late and early message Intra-epoch Late Early 0 0 0 1 1 1 2 2 2 P R Q Global Checkpoint 1 Global Checkpoint 2 Start of program Late message: if eA < eB Early message: if eA > eB Where eA is the epoch number of A at the point the application program execution where the send command is executed, eB is the epoch number of B when the message is delivered to the application program Observations:  Observations Fault tolerance for multiprocessor systems is in general very hard. For high performance multiprocessor systems, the lower the overhead the better the solution. Synergy:  Synergy Synergy is a parallel computing system using a Stateless Parallel Processing (SPP) principle. SPP is based on coarse-grain dataflow processing. Synergy uses passive objects for inter-process(or) communication. Synergy V.S. MPI:  Synergy V.S. MPI Master Worker Worker Tuple Space Indirect communication P0 P1 Pn Direct communication MPI (direct communication) Fault Tolerance:  MPI (direct communication) Fault Tolerance Message-passing systems complicate rollback-recovery because message induce inter-process dependencies during failure-free dependencies. In the implementation of the automated application-level checkpointing of MPI programs, it cooperates with a special compiler to achieve the store-recover. Example: Automated application-level checkpointing Co-ordination layer C3(Cornell checkpoint pre-compiler) Complier Non-blocking Application-level Synergy (indirect communication) Fault Tolerance:  Synergy (indirect communication) Fault Tolerance Worker Shadow tuple Master Automatic system-level checkpoint-based non-blocking coordinated rollback-recovery protocol Next Tuple space:  Tuple space Tuple1 Available List Tuple2 Tuplen Tuple1 Shadow List Tuple2 Tuplen Operation type Worker ID Operation type Worker ID Operation type Worker ID Tuple1 Children List Tuple2 Tuplen Worker: fault tolerance:  Worker: fault tolerance Available set Shadow Set Children set Request set Tuple request If matching tuple found on available set Intermediate tuple generation Upon parent tuple finish If no matching tuple found on available set New coming tuple satisfies a previous request Timeout Back A non-blocking coordination protocol:  A non-blocking coordination protocol Phase 1: An initiator sends a control message to all participating processor requesting a checkpoint. Phase 2: All master process takes a local checkpoint sometime after receiving the requesting message from the initiator. Phase 3: Each master process records 1) any early messages 2) every late message it receives 3) the result of every non-deterministic event, after takes the local checkpoint. After recording, sending a finish checkpoint message back to the initiator Phase 4: After receiving all the finish checkpoint message from all the processes, the initiator records to the stable storage about the global checkpoint, and also sends a stop message to all the processes. Phase 5: Upon receiving the stop message from the initiator, all processes stop recording. Recovery:  Recovery Failed master starts from the most recent checkpoint. Send every other master the signature of all early messages before the failure, so that these sends can be suppressed during recovery. Informing all the masters about the late message, which will be replayed. After finishing all the early and late messages, the failed master is now on the track to further execution. The Big Picture:  The Big Picture Normal running Local Checkpoint Upon receiving checkpoint request from the initiator Recording Upon finishing checkpoint before getting all late messages Committing Upon finishing all late messages Confirmation Inform initiator Upon all process informing about the finish Restore checkpoint Upon restoration Restore Upon finishing late and early messages Back Conclusions:  Conclusions Promising minimal overhead: small checkpoint size Simple schema Fast recovery Future work:  Future work Implementation of system-level checkpointing Implementation of multiple dependent stateful process coordination protocol Performance studies against MPI counter-parts Bibliography:  Bibliography Shi, Y. 2004 Stateless Parallel Processing Schulz, M. 2004 Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs Bronevetsky, G. 2003 Automated application-level checkpointing of MPI programs Elnozahy, E. 2002 A survey of rollback-recovery protocols in message passing systems Kaashoek, M. 1991 Fault tolerance using group communication Cannon S. 1994 Adding fault-tolerant transaction processing to Linda Chandy, K. 1985 Distributed snapshots: determining global states of distributed systems Cannon S. 1994 Adding fault-tolerant transaction processing to Linda Thanks:  Thanks

Add a comment

Related presentations

Related pages

A Plan of the City of Hartford. Prelimi: ...

- A Plan of the City of Hartford. Prelimi jetzt kaufen. ISBN: 9785874566029, Fremdsprachige Bücher - Fremdsprachige Bücher
Read more

giudice per le indagini prelimi... : Italienisch » Deutsch ...

Übersetzungen für giudice per le indagini prelimi... im Italienisch » Deutsch-Wörterbuch von PONS Online:giudice, giudice conciliatore, giudice ...
Read more

Lets dispense with the prelimi : Französisch » Deutsch | PONS

Übersetzungen für Lets dispense with the prelimi im Französisch » Deutsch-Wörterbuch von PONS Online:dispense, dispense de qc, dispense d'âge ...
Read more


A A B B C C D D E E 4 4 3 3 2 2 1 1 PARALLEL PORT I/F D600 942 * TMS320C6211 DSK TEXAS INSTRUMENTS INCORPORATED A Wednesday, July 21, 1999 2 22 Title Size ...
Read more

Affidamento dell’incarico di progettazione prelimi ...

Affidamento dell’incarico di progettazione preliminare con prime indicazioni per la sicurezza e progettazione definitiva, nonché riflievo topografi ...
Read more

Cerise Ranch Prelimi#16F589

Cerise Ranch Property Owners Association Design Review Board Cerise Ranch Prelimi#16F589.doc - 1 - CERISE RANCH PRELIMINARY REVIEW APPLICATION Date:_____
Read more

medizinische untersuchung - Englisch ⇔ Deutsch Wörterbuch ...

The exam is not "vorläufig" (how I would translate "prelimi... 27 Antworten: Sie sind hier: Wörterbuch Englisch-Deutsch:: medizinische ...
Read more

Acoustic Guitar Play Prelimi Preliminary Grade Rgt Guitar ...

Acoustic Guitar Play Prelimi Preliminary Grade Rgt Guitar Lessons. 14-10-2016 2/2 Acoustic Guitar Play Prelimi Preliminary Grade Rgt Guitar Lessons
Read more

2018 FIFA World Cup Russia™ - Preliminary Draw -

ROUTES TO RUSSIA 2018 REVEALED The Preliminary Draw for the 2018 FIFA World Cup Russia™ has taken place at St Petersburg's Konstantin Palace, with Spain ...
Read more

2018 FIFA World Cup Russia™ -

2018 FIFA World Cup Russia™ Toggle navigation. About FIFA; Development; Governance; Sustainability; We use "cookies" to collect information.
Read more