Warning: Unknown: Unable to allocate memory for pool. in Unknown on line 0 Warning: require(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 8 Warning: require(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 9 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 11 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 89 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 90 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 91 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 92 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 93 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 94 Warning: include_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/Smarty.class.php on line 95 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 12 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/funkce.php on line 2 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 13 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/DbSlideWeb.php on line 2 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/DbSlide.php on line 2 Warning: require(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/LoggedPDO.php on line 2 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/DbSlideWeb.php on line 3 Warning: include(): Unable to allocate memory for pool. in /var/www/html/slideee/www/index.php on line 14 Warning: require_once(): Unable to allocate memory for pool. in /var/www/html/slideee/libs/ErrorHandler/includer.php on line 6 the Paxos Commit algorithm, SlideSearchEngine.com
advertisement

the Paxos Commit algorithm

50 %
50 %
advertisement
Information about the Paxos Commit algorithm

Published on May 25, 2007

Author: paolos84

Source: slideshare.net

Description

This is the presentation I used to give a seminar about the "Paxos Commit" algorithm. It is one of Leslie Lamport's works (in this case, a joint work between him and Jim Gray). You can find the original paper here:
http://research.microsoft.com/users/lamport/pubs/pubs.html#paxos-commit

Feel free to post comments ;)
Enjoy.
advertisement

 

Agenda Paxos Commit Algorithm: Overview The participating processes The resource managers The leader The acceptors Paxos Commit Algorithm: the base version Failure scenarios Optimizations for Paxos Commit Performance Paxos Commit vs. Two-Phase Commit Using a dynamic set of resource managers

Paxos Commit Algorithm: Overview

The participating processes

The resource managers

The leader

The acceptors

Paxos Commit Algorithm: the base version

Failure scenarios

Optimizations for Paxos Commit

Performance

Paxos Commit vs. Two-Phase Commit

Using a dynamic set of resource managers

Paxos Commit Algorithm: Overview Paxos was applied to Transaction Commit by L.Lamport and Jim Gray in Consensus on Transaction Commit One instance of Paxos (consensus algorithm) is executed for each resource manager, in order to agree upon a value (Prepared/Aborted) proposed by it “ Not-synchronous” Commit algorithm Fault-tolerant (unlike 2PC) Intended to be used in systems where failures are fail-stop only, for both processes and network Safety is guaranteed (unlike 3PC) Formally specified and checked Can be optimized to the theoretically best performance

Paxos was applied to Transaction Commit by L.Lamport and Jim Gray in Consensus on Transaction Commit

One instance of Paxos (consensus algorithm) is executed for each resource manager, in order to agree upon a value (Prepared/Aborted) proposed by it

“ Not-synchronous” Commit algorithm

Fault-tolerant (unlike 2PC)

Intended to be used in systems where failures are fail-stop only, for both processes and network

Safety is guaranteed (unlike 3PC)

Formally specified and checked

Can be optimized to the theoretically best performance

Participants: the resource managers N resource managers (“RM”) execute the distributed transaction, then choose a value (“locally chosen value” or “LCV”; ‘p’ for prepared iff it is willing to commit) Every RM tries to get its LCV accepted by a majority set of acceptors (“MS”: any subset with a cardinality strictly greater than half of the total). Each RM is the first proposer in its own instance of Paxos Participants: the leader Coordinates the commit algorithm All the instances of Paxos share the same leader It is not a single point of failure (unlike 2PC) Assumed always defined (true, many leader-(s)election algorithms exist) and unique (not necessarily true, but unlike 3PC safety does not rely on it)

N resource managers (“RM”) execute the distributed transaction, then choose a value (“locally chosen value” or “LCV”; ‘p’ for prepared iff it is willing to commit)

Every RM tries to get its LCV accepted by a majority set of acceptors (“MS”: any subset with a cardinality strictly greater than half of the total).

Each RM is the first proposer in its own instance of Paxos

Coordinates the commit algorithm

All the instances of Paxos share the same leader

It is not a single point of failure (unlike 2PC)

Assumed always defined (true, many leader-(s)election algorithms exist) and unique (not necessarily true, but unlike 3PC safety does not rely on it)

Participants: the acceptors A denotes the set of acceptors All the instances of Paxos share the same set A of acceptors 2F+1 acceptors involved in order to achieve tolerance to F failures We will consider only F+1 acceptors, leaving F more for “spare” purposes (less communication overhead) Each acceptors keep track of its own progress in a Nx1 vector Vectors need to be merged into a Nx|MS| table, called aState, in order to take the global decision (we want “many” p’s) p p p p p AC 4 AC 5 AC 1 AC 2 AC 3 Consensus box (MS) RM1 a Ok! RM2 p Ok! RM3 p Ok! a a a a a p p p p p 3 rd instance 1 st instance 2 nd instance Acc 1 Acc 2 Acc 3 Acc 4 Acc 5 aState Paxos

A denotes the set of acceptors

All the instances of Paxos share the same set A of acceptors

2F+1 acceptors involved in order to achieve tolerance to F failures

We will consider only F+1 acceptors, leaving F more for “spare” purposes (less communication overhead)

Each acceptors keep track of its own progress in a Nx1 vector

Vectors need to be merged into a Nx|MS| table, called aState, in order to take the global decision (we want “many” p’s)

Paxos Commit (base) Not blocked iff F acceptors respond T 1 T 2 (N=5) (F=2) : Writes on log AC0 L AC1 AC2 RM1 RM2 RM3 RM4 RM0 prepare (N-1) x p2a rm 0 v(rm) (N(F+1)-1) x rm 0 v(rm) rm 0 v(rm) p2b acc rm 0 v(rm) rm 0 v(rm) rm 0 v(rm) Opt. F x If ( Global Commit ) then commit p3 else abort p3 x N 0 0 v(0) p2a 1x BeginCommit

Global Commit Condition That is: there must be one and only one row for each RM involved in the commitment; in each row of those rows there must be at least F+1 entries that have ‘ p ’ as a value and refer to the same ballot p2b acc rm b p

That is: there must be one and only one row for each RM involved in the commitment; in each row of those rows there must be at least F+1 entries that have ‘ p ’ as a value and refer to the same ballot

[T 1 ] What if some RMs do not submit their LCV? Leader One majority of acceptors Leader : «Has resource manager j ever proposed you a value?» (Promise not to answer any b L2 <b L1 ) “ accept?” “ promise” “ prepare?” p1a p1b p2a (1) Acceptor i : «Yes, in my last session (ballot) b i with it I accepted its proposal v i » (2) Acceptor i : «No, never» If (at least |MS| acceptors answered) If (for ALL of them case (2) holds) then V=‘a’ [FREE] else V=v(maximum({b i }) [FORCED] Leader : «I am j, I propose V» b L1 >0

[T 2 ] What if the leader fails? If the leader fails, some leader-(s)election algorithm is executed. A faulty election (2+ leaders) doesn’t preclude safety (  3PC), but can impede progress… trusted trusted trusted ignored ignored ignored MS b 1 >0 b 2 > b 1 Non-terminating example: infinite sequence of p1a-p1b-p2a messages from 2 leaders Not really likely to happen It can be avoided (random T ?) b 3 > b 2 b 4 > b 3 T T T L2 L1 trusted

If the leader fails, some leader-(s)election algorithm is executed. A faulty election (2+ leaders) doesn’t preclude safety (  3PC), but can impede progress…

Non-terminating example: infinite sequence of p1a-p1b-p2a messages from 2 leaders

Not really likely to happen

It can be avoided (random T ?)

Optimizations for Paxos Commit (1) Co-Location: each acceptor is on the same node as a RM and the initiating RM is on the same node as the initial leader -1 message phase (BeginCommit), -(F+2) messages “ Real-Time assumptions”: RMs can prepare spontaneously. The prepare phase is not needed anymore, RMs just “know” they have to prepare in some amount of time -1 message phase (Prepare), -(N-1) messages RM3 RM0 AC0 p2a BeginCommit L p3 RM1 AC1 p2a RM4 RM2 AC2 p2a RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 Not needed anymore! prepare (N-1) x

Co-Location: each acceptor is on the same node as a RM and the initiating RM is on the same node as the initial leader

-1 message phase (BeginCommit), -(F+2) messages

“ Real-Time assumptions”: RMs can prepare spontaneously. The prepare phase is not needed anymore, RMs just “know” they have to prepare in some amount of time

-1 message phase (Prepare), -(N-1) messages

Optimizations for Paxos Commit (2) Phase 3 elimination: the acceptors send their phase2b messages (the columns of aState) directly to the RMs, that evaluate the global commit condition Paxos Commit + Phase 3 Elimination = Faster Paxos Commit (FPC) FPC + Co-location + R.T.A. = Optimal Consensus Algorithm p2b RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 p2b p3

Phase 3 elimination: the acceptors send their phase2b messages (the columns of aState) directly to the RMs, that evaluate the global commit condition

Paxos Commit + Phase 3 Elimination = Faster Paxos Commit (FPC)

FPC + Co-location + R.T.A. = Optimal Consensus Algorithm

Performance If we deploy only one acceptor for Paxos Commit (F=0), its fault tolerance and cost are the same as 2PC’s. Are they exactly the same protocol in that case? *Not Assuming RMs’ concurrent preparation (slides-like scenario) **Assuming RMs’ concurrent preparation (r.t. constraints needed) N +F +1 N +F +1 N+1 Stable storage writes** 2 2 2 Stable storage write delays** 2FN-2F +3N-3 2NF +3N-1 NF +3N-3 NF+F +3N-1 3N-3 3N-1 Messages* 3 Coloc. 4 4 5 3 4 Message delays* No coloc. Coloc. No coloc. Coloc. No coloc. Faster Paxos Commit Paxos Commit 2PC

If we deploy only one acceptor for Paxos Commit (F=0), its fault tolerance and cost are the same as 2PC’s. Are they exactly the same protocol in that case?

Paxos Commit vs. 2PC Yes, but… T 1 T 2 TM RM1 Other RMs 2PC from Lamport and Gray’s paper 2PC from the slides of the course … two slightly different versions of 2PC!

Yes, but…

… two slightly different versions of 2PC!

Using a dynamic set of RM You add one process, the registrar , that acts just like another resource manager, despite the following: pad Pad RMs can join the transaction until the Commit Protocol begins The global commit condition now holds on the set of resource managers proposed by the registrar and decided in its own instance of Paxos: AC 4 AC 5 AC 1 AC 2 AC 3 MS RM1 a Ok! RM2 p Ok! RM3 p Ok! Paxos REG RM1;RM2;RM3 Ok! join join join RM1 RM2 RM3 p2b acc rm b p

You add one process, the registrar , that acts just like another resource manager, despite the following:

pad

Pad

RMs can join the transaction until the Commit Protocol begins

The global commit condition now holds on the set of resource managers proposed by the registrar and decided in its own instance of Paxos:

Thank You! Questions?

Add a comment