Fail over fail_back

50 %
50 %
Information about Fail over fail_back

Published on April 4, 2014

Author: PGExperts



Talk from pgDay NYC about how to plan and manage failover in a PostgreSQL replication system.

Fail Fail over back Josh Berkus PostgreSQL Experts Inc. NYC PgDay 2014

mozilla logo is a trademark of the Mozilla corporation. Used here under fair use.

2 servers 1 command

admin executes failover connect to master? no what error? shutdown master no response other erroryes fail to shutdown BRINGUP standby success standby is standing by? yes no

Automated Failover

image from used under creative commons share-alike

Fail over


1. Minimize Downtime

2. Minimize data loss

3. Don't make it worse! ?

Planned vs. Emergency

Failover once a quarter ● Postgres updates ● Kernel updates ● Disaster Recovery drills ● Just for fun!

Automated or Not? ● < 1hr ● false failover ● testing testing testing ● complex SW ● >= 1hr ● 2am call ● training ● simple script

sysadmin > software

failover in 3 parts (1) Detecting Failure(2) Failing Over DB (3) Failing Over App

1. Detecting Failure

can't connect to master could not connect to server: Connection refused Is the server running on host "" and accepting TCP/IP connections on port 5432?

can't connect to master ● down? ● too busy? ● network problem? ● configuration error?

can't connect to master ● down? › failover ● too busy? › don't fail over

pg_isready pg_isready -h -p 6433 -t 15 - accepting connections

pg_isready 0 == running and accepting connections (even if too busy) 1 == running but rejecting connections (security settings) 2 == not responding (down?)

more checks can ssh? master is down; failover no postgres processes on master? yes exit with error yes attempt restart no master is OK; no failover succeed fail

check replica pg_isready? OK to failover yes exit with error no is replica?yes no

some rules ● don't just ping 5432 ● misconfiguration > downtime ● tradeoff: › confidence › time to failover

failover time master poll fail: ssh master: attempt restart: verify replica: failover: 1 – 10 1 – 10 3 – 15 1 – 5 3 – 20 9 – 60

AppServer One AppServer Two PARTITION

AppServer One AppServer Two PARTITION

AppServer One AppServer Two Broker

AppServer One AppServer Two Proxy

Failing Over the DB

Failing Over the DB 1. choose a replica target 2. shutdown the master 3. promote the replica 4. verify the replica 5. remaster other replicas

Choosing a replica A. One replica B. Designated replica C. Furthest ahead replica

One Replica fail over to it or don't well, that's easy

Designated Replica ● load-free replica, or ● cascade master, or ● syncronous replica

“Furthest Ahead” ● Pool of replicas ● Least data loss ● Least downtime ● Other replicas can remaster … but what's “furthest ahead”?

receive vs. replay ● receive == data it has ● replay == data it applied

receive vs. replay ● receive == data it has › “furthest ahead” ● replay == data it applied › “most caught up”

receive vs. replay “get the furthest ahead, but not more than 2 hours behind on replay”

receive vs. replay “get the furthest ahead, but not more than 1GB behind on replay”

timestamp? pg_last_xact_replay_timestamp() ● last transaction commit ● not last data ● same timestamp, different receive positions

Position? pg_xlog_location_diff() ● compare two XLOG locations ● byte position ● comparable granularly

Position? select pg_xlog_location_diff( pg_current_xlog_location(), '0/0000000'); --------------- 701505732608

Position? ● rep1: 701505732608 ● rep2: 701505737072 ● rep3: 701312124416

Replay? ● more replay == slower promotion ● figure out max. acceptable ● “sacrifice” the delayed replica

Replay? SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location() ); --------------- 1232132

Replay? SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location() ); --------------- 4294967296

master shutdown ● STONITH ● make sure master can't restart ● or can't be reached

Terminate or Isolate

promotion pg_ctl promote ● make sure it worked ● may have to wait › how long?


remastering pre-9.3 ● all replicas are set to: recovery_target_timeline = 'latest' ● change primary_conninfo to new master ● all must pull from common archive ● restart replicas

remastering pre-9.3

remastering pre-9.3

remastering pre-9.3

remastering pre-9.3

remastering post-9.3

remastering post-9.3

remastering post-9.3

remastering post-9.3 ● all replicas are set to: recovery_target_timeline = 'latest' ● change primary_conninfo to new master ● restart replicas

restart problem ● must restart to remaster › not likely to change soon ● break connections vs. fall behind

3. Application Failover

3. Application Failover ● old master → new master for read-write ● old replicas → new replicas for load balancing ● fast: prevent split-brain

CMS method 1. update Configuration Management System 2. push change to all application servers

CMS method ● slow ● asynchronous ● hard to confirm 100% complete ● network split?

zookeeper method 1. write new connection config to zookeeper 2. application servers pull connection info from zookeeper

zookeeper method ● asynchronous › or poor response time ● delay to verify ● network split?

Pacemaker method 1. master has virtual IP 2. applications connect to VIP 3. Pacemaker reassigns VIP on fail

Pacemaker advantages ● 2-node solution (mostly) ● synchronous ● fast ● absolute isolation

Pacemaker drawbacks ● really hard to configure ● poor integration with load- balancing ● automated failure detection too simple › can't be disabled

proxy method 1. application servers connect to db via proxies 2. change proxy config 3. restart/reload proxies

AppServer One AppServer Two Proxy

AppServer One AppServer Two Proxy

proxies ● pgBouncer ● pgPool ● HAProxy ● Zeus, BigIP, Cisco ● FEMEBE


what? ● after failover, make the old master the master again

why? ● old master is better machine? ● some server locations hardcoded? ● doing maintenance on both servers?

why not? ● bad infrastructure design? ● takes a while? ● need to verify old master? ● just spin up a new instance?



rsync ● reduce time/data for old master recopy ● doesn't work as well as you'd expect › hint bits

pg_rewind ++ ● use XLOG + data files for rsync ● super fast master resync

pg_rewind -- ● not yet stable ● need to have all XLOGs › doesn't yet support archives ● need checksums › or 9.4's wal_log_hints

Automated Failback Fork it on Github!

Questions? ● › fork it! ● Josh Berkus: › PGX: › Blog: Copyright 2014 PostgreSQL Experts Inc. Released under the Creative Commons Share-Alike 3.0 License. All images, logos and trademarks are the property of their respective owners and are used under principles of fair use unless otherwise noted.

Add a comment

Related presentations

Online rank tracker!

Online rank tracker!

October 20, 2014

Rank tracker is a tool that helps everyone to find the traffic which a particular ...

Facebook y sus avances

Facebook y sus avances

November 11, 2014

facebook y sus avances

Preguntas Investigación

Preguntas Investigación

November 11, 2014

es una presentación de preguntas interesantes

Cartes interactives, frises chronologiques, diaporamas, carte Google Streetview in...

Related pages

fail-over, fail-back - Deutsch-Übersetzung – Linguee ...

Viele übersetzte Beispielsätze mit "fail-over, fail-back" – Deutsch-Englisch Wörterbuch und Suchmaschine für Millionen von Deutsch-Übersetzungen.
Read more

Failing over and failing back a volume - Dell

Failing over and failing back a ... you can fail over to the secondary group and allow users to access the ... you can fail back to the primary group.
Read more

Failover / Failback Policies on Microsoft Cluster Server

Microsoft Cluster Server ... The failoverthreshold is the number of times the group can fail over within the numberof hours specified by the failover period.
Read more

Failover - Wikipedia, the free encyclopedia

Failover and switchover are essentially the same operation, ... Some systems, intentionally, do not fail over entirely automatically, ...
Read more

Fail back VMware virtual machines and physical servers to ...

This articles describes how to fail back Azure virtual machines from Azure to the on-premises site. Follow the instructions in this article when ...
Read more

How to perform a Fail Over / Fail Over, an alternative to ...

Fail Over / Fail Over Method This method can be used as an alternative to the Fail Over / Fail Back and Replicat 156009
Read more

Fail-over/Fail-back - MidoNet Operations Guide - 5.2-rev4

In combination with the Spanning Tree Protocol (STP) enabled on the physical bridges, MidoNet VABs are able to provide fail-over capabilities by forwarding ...
Read more

DFSR for High availability - fail over/fail back

Background 5 offices, 1 domain and full mesh VPN connetivity. 90 users total. DC in each site. Each office has a directory which will be ...
Read more

Failover in Site Recovery | Microsoft Azure

Failover in Site Recovery. ... Fail over to a secondary VMM site—create a ... Follow these procedures to fail back to the original primary ...
Read more

What is failover? - Definition from - SearchStorage

Failover is a backup operational ... paths fail. The capacity for automatic failover means that ... Data protection has evolved at such a rapid pace over ...
Read more