Linux HA anno 2014

25 %
75 %
Information about Linux HA anno 2014
Technology

Published on April 5, 2014

Author: roidelapluie

Source: slideshare.net

Description

Conference given at LOADays.org 2014

Linux HA anno 2014 Julien PivottoJulien Pivotto LOADays, AntwerpLOADays, Antwerp April 4th, 2014April 4th, 2014

whoamiwhoami • sysadmin @ inuitssysadmin @ inuits • open-source defender for 7+ yearsopen-source defender for 7+ years • devops believerdevops believer • @roidelapluie on twitter/github@roidelapluie on twitter/github Julien Pivotto Linux HA

IntroductionIntroduction Julien Pivotto Linux HA

What is HAWhat is HA • High AvailabilityHigh Availability • One service fail ⇒ another takes over its jobOne service fail ⇒ another takes over its job • Transparent for the end-userTransparent for the end-user Julien Pivotto Linux HA

Where HA will NOT helpWhere HA will NOT help • It is not about scalabilityIt is not about scalability • It will not fix your applicationIt will not fix your application • It will make your application stableIt will make your application stable • It is not a one-size-fits-all solutionIt is not a one-size-fits-all solution • It is not about performancesIt is not about performances • It is not backupIt is not backup Julien Pivotto Linux HA

Why caring about HA?Why caring about HA? • Service goes down at 5pm on Friday?Service goes down at 5pm on Friday? • Downtime makes users unhappyDowntime makes users unhappy • Downtime costs moneyDowntime costs money Julien Pivotto Linux HA

What will not workWhat will not work • Virtualization will not make your app HAVirtualization will not make your app HA • VM mirroring is not HAVM mirroring is not HA • Live migrations are not HALive migrations are not HA • Containers are not HAContainers are not HA • Cloud lolCloud lol Julien Pivotto Linux HA

HA is about servicesHA is about services Julien Pivotto Linux HA

Start on a good basisStart on a good basis • AutomationAutomation • MonitoringMonitoring • CI / CDCI / CD • TestingTesting • . . . Then, start working on HA. . . Then, start working on HA Julien Pivotto Linux HA

Eliminate the SPOFEliminate the SPOF • Single Point of FailuresSingle Point of Failures • Hardware failsHardware fails • Disks always failDisks always fail • etc. . .etc. . . • Replicate. . .Replicate. . . Julien Pivotto Linux HA

Split-BrainSplit-Brain • Nodes can’t talk to each otherNodes can’t talk to each other • They think they are aloneThey think they are alone • They take decision and leadershipThey take decision and leadership • Data inconsistencyData inconsistency Julien Pivotto Linux HA

FencingFencing • Shoot the other node in the headShoot the other node in the head • Be sure a node is deadBe sure a node is dead • Preserve integrity of the dataPreserve integrity of the data • Combine with quorumsCombine with quorums Julien Pivotto Linux HA

MonitoringMonitoring • Monitoring if PID is running is uselessMonitoring if PID is running is useless • Result-based monitoringResult-based monitoring • Extract data out of itExtract data out of it • E.g try to insert in DBE.g try to insert in DB Julien Pivotto Linux HA

Cluster?Cluster? • Active/active: everything is activeActive/active: everything is active • Active/passive: nodes in standbyActive/passive: nodes in standby • N+1: One node waiting in standbyN+1: One node waiting in standby • N+M: Nodes waiting in standbyN+M: Nodes waiting in standby • Can mix them etc. . .Can mix them etc. . . Julien Pivotto Linux HA

http://clusterlabs.org/wiki (GFDL 1.2 licence)

http://clusterlabs.org/wiki (GFDL 1.2 licence)

http://clusterlabs.org/wiki (GFDL 1.2 licence)

http://clusterlabs.org/wiki (GFDL 1.2 licence)

KISS KISS KISSKISS KISS KISS Julien Pivotto Linux HA

Fix you appFix you app Julien Pivotto Linux HA

The StateThe State • Stateless applicationStateless application • Everything in DBEverything in DB • Avoid temp filesAvoid temp files • Disaster recoveryDisaster recovery Julien Pivotto Linux HA

The right toolsThe right tools • Make relevant choices for you appMake relevant choices for you app • Look for HA in databasesLook for HA in databases • Look for HA in queuing systemsLook for HA in queuing systems • Look for HA in filesystems?Look for HA in filesystems? • Master/Master vs Master/slaveMaster/Master vs Master/slave Julien Pivotto Linux HA

The configurationThe configuration • Same config everywhereSame config everywhere • Use puppet, chef, . . .Use puppet, chef, . . . • Config in one placeConfig in one place • KISSKISS Julien Pivotto Linux HA

PacemakerPacemaker Julien Pivotto Linux HA

PacemakerPacemaker • It is the brainIt is the brain • Decides what to do, whenDecides what to do, when • Gets information from ressourcesGets information from ressources • Depends on messaging and cluster managerDepends on messaging and cluster manager • Does not require shared storageDoes not require shared storage Julien Pivotto Linux HA

DecisionsDecisions • A node fails, now whatA node fails, now what • A service fails, now whatA service fails, now what • Restart? Move?Restart? Move? • Needs to be quick and without interventionNeeds to be quick and without intervention • Scores, policiesScores, policies Julien Pivotto Linux HA

CIBCIB • Cluster Information BaseCluster Information Base • XML shared accross the clusterXML shared accross the cluster • Updated using "pcs"Updated using "pcs" • Contains knowledge about the clusterContains knowledge about the cluster Julien Pivotto Linux HA

PrimitivesPrimitives • Service, Ip address, mountpoint,. . .Service, Ip address, mountpoint,. . . • Base bricks of a clusterBase bricks of a cluster • Get a lot of parametersGet a lot of parameters primitive ClusterIP ocf:heartbeat:IPaddr2 params ip="192.168.122.101" cidr_netmask="32" op monitor interval="30s" Julien Pivotto Linux HA

Resource AgentResource Agent • ScriptScript • How to startHow to start • How to stopHow to stop • How to change state (promote, demote)How to change state (promote, demote) • How to monitor (real monitoring)How to monitor (real monitoring) • An init script but way beterAn init script but way beter Julien Pivotto Linux HA

ClonesClones • Same resource running on multiple hostsSame resource running on multiple hosts • Define minimum and maximum of running primitivesDefine minimum and maximum of running primitives • Possible to run multiple on the same nodePossible to run multiple on the same node clone WebIP ClusterIP meta globally-unique="true" clone-max="2" clone-node-max="2" Julien Pivotto Linux HA

Master Slave (ms)Master Slave (ms) • Set of primitives with roleSet of primitives with role • Masters and slaves (e.g mysql, ldap)Masters and slaves (e.g mysql, ldap) • Can promote slaves to masterCan promote slaves to master • Can demote masters to slaveCan demote masters to slave • Multiples slaves / mastersMultiples slaves / masters ms WebDataClone WebData meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" Julien Pivotto Linux HA

GroupGroup • Group of primitives of different kindGroup of primitives of different kind • Implies colocationImplies colocation • Starts in a fixed orderStarts in a fixed order • Stops in the opposite orderStops in the opposite order Julien Pivotto Linux HA

ColocationColocation • ConstraintConstraint • Must run on the same hostsMust run on the same hosts • Has a scoreHas a score • Order mattersOrder matters • e.g vip with servicee.g vip with service colocation website-with-ip inf: WebSite ClusterIPproperty Julien Pivotto Linux HA

LocationLocation • Set preferred locationSet preferred location • Has a scoreHas a score location prefer-apache-1 WebSite 50: apache-1 ms WebDataClone WebData meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" Julien Pivotto Linux HA

OrderOrder • What starts after whatWhat starts after what • Even across nodesEven across nodes • Has a scoreHas a score pcs order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start Julien Pivotto Linux HA

PropertiesProperties • expected-quorums-votesexpected-quorums-votes • stonith-enabledstonith-enabled property $id="cib-bootstrap-options" dc-version="1.1.5-bdde4d02445be1f3d72e6a203ba2f" cluster-infrastructure="openais" expected-quorum-votes="2" stonith-enabled="true" no-quorum-policy="ignore" Julien Pivotto Linux HA

MaintenanceMaintenance • Manually move resourcesManually move resources • Set a DO-NOT-MANAGE flagSet a DO-NOT-MANAGE flag • Do not forget to revertDo not forget to revert Julien Pivotto Linux HA

CMAN and CorosyncCMAN and Corosync Julien Pivotto Linux HA

CMANCMAN • Manages membership and quorumManages membership and quorum • Notifies pacemaker when something changesNotifies pacemaker when something changes • Starts and manages corosyncStarts and manages corosync • Needs a cluster.conf that contains all the nodesNeeds a cluster.conf that contains all the nodes • Managed via ccsManaged via ccs • Will propagate the changesWill propagate the changes Julien Pivotto Linux HA

CorosyncCorosync • Messaging layerMessaging layer • Controlled via CMANControlled via CMAN • Next version will take over CMANNext version will take over CMAN Julien Pivotto Linux HA

EnvironmentEnvironment Julien Pivotto Linux HA

DistributionsDistributions • Developed mainly by RedHat and SuSeDeveloped mainly by RedHat and SuSe • Used with Openstack tooUsed with Openstack too • Getting into a unique stackGetting into a unique stack • Available in * distrosAvailable in * distros Julien Pivotto Linux HA

crmsh vs pcscrmsh vs pcs • crmsh was more usedcrmsh was more used • Disappeared in CentOS 6.4Disappeared in CentOS 6.4 • Getting used to pcsGetting used to pcs • One goal: modify the CIBOne goal: modify the CIB • pcs is young/not widely usedpcs is young/not widely used Julien Pivotto Linux HA

A PCS primerA PCS primer Julien Pivotto Linux HA

Create a resourceCreate a resource pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=32 op monitor interval=30s Julien Pivotto Linux HA

Create constraintsCreate constraints pcs constraint colocation add WebFS WebDataClone INFINITY with-rsc-role=Master pcs constraint order promote WebDataClone then start WebFS Julien Pivotto Linux HA

Standby a hostStandby a host pcs cluster standby node1 Julien Pivotto Linux HA

Check the status of the clusterCheck the status of the cluster pcs status Last updated: Fri Sep 14 12:41:12 2012 Last change: Fri Sep 14 12:41:08 2012 via crm_attribute on pcmk-1 Stack: corosync Current DC: pcmk-1 (1) - partition with quorum Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0 2 Nodes configured, unknown expected votes 5 Resources configured. Node pcmk-1 (1): standby Online: [ pcmk-2 ] Full list of resources: ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2 WebSite (ocf::heartbeat:apache): Started pcmk-2 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Stopped: [ WebData:1 ] WebFS (ocf::heartbeat:Filesystem): Started pcmk-2 Julien Pivotto Linux HA

Percona Replication ManagerPercona Replication Manager Julien Pivotto Linux HA

Percona Replication ManagerPercona Replication Manager • MySQL replication with pacemakerMySQL replication with pacemaker • Complete documentationComplete documentation • Ressource agentsRessource agents • Supports multi-slave setupsSupports multi-slave setups • Good documentationGood documentation Julien Pivotto Linux HA

Mysql Ressource AgentMysql Ressource Agent • Keeps track of a score for each slaveKeeps track of a score for each slave • In case of failure, will switch to the "best scored"In case of failure, will switch to the "best scored" • Can be reused in your clusterCan be reused in your cluster • https://github.com/percona/percona-pacemaker-agentshttps://github.com/percona/percona-pacemaker-agents Julien Pivotto Linux HA

PuppetPuppet Julien Pivotto Linux HA

puppetlabs-corosyncpuppetlabs-corosync • "Reference" module"Reference" module • Creates resources, constraints. . .Creates resources, constraints. . . • crmsh providercrmsh provider • https://github.com/clusterlabs/puppetlabs-corosynchttps://github.com/clusterlabs/puppetlabs-corosync • https://github.com/roidelapluie/puppetlabs-corosync (pcs)https://github.com/roidelapluie/puppetlabs-corosync (pcs) Julien Pivotto Linux HA

Old puppet codeOld puppet code exec { ’loadcrmconfig’: refreshonly => true, command => ’/bin/sleep 5; /usr/sbin/crm configure load replace /etc/mediasalsa/cib.cfg >>/tmp/empty 2>&1’, subscribe => File[’/etc/project/cib.cfg’], } Julien Pivotto Linux HA

New puppet codeNew puppet code include cman include pacemaker cman_cluster { ’mycluster’: ensure => ’present’, logging => ’to_logfile="off"’, require => Class[’cman’], } cman_clusternode{ $clusternodes: ensure => ’present’, require => Cman_cluster[’mycluster’]; } Julien Pivotto Linux HA

New puppet codeNew puppet code class { ’cman::service’: ensure => ’running’, enable => true, require => Cman_clusternode[$clusternodes], } class { ’pacemaker::service’: ensure => ’running’, enable => true, require => Class[’cman::service’], } Julien Pivotto Linux HA

New puppet codeNew puppet code Class[’cman::service’, ’pacemaker::service’]-> Cs_property<||>-> Cs_primitive<||> cs_property { ’stonith-enabled’: ensure => present, value => ’false’, } cs_property { ’no-quorum-policy’: ensure => present, value => ’ignore’, } Julien Pivotto Linux HA

New puppet codeNew puppet code cs_primitive { "mysql-${name}": provider => ’pcs’, primitive_class => ’ocf’, primitive_type => ’mysql’, provided_by => ’inuits’, parameters => { ’test_user’ => $test_user, ’test_passwd’ => $test_passwd, ’test_table’ => $test_table, }, operations => { ’monitor’ => { ’interval’ => ’10s’, }, }, require => Cs_property[’stonith-enabled’], } Julien Pivotto Linux HA

ConclusionsConclusions Julien Pivotto Linux HA

Be cleverBe clever • KISSKISS • AutomateAutomate • MonitorMonitor • Be realisticBe realistic Julien Pivotto Linux HA

Do not promise the impossibleDo not promise the impossible • WONTFIX your appWONTFIX your app • Working together (devops)Working together (devops) • Not about scaleNot about scale • Not about stabilityNot about stability • Do not talk in ninesDo not talk in nines Julien Pivotto Linux HA

Linux HALinux HA • ReliableReliable • Pacemaker, Corosync, CMANPacemaker, Corosync, CMAN • Pcs, crmsh, ccsPcs, crmsh, ccs • A lot of readingA lot of reading • A lot of experience to buildA lot of experience to build Julien Pivotto Linux HA

RTFMRTFM • http://clusterlabs.orghttp://clusterlabs.org • Clusters From ScratchClusters From Scratch • Pacemaker explainedPacemaker explained • http://blog.clusterlabs.orghttp://blog.clusterlabs.org • Old http://linux-ha.orgOld http://linux-ha.org Julien Pivotto Linux HA

Thank youThank you Any question?Any question? Julien Pivotto Linux HA

ContactContact Julien PivottoJulien Pivotto julien@inuits.eujulien@inuits.eu @roidelapluie@roidelapluie INUITS bvbaINUITS bvba BelgiumBelgium +32 473 441 636+32 473 441 636 https://inuits.euhttps://inuits.eu Julien Pivotto Linux HA

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Linux HA, anno 2014 - Loadays

Linux HA, anno 2014 abstract. While High Availability tools are known and proven technologies, there are some moves make by the actors around it in the ...
Read more

Linux-HA

Root page for High-Availability Linux Project; goal: Provide a high-availability (clustering) solution for Linux which promotes reliability, availability ...
Read more

Marc's Public Blog - Linuxha - July 2014

Marc's Public Blog - Linux Home Automation. All ... TV and remotes with linux (single page) ... 2014/07/23: Making a super ...
Read more

[Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

Wed Apr 23 01:11:00 MDT 2014. Previous message: [Linux-HA] ... OK > >> > >> _____ > >> Linux-HA mailing list > >> Linux-HA at lists.linux-ha.org > ...
Read more

9 miglia Bra anno 2014 - YouTube

Partenza della gara podistica '9 miglia' del 2 marzo 2014, tenutasi a Bra (CN). Iscritti oltre 1200 atleti di tutta Italia
Read more

Linux Days 2014 - HA monitoring setup - Martin Čaj - YouTube

Linux Days 2014 - HA monitoring setup - Martin Čaj LinuxDays. Subscribe Subscribed Unsubscribe 390 390. ... Linux Days 2014 - O síťové ...
Read more

Red Hat Customer Portal

If you are a new customer, ... now available for Red Hat Enterprise Linux 5. ... (CVE-2014-1737, Important)
Read more

[Linux-HA] pgsql resource agent in status "Stopped" after ...

[Linux-HA] pgsql resource agent in status "Stopped" after crm resource cleanup Andrew Beekhof andrew at beekhof.net Sun Feb 23 19:11:36 MST 2014
Read more

havore Tirol anno 2014 Salzburg (Dienstag)

Dienstag, der 26.08.2014: So um 13 Uhr von Tegel nach Salzburg losgeflogen. Nahezu durchgängig Wolken über Deutschland und Turbolenzen beim Anflug.
Read more