Spil Games @ FOSDEM: Galera Replicator IRL

50 %
50 %
Information about Spil Games @ FOSDEM: Galera Replicator IRL
Technology

Published on January 30, 2014

Author: spil-engineering

Source: slideshare.net

Description

Spil Games @ FOSDEM: Galera Replicator IRL

Galera Replicator IRL Art van Scheppingen Head of Database Engineering

Overview 1. 2. 3. 4. 5. 6. Who are we? What is Galera? What is Spil Games using Galera for? What have we learned? Future technologies Conclusion 2

Who are we? Who is Spil Games?

Facts • • • • • Company founded in 2001 350+ employees world wide 180M+ unique visitors per month Over 60M registered users 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile games • 35+ MySQL clusters • 60k queries per second (3.5 billion qpd) 4

Geographic Reach 180 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012 5

What is Galera? How to get Highly Available and beyond

What is Galera? 1. Replication plugin for MySQL by Codership • Synchronous (parallel) replication • Supports InnoDB • MyISAM “works” • Committing transactions actually replicates data 1. Allows clustering of nodes • Minimum of 3 nodes for HA • Galera Arbitrator allows 2 nodes • One node elected as Primary Component

How does Galera work? Server-1 Server-1 Server-2 Server-2 Server-n Server-n Connect/read/write to any node MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication

Galera replication Server-n Server-n Client receives OK MySQL MySQL commit MySQL MySQL Galera replication Galera replication MySQL MySQL Transaction applied to slaves

High Availability (1) Server-1 Server-1 MySQL MySQL Server-2 Server-2 MySQL MySQL Galera Galera Server-n Server-n MySQL MySQL

High Availability (2) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer / Query router MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL

High Availability (3) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer Load balancer Load balancer MySQL MySQL MySQL MySQL MySQL MySQL Galera Galera

High Availability (4) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 read,3307 write) read+write MySQL MySQL read only MySQL MySQL Galera Galera read only MySQL MySQL

Node joining SST (State Snapshot Transfer) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer SST New New MySQL MySQL node node Requests to join cluster read/write to two nodes Cluster drains node MySQL MySQL MySQL MySQL Galera Galera Galera Galera MySQL MySQL Synchronous replication

Node joining IST (Incremental State Transfer) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer IST Existing Existing MySQL MySQL node node Requests to join cluster MySQL MySQL MySQL MySQL Galera Galera Galera Galera MySQL MySQL Synchronous replication

Galera replication over WAN Server-n Server-n Commit is delayed by RTT DC1 DC2 MySQL MySQL MySQL MySQL Galera replication Galera replication

WAN replication Galera 2.x DC1 DC2 Node 11 Node Node 22 Node Node 44 Node Node 33 Node Node 55 Node Node 66 Node

WAN replication Galera 3.x DC1 DC2 Node 11 Node Node 22 Node Node 44 Node Node 33 Node Node 55 Node Node 66 Node

What are we using Galera for? Synchronous replication for the masses

Our systems 1. Legacy services databases • MySQL Master-Master 1. SSP (Spil Storage Platform) • MySQL Master-Master (to be phased out) • Galera 1. ROAR (Read Often, Alter Rarely) • Galera

Master-Master setup used at Spil Games Server-1 Server-1 Server-2 Server-2 read+write read only MySQL MySQL active active master master db-something (192.168.1.1) db-something-r1 (192.168.1.2) Server-n Server-n MySQL MySQL inactive inactive master master db-something-r2 (192.168.1.3) Asynchronous replication MMM MMM

Master-Master setup used at Spil Games MMM MMM read+write db-something (192.168.1.1) db-something-r1 (192.168.1.2) db-something-r3 (192.168.1.4) MySQL MySQL active active master master MySQL MySQL slave slave read only MySQL MySQL inactive inactive master master read only MySQL MySQL slave slave db-something-r2 (192.168.1.3) db-something-r4 (192.168.1.5)

Migrating legacy dbs to Galera (lab) legacy1 legacy1 inactive inactive master master MySQL MySQL Clone database (innobackupex) legacy2 legacy2 inactive inactive master master legacy3 legacy3 inactive inactive master master Feed database dump (mysqldump) Start slaving MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL

Scaling Galera (1) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 write+read) MySQL MySQL MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Galera Galera MySQL MySQL

Scaling Galera (2) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 write, 3307 read) read only read only asynchronous replication MySQL MySQL asynchronous replication MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL MySQL MySQL

Why consolidate legacy systems? 1. Around 20 legacy database clusters • 50 servers in total 1. Maintenance • Master-Master requires a lot of (manual) maintenance 1. Replacement is needed • 35 of them will be older than 3 years in 2014 1. Current state: tested in lab

SSP (Spil Storage Platform) • Storage API between application and databases • All data is sharded • User SSP SSP • Function • Location • Every cluster (two masters) will contain two shards Shard 11 Shard 22 Shard Shard • Data written interleaved • HA for both shards • Both masters active and “warmed up” 27

SSP Master-Master setup Server-1 Server-1 Server-2 Server-2 read+write read+write MySQL MySQL active active master master db-ssp001 (192.168.2.1) Server-n Server-n MySQL MySQL active active master master db-ssp002 (192.168.2.2) Asynchronous replication MMM MMM

SSP Master-Master setup Server-1 Server-1 Server-2 Server-2 Server-n Server-n read+write MySQL MySQL broken broken master master MySQL MySQL active active master master db-ssp002 (192.168.2.2) db-ssp001 (192.168.2.1) MMM MMM

SSP Galera setup Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer read/write to any node MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication

Current state of the SSP 1. Total of 4 old style SSP shard nodes (2 clusters) 2. Total of 6 Galera SSP shard nodes (2 clusters) 3. Add Galera nodes/clusters when necessary

What have we learned so far? Pitfalls, hurdles, etc

Creating backups 1. Two ways to make backups: • Issue SST • Either mysqldump or Innobackupex • Regular Innobackupex • --galera_info • set global wsrep_desync=on to remove node

Backup SST Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer SST Backup Backup receiver done receiver done read/write to two nodes Cluster drains node MySQL MySQL Request SST MySQL MySQL Galera Galera MySQL MySQL Synchronous replication

Backup Innobackupex Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer read/write to two nodes wsrep_desync=ON wsrep_desync=OFF Stream backup BackupPC BackupPC MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication

Restoring backups 1. Restored backup can be used to prevent SST of new joiners 2. Automated backup verification • Restores (randomly) chosen backup • Installs necessary MySQL version (5.1/5.5) • Perform basic checks • Enable replication • Will not work fully as it needs a working cluster to join

Monitoring 1. Cluster • Nodes in the cluster • Warning at 2, critical at 1 • Availability of the address 1. Load balancer • Node checks 1. Performance monitoring • Adding metrics to mysql_statsd is easy • wsrep_flow_control

Flow control 1. Usage of replication threads • Scale from 0.0 to 1.0 1. Recommended to stay below 0.1 (10% blocked) 2. Adding more nodes will not solve your problem 3. Increase replication threads • Recommended 2*CPU cores • What if 64 is not enough? • How do you close flood gates?

Other things we bumped into… 1. MySQL version updates • Update one by one • PXC SST changes 1. Availability after restart • Joins cluster after IST/SST • LRU still loading 1. In descriptive errors during SST • Local user authentication (after starting mysqld with sudo!) 1. Schema changes

Future for Galera at Spil Games What will we do in the near future?

Openstack 1. Offer DAAS to our (internal) customers 2. Spawning (automated) database nodes and clusters when necessary 3. Mix and match Galera and regular MySQL replication

WAN Replication 1. No immediate use case (yet) • No need for WAN in sharded environment • Game catalogue might need it in the future 1. Wait for Galera 3.0 • Datacenter awareness

MaxScale 1. Beta testing MaxScale for SkySQL • Works flawless in the lab (so far) • Not yet tested with mixed Galera/MySQL replication 1. MaxScale itself is not HA (yet) • Keepalived?

Conclusion What is our verdict?

Conclusion(s) 1. 2. 3. 4. 5. 6. Galera definitely live up to expectations Decreased cluster wide performance Increased replication performance High investment in time for initial setup/tools Maintenance is easier Well worth the investment for us

Thank you! • Presentation can be found at: http://spil.com/fosdem2014 • Mysql_statsd can be found at: http://spil.com/mysqlstatsd http://github.com/spilgames/mysql-statsd • If you wish to contact me: Email: art@spilgames.com Twitter: @banpei • Engineering @ Spil Games Blog: http://engineering.spilgames.com Twitter: @spilengineering 46

Photo sources Our current HA environment: http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/ What we have learned so far: http://renaissanceronin.wordpress.com/2009/10/05/playing-with-plasma-cutters/ Near future: http://www.example-infographics.com/envisioning-the-near-future-of-technology/ Conclusion: http://www.flickr.com/photos/louisephotography/5796499806/in/photostream/ 47

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

FOSDEM 2014 - Galera Cluster IRL

Galera Cluster IRL Migrate an ... entire MySQL environment running on Galera Cluster. At Spil Games we have a large variety of MySQL ... FOSDEM. Home; News;
Read more

fosdem Archives - Spil Games Engineering

After my presentation at FOSDEM I got a few questions regarding our Galera implementation and why we did things the way we want. First of all, the slides:
Read more

Spil Games Engineering

At Spil Games we organise a monthly hackday ... Spil Games @ FOSDEM: Galera Replicator IRL from spil-engineering. Second of all, the questions I got: Q: ...
Read more

MySQL Archives - Spil Games Engineering

Spil Games Engineering We build the Spil ... After my presentation at FOSDEM I got a few questions ... Spil Games @ FOSDEM: Galera Replicator IRL from ...
Read more

Irl | LinkedIn

View 12675 Irl posts ... an amazing event that combines the best of video games, technology... ever. IRL featured an ... Spil Games @ FOSDEM: Galera ...
Read more

Spil | LinkedIn

Online Marketeer at Spil Games, Product Manager at Devani | Creative, ... Spil Games @ FOSDEM: Galera Replicator IRL. 63,769 Views. spil-engineering.
Read more

Planet MySQL

Spil Games hackday At Spil Games we organise a monthly hackday ... This is a complete merge of MariaDB 10.0.12 and Galera Cluster ...
Read more

archive.fosdem.org

Code, Culture and Community Keynotes keynote

Most open source projects are rightly proud of their communities, long histories (both measured in time and ...
Read more