Cassandra - Beyond Read-Modify-Write by Al Tobey

80 %
20 %
Information about Cassandra - Beyond Read-Modify-Write by Al Tobey
Technology

Published on February 13, 2014

Author: planetcassandra

Source: slideshare.net

Description

YouTube Video: TBA
As we move into the world of Big Data and the Internet of Things, the systems architectures and data models we've relied on for decades are becoming a hindrance. At the core of the problem is the read-modify-write cycle. In this session, Al will talk about how to build systems that don't rely on RMW, with a focus on Cassandra. Finally, for those times when RMW is unavoidable, he will cover how and when to use Cassandra's lightweight transactions and collections.

Beyond Read-Modify-Write @AlTobey Open Source Mechanic | Datastax Apache Cassandra のオープンソースエバンジェリスト ©2014 DataStax Obsessed with infrastructure my whole life. See distributed systems everywhere. !1

The Problem Skype video call with grandma at 3 hrs. Using Netflix solo at 18 months. Plays many games with online metrics at 4 years old.

The Problem ! Users expect their infrastructure to Just Work. Explain to my 4 year old why he can’t watch cartoons on Netflix.

The Problem Lag kills.

The Problem Even when everything is working, or even especially then, usage can explode, often unexpectedly.

Evolution 3-tier + read scaled DB + cache Classic 3-tier A high-level overview of internet architectures. Client/Server

Client-server Client - server database architecture. Obsolete. The original “funnel-shaped” architecture

3-tier Client - client - server database architecture. Still suitable for small applications. e.g. LAMP, RoR

3-tier master/slave slave Client - client - server database architecture. Still suitable for small applications. master slave

3-tier + caching cache slave more complex cache coherency is a hard problem cascading failures are common Next: Out: the funnel. In: The ring. master slave

Webscale outer ring: clients (cell phones, etc.) middle ring: application servers inside ring: Cassandra servers ! Serving millions of clients with mere hundreds or thousands of nodes requires a different approach to applications!

When it Rains scale out … but at a cost

Beyond Read-Modify-Write •Practical Safety •Eventual Consistency •Overwrites •Key / Value •Journal / Logging / Time-series •Content-addressable-storage •Cassandra Collection Types •Cassandra Lightweight Transactions

Theory & Practice In theory there is no difference between theory and practice. In practice there is. ! -Yogi Berra I could talk about CAP theorem, but there’s plenty of that going around. And then there’s this quote.

Safety Fun happens when you turn the safety off. ! But you can’t sell it.

Safety But you don’t have to throw it out entirely. - roll cages - kill switches - rev limiters - protective clothing

Safety max speed is 320 km/h (200 mph) Tested to 443 km/h (275 mph) & 581 km/h (361 mph) (world record) Safety is ultimately a property of the system. But it can be expensive. - lots of maintenance - inspections - reputation

Read-Modify-Write UPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24   WHERE  EmployeeID=1337; EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********3 Promoted****null This might be what it looks like from SQL / CQL, but … ! EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********4 Promoted****2014501524

Read-Modify-Write UPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24   WHERE  EmployeeID=1337; EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********4 Promoted****2014501524 EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********3 Promoted****null RDBMS TNSTAAFL 無償の昼食なんてものはありません TNSTAAFL … If you’re lucky, the cell is in cache. Otherwise, it’s a disk access to read, another to write.

Eventual Consistency UPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24   WHERE  EmployeeID=1337; EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********3 Promoted****null EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********4 Promoted****2014501524 Explain distributed RMW More complicated. Will talk about how it’s abstracted in CQL later. Coordinator

Eventual Consistency UPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24   WHERE  EmployeeID=1337; EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********3 Promoted****null EmployeeID**1337 Name********アルトビー StartDate***2013510501 Rank********4 Promoted****2014501524 Coordinator read write Memory replication on write, depending on RF, usually RF=3. Reads AND writes remain available through partitions. Hinted handoff.

Overwriting CREATE TABLE host_lookup ( name varchar, id uuid, PRIMARY KEY(name) ); ! INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); ! INSERT INTO host_uuid (name,id) VALUES (“tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); ! INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); ! SELECT id FROM host_lookup WHERE name=“tobert.org”; Beware of expensive compaction Best for: small indexes, lookup tables Compaction handles RMW at storage level in the background. Under heavy writes, clock synchronization is very important to avoid timestamp collisions. In practice, this isn’t a problem very often and even when it goes wrong, not much harm done.

Key/Value CREATE TABLE keyval ( key VARCHAR, value blob, PRIMARY KEY(key) ); ! INSERT INTO keyval (key,value) VALUES (?, ?); ! SELECT value FROM keyval WHERE key=?; e.g. memcached Don’t do this. But it works when you really need it.

Journaling / Logging / Time-series CREATE TABLE tsdb ( time_bucket timestamp, time timestamp, value blob, PRIMARY KEY(time_bucket, time) ); ! INSERT INTO tsdb (time_bucket, time, value) VALUES ( “2014-10-24”, -- 1-day bucket (UTC) “2014-10-24T12:12:12Z”, -- ALWAYS USE UTC ‘{“foo”: “bar”}’ ); Oversimplified, use normalization over blobs whenever possible. ALWAYS USE UTC :)

Journaling / Logging / Time-series 2014(01(24 2014(01(24T12:12:12Z 2014(01(24T21:21:21Z {“key”:" value”} {“key”:"“value”} 2014(01(25 2014(01(25T13:13:13Z {“key”:"“value”} {"“2014(01(24”"=>"{ """"“2014(01(24T12:12:12Z”"=>"{ """"""""‘{“foo”:"“bar”}’ """"} } Oversimplified, use normalization over blobs whenever possible. ALWAYS USE UTC :)

Content Addressable Storage CREATE TABLE objects ( cid varchar, content blob, PRIMARY KEY(cid) ); ! INSERT INTO objects (cid,content) VALUES (?, ?); ! SELECT content FROM objects WHERE cid=?; The address of the data can be created by using the data itself. e.g. SHA1 (160 bits), MD5, Whirpool, etc.

Content Addressable Storage require  'cql'   require  ‘digest/sha1'   ! dbh  =  Cql::Client.connect(hosts:  ['127.0.0.1'])   dbh.use('cas')   ! data  =  {  :timestamp  =>  1390436043,  :value  =>  1234  }   ! cid  =  Digest::SHA1.new.digest(data.to_s).unpack(‘H*’)   ! sth  =  dbh.prepare(     'SELECT  content  FROM  objects  WHERE  cid=?')   ! sth.execute(root_cid).first[‘content’] Oversimplified! e.g. data.to_s is a BAD idea ALWAYS USE UTC :)

In Practice • In practice, RMW is sometimes unavoidable • Recent versions of Cassandra support RMW • Use them only when necessary • Or when performance hit is mitigated elsewhere or irrelevant

Cassandra Collections CREATE TABLE posts ( id uuid, body varchar, created timestamp, authors set<varchar>, tags set<varchar>, PRIMARY KEY(id) ); ! INSERT INTO posts (id,body,created,authors,tags) VALUES ( ea4aba7d-9344-4d08-8ca5-873aa1214068, ‘アルトビーの犬はばかね’, ‘now', [‘アルトビー’, ’ィオートビー’], [‘dog’, ‘silly’, ’犬’, ‘ばか’] ); quick story about 犬ばかね sets & maps are CRDTs, safe to modify

Cassandra Collections CREATE TABLE metrics ( bucket timestamp, time timestamp, value blob, labels map<varchar,varchar>, PRIMARY KEY(bucket) ); sets & maps are CRDTs, safe to modify

Lightweight Transactions • Cassandra 2.0 and on support LWT based on PAXOS • PAXOS is a distributed consensus protocol • Given a constraint, Cassandra ensures correct ordering

Lightweight Transactions UPDATE  users          SET  username=‘tobert’    WHERE  id=68021e8a-­‐9eb0-­‐436c-­‐8cdd-­‐aac629788383          IF  username=‘renice’;   ! INSERT  INTO  users  (id,  username)   VALUES  (68021e8a-­‐9eb0-­‐436c-­‐8cdd-­‐aac629788383,  ‘renice’)   IF  NOT  EXISTS;   ! ! Client error on conflict.

Conclusion • Businesses are scaling further and faster than ever • Assume you have to provide utility-grade service • Data models and application architectures need to change to keep up • Avoiding Read/Modify/Write makes high-performance easier • Cassandra provides tools for safe RMW when you need it ! • Questions?

Add a comment

Related presentations

Related pages

Webinar: Cassandra - Beyond Read-Modify-Write | DataStax

Title: Cassandra - Beyond Read-Modify-Write; Date: February 13th, 2014; ... Speakers: Al Tobey, Open Source Mechanic at DataStax Al is a father ...
Read more

Strategies for Designing Scalable Architectures: Avoid RMW

“Beyond Read-Modify-Write” was presented by Al Tobey, Open Source Mechanic at DataStax, at Cassandra Day Silicon Valley 2014 and as part of Hakka Labs ...
Read more

Beyond Read-Modify-Write - YouTube

Al Tobey (Open Source ... argues that avoiding read/modify/write makes it easier to scale high ... This talk was given at Cassandra Day Silicon ...
Read more

Beyond Read-Modify-Write - Planet Cassandra

As we move into the world of Big Data and the Internet of Things, the systems architectures and data models we’ve relied on for decades are becoming a ...
Read more

C* Summit EU 2013: Stump the Experts with Al Tobey ...

... Open Source Mechanic at DataStax It's time to play "Stump the Experts", with Al Tobey ... Cassandra questions to this ... Beyond Read-Modify ...
Read more

تخته سفید | Beyond Read-Modify-Write

توضیحات: Al Tobey (Open Source Mechanic, DataStax) argues that avoiding read/modify/write makes it easier to scale high-performance storage.
Read more