CQL3 and Data Modeling 101 with Apache Cassandra

50 %
50 %
Information about CQL3 and Data Modeling 101 with Apache Cassandra
Technology

Published on February 28, 2014

Author: chrismceniry

Source: slideshare.net

Description

Introduction to CQL3 and Cassandra Data Modeling.

Presented at SD Cassandra Meetup 2014.02.27

CQL3 & Data Modeling 101 with Apache Cassandra San Diego Cassandra Meetup Feb 27, 2014

Not how you’ve done it before…

In the beginning… There was the Row, and the Column And the Row was fast to find and scale, And the Column was fast to order.

Cassandra Properties

C* • Column Oriented • Log Structured • Distributed Database

Column Oriented • Columns actually hold the data • Key/Value pair • Name can be used to store meaning as well

Distributed Database • Rows are used to distribute • C* pulls the entire row into memory • Can pull out individual parts or write to individual parts, but it’s still considered together

Log Structured Updates • Commitlog and sstables are log structured • Oriented around appending (streaming at a know location) • ==> Writes quickly • And you want to avoid rewrites

Random Reads • Data is scattered around the store (have to get location and random read to look it up) • Some indexing, and hopefully it’s in the vfs page cache, but still. • ==> Reads “slower”

General Rules of Thumb • De-normalize Everything • Duplicate your data • Organize it for reading

Containers

Keyspace • For modeling, not much • All data lives inside of Keyspaces

Columnfamily • aka Table • Grouping of similar data • Has unique key/row space • Where some structure is applied

Row • Unique inside of a column family • Key/Value where the Value is all of the columns in the row • Can handle some additional meaning to the row name • Typically “bucketing”

Column • Holds data values • Key/Value • Can have meaning in the column name as well

Thrift Interface • Operates on the raw rows and columns • Many different language drivers • Can use cassandra-cli to do this on the command line

Data Patterns

Users CF mac@mac .com NAME TWITTER TAGS mac @macmceniry admin,super,cool jsmith@mac .com NAME ICQ Employer Hobby John 89403270 Smithco Miniature Horses bb@example .com NAME IRC Bobtholomew bb@DAL NAME TWITTER Food TAGS Elizabeth @liz Cheesecake admin NAME IRC Steven steve@DARK liz@example .com steve@my .net

Lookup by chat handle ICQ CF IRC CF EMAIL 89403270 EMAIL bb@DAL bb@example .com jsmith@mac.com EMAIL steve@ DARK steve@my .net

Lookup by chat handle ICQ CF EMAIL IRC CF Name 89403270 EMAIL Name bb@example .com Bobtholomew EMAIL Name steve@my .net Steve bb@DAL jsmith@mac.com John steve@ DARK

HandleCF EMAIL NAME jsmith@mac.com John EMAIL NAME bb@example.com Bobtholomew EMAIL NAME steve@my.net Steven 89403270 bb@DAL steve@DARK

HandleCF TYPE EMAIL NAME ICQ jsmith@mac.com John TYPE EMAIL NAME IRC bb@example.com Bobtholomew TYPE EMAIL NAME IRC steve@my.net Steven 89403270 bb@DAL steve@DARK

How do I create these? [default@userdb] create column family usersCF; 5ecec19a-3a43-3490-8c9a-3eb2901e2e97 Waiting for schema agreement... ... schemas agree across the cluster [default@userdb] create column family handleCF; df82135c-eb1f-3abf-b9df-02c605d571d5 Waiting for schema agreement... ... schemas agree across the cluster

How do I insert data? [default@userdb] set handleCF[utf8(‘bb@DAL’)] … [utf8(‘NAME’)] = utf8('Bobtholomew'); Value inserted. Elapsed time: 22 msec(s). [default@userdb] set handleCF[utf8(‘bb@DAL')] … [utf8(‘EMAIL’)] = utf8(‘bb@example.com’); Value inserted. Elapsed time: 3.43 msec(s).

Users CF - TAGS mac@mac .com NAME TWITTER TAGS mac @macmceniry admin,super,cool jsmith@mac .com NAME ICQ Employer Hobby John 89403270 Smithco Miniature Horses bb@example .com NAME IRC Bobtholomew bb@DAL NAME TWITTER Food TAGS Elizabeth @liz Cheesecake admin NAME IRC Steven steve@DARK liz@example .com steve@my .net

Users CF - TAGS mac@mac .com jsmith@mac .com bb@example .com liz@example .com steve@my .net NAME TWITTER TAGS:admin TAGS:cool mac @macmceniry NAME ICQ Employer Hobby John 89403270 Smithco Miniature Horses NAME IRC Bobtholomew bb@DAL NAME TWITTER Food TAGS:admin Elizabeth @liz Cheesecake NAME IRC Steven steve@DARK TAGS:super

“Types” [default@userdb] set handleCF[utf8(‘bb@DAL’)] ! • Key Validator • (Column) Comparator • (Column Value) Default Validator, Metadata • BytesType, AsciiType, UTF8Type, IntegerType, Int32Type, LongType, UUIDType, TimeUUIDType, DateType, BooleanType, FloatType, DoubleType, DecimalType, CounterColumnType (, CompositeTypes)

What’s in a name? • Can use row names and column names to add meaning • Row name meaning creates a new distribution bin • Column name meaning can create a data hierarchy • No real change to the column family creation in the thrift interface (well, types depending on what you’re doing)

EventCF mac@mac .com: 20140203 08:10:15 08:15:15 09:10:15 join update logout mac@mac .com: 20140204 08:11:23 08:14:57 18:45:12 18:50:52 19:01:29 login logout login logout logout mac@mac .com: 20140205 09:23:23 09:57:44 login logout liz@example .com: 20140203 11:22:33 11:44:55 22:10:05 22:52:02 login logout login logout liz@example .com: 20140205 08:11:23 08:14:57 login logout

That was then, this is…

Now • Same underlying structure none of that has changed • • • Rows - reference quickly use for searching Columns - scan quickly user for ordering But now have usage patterns • Some have been codified into CQL

CQL • Thrift alternative • Simpler API • Hides the structure of the internal storage • 3 generations • Only looking at CQL3 here • cqlsh [-3]

How does handle look here? cqlsh:userdb> CREATE TABLE handles ( … handlename VARCHAR, … email VARCHAR, … name VARCHAR, … PRIMARY KEY (handlename) … ); cqlsh:userdb> INSERT INTO handles … (handlename, email, name) VALUES … (‘bb@DAL’, ‘bb@example.com’, ‘Bobtholomew’);

handles Table email name bb@example.com Bobtholomew bb@DAL

How does handle look here? cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | bb@example.com | Bobtholomew cqlsh:userdb> SELECT * FROM handles WHERE … handlename = ‘bb@DAL’; handlename | email | name ————————————|————————————————|————————————— bb@DAL | bb@example.com | Bobtholomew

How do I change it? cqlsh:userdb> UPDATE handles SET email=‘none’ … WHERE handlename = ‘bb@DAL’; cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | none | Bobtholomew

upsert • Update instead of Insert • • Does the same thing (as long as it’s not a key) Insert instead of Update • Overwrites data if it’s already there

What about our event buckets from earlier? • Can do the same thing • Creating a composite key • • USERNAME:DATE Creating a composite column • hh:mm:ss

EventCF mac@mac .com: 20140203 08:10:15 08:15:15 09:10:15 join update logout mac@mac .com: 20140204 08:11:23 08:14:57 18:45:12 18:50:52 19:01:29 login logout login logout logout mac@mac .com: 20140205 09:23:23 09:57:44 login logout liz@example .com: 20140203 11:22:33 11:44:55 22:10:05 22:52:02 login logout login logout liz@example .com: 20140205 08:11:23 08:14:57 login logout

cqlsh:userdb> CREATE TABLE events ( … username VARCHAR, … d VARCHAR, … hr INT, … min INT, … sec INT, … event VARCHAR, … PRIMARY KEY ( (username,d), hr, min, sec ) );

… PRIMARY KEY ( (username,d), hr, min, sec ) ); ROW NAME (C* 1.2) COLUMN NAME

Tags • CQL has collections • map, list, set • Collections are build similar to small/special composite columns • Can add to our existing handle table

cqlsh:userdb> ALTER TABLE handles ADD tags SET; cqlsh:userdb> UPDATE TABLE handles … SET tags = (‘admin’, ‘foo’); email name bb@example.com Bobtholomew bb@DAL tags:admin tags:foo

Design the data model so that it’s idempotent (eBay) • Counter versus Collection (what question is being asked?) Count 100 Count 200 Count 300 Likes A Likes B Likes C Likes A Likes B Likes C +user11 1393287359 +user12 1393287359 +user11 1393287359 +user12 1393280912 -user11 1393281942 +user13 1393212345 1393287100 +user12 1393287100 1393287100 +user13 1393287100 1393287100 +user14 1393287100

Go Forth and Model! Thank You! PS… Sony Network is hiring!

Add a comment

Related presentations

Related pages

CQL3 - Data modeling - m7a.me - m7a.me

CQL3 data modeling slide from Cassandra Conference Japn 2014. ... Accessing Data Stored in Apache Cassandra. ... Start using CQL3, NOW.
Read more

CQL3 Data Modeling - YouTube

CQL3 Data Modeling ... Getting Started with Apache Cassandra with ... Eric Evans "Cassandra By Example: Data modeling with CQL3 ...
Read more

#bbuzz: Eric Evans "Cassandra By Example: Data modeling ...

Eric Evans http://berlinbuzzwords.de/sessions/cassandra-example-data-modeling-cql3 Apache Cassandra is an increasingly popular choice of data store.
Read more

Cassandra Data Modeling - Home - Springer

Cassandra Data Modeling. ... and placement_strategy='org.apache.cassandra.locator.SimpleStrategy'; ... Cassandra CQL3 is the driving factor at present.
Read more

The Apache Cassandra Project

Welcome to Apache Cassandra ™ The Apache Cassandra database is the right choice when you need scalability and high availability without compromising ...
Read more

DataModel - Cassandra Wiki - Apache Software Foundation

... Data Modeling with Apache Cassandra. ... Apache Cassandra 2.0: Data Model on Fire: Video, Slides; Real Data Models of Silicon Valley ...
Read more

Data Modeling | DataStax

Data Modeling. DataStax Enterprise, powered by Apache Cassandra, lets people solve challenges, avoid risks, and transform how they interact with their ...
Read more

DS220: Data Modeling | DataStax Academy: Free Cassandra ...

DS220: Data Modeling. ... DS220 teaches you data modeling techniques essential to a successful Apache Cassandra™ and DataStax Enterprise deployment.
Read more