Published on March 4, 2014
Cassandra How Stuff Works Sergey Enin (firstname.lastname@example.org)
AGENDA Agenda Introduction Architecture Partitioning & Replication Data management Data model 2
INTRODUCTION: SELECTED CASES Selected Cases Who use Cassandra? eBay has Cassandra supporting multiple applications (Social Signals, Hunch, and many time series use cases) with clusters spanning several data centers. Netflix is using Cassandra on AWS as a key infrastructure component of its globally distributed streaming product. Shazam uses Cassandra cluster to power their recommendations system. and many others… Check - http://www.datastax.com/cassandrausers 4
INTRODUCTION: MOST ADVANTAGES Most advantages Most advantages of Cassandra are: • Fast writes. • Tunable consistency. • Decentralization. • Integration with Hadoop. 5
ARCHITECTURE: FAST WRITES Fast writes Cassandra is very fast on writes, cause of use of Log-structured merge tree. Process of inserting new record into Cassandra 7
ARCHITECTURE: FAST WRITE How LSM-tree is done: Memtables and SSTables 2 1 3 1 Commit log – all data is written to the commit log for durability. 2 SSTables are immutable. A row is typically stored across multiple SSTable files. 3 Each SSTable has a bloom filter associated with it. The bloom filter is used to check if a requested row key exists in the SSTable before doing any disk seeks. 4 Deleted data is not immediately removed from disk. A deleted column can reappear. Tombstones. 8
ARCHITECTURE: NETWORK ARCHITECTURE Network architecture • All nodes – are peers (no master). • Client specify set of Cassandra nodes and get connected to first live node. • Nodes are using gossip protocol. 9
Partitioning & replication 10
PARTITIONING & REPLICATION: DATA PARTITIONING Data partitioning Partitioner – determines, where first replica would live in the ring. • RandomPartitioner – default strategy, provides ±same load of all nodes. • ByteOrderedPartitioner - orders rows lexically by key bytes, allows range scans, not recommended. 11
PARTITIONING & REPLICATION: REPLICATION Replication Replication = replication factor + replica placement strategy Replica placement strategy: SimpleStrategy: • default strategy; • not taking network topology into account; NetworkTopology Strategy: • preferred, when you have information about network map of your nodes; 12
Data management 13
DATA MANAGEMENT: DATA ACCESSING Data accessing READ + WRITES: • Tunable consistency. Consistency level specify how many nodes should answer for read/write request(but writes goes to all replicas). • Batches - sets a global consistency level and client-supplied timestamp for all columns written by the statements in the batch. 14
DATA MANAGEMENT: ACID ACID ACID • Atomicity – writes are atomic at row level. • Consistency – tunable consistency. • Isolation – writes are invisible until they are complete. • Durability – writes are durable. • Read-repair, anti-entropy node repair, hinted handoff. 15
Data model 16
DATA MODEL: CASSANDRA`S DATA MODEL Cassandra`s data model Relational databases – you design schema, based on entities and relationships. Cassandra – you design schema, based on what queries you would like to perform. 17
DATA MODEL: INDEXES Indexes An index is a data structure that allows for fast, efficient lookup of data matching a given condition. Primary key – the unique key used to identify each row in a table. Secondary indexes – refer to indexes on column values. 18
DATA MODEL: CQL3 CQL3 cqlsh> INSERT INTO users (user_name, password) VALUES ('jsmith', 'ch@ngem3a'); cqlsh> SELECT * FROM users WHERE user_name='jsmith'; user_name | password | state -----------+-----------+------jsmith | ch@ngem3a | null Confidential 19
THANK YOU! Thank you! 20
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.
Recommended starting points. Introduction to Cassandra at FOSDEM 2010 (video + slides) How to set up a 4 node Cassandra cluster in under 2 minutes
View Cassandra presentations online, safely and virus-free! Many are downloadable. Learn new and interesting things. Get ideas for your own presentations.
Presentation: Gary Dusbabek (Rackspace) on.... About cassandra, Presentation, ... My notes: What problems does it solve? Reliability at scale No Single ...
Presentations. Presentations were from a diverse group of project organizers and industry representatives that together gave a focused view of the details ...
Open-source database management system (DBMS) Several key features of Cassandra differentiate it from other similar systems. What is Cassandra?
cassandra_presentation_final.pptx - Download as Powerpoint Presentation (.ppt / .pptx), PDF File (.pdf), Text File (.txt) or view presentation slides online.