Running Cassandra in AWS

50 %
50 %
Information about Running Cassandra in AWS

Published on November 27, 2013

Author: planetcassandra



For this upcoming meetup, we welcome Patrick Eaton PhD, Systems Architect at Stackdriver, and Joey Imbasciano, Cloud Platform Engineer at Stackdriver.

What You'll Learn At This Meetup:
• Why Stackdriver chose Cassandra over other DB offerings
• Stackdriver's data pipeline that runs into Cassandra
• Operating Cassandra Running on AWS
• Stackdriver's approach to disaster recovery

Patrick and Joey will be presenting their use of Apache Cassandra at Stackdriver, some lesson's learned, technical tips and a Q&A to end the evening.

Running Cassandra in AWS Patrick Eaton, PhD @PatrickREaton Joey Imbasciano @_joeyi

Stackdriver at a Glance Stackdriver's hosted intelligent monitoring service helps SaaS companies innovate more by reducing the burden of day-to-day operations ● Cloud-native and cloud-aware ● Designed for complex distributed applications ● Founded by cloud/infrastructure industry veterans (Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise ● Team of ~25, based in Downtown Boston

Intelligent Monitoring Discover customer’s cloud-hosted applications ● ● ● ● Infrastructure inventory Logical units, like groups/clusters Services, hosted and self-managed Elastic resources Monitor ● ● Various data sources ● Provider metrics ● Host metrics ● Custom metrics ● Endpoints ● Events ● Health Rich visualizations Analyze ● ● ● ● ● Integrate data sources Aggregate metrics Report utilization, cost, etc. Detect policy violations Recommend actions

Lambda Architecture ● ● ● ● ● ● Typical of modern architectures for on-line applications. Formalized by Nathan Marz Composed of "batch", "speed", and "serving" layers Batch layer ○ Store of record ○ Compute arbitrary views Speed layer ○ Low latency updates ○ Streaming algorithms Serving layer ○ Combine data from batch and speed layers to answer queries Serving Speed Batch Data

Stackdriver Architecture ● ● ● ● ● Shares characteristics of lambda architecture Indexing (speed) path ○ Make "live" data available "pre-analysis" Analysis (batch) path ○ Compute aggregations ○ Create recommendations Query (serving) layer ○ Combine "live" and analyzed data to answer queries ○ May require on-the-fly analysis Alerting (speed) path (not discussed here) ○ Stream processing to detect Query (Serving) Notification (Serving) Database Indexing (Speed) Analysis (Batch) policy-based anomalies Data Alerting (Speed)

Database Options ● We chose Cassandra! ○ True P2P architecture ○ Good support for write-heavy workloads ○ Compatible data model for time series data ■ Column per metric type, timestamps as columns ● Why not MySQL? ○ Experience with operating large, sharded deployments ○ Relational data model not a good match ● Why not HBase? ○ Operational complexity - zk, hadoop, hdfs, ... ○ Special "Master" role ● Why not Dynamo? ○ Avoid vendor lock-in and high cost

Stackdriver Architecture ++ ● Archival pipeline stores all data ● Very small surface area, battle-tested ● Critical for disaster recovery ● S3 considered durable enough ● Replicated for availability Query Cassandra Roll-ups Analysis Recs Inventory Data Series Analyze ● ● ● Archive means Cassandra is "soft state" C* consolidates analysis and indexing results Properties of data in C* ● Immutable data ● Append-only ● Read-1, write-1 consistency S3 Archive Index ● Scales out easily ● Indexers, archivers, analyzers, query servers Data

Cassandra at Stackdriver Cluster Configuration ● ● ● ● ● ● Version: Datastax Community Edition 1.2.10 Replication Factor: 3 Vnodes Murmur3Partitioner Ec2Snitch ○ Aids in request efficiency ○ Enables Cassandra to ensure replicas are in different Availability Zones phi_convict_threshold: 8 -> 12 ○ Used to determine when nodes are down ○ AWS network can be spotty

Cassandra Topology in AWS Where we started... Where we are... 1 us-east-1a us-east-1a 3 2 us-east-1c us-east-1b us-east-1c Keep it balanced! us-east-1b

Cassandra EC2 Node Configuration ● m1.xlarge ○ 4 cores ○ 15 GB RAM ○ 4 ephemeral disks available ● 4 disks RAID-0 for Data Volume and CommitLog ○ ○ ○ ○ ext4 - defaults,noatime mdadm RAID-0 Compactions Heavy Read/Write IO

Cassandra Automation and Operations ● Combination of Boto, Fabric, & Puppet ○ Boto for AWS API ○ Fabric + Puppet for Bootstrapping ○ Fabric for Operations ● One command to: ○ ○ ○ ○ ○ Launch a new cluster Upsize a cluster Replace a dead node Remove existing nodes List nodes in a cluster

Our (Internal) Slogan

Cassandra Backups using S3 ● No Cassandra Powered Backups ● Restore from S3 ● Useful for major version upgrades Data S3 Bulk Loader Map Reduce 1. Data is archived when it is received 2. Bulk loader reads from S3 3. M/R re-analyzes data 4. Cassandra is repopulated Cassandra

Disaster Recover in the Wild ● ● ● ● ● ● ● ● October 23, Stackdriver suffered a total loss of our C* cluster ● Exhausted memory due to number of open file descriptors (see graph) We did not notice the problem until it was too late ● Nodes began crashing, resulted in inconsistent view of the ring Attempted to restart the cluster unsuccessfully for ~2 hours Provisioned new 36 node cluster in ~2 hours Directed “live” data to new cluster Started bulk restore operation from archive ● Full-fidelity data and aggregations No data loss due to archival pipeline See

Cluster Restoration Process S3 Map Reduce Bulk Loader Historical Data New Cluster UI UI UI UI UI API UI UI Gateway New Data Old Cluster

Thank you! Yes, we are hiring! Patrick Eaton - - @PatrickREaton Joey Imbasciano - - @_joeyi

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Apache Cassandra on AWS -

Amazon Web Services – Apache Cassandra on AWS January 2016 Page 4 of 52 Performing Backups 41 Building Custom AMIs 42 Migration into AWS 42
Read more

Running Cassandra in AWS - Stackdriver

Stackdriver relies heavily on Cassandra running in AWS to store, analyze, and query across 100+ million data points per day. As we move to the public beta ...
Read more

Best practice cassandra setup on ec2 with ... - Stack Overflow

Best practice cassandra setup on ... I have been running Cassandra on ... The above two tips should satisfy basic availability in AWS and in case ...
Read more

CloudConfig - Cassandra Wiki

Setting up Cassandra in the Cloud. ... (AWS/EC2) There is an ... Note on using Cloudkick for monitoring Cassandra running on Debian: ...
Read more

Simple Cassandra Instance In AWS EC2 | Adam Hutson

Home Simple Cassandra Instance In AWS EC2. Simple Cassandra Instance In AWS EC2. October 24, ... Now we have Cassandra running in the foreground, ...
Read more

Installing a Cassandra cluster on Amazon EC2 - DataStax

A step-by-step guide for installing the DataStax Community AMI ... Replacing a running node; ... Installing a Cassandra cluster on Amazon EC2.
Read more

The Netflix Tech Blog: Benchmarking Cassandra Scalability ...

Using an additional 60 instances as clients running the stress ... Netflix is using Cassandra on AWS as a key ... About the Netflix Tech Blog.
Read more

Running Cassandra in AWS - Technology -

Home; Technology; Running Cassandra in AWS; Running Cassandra in AWS May 06, 2015 Technology planet-cassandra
Read more