C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop

40 %
60 %
Information about C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop

Published on November 12, 2013

Author: planetcassandra

Source: slideshare.net


Speaker: Dmitry Mezhensky

Analytics on top of Cassandra and Hadoop Dmitry Mezhensky | Mirantis Inc #CASSANDRAEU

What we will discuss today ● Analytics on Cassandra using Hadoop ● Various types of statistics & implementation ● Scalability of approach #CASSANDRAEU

Problems ● Too many statistics (more that 100) ● Various types ○ Top N ○ Time series ○ Min/max/average/median ○ Extremum values on time interval ○ Fraud analysis ● Huge amount of data ● Scalability of approach #CASSANDRAEU

Statistics implementation on Hadoop #CASSANDRAEU

Top N ● Map phase generates <Key, Value> pairs, top N is building by Value ● Reduce phase accumulates values, persist to Cassandra is done via custom output format ● For top N entities in Cassandra suitable comparator was used #CASSANDRAEU

Top N ● One write stage to Cassandra sorting is done by value ● On reading stage first N records will be Top N values #CASSANDRAEU

Time series ● Map phase generates pairs <Time, Value> ● Reduce phase accumulates (various behaviour for different statistics) ● Persist to Cassandra using custom output format & using one row key per statistics, one column per date #CASSANDRAEU

Maximum, minimum, extremum on interval ● Max/min values are simple to calculate ● Extremum on interval is calculating the similar to time series #CASSANDRAEU

Fraud analysis ● Fraud analysis is running after all statistics are calculated ● Processed data is filtered by fraud filters #CASSANDRAEU

Scalability approach ● ● ● ● Data is reading/writing to Cassandra only Hadoop is elastically scalable Cassandra is elastically scalable No bottleneck #CASSANDRAEU



Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

C* Summit 2013: Real-time Analytics using Cassandra, Spark ...

Speaker: Evan Chan, Ooyala Slides: http://www.slideshare.net/planetcassa... This session covers our experience with using the Spark and Shark ...
Read more

C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra ...

... http://www.slideshare.net/planetcassandra/c-summit-eu-2013 ... 2013: Mixing Batch and Real-Time: Cassandra ... Cassandra has Hadoop ...
Read more

Cassandra Summit Europe 2013 | DataStax

Welcome to Apache Cassandra Summit Europe 2013. ... DSE Analytics ... Apache Cassandra, Cassandra, Apache Hadoop, ...
Read more

C* Summit EU 2013: From CQL to Time-Series Event Tracking ...

C* Summit EU 2013: From CQL to Time-Series Event Tracking and Aggregation Using Cassandra and Hadoop Mr. Lokal.. original link http://www ... Top Uploaders;
Read more

A big data case study: Ooyala’s real-time video analytics

A big data case study: ... Apache Cassandra and data crunching platform Hadoop, ... generation analytics system based on Cassandra, ...
Read more

Welcome to Apache™ Hadoop®!

The Apache™ Hadoop® project develops open ... 2013: release 2.2.0 available . Apache Hadoop 2.x reaches ... Apache Hadoop takes top prize at Media ...
Read more