bigdata lambda architecture (spark, kafka, cassandra)

33 %
67 %
Information about bigdata lambda architecture (spark, kafka, cassandra)

Published on July 11, 2016

Author: zigiella

Source: slideshare.net

1. Lambda Architecture for Twitter content-based Recommendation System Barcelona Tourist City Monitor & Insights 01.07.2016 #SPARK #KAFKA #CASSANDRA Juan Pablo López Rodica Fazakas Yulia Zvyagelskaya Beatriz Martín BIG DATA MANAGEMENT AND ANALYTICS POSTGRADUATE COURSE - FINAL PROJECT

2. Data

3. Data Source ENGLISH, FRENCH, RUSSIAN [41.34,2.03,41.45,2.25] Tweets geolocated in Barcelona Tweets with Barcelona KW Barcelona Sagradafa MWC

4. Data Source (amount of data) [41.34,2.03,41.45,2.25] All languages: 20.000 tweets/day Only EN, FR, RU: 7.000 tweets/day All languages: 250.000 tweets/day Only EN, FR, RU: 80.000 tweets/day Barcelona Sagradafa MWC

5. Data Management

6. Cluster topology

7. Architecture

8. - Architecture

9. Data Collect Layer

10. Data Collect Layer Collect Process

11. Data Collect Layer: Apache Kafka Distributed publish-subscribe messaging service Fault-tolerant Decoupling, Simplicity, Efficiency Fast topics: twittergeobcn, twitterkwbcn, rtstats, rtpredictions

12. Data Collect Layer Collect Process topics: twittergeobcn, twitterkwbcn

13. Data Collect Layer

14. Data Collection: Apache Flume

15. Processing Analytics Layer

16. Processing Analytics Layer

17. Batch Processing: Pre Process ● Collect ● Pre Process ● Read Geolocated Tweets stored in HDFS ● Clean Tweet Text (lowercase, numbers, spaces,tabs,etc..) ● Categorize users (tourist, resident), comparing geolocation of last 200 tweets ● Save in Cassandra for ML processes

18. Batch Processing: Topic Modelling Process ● Collect ● TP Process

19. Batch Processing: SVM Process ● Collect ● SVM Process Model

20. Streaming Process Collect Stats Process topic: twittergeobcn topic: rtstats Predict Process topic: rtpredictions Model

21. API Layer

22. API Layer REST API

23. Dashboard HTML

24. DEMO

25. Thank you!

Add a comment

Related pages

Apache Spark™ - Lightning-Fast Cluster Computing

Apache Spark ™ is a fast and ... (lambda line: line.split()) ... It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
Read more

Yair Kler | LinkedIn

... wie Yair Kler dabei ... - BigData - Lambda Architecture (Hadoop, Kafka & RabbitMQStorm, - Spark Streaming , NoSQL: HBase/Cassandra ...
Read more

Apache Spark - Wikipedia, the free encyclopedia

Apache Spark; Original author(s) ... Cassandra, OpenStack Swift, Amazon S3, ... thus facilitating easy implementation of lambda architecture. ...
Read more

[Part 1] Big Data Analytics with Cassandra And Spark - YouTube

... insight thru Cassandra and Spark. http://www ... storm ,kafka and cassandra. with a ... Lambda Architecture with Spark Streaming ...
Read more

Big Data Processing with Apache Spark – Part 1: Introduction

... Srini Penchikala talks about how Apache Spark framework helps with big data processing ... Spark Architecture. ... Spark, Kafka, and Apache Cassandra ...
Read more

Web analysis using apache storm ,kafka and cassandra. with ...

... web using apache storm apache ... Lambda Architecture, Analytics and Data Pathways with Spark Streaming, Kafka, Akka and Cassandra ...
Read more

Newest 'lambda-architecture' Questions - Stack Overflow

Tagged Questions. info newest frequent ... Lambda architecture is a data ... I read a lot about lambda and kappa architectures where we need to use either ...
Read more