Big Data and APIs - a recon tour on how to successfully do Big Data analytics

67 %
33 %
Information about Big Data and APIs - a recon tour on how to successfully do Big Data...
Technology

Published on March 11, 2014

Author: natalinobusa

Source: slideshare.net

Big Data & APIs A recon tour on how to successfully do Big Data

More events, users Facebook user post 4.5 billion items a day (as of Sep 2013) Facebook MAU 1.2 billion (as of Sep 2013)

More messages, transactions WhatsApp From 0 to 31 billion messages sent daily (as of Aug 2013)

for { x <- post.stream user <- getUser(x) message <- getData(x) friend <- getFriends(user) } { yield notifyFriend(friend,user,message.id) } 1 billion posts a day! Example: Notify all my friends

News filtering This is a tough problem. You cannot read all that stuff !!!

News filtering: “a machine feeds you what to read”

for { x <- post.stream user <- getUser(x) message <- getData(x) friend <- getFriends(user) hustle <- getFriendNonsense(friend) weather <- getWeather(user) mood <- getMood(user), vibe <- getMood(friend), topics <- getTrendingTopics(friends) market <- getChart(‘gold, ‘bigmac) interesting <- hal9000(hustle,weather,mood,vibe,topics,market) if interesting }{ yield notifyFriend(friend,user,message.id) } 1 billion posts a day! Notify only those who care. The context is much bigger now.

Dealing with context

Machine learning to the rescue ● ● ● The problem Constraints

Data science: random forests from bigml.com Solve a classification problem

Million of features. Million of users and preferences. Very large sparse matrix !

Data science: Time series prediction Extract features. Correlate time series Very large sparse matrix !

RAM: 100 Tera Byte, DISK: 100 Peta Byte, CPU: 100 Tera Flops

Bummer.

why?

Nature went that way too. Ain’t that funny? “Evolving to multi cellular organisms” More resiliant cells die: organism lives on Complex tasks: cannot be handled by a single cell

High Availability A system can be up, but not available (think of a network outage) How to improve it . Replication / Redundancy: 3, 5 replicas are common in highly available systems Dynamic Commission - Decommission: re-balance the cluster for dead/new nodes

CAP theorem: 12 years later The CAP theorem is largely misunderstood.

Tuning CAP: understand your use cases http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks

Hadoop Distributed FS Haddop Distribute Run-Time (Map-Reduce) Hive (DB) Python R Cassandra (distributed low-latency datastore) Akka (web server, in-memory runtime) A proven stack today: Functional

Hadoop Distributed FS Haddop Distribute Run-Time (Map-Reduce) Hive (DB) Python R Cassandra (distributed low-latency datastore) Akka (web server, in-memory runtime) A proven stack today: Monitoring-Logging Atmos DataStax OpsCenter Hue Ambari Ganglia Elastic Search Logstash KibanaMarvel

Everything Distributed All Things Distributed - Werner Vogels' weblog on building scalable and robust distributed systems.

Latency tradeoffs

Hmm, thats a complex system. How to manage?

Hmm, thats a complex system. How to manage? lazy evaluated scheduled

APIs are everywhere.

Thanks

Geoffrey Moore, Author of Crossing the Chasm

Add a comment

Related presentations

Related pages

Tracks -- Live! 360 Events

... Big Data, Data Analytics, ... Recon, Data Gathering, ... Explore why you should care about the Office 365 APIs and what you can do with them.
Read more

Recently Active Questions - Page 75 - Data Science Stack ...

Q&A for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. ... sign up log in tour help
Read more

Archive News & Video for Tuesday, 30 Jul 2013 | Reuters.com

Site Archive for Tuesday, 30 Jul 2013. ... Off National ‘One America’ Tour to Unite Communities and Unlikely ... of Your Big Data Investment ...
Read more

Archive News & Video for Thursday, 18 Jun 2015 | Reuters.com

Site Archive for Thursday, 18 Jun 2015. ... powered by InsideView APIs 5:02PM ... Global Big Data Professional Services Market 2015 Report Featuring the ...
Read more

SREcon15 Europe Programme | USENIX

Home » SREcon15 Europe Programme. ... to the periodic pipeline for reliable Big Data ... 10+ years of mathematical expertise to data analytics.
Read more

HP TouchPad Needs 6 to 8 Weeks for Additional Shipments

Big Data Analytics Cloud Backup Next Generation ... The manufacturer originally had big plans for loading the operating system onto a variety of ...
Read more

DIKSHA CDC | Instructor Led Online,Classroom and Corporate ...

... problems with big data and analytics ... Wizard Guided tour of Business ... to successfully build interactive, data-driven sites ...
Read more

On the Web - Chemical Processing

Home / Issue Index / 2002 / On the Web. On the Web 2002 Issue. Articles. When was the ... to go big, to embrace the ... time plant performance data and ...
Read more