Analytics Drives Big Data Drives Infrastructure

55 %
45 %
Information about Analytics Drives Big Data Drives Infrastructure
Technology

Published on March 12, 2014

Author: xale4

Source: slideshare.net

Description

A personal perspective of how analytics have evolved from the 80s to current and how it has driven demands on the computing and storage infrastructure. Examples are given from using machine learning ("AI") techniques using neural networks and genetic algorithms in 80s and 90s to Aumnidata's social media analytics in 2008-10 and real-time intent detection by Cruxly from 2011 onwards.

Analytics Drives Big Data Drives Infrastructure Confessions of Storage turned Analytics Geeks Dr. Aloke Guha 29th IEEE Conference on Massive Data Storage May 8th, 2013 aloke@cruxly.com

2 What’s Common Between a Sensor that could Distinguish a fine Cognac, and Predicting Movies You’d Like on Netflix? Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

The Sommelier “Robot” Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 3

Predicting What Movies You’d Watch Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 4

5 (Analytics, BigData, DataStore)+ Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

6 Many Analytics Techniques . . . Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 Statistics Regression Linear Time-Series Decision Trees R AI (McCarthy) 1956 Expert Systems Machine Learning Neural Networks SVM LDA Naïve Bayes K-nearest neighbor Random Forests . . . Genetic Algorithms Random Forests SNARC (Minsky) 1951 Dendral (Feigenbaum) 1965 Fraser and Burnell (1970) . . . Vapnik (1992) Ihaka and Gentleman (1993)

7 Common Analytics Processing pre-2000 • Sources: Local • Data: Numeric, Homogeneous • Processing: Local • Consumer: Local • Analytics: Linear/Non-Linear Regression, Neural Networks, SVM, LDA, LSA, Decision Trees, Monte Carlo, Lin-Ops, Expert Systems . . . Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

Flavor Predictor – Neural Networks USPTO #5,373,452 (1994) 1988 Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 8

Pattern Recognition – Genetic Algorithms US PTO #5,140,530, 1992 Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 9

10 Small to Big http://article.wn.com/view/2013/04/04/Big_data_forefather_Michael_Stonebraker_shows_no_signs_of_sl/#/related_news Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

11 Typical Analytics: 2000-2006 • Sources: Global , Social Networks • Data: Heterogeneous, Numeric, Text • Processing: Hosted/Scale • Consumer: Global • Analytics: Batch Mode, Social Media Marketing, Churn Detection, Sentiment Analysis, etc. Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

2007- : Internet Data Analytics Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 12

Financial Risk Scoring: Detect Risk Scoring: detect incremental change in # occurrences where corporate officers mention “risk” (or equivalent terms) during earnings call Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 13

Financial Risk Scoring: Listen *Risk Scoring: detect incremental change in occurrences where corporate officers mention “risk” (or semantically equivalent terms) during the corporate earnings call Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 14

Banking: Credit Worthiness – remember 2008? Analyze bank reports to assess loans, payments, recoveries, etc. for key bank indexes, groups of banks, or individual banks Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 15

Share of Voice: Online Buzz Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 16

Sentiment Analysis Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 17

18 Analytics Processing: 2007- • Sources: Global, Mobile, New Social (Instagram, . . ) • Data: Multi-Dimensional, Heterogeneous, Audio/Video • Processing: Hosted/Scale • Consumer: Global • Analytics: Batch, Streaming, . . . Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

2008 - : Real-Time/Streaming Analytics Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 19

Brand Marketing Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 20

Brand Management 21

Customer Support Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 22

Customer Support 23

24 Lead Generation Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

. . . More Data, Faster http://www.cioinsight.com/it-strategy/big-data/data-analytics-allows-pg-to-turn-on-a-dime/?kc=CIOMINUTE05062013CIOA Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 25

“Internet of Things” http://www.news-sap.com/survey-by-sap-and-harris-interactive-finds-brazil-china-germany-and-india-most-ready-for- m2m-technology-to-drive-connected-smarter-cities/ Message Queuing Telemetry Transport Machine-to-Machine Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 26

27 AumniData: Batch Processing Data Collector (Batch Scheduled) Twitter Blog/Web Site Data Collector (Batch Scheduled) RSS/ATOM Feed Requestor/ URL Scanner NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP Stack+ AumniData Classifier + Analytics* (RackSpace VM) Dashboard Application (.3rd party App) Blog/Web Site Blog/Web SiteYouTube Dashboard Configuration (TomCat) Custom Analytics Display Ad-Hoc Query Summary Data Collector (Batch Scheduled) Content Store Content / Metadata Index (MySQL) Dashboard Store (SQL Server) Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

28 Cruxly: Stream Processing Streaming API Client (Heroku Worker) (24x7) Streaming API Client (Heroku Worker) (24x7) NLP+ Cruxly Intent Detection (AWS) Streaming API Client (Heroku Worker) (24x7) Tweets (Keywords) Request (Keywords) Tweets (Keywords) Tweet ID + Intent Signal (Heroku PostgresSQL) Tweets Content Store (DynamoDB) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP+ Cruxly Intent Detection (AWS) NLP (NER, etc + Cruxly Intent Detection (AWS) Reports / Dashboard Tracker Editor (web app - Heroku) Twitter Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

29 Data Analytics Demands . . . Store Process Analyze View Store Process Analyze View Storm Data Collector Text / Sensor Data/ Stream . . . NLP Classify Index Query/ RT Query Ad Hoc/ Search/ SQL Custom Analytics Dashboards Chart Report Machine Learning Library Stats Library R Yarn

Storage Implications: Back to the Future MB/s – Batch IOPs – Stream Both? Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 30

Storage Implications: Back to the Future II, III Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 Task tracker Task tracker Task tracker Job Tracker Zookeeper Hive Pig Oozie HUE HDFS clientData Node Data Node Data Node Name Node MapReduceHDFS Master Slave #1 Slave #N Mgmt Node Storage Capacity Scaling? 31 Storage Tiering? Import/Export Data?

A More General Data Analytics Framework? Data Ingesters (Basic) Data Ingesters (Smart) Content StoreMetadata / In-Mem Store Processing Stream and Batch Data Ingesters Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013 AnalyticsProcessing SensorProcessing:DataIntegration VisualizationLibrary/InteractiveQuery LocalStorage/Flash/DAS MapReduce/DistributedDataStore 32

33 Conclusion • Data Analytics ⇒ Big Data ⇒ Scale-Out • Variety ⇒ Infrastructure • Volume ⇒ Bandwidth Support • Velocity ⇒ Streaming Support • We Solved the Processing Problem • We Need to Solve the Larger Storage Problem Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

34 Grateful Acknowledgements • Kapil Tundwal • Dr. Kirill Kireyev • Dr. Andrew Lampert • Venky Madireddy • Dr. Shumin Wu • Joan Wrabetz Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013

Add a comment

Related presentations

Related pages

⭐Analytics Drives Big Data Drives Infrastructure ...

Analytics Drives Big Data Drives Infrastructure Confessions of ... 1 Analytics Drives Big Data Drives Infrastructure Confessions of Storage turned ...
Read more

The State of Big Data Infrastructure - CA Technologies

The State of Big Data Infrastructure: Benchmarking global Big Data users to drive future performance April 2015
Read more

The Power behind Big Data and Analytics: how ...

The Power behind Big Data and Analytics: how infrastructure drives ... Big Data and business analytics ... Big Data Mining Infrastructure: ...
Read more

Cisco Big Data and Analytics Solutions - Cisco

Platform’s Big Data Analytics platform is the ... Infrastructure for Big Data integrates ... to data wherever it is. No matter how big or ...
Read more

【Oozie】Analytics Drives Big Data Drives Infrastructure ...

AnalyticsDrives Big Data Drives Infrastructure Confessions Storageturned Analytics Geeks Dr. Aloke Guha 29th IEEE Conference MassiveData Storage May ...
Read more

Big Data Analytics Drives Technology M&A - Outlook Series

Big Data Analytics Drives Technology M&A May 12, 2016 Following record-setting levels in 2015, global technology M&A volume rose 8% sequentially ...
Read more

Big data, analytics, and cloud drive software growth ...

IoT threats drive global embedded security shipments ... Big data, analytics, and cloud drive software growth. ... such as Data Access, ...
Read more

Big Data Test Drive - aws.amazon.com

Test Drive Big Data solutions on AWS. Quickly and easily explore the rich AWS ecosystem of solutions designed to address your Big Data challenges.
Read more

Noah Data | Offshore Big Data Analytics | IT Consulting ...

We exclusively focus on helping Enterprises & Startups globally drive Digital Transformation through a combination of Java, Big Data Infrastructure ...
Read more