advertisement

Large Scale Data Analysis with AWS

50 %
50 %
advertisement
Information about Large Scale Data Analysis with AWS
Technology

Published on February 28, 2014

Author: AmazonWebServices

Source: slideshare.net

Description

This presentation from the AWS Lab at Cloud Expo Europe 2014 explores large scale data analysis on AWS. The cost of data generation is falling. Storing, analyzing and sharing data using the tools that AWS offers a low cost and easy to use solution for creating value from your data assets.
advertisement

LARGE SCALE DATA ANALYSIS WITH AWS Carlos Conde – Sr. Mgr. Solutions Architecture carlosco@amazon.com @caarlco

THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT

THE COST OF DATA GENERATION IS FALLING

DATA VOLUME Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

GENERATE  STORE  ANALYZE  SHARE

Lower cost, higher throughput GENERATE  STORE  ANALYZE  SHARE

Lower cost, higher throughput GENERATE  STORE  ANALYZE  SHARE Highly constrained

+ ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS

GENERATE  STORE  ANALYZE  SHARE

AWS Import /Export AWS Direct Connect GENERATE  STORE  ANALYZE  SHARE

Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect

Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE

AMAZON S3 SIMPLE STORAGE SERVICE

CASE STUDY: SPOTIFY ADDS 20,000 TRACKS/DAY TO ITS CATALOGUE

AMAZON DYNAMODB HIGH-PERFORMANCE, FULLY MANAGED NoSQL DATABASE SERVICE

DURABLE & AVAILABLE CONSISTENT, DISK-ONLY WRITES (SSD)

LOW LATENCY AVERAGE READS < 5MS, WRITES < 10MS

NO ADMINISTRATION

CASE STUDY: SHAZAM SUPPORTED 500,000 WRITES/SEC DURING SUPER BOWL

AMAZON REDSHIFT FULLY MANAGED, PETA-BYTE SCALE DATAWAREHOUSE ON AWS

DESIGN OBJECTIVES: A petabyte-scale data warehouse service that was… A Lot Faster AMAZON REDSHIFT A Lot Cheaper A Whole Lot Simpler

30 MINUTES DOWN TO 12 SECONDS

AMAZON REDSHIFT LETS YOU START SMALL AND GROW BIG Eight Extra Large Node (HS1.8XL) Extra Large Node (HS1.XL) Cluster 2-100 Nodes (32 TB – 1.6 PB) X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L X L X L X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L X L X L X L X L 8X L 8X L Cluster 2-32 Nodes (4 TB – 64 TB) 8X L 8X L Single Node (2 TB) 8X L 8X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L

CREATE A DATAWAREHOUSE IN MINUTES

JDBC/ODBC

ON-DEMAND PRICING

PRICE PER TB / YEAR

GENERATE  STORE  ANALYZE  SHARE Amazon EC2 Amazon Elastic MapReduce

AMAZON EC2 ELASTIC COMPUTE CLOUD

3 HOURS FOR $4828.85/hr

Instead of $20+ MILLIONS in infrastructure

GPU INSTANCES G2 CG1 1x NVIDIA Kepler GK104 8 vCPU (Intel Xeon E5-2670) $ 2x NVIDIA Fermi M2050 16 vCPU (Intel Xeon X5570) 0.65/h $ 2.10/h

ON A SINGLE INSTANCE COMPUTE TIME: 4h COST: 4h x $2.1 = $8.4

ON MULTIPLE INSTANCES COMPUTE TIME: 1h COST: 1h x 4 x $2.1 = $8.4

AMAZON ELASTIC MAPREDUCE HADOOP AS A SERVICE

• SPLITS DATA INTO PIECES • LETS PROCESSING OCCUR • GATHERS THE RESULTS

CASE STUDY: "WITH AMAZON EMR WE CAN ANALYZE 100% OF THE DATA, NOT JUST A SAMPLE" - Sanjeevan Bala, Head of Data Planning & Analytics, Channel 4

Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE

PUBLIC DATA SETS http://aws.amazon.com/publicdatasets

GENERATE  STORE  ANALYZE  SHARE

GENERATE  STORE  ANALYZE  SHARE BATCH PROCESSING

STREAM GENERATE  PROCESSING  SHARE

AMAZON KINESIS REAL-TIME DATA STREAM PROCESSING

Real-time response to content in semi-structured data streams Relatively simple computations on data (aggregates, filters, sliding window, etc.)

Hourly server logs: how your systems went wrong an hour ago Real-time metrics: what just went wrong now Weekly / Monthly Bill: What you spent this past billing cycle Real-time spending alerts/caps: guaranteeing you can’t overspend Daily customer report from your website: tells you what deal or ad to try next time Real-time analysis: what to offer the current customer now Daily fraud reports: tells you if there was fraud yesterday Daily business reports: tells me how customers used AWS services yesterday Real-time detection: blocks fraudulent use now Fast ETL into Amazon Redshift: how are customers using services now

GENERATE  STORE  ANALYZE  SHARE

AWS Import / Export AWS Direct Connect Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE Amazon EC2 Amazon Elastic MapReduce

STREAM GENERATE  PROCESSING  SHARE

Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 STREAM GENERATE  PROCESSING  SHARE Amazon Kinesis Stream Processing on Amazon EC2

FROM DATA TO ACTIONABLE INFORMATION

THANK YOU Carlos Conde – Sr. Mgr. Solutions Architecture carlosco@amazon.com @caarlco

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Big Data Analytics Options on AWS - d0.awsstatic.com

Analyzing large data sets ... secure devices like AWS Import/Export Snowball5 to accelerate petabyte-scale data ... Big Data Analytics Options on AWS ...
Read more

Big Data Solutions – Amazon Web Services (AWS)

Big Data on Amazon Web Services ... Easily manipulate graphs at massive scale in AWS. ... genome assembly and analysis, and other large ...
Read more

EC2 Instance Types – Amazon Web Services (AWS)

AWS Snowball Edge Petabyte-scale Data Transport with On-board Compute. ... Amazon EC2 Instance Types. ... data mining & analysis, ...
Read more

Large scale observation and analysis of Amazon AWS traffic

Large scale observation and analysis of Amazon ... AWS has gained a large ... describes products offered by AWS, and Sec. III overviews the data collection ...
Read more

Analyzing Big Data - Amazon Web Services

Getting Started with AWS: Analyzing Big Data ... to process big data. Getting Started: Sentiment Analysis ... AWS services to simplify large-scale data ...
Read more

AWS Big Data Blog - blogs.aws.amazon.com

AWS Big Data Blog. Helping you collect ... SparkR is an R package that allows you to integrate complex statistical analysis with large ... That is why for ...
Read more

Big Data Analysis on AWS | Cloud Academy

... there are many components in the AWS big data ... of Big Data Analysis on AWS. ... Redshift cluster to store large amounts of data, ...
Read more

Large-Scale Data Analysis in the Cloud - hpi.de

... data profiling, ... Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems.
Read more