Complement Your Existing Data Warehouse with Big Data & Hadoop

50 %
50 %
Information about Complement Your Existing Data Warehouse with Big Data & Hadoop
Technology

Published on February 6, 2014

Author: Datameer

Source: slideshare.net

Description

To view the full webinar, please go to: http://info.datameer.com/Slideshare-Complement-Your-Existing-EDW-with-Hadoop-OnDemand.html

With 40% yearly growth in data volumes, traditional data warehouses have become increasingly expensive and challenging.

Much of today’s new data sources are unstructured, making the structured data warehouse an unsuitable platform for analyses. As a result, organizations now look at Hadoop as a data platform to complement existing BI data warehouses, and a scalable, flexible and cost-effective solution for data storage and analysis.

Join Datameer and Cloudera in this webinar to discuss how Hadoop and big data analytics can help to:

-Get all the data your business needs quickly into one environment
Shorten the time to insight from months to days
Extend the life of your existing data warehouse investments
Enable your business analysts to ask and answer bigger questions

Complement Your Existing 
 Data Warehouse with 
 Big Data & Hadoop © 2013 Datameer, Inc. All rights reserved.

View Recording ▪  You can view the recording of this webinar at: ▪  http://info.datameer.com/SlideshareComplement-Your-Existing-EDW-withHadoop-OnDemand.html

About our Speakers Karen Hsu –  Karen is Senior Director, Product Marketing at Datameer. With over 15 years of experience in enterprise software, Karen Hsu has co-authored 4 patents and worked in a variety of engineering, marketing and sales roles. –  Most recently she came from Informatica where she worked with the start-ups Informatica purchased to bring data quality, master data management, B2B and data security solutions to market.  –  Karen has a Bachelors of Science degree in Management Science and Engineering from Stanford University.  

About our Speakers Jeff Bean –  Jeff Bean has been at Cloudera since 2010. He's helped several of Cloudera's most important customers and partners through their adoptions of Hadoop and HBase, including cluster sizing, deployment, operations, application design, and optimization. " –  Jeff has also spent time on Cloudera's training team, where he focused on partner enablement, training hundreds of field personnel in Hadoop, it's usage, and it's position in the market. Jeff currently does partner engineering at Cloudera, where he handles field support, certifications, and joint engagements with partners such as Datameer. "

How Big Data Analytics and Hadoop Complement Your Existing Data Warehouse Jeff Bean, Cloudera Karen Hsu, Datameer © 2013 Datameer, Inc. All rights reserved.

Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion

Data Has Changed in the Last 30 Years DATA GROWTH END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES UNSTRUCTURED DATA – 90% STRUCTURED DATA – 10% 1980 2013

EDW Expansion: A Vicious Cycle §  Increasing   numbers   of  users   §  Growing   volumes   of  data   §  Addi7onal   data   sources   §  New  use   cases   Degraded   quality  of   service  and   inability  to  meet   SLAs   §  Constant   pressure  to   purchase   addi7onal   capacity     §  Enterprise Data Warehouse

Hadoop vs. Data Warehouse:
 Freeing up Capacity for High Value Workloads Today   All  growth  accommodated  by  incremental  investment   in  DW   100  TB   100%     Data  Growth   Data  Warehouse   $20,000  -­‐  $100,000  /  TB   11   100  TB   +   100  TB   More  Capacity  in  Data   Warehouse   Incremental  Spend:  
 $2  to  $10  Million  

Hadoop vs. Data Warehouse:
 Freeing up Capacity for High Value Workloads Future
 Hadoop  offloads  data  and  workloads  to  defer/avoid   incremental  spend  and  reduce  data  management  TCO   100   TB   Lower  Value  Data   High  Value  Data   Keep  the  Right  Data  in  the   Data  Warehouse  System   • Opera7onal  Analy7cs   • Repor7ng   • Business  Analy7cs   50  TB   100   TB   Cloudera  /  Datameer   (Total  Cost  of  Cluster)   $1,000  -­‐  $2,000  /  TB   50  TB   Incremental  Spend:   $240,000-­‐  $300,000  ACV   Use  Hadoop  for  Everything  Else
 Savings:  $1.85  to  9.8  MM   • Historical  Data   • Data  Processing   • Ad  Hoc  Exploratory   • Transforma7on  /  Batch   • Data  Hub  

Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion

Assessing Workloads and Data Data Warehouse WORKLOADS Analytics Self-Service BI Operational Business Intelligence ▪  Data Processing (ELT) –  Staged data, to be processed –  Temp tables, BLOB/CLOB types, … ▪  Analytics / Machine Data Processing (ELT) Learning DATA –  Deep and broad data sets, within and beyond the warehouse Operational Data Archival Data Staged Data 14 ▪  Self-Service BI (Ad-Hoc Query) –  Operational data, actively used for BI –  Archival data, inactively used for BI

Offload Data Processing (ELT) What? Key Capabilities Integrate any type of data with pre-built connectors High-scale batch data processing High availability, disaster recovery, downtime-less upgrades Low-latency SQL processing Benefits of Cloudera and Datameer Over 2X the performance at 1/10th the cost 96% reduction in ETL time 15

Offload Analytics / Machine Learning What? Training & scoring
 predictive models Deep and broad data sets Key Capabilities Drag-and-drop Data Mining and Machine Learning for a business analyst Automated support for Clustering, Recommendations, Decision Tree, and Column Dependencies Ability to run SAS, R natively on the same cluster Benefits of Cloudera and Datameer Greater flexibility at 1/10th the cost Expand data mining and machine learning to analysts

Offload Self-Service Business Intelligence Workload Key Capabilities Self-Service BI,
 Exploratory BI,
 Data Discovery 250+ prebuilt analytics functions Unknown Questions Open source interactive SQL Transparency and governance Benefits of Cloudera and Datameer Better flexibility at 1/10th the cost Reduce analysis time from 4 weeks to 3 days

Complementing the Data Warehouse Data Warehouse Enterprise Applications (High $/Byte) Load OLTP ETL Archive CLOUDERA / DATAMEER Analyze Integrate Vis Batch Process Storage 19 Operational BI Query
 Search Business Intelligence Archival Data, Exploration, Analytics

Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion

Process! Integrate! Define! Ad Hoc Prepare and! Analyze! Deploy! Visualize and ! Validate! Production

Define! Profile and Assess Prioritize Identify "  Workloads in EDW" "  Constraints" "  Use cases" "  Ability to migrate" "  Portability" "  Return on investment" "  Size of data set" "  Disruption" © 2013 Datameer, Inc. All rights reserved.

Integrate! Migration Codeless Integration "  Data ingest paths" " ELT, not ETL" "  Map EDW workload to Cloudera" " 50+ Datameer connectors, plug-in API" © 2013 Datameer, Inc. All rights reserved.

Prepare and Analyze! Interactive Data Preparation Interactive + Smart Analytics Transparency + Governance " Ensure Data Quality" "  250+ built-in functions" "  Visual data lineage" " Enrich data" "  Automated machine learning" "  Complete audit trail" "  Metadata catalog" © 2013 Datameer, Inc. All rights reserved.

Visualize and Validate! Visualization Anywhere Validate "  Infographic or dashboard" " Verify results" "  Run on tablets and smart phone devices" " Tune" © 2013 Datameer, Inc. All rights reserved.

Deploy! Security Scheduling Monitoring "  LDAP / Active Directory " "  Dependency triggers" "  Monitoring system, jobs, "  Role based access control" "  Data synchronization" "  Support for Kerberos" "  External scheduling integration" performance, throughput" "  Error handling" "  Log management" © 2013 Datameer, Inc. All rights reserved.

Role Responsibilities Admin Set up and maintain environment Business Analyst Work with partners to define requirements and define goals Deployment Team Set up monitoring and scheduling ETL Architect Prepare and cleanse data

Roles Mapped to Process! Define BA Define goals, results, sources, requirements Integrate Admin Source data, secure for ad hoc Prepare & Analyze BA / Arch. Cleanse, combine, enrich data Create analysis Visualize BA Create infographics, dashboards Deploy Admin / Deploy. Team Business: Validate with end users Technical: Secure, monitor schedule

Use Cases Customer Operational Fraud and Compliance

Customer Reduce customer acquisition costs by 30%

HELLO my name is Identify $2B in fraudulent transactions $5.15 $3.95 $4.10 $4.15 $4.55 $3.22 greg 7-ELEVEN POS Reports Location Data Transactions Authorizations

Structured Logs ImproveDoubling in size every customer service, Network Data development, sales 15 months Unstructured Logs 111001 110010 01101001 01100100 10011101 01101110

Calculating ROI is a process

Apply ROI to Multiple Projects

Calculating Return

Business Benefits Funnel Optimization Increase Customer conversion by 3x Behavioral Analytics Increase Revenue by 2x Fraud Prevention Customer Segmentation Identify $2B in potential fraud Lower Customer Acquisition Costs by 30%

EDW Optimization Enterprise Data Warehouse Discover fraud in less time – from 2 days to 2 hours, save $30M on DR Avoid tens of millions in expansion purchases Offload 90% of all data Shrank EDW footprint by 4PB, 20x performance boost

Call to Action ▪  ROI and Solution Development Consultation ▪  Join us at Hadoop World ▪  Contacts –  Jeff Bean jwfbean@cloudera.com –  Karen Hsu khsu@datameer.com

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Complement Your Existing Data Warehouse with Big Data & Hadoop

Share Complement Your Existing Data Warehouse with Big Data & Hadoop. ... How Big Data Analytics and Hadoop Complement Your Existing Data Warehouse Jeff ...
Read more

How Big Data Analytics & Hadoop Complement Your Existing ...

... months to days • Extend the life of your existing data warehouse investments • Enable ... Big Data Analytics & Hadoop Complement Your ...
Read more

How Big Data Analytics & Hadoop Complement Your Existing ...

How Big Data Analytics & Hadoop Complement Your Existing Data Warehouse. Syncsort and Cloudera provide you a seamless approach to unlocking the value of ...
Read more

Hadoop and the Enterprise Data Warehouse, Simplified

Hadoop and the Enterprise Data Warehouse, ... replace existing infrastructures, complement ... how best to utilize Hadoop - even if you don't have big data.
Read more

Enhancing Your Data Warehouse: How Big Data Technologies ...

ENHANCING YOUR DATA WAREHOUSE ... on your existing data warehouse ... The idea is to complement the data warehouse with big data capabilities that are ...
Read more

How Big Data Analytics Complement Your Existing Data Warehouse

How Big Data Analytics Complement Your Existing Data Warehouse. Datameer and Cloudera discuss how Hadoop and big data analytics can help to get all the ...
Read more

Building a Hadoop Data Warehouse: Hadoop 101 for EDW ...

An efficient staging and ETL source for an existing data warehouse You ... Building a Hadoop Data Warehouse ... >You can mainstream Big Data and Hadoop ...
Read more

Hadoop vs Data Warehouse: Friends not Enemies

... the rate of adoption of Hadoop big data ... Hadoop and the Data Warehouse: ... virtually all of your organization’s existing data is instantly ...
Read more