IEG 201402 INTUIT Building Big Data Analytics Platform

60 %
40 %
Information about IEG 201402 INTUIT Building Big Data Analytics Platform
Business & Mgmt

Published on February 14, 2014

Author: informationexcellence



Information Excellence Group 2014 Spring "Business Analytics Industry Summit", Building Big Data Analytics Platform, Neeta Pande, Data Architect, INTUIT

INTUIT: Neeta Pande Building Big Data Analytics Platform at Intuit

Building Big Data Analytics Platform at Intuit Neeta Pande 8/Feb/2014

Roadmap • Setting Context and Introduction to the Analytics Platform at Intuit • Key highlights that differentiates the platform • Sharing Experiences building the platform • Wish-list of capabilities for future of Big data technologies

Setting Context and Intro to the Analytics Platform

Quick look into Intuit Offerings

Introduction to the Analytical Platform • Central repository of Analytical Data from – – – – Intuit products Intuit Business Systems Intuit Master Systems External Data Sources • Caters to – – – – – Product Managers Product Developers Data Analysts Data Scientists Experience Designers Enterprise Wide Platform for cross Intuit Data Analytics 7

HCATALOG Technologies used to build the platform

Key highlights that differentiates the platform

Capability View of the Platform Management, PM, PD, Data Analyst, Data Scientist Policy based Access Control Central Analytics Platform Near Realtime Batch Realtime Data Integration Product User Entered Data 10 Product Usage Data Business Data Master Data External Data

Key differentiators of the Platform DWH Semantic layers on Hadoop Cohost Sensitive Information on same infrastructure Batch, Near Real Time, Real time on the same infrastructure Mobile, Web, Desktop Offerings Enterprise wide data across all offerings and cross-offerings

Data Pipeline and Challenges • Encryption of sensitive information • Tokenization for join optimization on sensitive fields • Extract Analytical information before encryption • Challenge loading data from transactional sources 3 Data Cleansing 1 Data Acquisition • Cleansing and Standardization need third party libraries • Part of the same flow and need a hadoop integration DWH load 6 7 8 5 4 Data Standardization • DWH patterns like SCD, surrogate key, fact updates challenging Entity Mastering Incremental load Data Securitization 2 • MDM solutions from major vendors do not provide mastering in Hadoop. • Interactive exploration in MPPRDBMS because of Advanced SQL and query performance • Sampling and extraction for building models in R Data Consumption

Sharing Experiences building the platform

Custom Implementation of Mastering solution in-hadoop. • Custom Implementation of symmetric key Encryption/Decryption. • Hadoop does not provide out of the box solution • Leading MDM solutions do not have Hadoop Integration • Evaluated Third Party Solutions, not matured enough • Some open source tools have MDM capabilities, but not matured and widely adopted. • Key management using HSM (Safenet) • Decryption UDFs in MR, PIG, Hive shielding developers/users from the security implementation • Evaluated and found Informatica Data Quality good fit for Data Cleansing and Standardization integrated in the same flow as Batch Data Integration • Batch Data Integration – Evaluated and found Big Data Integration capabilities of Informatica relevant for the Platform • Real time – Using Flume for real time use cases. Found Kafka and storm to be a good fit from several requirements POV. • Traditional DWH and incremental loads challenging on Hadoop. • Upserts and SCD handled best in HBase and exposed via HCatalog for querying The adhoc query capabilities still not matured/adopted and hence MPP-RDBMS still preferred. • Large Scale machine learning infrastructure still being adopted. Hence widely used technology options not in place

Wish-list for future of Hadoop

Data Security support built in to the platform MDM solutions integrated and optimized for the platform Interactive querying capabilities on the big data platforms (Impala, Tez) Better support for traditional DWH capabilities Integrated Real time, Near real time and Batch processing pipelines Distributed machine learning technologies with comprehensive and advanced capabilities Opensource end to end data quality solutions integrated with the platform

Q&A Thank you

About Information Excellence Group Community Focused Volunteer Driven Knowledge Share Accelerated Learning Collective Excellence Distilled Knowledge Shared, Non Conflicting Goals Validation / Brainstorm platform Progress Information Excellence Towards an Enriched Profession, Business and Society Mentor, Guide, Coach Satisfied, Empowered Professional Richer Industry and Academia

About Information Excellence Group Reach us at: blog: presentations: linked in: Facebook: Google+: twitter: email: #infoexcel Have you enriched yourself by contributing to the community Knowledge Share..

#infoexcel presentations

Add a comment

Related presentations

Canvas Prints at Affordable Prices make you smile.Visit http://www.shopcanvasprint...

30 Días en Bici en Gijón organiza un recorrido por los comercios históricos de la ...

Con el fin de conocer mejor el rol que juega internet en el proceso de compra en E...

With three established projects across the country and seven more in the pipeline,...

Retailing is not a rocket science, neither it's walk-in-the-park. In this presenta...

What is research??

What is research??

April 2, 2014

Explanatory definitions of research in depth...

Related pages

Big Data Analytics Platform | LinkedIn

View 1158 Big Data Analytics Platform ... Wearable Tech Must Start With Big analytics platform to measure ... IEG 201402 INTUIT Building Big ...
Read more

Big Data Hosting | Computing Architecture & Management ...

SoftLayer provides the scalability and performance that big data ... big thing in big data platform ... of building your big data ...
Read more

Big Data: The organizational challenge - Bain & Company

Big Data: The organizational challenge ... But is building an advanced analytics capability really worth ... position in Big Data analytics and ...
Read more

Agenda, Speakers and Schedule for Feb 8th Summit ...

... Networking IISc and IEG Volunteer Team 9:00 9:15 Welcome ... Speakers and Schedule for Feb 8th ... Building Big Data Analytics Platform:
Read more

Explore Career and Job Opportunities at Intuit – Intuit ...

At Intuit we hire exceptional people in software engineering, user experience, data analytics, ... Do big things with small teams.
Read more

Intuiface - Multi-Touch Interfaces | Interactive ...

... experience creation platform. Use IntuiFace to create ... IntuiFace Data Tracking: Define ... experiences to any platform for analytics, marketing or ...
Read more

Big Data Jobs, Employment |

46,651 Big Data Jobs available on ... Big Data, and Analytics Platforms; Accenture ... Experience building big data solution using Hadoop ...
Read more