Published on February 14, 2014
INTUIT: Neeta Pande Building Big Data Analytics Platform at Intuit
Building Big Data Analytics Platform at Intuit Neeta Pande 8/Feb/2014
Roadmap • Setting Context and Introduction to the Analytics Platform at Intuit • Key highlights that differentiates the platform • Sharing Experiences building the platform • Wish-list of capabilities for future of Big data technologies
Setting Context and Intro to the Analytics Platform
Quick look into Intuit Offerings
Introduction to the Analytical Platform • Central repository of Analytical Data from – – – – Intuit products Intuit Business Systems Intuit Master Systems External Data Sources • Caters to – – – – – Product Managers Product Developers Data Analysts Data Scientists Experience Designers Enterprise Wide Platform for cross Intuit Data Analytics 7
HCATALOG Technologies used to build the platform
Key highlights that differentiates the platform
Capability View of the Platform Management, PM, PD, Data Analyst, Data Scientist Policy based Access Control Central Analytics Platform Near Realtime Batch Realtime Data Integration Product User Entered Data 10 Product Usage Data Business Data Master Data External Data
Key differentiators of the Platform DWH Semantic layers on Hadoop Cohost Sensitive Information on same infrastructure Batch, Near Real Time, Real time on the same infrastructure Mobile, Web, Desktop Offerings Enterprise wide data across all offerings and cross-offerings
Data Pipeline and Challenges • Encryption of sensitive information • Tokenization for join optimization on sensitive fields • Extract Analytical information before encryption • Challenge loading data from transactional sources 3 Data Cleansing 1 Data Acquisition • Cleansing and Standardization need third party libraries • Part of the same flow and need a hadoop integration DWH load 6 7 8 5 4 Data Standardization • DWH patterns like SCD, surrogate key, fact updates challenging Entity Mastering Incremental load Data Securitization 2 • MDM solutions from major vendors do not provide mastering in Hadoop. • Interactive exploration in MPPRDBMS because of Advanced SQL and query performance • Sampling and extraction for building models in R Data Consumption
Sharing Experiences building the platform
Custom Implementation of Mastering solution in-hadoop. • Custom Implementation of symmetric key Encryption/Decryption. • Hadoop does not provide out of the box solution • Leading MDM solutions do not have Hadoop Integration • Evaluated Third Party Solutions, not matured enough • Some open source tools have MDM capabilities, but not matured and widely adopted. • Key management using HSM (Safenet) • Decryption UDFs in MR, PIG, Hive shielding developers/users from the security implementation • Evaluated and found Informatica Data Quality good fit for Data Cleansing and Standardization integrated in the same flow as Batch Data Integration • Batch Data Integration – Evaluated and found Big Data Integration capabilities of Informatica relevant for the Platform • Real time – Using Flume for real time use cases. Found Kafka and storm to be a good fit from several requirements POV. • Traditional DWH and incremental loads challenging on Hadoop. • Upserts and SCD handled best in HBase and exposed via HCatalog for querying The adhoc query capabilities still not matured/adopted and hence MPP-RDBMS still preferred. • Large Scale machine learning infrastructure still being adopted. Hence widely used technology options not in place
Wish-list for future of Hadoop
Data Security support built in to the platform MDM solutions integrated and optimized for the platform Interactive querying capabilities on the big data platforms (Impala, Tez) Better support for traditional DWH capabilities Integrated Real time, Near real time and Batch processing pipelines Distributed machine learning technologies with comprehensive and advanced capabilities Opensource end to end data quality solutions integrated with the platform
Q&A Thank you
About Information Excellence Group Community Focused Volunteer Driven Knowledge Share Accelerated Learning Collective Excellence Distilled Knowledge Shared, Non Conflicting Goals Validation / Brainstorm platform Progress Information Excellence Towards an Enriched Profession, Business and Society Mentor, Guide, Coach Satisfied, Empowered Professional Richer Industry and Academia
About Information Excellence Group Reach us at: blog: http://informationexcellence.wordpress.com/ presentations: http://www.slideshare.net/informationexcellence linked in: http://www.linkedin.com/groups/Information-Excellence-3893869 Facebook: http://www.facebook.com/pages/Information-excellence-group/171892096247159 Google+: https://plus.google.com/u/0/communities/102316155996060621595 twitter: email: #infoexcel email@example.com firstname.lastname@example.org Have you enriched yourself by contributing to the community Knowledge Share..
View 1158 Big Data Analytics Platform ... Wearable Tech Must Start With Big Data...data analytics platform to measure ... IEG 201402 INTUIT Building Big ...
SoftLayer provides the scalability and performance that big data ... big thing in big data platform ... of building your big data ...
Big Data: The organizational challenge ... But is building an advanced analytics capability really worth ... position in Big Data analytics and ...
... Networking IISc and IEG Volunteer Team 9:00 9:15 Welcome ... Speakers and Schedule for Feb 8th ... Building Big Data Analytics Platform:
At Intuit we hire exceptional people in software engineering, user experience, data analytics, ... Do big things with small teams.
... experience creation platform. Use IntuiFace to create ... IntuiFace Data Tracking: Define ... experiences to any platform for analytics, marketing or ...
46,651 Big Data Jobs available on Indeed.com ... Big Data, and Analytics Platforms; Accenture ... Experience building big data solution using Hadoop ...