OOP 2014

67 %
33 %
Information about OOP 2014

Published on February 4, 2014

Author: EmilAndreasSiemes

Source: slideshare.net


Data-Lake talk at OOP 2014

Hortonworks: We Do Hadoop. Our mission is to enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop Emil A. Siemes esiemes@hortonworks.com Solution Engineer January 2014

Enable your Modern Data Architecture by Our Mission: Delivering Enterprise Apache Hadoop Our Commitment Headquarters: Palo Alto, CA Employees: 300+ and growing Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process Trusted Partners Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Page 2

APPLICATIONS A Traditional Approach Under Pressure Custom Applications Business Analytics Packaged Applications DATA SYSTEM 2.8 ZB in 2012 85% from New Data Types RDBMS EDW MPP REPOSITORIES 15x Machine Data by 2020 40 ZB by 2020 SOURCES Source: IDC Existing Sources Emerging Sources (CRM, ERP, Clickstream, Logs) (Sensor, Sentiment, Geo, Unstructured) Page 3

APPLICATIONS Emerging Modern Data Architecture Custom Applications Business Analytics Packaged Applications DEV & DATA TOOLS SOURCES DATA SYSTEM BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MANAGE & MONITOR MPP REPOSITORIES Existing Sources Emerging Sources (CRM, ERP, Clickstream, Logs) (Sensor, Sentiment, Geo, Unstructured) Page 4

Drivers of Hadoop Adoption New Business Applications From NEW types of Data (or existing types for longer) Page 5

Most Common NEW TYPES OF DATA 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Value Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents + Keep existing data longer!

Drivers of Hadoop Adoption Architectural A Modern Data Architecture New Business Applications Complement your existing data systems: the right workload in the right place Page 7

Let’s build a Data Lake… Instructions on: hadoopwrangler.com Page 8

HDP Data Lake Solution Architecture Manage Steps 1-4: Data Lifecycle with Falcon FALCON (data pipeline & flow management) Downstream Data Sources Oozie (Batch scheduler) Step 4: Schedule and Orchestrate HIVE SOURCE DATA PIG Step 3: Transform, Aggregate & Materialize EDW ClickStream Data HCATALOG File Sales Transaction/ Data JMS Ingestion Step 1:Extract & Load REST HTTP Social Data Sqoop/Hiv e EDW (Teradata) Step 2: Model/Apply Metadata INTERACTIVE SQOOP compute & storage . Storm Web HDFS Query/ Analytics/Repor ting Tools Hive Server (Tez/Stinger) Tableau/Excel YARN . MR2 . NFS Marketing/I nventory HBase Client OLTP HBase Use Case Type 1: Materialize & Exchange (table & user-defined metadata) FLUME Product Data Mahout (data processing) Exchange . . . Elastic Search . TEZ . . . SAS compute & storage Use Case Type 2: Explore/Visualize Datameer/Platfo ra/SAP Stream Processing, Real-time Search, MPI AMBARI Streaming YARN Apps Data Lake HDP Grid Knox – Perimeter Level Security Opens up Hadoop to many new use cases Page 9

Hadoop 2: The Introduction of YARN Store all date in a single place, interact in multiple ways Single Use System Multi Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop HADOOP 2 Standard Query Processing (cluster resource management & data processing) HDFS (redundant, reliable storage) Real Time Stream Processing Hive, Pig MapReduce Online Data Processing HBase, Accumulo Storm Batch … Interactive MapReduce others Tez Efficient Cluster Resource Management & Shared Services (YARN) Redundant, Reliable Storage (HDFS) Page 10

Let’s start simple… • A solution unifying all data sources of a mobile App – Allowing analytics over all data in one place – In real time and long term • Mobile Apps have multiple channels for data: – Data created on the handset (e.g. geo location) – Data created on servers accessed by the mobile app (e.g. app data, logs) – Data from backend services (e.g. RDBMS) – Store data (e.g. iTunes Connect, Google Play) – Social data (Twitter, App Reviews, etc.) Page 11

Why Should We Care? • How much revenue did I made? (Not that easy to answer as one could think) • Where are my customers now? • Can you fulfill requirements from the business like: ”Tell me when our customers are in a coffee shop so we can offer them e.g. Wifi” • What are my customers thinking about my app/brand? • Are the ones complaining really using it (correct)? • How can I support marketing activities? • How can I evaluate local marketing activities? • Does positive/negative sentiment effect my downloads? • Will my servers be able to deal with the load in 3 months • … Page 12

Design Goals • Use as much as we have in our stack as possible • Minimize dependencies on stacks beyond Hadoop – Still make it useful and complete • Make it fit into a 8GB MacBook/Laptop • Release early & release often Page 13

iiCaptain Page 14

Types Of Data For iiCaptain • Geo location data • Store Data • iTunes Connect, Google Play, Amazon via AppAnnie • Twitter • RDBMS (Sqoop) • Logs Page 15

iiCaptain’s Data Ocean / Data Lake Page 16

More Details Page 17

Analytics Page 18

SQL Interactive Query & Apache Hive Key Services Apache Hive Platform, operational and data services essential for the enterprise • The defacto standard for Hadoop SQL access • Used by your current data center partners • Built for batch AND interactive query Skills SQL Leverage your existing skills: development, analytics, operations Stinger Initiative Integration Interoperable with existing data center investments Broad, community based effort to deliver the next generation of Apache Hive Speed Scale SQL Improve Hive query performance by 100X to allow for interactive query times (seconds) The only SQL interface to Hadoop designed for queries that scale from TB to PB Support broadest range of SQL semantics for analytic applications against Hadoop Page 19

Build Process, Shining With Savanna Page 20

Roadmap - Servlet Engine in YARN Project Savanna: Continuous Delivery end-2-end Sentiment Analysis with Flume/Hive and App Reviews Knox Falcon Phoenix Page 21

HDP 2.0: Enterprise Hadoop Platform OPERATIONAL OPERATIONAL SERVICES SERVICES AMBARI Cluster AMBARI Dataset Mgmnt FALCON FALCON* Mgmnt Schedule OOZIE OOZIE Hortonworks Data Platform (HDP) DATA DATA SERVICES SERVICES FLUME FLUME Data Movement SQOOP SQOOP LOAD & LOAD & EXTRACT EXTRACT NFS NFS CORE CORE SERVICES WebHDFS CORE CORE SERVICES SERVICES KNOX* KNOX* WebHDFS HIVE HBASEData Access HIVE& PIG HCATALOG HBASE MAP Process REDUCE TEZ TEZ ResourceYARN Management Cloud • Integrates full range of enterprise-ready services HDFS Storage HDFS Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OS/VM • The ONLY 100% open source and most current platform • Certified and tested at scale • Engineered for deep ecosystem interoperability Appliance Page 22

Hortonworks: The Value of “Open” for You Validate & Try 1. Download the Hortonworks Sandbox 2. Learn Hadoop using the technical tutorials 3. Investigate a business case using the step-bystep business cases scenarios 4. Validate YOUR business case using your data in the sandbox Engage 1. Execute a Business Case Discovery Workshop with our architects 2. Build a business case for Hadoop today Connect With the Hadoop Community We employ a large number of Apache project committers & innovators so that you are represented in the open source community Avoid Vendor Lock-In Hortonworks Data Platform remain as close to the open source trunk as possible and is developed 100% in the open so you are never locked in The Partners you Rely On, Rely On Hortonworks We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments Certified for the Enterprise We engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use Support from the Experts We provide the highest quality of support for deploying at scale. You are supported by hundreds of years of Hadoop experience Page 23

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

OOP Konferenz - Software Engineering und Management ...

Die OOP ist eine der bekanntesten und größten Software-Konferenzen weltweit und fester Bestandteil im Kalender der deutschen Software-Spezialisten.
Read more

OOP Konferenz - Software Engineering und Management ...

Die OOP 2017 findet vom 30. Januar - 03. Februar 2017 im ICM München statt. Bilder der letzten OOP finden Sie in unserer Bildergalerie. Wir wünschen viel ...
Read more

OOP Konferenz - Konferenzprogramm - 2014

1) Von „über den Zaun“ zum „Deal“ – Planung im agilen Entwicklungsumfeld mit verteilten Teams (Uwe Paesch, Silvio Simone) 2) Warum Scrum nicht ...
Read more


OOP 2013 - Continuous Innovation: The Foundation for Success 21. - 25. Januar 2013 Munich/Germany ICM International Congress Center. Die OOP ist der ...
Read more

OOP 2014: Gegen Massenüberwachung und für soziale ...

Vom 3. bis zum 7. Februar fand zum 23. Mal die OOP in München statt. Die Konferenz, die sich Themen rund um Software Engineering und Management widmet ...
Read more


SIGS DATACOM veranstaltet die OOP, die Software Engineering und Management Konferenz im ICM München. Zahlreiche internationale Experten halten Vorträge ...
Read more

Konferenz-Website für SIGS DATACOM | Die Medialen

Die Medialen haben die Konferenz-Website der OOP 2014 neu gestaltet und mit dem Content-Management-System TYPO3 programmiert sowie SEO-Maßnahmen vorgenommen
Read more

Objektorientierte Programmierung (Winter 2014/15)

Homepage der Vorlesung "Objektorientierte Programmierung" im Wintersemester 2014/15 an der Martin-Luther-Universität Halle-Wittenberg
Read more

Konferenz Ausstellung OOP - sigs.de

7 2014 software meets business OOP * Alle Preise zzgl. gesetzl. MwSt. Bronze-SponSorShIp Stand • 15 m² Standfläche für eigenen Standbau (inkl. täglicher
Read more

OOP 2014 mit neuer Website - diemedialen.de

Die OOP ist seit 23 Jahren eine der weltweit größten und bekanntesten Software-Konferenzen. Begleitet wird die Veranstaltung von einer Fachmesse und ...
Read more