Published on February 24, 2014
Adam Muise – Solu/on Architect, Hortonworks HADOOP 101: AN INTRODUCTION TO HADOOP WITH THE HORTONWORKS SANDBOX
Who are we?
Who is ?
100% Open Source – Democra/zed Access to Data The leaders of Hadoop’s development We do Hadoop Drive Innova/on in the plaForm – We lead the roadmap Community driven, Enterprise Focused
We do Hadoop successfully. Support Training Professional Services
Enter the Hadoop. ……… hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/
Hadoop was created because tradi/onal technologies never cut it for the Internet proper/es like Google, Yahoo, Facebook, TwiOer, and LinkedIn
Tradi/onal architecture didn’t scale enough… App App App App App App App App DB DB DB SAN App App App App DB DB DB SAN DB DB DB SAN
Databases can become bloated and useless
$upercompu/ng Tradi/onal architectures cost too much at that volume… $/TB $pecial Hardware
So what is the answer?
If you could design a system that would handle this, what would it look like?
It would probably need a highly resilient, self-‐healing, cost-‐eﬃcient, distributed ﬁle system… Storage Storage Storage Storage Storage Storage Storage Storage Storage
It would probably need a completely parallel processing framework that took tasks to the data… Processing Processing Processing Storage Storage Storage Processing Processing Processing Storage Storage Storage Processing Processing Processing Storage Storage Storage
It would probably run on commodity hardware, virtualized machines, and common OS plaForms Processing Processing Processing Storage Storage Storage Processing Processing Processing Storage Storage Storage Processing Processing Processing Storage Storage Storage
It would probably be open source so innova/on could happen as quickly as possible
It would need a cri/cal mass of users
Tez Storm YARN Pig HDFS MapReduce Apache Hadoop HCatalog Hive HBase Ambari Knox Sqoop Falcon Flume
Storm Tez Pig YARN HDFS MapReduce Hortonworks Data PlaForm HCatalog Hive HBase Ambari Knox Sqoop Falcon Flume
We are going to learn how to work with Hadoop in less than an hour.
To do this, we need to install Hadoop right?
Enter the Sandbox.
The Sandbox is ‘Hadoop in a Can’. It contains one copy of each of the Master and Worker node processes used in a cluster, only in a single virtual node. Processing Processing Processing Storage Storage Storage Processing Processing Processing Storage Storage Storage Processing Storage Linux VM Processing Processing Processing Storage Storage Storage
Gefng started with Sandbox VM: -‐ Pick your ﬂavor of VM at… hOp://www.hortonworks.com/sandbox -‐ Start the sandbox VM -‐ ﬁnd the IP displayed -‐ go to… hOp://172.16.130.131 -‐ Register -‐ Click on ‘Start Tutorials’ -‐ On the lek hand nav, click on ‘HCatalog, Basic Pig & Hive Commands’
In this tutorial we will: -‐ Land ﬁles in HDFS -‐ Assign metadata with HCatalog -‐ Use SQL with Hive -‐ Learn to process data with Pig
Try the other tutorials.
Hadoop is the new Modern Data Architecture for the Enterprise
There is NO second place Hortonworks …the Bull Elephant of Hadoop InnovaGon © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 29
Averboukh profile_template-en short 24 Feb 2014 ver 1 1. Page 1 Elena Averboukh Lean and CI Expert - Coach - Manager Six Sigma Master Black Belt Dr.- Ing.
Sweet Pea Chainmaille Date: Sunday. Feb.16th Time: 10:00 a.m - 2:00 p.m. Class Fee $20.00 Rings: $20.00 per kit (does not include a clasp) I will have a ...