2014 feb 24_big_datacongress_hadoopsession1_hadoop101

50 %
50 %
Information about 2014 feb 24_big_datacongress_hadoopsession1_hadoop101
Technology

Published on February 24, 2014

Author: adammuise

Source: slideshare.net

Description

A hands on introduction to Hadoop by using the Hortonworks Sandbox

Adam  Muise  –  Solu/on  Architect,  Hortonworks   HADOOP  101:   AN  INTRODUCTION  TO  HADOOP  WITH  THE   HORTONWORKS  SANDBOX  

Who  are  we?  

Who  is                                        ?  

100%  Open  Source  –   Democra/zed  Access  to   Data   The  leaders  of  Hadoop’s   development   We  do  Hadoop   Drive  Innova/on  in   the  plaForm  –  We   lead  the  roadmap     Community  driven,     Enterprise  Focused  

We  do  Hadoop  successfully.   Support     Training   Professional  Services  

Enter  the  Hadoop.   ………   hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  

Hadoop  was  created  because   tradi/onal  technologies  never  cut  it   for  the  Internet  proper/es  like   Google,  Yahoo,  Facebook,  TwiOer,   and  LinkedIn  

Tradi/onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  

Databases  can  become  bloated   and  useless  

$upercompu/ng   Tradi/onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  

So  what  is  the  answer?  

If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  

It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  

It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  

It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaForms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  

It  would  probably  be  open  source  so   innova/on  could  happen  as  quickly   as  possible  

It  would  need  a  cri/cal  mass  of   users  

Tez   Storm   YARN   Pig   HDFS   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  

Storm   Tez   Pig   YARN   HDFS   MapReduce   Hortonworks  Data  PlaForm   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  

We  are  going  to  learn  how  to  work   with  Hadoop  in  less  than  an  hour.  

To  do  this,  we  need  to  install   Hadoop  right?  

Nope.  

Enter  the         Sandbox.  

The  Sandbox  is  ‘Hadoop  in  a  Can’.   It  contains  one  copy  of  each  of  the   Master  and  Worker  node  processes   used  in  a  cluster,  only  in  a  single   virtual  node.   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Storage   Linux  VM   Processing   Processing  Processing   Storage   Storage   Storage  

Gefng  started  with  Sandbox  VM:     -­‐  Pick  your  flavor  of  VM  at…    hOp://www.hortonworks.com/sandbox   -­‐  Start  the  sandbox  VM   -­‐  find  the  IP  displayed       -­‐  go  to…    hOp://172.16.130.131     -­‐  Register   -­‐  Click  on  ‘Start  Tutorials’   -­‐  On  the  lek  hand  nav,  click  on  ‘HCatalog,  Basic  Pig    &  Hive  Commands’    

In  this  tutorial  we  will:   -­‐  Land  files  in  HDFS   -­‐  Assign  metadata  with  HCatalog   -­‐  Use  SQL  with  Hive   -­‐  Learn  to  process  data  with  Pig  

Try  the  other  tutorials.  

Hadoop  is  the  new  Modern  Data   Architecture  for  the  Enterprise  

There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  InnovaGon   © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  29  

Add a comment

Related presentations

Related pages

Feb Short 2014 - Documents - docslide.us

Averboukh profile_template-en short 24 Feb 2014 ver 1 1. Page 1 Elena Averboukh Lean and CI Expert - Coach - Manager Six Sigma Master Black Belt Dr.- Ing.
Read more

Feb 2014 - Documents

Sweet Pea Chainmaille Date: Sunday. Feb.16th Time: 10:00 a.m - 2:00 p.m. Class Fee $20.00 Rings: $20.00 per kit (does not include a clasp) I will have a ...
Read more