A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

33 %
67 %
Information about A Performance Comparison of Container-based Virtualization Systems for...
Technology

Published on March 9, 2014

Author: mvneves

Source: slideshare.net

A  Performance  Comparison  of  Container-­‐based   Virtualiza8on  Systems  for  MapReduce  Clusters     Miguel  G.  Xavier,  Marcelo  V.  Neves,  Cesar  A.  F.  De  Rose   miguel.xavier@acad.pucrs.br   Faculty  of  Informa8cs,  PUCRS   Porto  Alegre,  Brazil     February  13,  2014  

Outline   •  •  •  •  •  Introduc8on   Container-­‐based  Virtualiza8on   MapReduce   Evalua8on   Conclusion    

Introduc8on   •  •  •  Virtualiza8on     •  Allows  resources  to  be  shared   •  Hardware  independence,  availability,  isola8on  and  security   •  BeUer  manageability   •  Widely  used  in  datacenters/cloud  compu8ng   MapReduce  Cluster  and  Virtualiza8on     •  Usage  scenarios   •  BeUer  resource  sharing   •  Cloud  Compu8ng   However,  hypervisor-­‐based  technologies  in  MapReduce  environments  has   tradi8onally  been  avoided  

Container-­‐based  Virtualiza8on   •  A  group  o  processes  on    a  Linux  box,  put  together  in  a   •  •  •  isolated  environment   A  lightweight  virtualiza8on  layer     Non  virtualized  drivers   Shared  opera8ng  system   Guest Processes Guest Processes Guest Processes Guest Processes Guest OS Guest OS Virtualization Layer Virtualization Layer Host OS Host OS Hardware Hardware Container-based Virtualization Hypervisor-Based Virtualization

•  Container-­‐based  Virtualiza8on     Each  container  has:   •  Its  own  network  interface  (and  IP  Address)   •  •  •  •  Bridged,  routed  …   Its  own  filesystem   Isola8on  (security)   •  container  A  and  B  can’t  see  each  other   Isola8on  (resource  usage)   •  RAM,  CPU,  I/O   •  Current  systems   •  Linux-­‐Vserver,  OpenVZ,  LXC        

•  Container-­‐based  Virtualiza8on     Implements  Linux  Namespaces   Mount  –  moun8ng/unmou8ng  file  systems   UTS  –  hostname,  domainname   IPC  –  SysV  message  queues,  semaphore,  memory  segments   Network  –  IPv4/IPv6  stacks,  rou8ng,  firewall,  /proc/net,   sock   •  PID  –  Own  set  of  pids   Chroot  is  filesystem  namespace     •  •  •  •  •  Current  systems   •  Linux-­‐Vserver,  OpenVZ,  LXC        

•  Container-­‐based  Systems     Linux-­‐VServer   Implements  its  own  features  in  Linux  kernel     limits  the  scope  of  the  file  system  from  different  processes   through  the  tradi8onal  chroot   •  OpenVZ   •  •  •  Linux  Containers  (LXC)   •  Based  on  CGroups  

Hypervisor-­‐  vs  Container-­‐based  Systems   Hypervisor   Container   Different  Kernel  OS   Single  Kernel   Device  Emula8on   Syscall   Many  FS  caches   Single  FS  cache   Limits  per  machine   Limits  per  process   High  Performance  Overhead   Low  Performance  Overhead  

MapReduce   •  •  MapReduce     •  A  parallel  programming  model   •  Simplicity,  efficiency  and  high  scalability   •  It  has  become  a  de  facto  standard  for  large-­‐scale  data  analysis     MapReduce  has  also  aUracted  the  aUen8on  of  the  HPC   community   •  Simpler  approach  to  address  the  parallelism  problem   •  Highly  visible  case  where  MapReduce  has  been  successfully   used  by  companies  like  Google,  Yahoo!,  Facebook  and   Amazon  

MapReduce  and  Containers   •  •  Apache  Mesos   •  Shares  a  cluster  between  mul8ple  different  frameworks   •  Creates  another  level  of  resource  management   •  Management  is  taken  away  from  cluster’s  RMS   Apache  YARN   •  Hadoop  Next  Genera8on   •  BeUer  job  scheduling/monitoring   •  Uses  virtualiza8on  to  share  a  cluster  among  different   applica8ons      

Evalua8on   •  Experimental  Environment     •  •  •  •  •  Hadoop  cluster  composed  by  4  nodes     Two  processors  with  8  cores  (without  threads)  per  node   16GB  of  memory  per  node   146GB  of  disksize  per  node   Analyze  of  the  best  results  of  performance   •  Through  micro-­‐benchmarks     •  •  •  •  •  HDFS  evalua8on  (TestDFSIO)   NameNode  evalua8on  (NNBench)   MapReduce  evalua8on  (MRBench)   Through  macro-­‐benchmarks  (WordCount,  TeraBench)     Analyze  of  best  results  of  isola8on   •  Through  IBS  benchmark   At  least  50  execu8ons  were  performed  for  each  experiment   •   

HDFS  Evalua8on   Semngs:   •  Replica8on  of  3  blocks   •  File  size  from  100  MB  to   3000  MB       •  •  •    All  Container-­‐based  systems   have  performance  similar  to   na8ve     Results  o  OpenVZ  represents   loss  of  3Mbps   It  is  due  to  the  CFQ  scheduler     30 25 Throughput (Mbps) •  20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000

HDFS  Evalua8on     •    All  of  Container-­‐based   systems  obtained   performance  results  similar   to  na8ve     Linux-­‐VServer  uses  a   Physical-­‐based  network   30 25 Throughput (Mbps)   •  20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000

NameNode  Evalua8on  using  NNBench   •       Generates  opera8ons  on  1000  files  on  HDFS   Na8ve   VServer   0.51     0.52   0.51   0.49   Create/Write  (ms)   •  •  OpenVZ   Open/Read  (ms)   •  •  LXC   54.65   56.89   51.96   48.90   NNBench  benchmark  was  chosen  to  evaluate  the  NameNode  component   Linux-­‐VServer  reaches  a  latency  at  a  average  of  48ms,  while  LXC  obtained  the   worst  result  at  an  average  of  56ms   The  differences  are  not  so  significant  if  the  numbers  are  considered   However,  the  strengths  are  that  no  excep8on  was  observed  during  the  high   HDFS  management  stress,  and  that  all  systems  were  able  to  respond   effec8vely  as  the  na8ve  

MapReduce  Evalua8on  using  MRBench   Na8ve   Execu8on  Time     •  LXC   OpenVZ   VServer   14251       13577   14304     13614     The  results  obtained  from  MRBench  show  that  MR  layer  suffers  no  substan8al   effect  while  running  on  different  container-­‐based  virtualiza8on  systems  

Analyzing  Performance  with  WordCount   180 •  30  GB  of  input  data   •  The  peak  of  performance   degrada8on  from  OpenVZ   is  explained  by  the  I/O   scheduler  overhead   160 140 Execution Time (seconds)   120 Native 100 LXC OpenVZ 80 VServer 60 40 20 0 Wordcount

Analyzing  Performance  with  TeraSort   •  A  HDFS  block  size  of  64MB     140 120 Execution Time (seconds) •  Standard  map/reduce  sort     •  Steps:   •  Generates  30  GB  of  input   data   •  Run  on  such  input  data.     100 Native 80 LXC OpenVZ VServer 60 40 20 0 Terasort

Performance  Isola8on   Base  line     applica8on   Base  line     applica8on   Stress  Test   Container   A   Container   A   Container   B   Execu8on  Time     Execu8on  Time     Performance  degrada8on  (%)    

Performance  Isola8on     CPU   LXC   Memory   I/O   Fork  Bomb   0%   8.3%   5.5%   0%   •  We  chose  LXC    as  the  representa8ve  of  the  container-­‐based  virtualiza8on  to  be   evaluated   •  The  limits    of  the  CPU  usage  per  container  is  working  well   •  no  significant  impact  was  noted.     •  a  liUle  performance  degrada8on  needs  to  be  taken  into  account     •  The  fork  bomb  stress  test  reveals  that  the  LXC  has  a  security  subsystem  that   ensure  feasibility  

Conclusions   •  we  found  that  all  container-­‐based  systems  reach  a  near-­‐na8ve  performance  for   MapReduce  workloads     •  the  results  of  performance  isola8on  reveled  that  the  LXC  has  improved  its   capabili8es  of  restrict  resources  among  containers     •  although  some  works  are  already  taking  advantages  of  container-­‐based   systems  on  MR  clusters   •  this  work  demonstrated  the  benefits  of  using  container-­‐based  systems  to   support  MapReduce  clusters  

Future  Work   •  We  plan  to  study  the  performance  isola8on  at  the  network-­‐level   •  We  plan  to  study  the  scalability  while  increasing  the  number  of   nodes   •  We  plan  to  study  aspects  regarding  the  green  compu8ng,  such  as   the  trade-­‐off  between  performance  and  energy  consump8on    

Thank  you  for  your  aUen8on!  

Add a comment

Related presentations

Related pages

A Performance Comparison of Container-Based Virtualization ...

A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters. Authors: Miguel Gomes Xavier: Marcelo Veiga Neves: Cesar ...
Read more

A Performance Comparison of Container-based Virtualization ...

A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose
Read more

A Performance Comparison of Container-Based Virtualization ...

A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters ...
Read more

A Performance Comparison of Container-Based Virtualization ...

A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters ... although all container-based systems reach a near ...
Read more

A Performance Comparison of Container-Based Virtualization ...

A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters Xavier, Miguel Gomes; Neves, Marcelo Veiga; Rose, Cesar Augusto ...
Read more

A Performance Comparison of Container-Based Virtualization ...

Official Full-Text Publication: A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters on ResearchGate, the professional ...
Read more

www.computer.org

... "A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters," 2014 22nd Euromicro International Conference on ...
Read more

Performance Evaluation of Container-based Virtualization ...

Performance Evaluation of Container-based ... and isolation in container-based virtualization systems and ... Comparison of Container-based and ...
Read more