Published on March 9, 2014
A Performance Comparison of Container-‐based Virtualiza8on Systems for MapReduce Clusters Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose email@example.com Faculty of Informa8cs, PUCRS Porto Alegre, Brazil February 13, 2014
Outline • • • • • Introduc8on Container-‐based Virtualiza8on MapReduce Evalua8on Conclusion
Introduc8on • • • Virtualiza8on • Allows resources to be shared • Hardware independence, availability, isola8on and security • BeUer manageability • Widely used in datacenters/cloud compu8ng MapReduce Cluster and Virtualiza8on • Usage scenarios • BeUer resource sharing • Cloud Compu8ng However, hypervisor-‐based technologies in MapReduce environments has tradi8onally been avoided
Container-‐based Virtualiza8on • A group o processes on a Linux box, put together in a • • • isolated environment A lightweight virtualiza8on layer Non virtualized drivers Shared opera8ng system Guest Processes Guest Processes Guest Processes Guest Processes Guest OS Guest OS Virtualization Layer Virtualization Layer Host OS Host OS Hardware Hardware Container-based Virtualization Hypervisor-Based Virtualization
• Container-‐based Virtualiza8on Each container has: • Its own network interface (and IP Address) • • • • Bridged, routed … Its own ﬁlesystem Isola8on (security) • container A and B can’t see each other Isola8on (resource usage) • RAM, CPU, I/O • Current systems • Linux-‐Vserver, OpenVZ, LXC
• Container-‐based Virtualiza8on Implements Linux Namespaces Mount – moun8ng/unmou8ng ﬁle systems UTS – hostname, domainname IPC – SysV message queues, semaphore, memory segments Network – IPv4/IPv6 stacks, rou8ng, ﬁrewall, /proc/net, sock • PID – Own set of pids Chroot is ﬁlesystem namespace • • • • • Current systems • Linux-‐Vserver, OpenVZ, LXC
• Container-‐based Systems Linux-‐VServer Implements its own features in Linux kernel limits the scope of the ﬁle system from diﬀerent processes through the tradi8onal chroot • OpenVZ • • • Linux Containers (LXC) • Based on CGroups
Hypervisor-‐ vs Container-‐based Systems Hypervisor Container Diﬀerent Kernel OS Single Kernel Device Emula8on Syscall Many FS caches Single FS cache Limits per machine Limits per process High Performance Overhead Low Performance Overhead
MapReduce • • MapReduce • A parallel programming model • Simplicity, eﬃciency and high scalability • It has become a de facto standard for large-‐scale data analysis MapReduce has also aUracted the aUen8on of the HPC community • Simpler approach to address the parallelism problem • Highly visible case where MapReduce has been successfully used by companies like Google, Yahoo!, Facebook and Amazon
MapReduce and Containers • • Apache Mesos • Shares a cluster between mul8ple diﬀerent frameworks • Creates another level of resource management • Management is taken away from cluster’s RMS Apache YARN • Hadoop Next Genera8on • BeUer job scheduling/monitoring • Uses virtualiza8on to share a cluster among diﬀerent applica8ons
Evalua8on • Experimental Environment • • • • • Hadoop cluster composed by 4 nodes Two processors with 8 cores (without threads) per node 16GB of memory per node 146GB of disksize per node Analyze of the best results of performance • Through micro-‐benchmarks • • • • • HDFS evalua8on (TestDFSIO) NameNode evalua8on (NNBench) MapReduce evalua8on (MRBench) Through macro-‐benchmarks (WordCount, TeraBench) Analyze of best results of isola8on • Through IBS benchmark At least 50 execu8ons were performed for each experiment •
HDFS Evalua8on Semngs: • Replica8on of 3 blocks • File size from 100 MB to 3000 MB • • • All Container-‐based systems have performance similar to na8ve Results o OpenVZ represents loss of 3Mbps It is due to the CFQ scheduler 30 25 Throughput (Mbps) • 20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000
HDFS Evalua8on • All of Container-‐based systems obtained performance results similar to na8ve Linux-‐VServer uses a Physical-‐based network 30 25 Throughput (Mbps) • 20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000
NameNode Evalua8on using NNBench • Generates opera8ons on 1000 ﬁles on HDFS Na8ve VServer 0.51 0.52 0.51 0.49 Create/Write (ms) • • OpenVZ Open/Read (ms) • • LXC 54.65 56.89 51.96 48.90 NNBench benchmark was chosen to evaluate the NameNode component Linux-‐VServer reaches a latency at a average of 48ms, while LXC obtained the worst result at an average of 56ms The diﬀerences are not so signiﬁcant if the numbers are considered However, the strengths are that no excep8on was observed during the high HDFS management stress, and that all systems were able to respond eﬀec8vely as the na8ve
MapReduce Evalua8on using MRBench Na8ve Execu8on Time • LXC OpenVZ VServer 14251 13577 14304 13614 The results obtained from MRBench show that MR layer suﬀers no substan8al eﬀect while running on diﬀerent container-‐based virtualiza8on systems
Analyzing Performance with WordCount 180 • 30 GB of input data • The peak of performance degrada8on from OpenVZ is explained by the I/O scheduler overhead 160 140 Execution Time (seconds) 120 Native 100 LXC OpenVZ 80 VServer 60 40 20 0 Wordcount
Analyzing Performance with TeraSort • A HDFS block size of 64MB 140 120 Execution Time (seconds) • Standard map/reduce sort • Steps: • Generates 30 GB of input data • Run on such input data. 100 Native 80 LXC OpenVZ VServer 60 40 20 0 Terasort
Performance Isola8on Base line applica8on Base line applica8on Stress Test Container A Container A Container B Execu8on Time Execu8on Time Performance degrada8on (%)
Performance Isola8on CPU LXC Memory I/O Fork Bomb 0% 8.3% 5.5% 0% • We chose LXC as the representa8ve of the container-‐based virtualiza8on to be evaluated • The limits of the CPU usage per container is working well • no signiﬁcant impact was noted. • a liUle performance degrada8on needs to be taken into account • The fork bomb stress test reveals that the LXC has a security subsystem that ensure feasibility
Conclusions • we found that all container-‐based systems reach a near-‐na8ve performance for MapReduce workloads • the results of performance isola8on reveled that the LXC has improved its capabili8es of restrict resources among containers • although some works are already taking advantages of container-‐based systems on MR clusters • this work demonstrated the beneﬁts of using container-‐based systems to support MapReduce clusters
Future Work • We plan to study the performance isola8on at the network-‐level • We plan to study the scalability while increasing the number of nodes • We plan to study aspects regarding the green compu8ng, such as the trade-‐oﬀ between performance and energy consump8on
Thank you for your aUen8on!
A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters. Authors: Miguel Gomes Xavier: Marcelo Veiga Neves: Cesar ...
A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose
A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters ...
A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters ... although all container-based systems reach a near ...
A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters Xavier, Miguel Gomes; Neves, Marcelo Veiga; Rose, Cesar Augusto ...
Official Full-Text Publication: A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters on ResearchGate, the professional ...
... "A Performance Comparison of Container-Based Virtualization Systems for MapReduce Clusters," 2014 22nd Euromicro International Conference on ...
Performance Evaluation of Container-based ... and isolation in container-based virtualization systems and ... Comparison of Container-based and ...