advertisement

Why Virtualization is important

50 %
50 %
advertisement
Information about Why Virtualization is important
Internet

Published on September 26, 2014

Author: sawjd

Source: slideshare.net

Description

Hadoop Virtualization
advertisement

A Discussion of Hadoop Use Cases & Runtime Environments Tom Phelan, Chief Architect of BlueData Los Angeles Hadoop Users Group Sept 25, 2014

… or when should I virtualize my Hadoop cluster?

First - Some Definitions

Physical Hadoop Cluster  AKA “bare metal” installation  The Hadoop distribution is installed as an application on top of the operating system.  A set of physical servers run the various Hadoop services, forming the Hadoop cluster. • File System (HDFS, NameNode etc) • Processing Framework (JobTracker etc)  Original design goal was reduced Cost and not necessarily improved performance.

Physical Hadoop Cluster NameNode JobTracker Server Disk Disk Disk DataNode TaskTracker Server Disk Disk Disk DataNode TaskTracker Server Disk Disk Disk Controller Worker Worker

Virtual Hadoop Cluster  The Hadoop distribution is installed as an application running within the context of a collection of virtual machines .  A virtual machine is software that presents an abstraction that is identical to the underlying hardware. In general, the software running within the VM cannot tell the difference from a physical server.  If the collection of virtual machines is spread across more than one physical server, it is typically referred to as a cloud,. The cloud can be either public or private.  The type of virtualization technology used can be one of : • Type I Hypervisor , VMW ESX • Type II Hypervisor , KVM • Linux Containers, LXC

Virtual Hadoop Cluster – Public Cloud  IaaS – infrastructure as a service  The type of virtualization is unknown  Typically the physical hosts are not located within the enterprise data center  Data security can be an issue  Can be expensive Examples: AWS, Azure

Virtual Hadoop Cluster – Private Cloud  IAAS – infrastructure as a service  The type of virtualization is known but not specified  Typically the physical hosts are located within the enterprise data center  Data security enforced by the enterprise  Can be expensive Examples: VMware vSphere, OpenStack. CloudStack

Virtual Hadoop Cluster – Private Cloud - Hypervisor  IAAS – infrastructure as a service  The type of virtualization is Type I or II hypervisor. Generically referred to as “hypervisor” or “virtual machine”  Typically the physical hosts are located within the enterprise data center  Data security enforced by the enterprise  Strong fault isolation - a fault in the VM cannot cause the physical cluster to crash  Strong resource partitioning  Moderate amount of “overhead” to implement the virtualization.  Can be expensive Examples: VMW Vsphere, OpenStack

Virtual Hadoop Cluster – Private Cloud - Containers  IAAS – infrastructure as a service  The type of virtualization is Linux Containers, LXC  Typically the physical hosts are located within the enterprise data center  Data security enforced by the enterprise  Currently weak fault isolation - a fault in the VM can the physical cluster to crash  Moderate resource partitioning  Low amount of “overhead” to implement the virtualization.  Can be expensive Examples:Docker, Mesos, CoreOS, LXC

Virtual Hadoop Cluster - Hypervisor Controller VM NameNode JobTracker Hadoop Server vDisk vDisk vDisk Worker VM Cloud Server Disk Disk Disk Cloud Server Worker VM Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk

Virtual Hadoop Cluster - Containers JobTracker Controller Container Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk TaskTracker Worker Container TaskTracker Worker Container NameNode DataNode DataNode DataNode

Virtual Hadoop Cluster- Hypervisors Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk

Virtual Hadoop Cluster – Private Cloud – Data Para Virtualization  Paravirtualization means that the abstraction the virtualization software provides is similar, but not identical, to the underlying hardware.  The differences are designed to reduce the virtualization “overhead” by taking advantage of some knowledge about the tasks running in the virtual machine. Examples: BlueData

Virtual Hadoop Cluster – Paravirtualization Controller VM NameNode JobTracker Hadoop Server vDisk vDisk vDisk Worker VM Cloud Server Disk Disk Disk Cloud Server Worker VM Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk Data Connection Data Connection Data Connection NFS HDFS GlusterFS

In which situations should an enterprise run their Hadoop jobs in a virtual or physical environment?

Evaluation based on: Faster … – Deployment – Runtime Easier … – Deployment – Management Cheaper … – Hardware costs – Management costs

Questions not to ask •How fast does the job need to run? •How much does the cluster cost? •How easy is it to use? Any application can be run with the needed speed in either a virtual or physical environment if enough money is spent. Any tool is easy to use once you are familiar with it. Other attributes indicate if the best solution is with physical or virtual clusters.

Answers There are multiple clusters and each is lightly used. – Virtual cluster There is one cluster, it runs a single Hadoop query job. It runs 7 x 24 and demands instant response. – Physical cluster

Answers Test & Dev environment where Hadoop clusters need to be built quickly and have short lifespan. Each developer gets their own cluster. No security concerns. – Virtual cluster - LXC An environment with multiple Hadoop applications constantly running and requiring access to a common data set. No expected change in applications or load. – Physical cluster

Answers IAAS environment with multiple external customers each with different QoS agreements, Hadoop distros, and data security needs. – Virtual cluster - Hypervisor

Those scenarios are too easy!

What the obvious answers tell us:  Situations that require many distinct Hadoop clusters, or clusters that require frequent provisioning, or clusters that have a relatively short lifespan are well suited for virtualized Hadoop.  Flexibility and speed of cluster creation are critical.  Situations that require few distinct Hadoop clusters, have long lifespans, and static configurations are well suited for Bare Metal Hadoop.  No reason to pay virtualization “tax” in exchange for flexibility.

Questions to ask  How many clusters will be needed? • Over what time span?  What is the life span of the clusters?  Will the clusters have idle time?  What are the fault isolation needs?  What is the source of the big data?

Other questions to ask*  Are multiple levels of priority job priority required?  Are multiple levels of data security required?  Is resource usage tracking/billing required? * The implementation of these may be different between different distributions of Hadoop and so the level of effort to implement in virtual and physical environments may be different.

Use Case I Large manufacturing company – Internal Customers Started out with one Hadoop cluster – Success! Soon everyone wanted one. – Many lightly used – Different configurations. – “Cluster Sprawl” Virtual Clusters Either hypervisor or LXC could be used

Use Case II Development group within a large tech company – Internal Customers Built out a physical Hadoop cluster. – No data security requirement – No expectation of growth in foreseeable future. Single Physical Cluster

Use Case III Large service company. – Internal and External Customers No use of Hadoop. Fault containment and data security required Small IT department tasked with all Hadoop support for the company. – No clear way to predict growth. Virtual Clusters - hypervisor

Use Case IV Large tech company – Internal Customers Constant stream of low priority jobs Bursty stream of very high priority, low latency, jobs No future demand growth. Very Hadoop savvy IT organization. Two Physical Clusters OR virtual clusters

Use Case V Startup online service company – Selling information gathered using unique Hadoop analytics External Customers Multiple data sources Customer data security requirements Rapid growth of customer base No in-house Hadoop expertise Virtual Clusters Could benefit from paravirtualization

Q & A

Contact Tom Phelan tap@bluedata.com

Virtual Hadoop Cluster NameNode Controller VM JobTracker Hadoop Server vDisk vDisk vDisk Worker VM Cloud Server Disk Disk Disk Cloud Server Worker VM Disk Disk Disk Cloud Server Disk Disk Disk Cloud Server Disk Disk Disk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk DataNode TaskTracker Hadoop Server vDisk vDisk vDisk

Add a comment

Related presentations

Online rank tracker!

Online rank tracker!

October 20, 2014

Rank tracker is a tool that helps everyone to find the traffic which a particular ...

Facebook y sus avances

Facebook y sus avances

November 11, 2014

facebook y sus avances

Preguntas Investigación

Preguntas Investigación

November 11, 2014

es una presentación de preguntas interesantes

Cartes interactives, frises chronologiques, diaporamas, carte Google Streetview in...

Related pages

What is Virtualization, and Why Should You Care?

What is Virtualization, and Why Should ... and then discuss why it may be important to ... Why Should You Care? Virtualization can help you shift your ...
Read more

Gigaom Why Network Virtualization Is Important

With the TX Matrix Plus and JCS 1200, Juniper is addressing these challenges by enabling the hardware virtualization of highly scalable, adaptable core ...
Read more

Why Data Virtualization Is Important - YouTube

Bob Eve, Cisco's director of Product Management, talks how the distributed data landscape is driving industry innovations
Read more

Top 10 benefits of server virtualization | InfoWorld

Server virtualization on the x86 platform has been around now for more than a decade, yet many in the industry still consider it a "new" technology. But ...
Read more

Why Is Virtualization Important? - EzineArticles

The problem is that the current mobile processor is lack of the virtual hardware support. The user can't achieve the virtualization on its own device.
Read more

Hardware virtualization - Wikipedia, the free encyclopedia

Computer hardware virtualization is the virtualization of computers as complete hardware platforms, ... capable of supporting important applications.
Read more

What is Virtualization and WHY is it Important? | Paul ...

What is Virtualization and WHY is it Important? Paul Hager. The Intolerance Debate: Reacting With Hate? Santosh Desai. With Prop F Gone, Airbnb Is Now ...
Read more

Virtualization Continues To Be An Important IT Trend ...

Virtualization, while not a new trend, is an important IT trend for 2011. It will continue to transform IT infrastructures, impacting servers, storage ...
Read more