An Introduction to Apache Hadoop

50 %
50 %
Information about An Introduction to Apache Hadoop
Technology

Published on February 18, 2014

Author: MindfireSolutions

Source: slideshare.net

Description

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014

About Me SCJP/OCJP - Oracle Certified Java Programmer MCP:70-480 - Specialist certification in HTML5 with JavaScript and CSS3 Exam Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS. Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm Presenter: Prem Chand Mali, Mindfire Solutions

Agenda History What is Apache Hadoop Why Apache Hadoop HDFS MapReduce Q&A Presenter: Prem Chand Mali, Mindfire Solutions

History • Nutch Crawler based search • GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team. Presenter: Prem Chand Mali, Mindfire Solutions

What is Apache Hadoop ? • Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware. • Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. Presenter: Prem Chand Mali, Mindfire Solutions

What is Apache Hadoop ? • The Apache Hadoop framework is composed of the following modules : – Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. – Hadoop MapReduce - a programming model for large scale data processing. – Hadoop Common - contains libraries and utilities needed by other Hadoop modules – Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Presenter: Prem Chand Mali, Mindfire Solutions

Why Apache Hadoop ? • State of Data – 90% of data in past three years. – Type of data • Unstructured • Semi-structured • Relational – Relation world can handle GB of data. • Distributed • Scalable • Flexible • Fault tolerant • Intelligent Presenter: Prem Chand Mali, Mindfire Solutions

HDFS • HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components. – NameNode – DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware. • Hadoop supports shell-like commands to interact with HDFS directly. Presenter: Prem Chand Mali, Mindfire Solutions

HDFS Presenter: Prem Chand Mali, Mindfire Solutions

MapReduce • MapReduce if combination of following three things. – Map – Shuffle – Reduce • It done it's job through Job Tracker and Task Tracker Presenter: Prem Chand Mali, Mindfire Solutions

MapReduce Presenter: Prem Chand Mali, Mindfire Solutions

MapReduce Presenter: Prem Chand Mali, Mindfire Solutions

MapReduce Presenter: Prem Chand Mali, Mindfire Solutions

Question and Answer Presenter: Prem Chand Mali, Mindfire Solutions

Thank you Presenter: Prem Chand Mali, Mindfire Solutions

www.mindfiresolutions.com https://www.facebook.com/MindfireSolutions http://www.linkedin.com/company/mindfire-solutions http://twitter.com/mindfires Presenter: Prem Chand Mali, Mindfire Solutions

Add a comment

Related presentations

Related pages

An introduction to Apache Hadoop | Opensource.com

Introduction to Apache Hadoop, ... Really a nice introduction about Hadoop! Apache Hadoop has been the driving force behind the growth of the big data ...
Read more

Introduction to Hadoop | PACKT Books

Introduction to Hadoop written by Shiva Achari: one of the many articles from Packt Publishing ...
Read more

Hadoop Tutorial

Hello World. Introducing Apache Hadoop to Java Developers; Faster Pig with Tez; How to Process Data with Apache Hive; How To Process Data with Apache Pig
Read more

Basic Introduction to Apache Hadoop - YouTube

In this Basic Introduction to Hadoop Video, ... Basic Introduction to Apache Hadoop Hortonworks. Subscribe Subscribed Unsubscribe 9,703 9K ...
Read more

Introduction To Apache Hadoop – HDFS & MapReduce | 10K-LOC

Let’s get something out of the way quickly: Hadoop is NOT a database. It is NOT a library. In reality, there is NO single product called Hadoop.
Read more

Apache Hadoop 2.7.1 – Introduction - Welcome to Apache ...

Implicit assumptions of the Hadoop FileSystem APIs. The original FileSystem class and its usages are based on an implicit set of assumptions. Chiefly, that ...
Read more

O'Reilly Webcast: An Introduction to Hadoop - YouTube

... author Tom White will provide an introduction to Hadoop ... Hadoop Training | Hadoop Youtube Video ... Apache Hadoop: The Modern Data ...
Read more

Introduction to Hadoop - Computer Science Department ...

Introduction to Hadoop. ... under the umbrella of the Apache Software Foundation. Hadoop parallelizes data processing across many nodes ...
Read more

Apache Hadoop - Yahoo Developer Network

Introduction. Welcome to the Yahoo! Hadoop tutorial! This series of tutorial documents will walk you through many aspects of the Apache Hadoop system.
Read more