Hadoop hbase introduction

50 %
50 %
Information about Hadoop hbase introduction
Data & Analytics

Published on May 31, 2014

Author: jaksky

Source: slideshare.net

Description

Brief introduction to hadoop and HBase

Introduction

Hadoop Motivation • HW improvements through the years … – CPU Speeds: 40 MIPS (1990) -> 50 GIPS (2010) => 1,250x – RAM Memory: 640kB (1990) -> 8GB (2010) => 12,500x – Disk Capacity & Cost: 40 MB (1990 for $400) -> 1 TB (2010 for $100) => 25,000x • What about disk read speed / disk latency? – 4.4 MB/s in 1990 – 100 MB/s in 2010 => JUST 25x faster – => parallel read from multiple-disks – it’s not just about reads, but parallel writes as well

Hadoop motivation - issues • Parallel reads and writes brings challenges … – Hardware failure • Disks failure => replication? => RAID? – Data combination • Combining data from disks • Solution … HADOOP – Hadoop Distributed File System (HDFS) – MapReduce programming model – analysis system • Abstracts from disk R/W to computation over sets of keys and values

Hadoop HDFS • Big Virtual File System • Master – Slave architecture

Map-Reduce

Hadoop X Relational DB Relational DB MapReduce Data size GBs TBs / PBs Access Interactive / Batch Batch Updates Read & Write Write once / multiple reads Structure Static schema Dynamic schema (analyst chooses it) Integrity High Low Scaling Non linear Linear

Hadoop 1 vs Hadoop 2 • Hadoop 1 SPOF NameNode • Security • Hadoop 2 promotes cluster to “universal computational cluster” • Removes bottlenecks in Map-Reduce

NameNode • http://bd-prg-c03-nn01:50070/dfshealth.jsp • http://bd-prg-c03-nn02:50070/dfshealth.jsp

YARN - Resource Manager • http://bd-prg-c03-rm01:8088/cluster • http://aimc2rm1:8088/cluster

History server • http://bd-prg-c03-rm01:19888/jobhistory

HBase features • NoSQL database • Column oriented DB • Google’s BigTable implementation • Linear and modular scalability • Strictly consistent reads and writes. • Automatic and configurable sharding of tables • Automatic failover support between RegionServers. • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. • Easy to use Java API for client access. • Block cache and Bloom Filters for real-time queries. • Query predicate push down via server side Filters • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options

HBase Architecture

HBase • http://bd-prg-c03-nn02:60010/master-status

Add a comment

Related presentations

Research/ Dissertation on “How online selling has changed the marketing perspectiv...

مشروع قانون يتعلق بالقضاء على كل أشكال العنف ضد المرأة

Remedial geo

Remedial geo

November 6, 2014

nnn

This brief examines 2013 demographic data recently released by the U.S. Census Bur...

Introduction into Big data

Introduction into Big data

October 22, 2014

This presentation shows you the advantages and the importance of Big Data in these...

Info om powerpoint

Info om powerpoint

November 10, 2014

Powerpoint

Related pages

Apache HBase – Apache HBase™ Home

Welcome to Apache HBase™ Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Read more

HBase: Introduction to HBase - Lars Hofhansl

Sometimes people ask me: "What is HBase?" It's hard to give a concise answer. There is a lot of information about HBase, but I have not been ...
Read more

Hadoop HBase Introduction - YouTube

Jan 2015 Session, Hortonworks Scott Shaw: HBase Architecture and Use Cases - Duration: 1:00:47. PASS Big Data Virtual Chapter 3,272 views
Read more

Hadoop Introduction - Custom Training Courses: Android ...

Hadoop Introduction ... Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase Hadoop World 2011: Indexing the Earth ...
Read more

An introduction to Apache Hadoop | Opensource.com

Introduction to Apache Hadoop, an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
Read more

What is Hadoop in the cloud? An introduction to Hadoop ...

What is Hadoop in the cloud and how is it managed in HDInsight? An introduction to Hadoop components and big data analysis.
Read more

Introduction to HBase | Hadoop

Introduction to HBase. ... Good for semi-structured data as well as structured data. Hbase is tightly coupled with hadoop and stores data in HDFS.
Read more

Introduction to HBase, the NoSQL Database for Hadoop ...

Apache HBase is a NoSQL database that runs on top of Hadoop as a distributed and scalable big data store. This means that HBase can leverage the ...
Read more

Introduction to HBase Online Training - Cloudera

Introduction to HBase. HBase usage scenarios, how HBase compares to an RDBMS, and how HBase complements Hadoop. Online HBase Training.
Read more