Published on May 31, 2014
Hadoop Motivation • HW improvements through the years … – CPU Speeds: 40 MIPS (1990) -> 50 GIPS (2010) => 1,250x – RAM Memory: 640kB (1990) -> 8GB (2010) => 12,500x – Disk Capacity & Cost: 40 MB (1990 for $400) -> 1 TB (2010 for $100) => 25,000x • What about disk read speed / disk latency? – 4.4 MB/s in 1990 – 100 MB/s in 2010 => JUST 25x faster – => parallel read from multiple-disks – it’s not just about reads, but parallel writes as well
Hadoop motivation - issues • Parallel reads and writes brings challenges … – Hardware failure • Disks failure => replication? => RAID? – Data combination • Combining data from disks • Solution … HADOOP – Hadoop Distributed File System (HDFS) – MapReduce programming model – analysis system • Abstracts from disk R/W to computation over sets of keys and values
Hadoop HDFS • Big Virtual File System • Master – Slave architecture
Hadoop X Relational DB Relational DB MapReduce Data size GBs TBs / PBs Access Interactive / Batch Batch Updates Read & Write Write once / multiple reads Structure Static schema Dynamic schema (analyst chooses it) Integrity High Low Scaling Non linear Linear
Hadoop 1 vs Hadoop 2 • Hadoop 1 SPOF NameNode • Security • Hadoop 2 promotes cluster to “universal computational cluster” • Removes bottlenecks in Map-Reduce
NameNode • http://bd-prg-c03-nn01:50070/dfshealth.jsp • http://bd-prg-c03-nn02:50070/dfshealth.jsp
YARN - Resource Manager • http://bd-prg-c03-rm01:8088/cluster • http://aimc2rm1:8088/cluster
History server • http://bd-prg-c03-rm01:19888/jobhistory
HBase features • NoSQL database • Column oriented DB • Google’s BigTable implementation • Linear and modular scalability • Strictly consistent reads and writes. • Automatic and configurable sharding of tables • Automatic failover support between RegionServers. • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. • Easy to use Java API for client access. • Block cache and Bloom Filters for real-time queries. • Query predicate push down via server side Filters • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
HBase • http://bd-prg-c03-nn02:60010/master-status
Research/ Dissertation on “How online selling has changed the marketing perspectiv...
مشروع قانون يتعلق بالقضاء على كل أشكال العنف ضد المرأة
This brief examines 2013 demographic data recently released by the U.S. Census Bur...
This presentation shows you the advantages and the importance of Big Data in these...
Welcome to Apache HBase™ Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Sometimes people ask me: "What is HBase?" It's hard to give a concise answer. There is a lot of information about HBase, but I have not been ...
Jan 2015 Session, Hortonworks Scott Shaw: HBase Architecture and Use Cases - Duration: 1:00:47. PASS Big Data Virtual Chapter 3,272 views
Hadoop Introduction ... Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase Hadoop World 2011: Indexing the Earth ...
Introduction to Apache Hadoop, an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
What is Hadoop in the cloud and how is it managed in HDInsight? An introduction to Hadoop components and big data analysis.
Introduction to HBase. ... Good for semi-structured data as well as structured data. Hbase is tightly coupled with hadoop and stores data in HDFS.
Apache HBase is a NoSQL database that runs on top of Hadoop as a distributed and scalable big data store. This means that HBase can leverage the ...
Introduction to HBase. HBase usage scenarios, how HBase compares to an RDBMS, and how HBase complements Hadoop. Online HBase Training.