40 %
60 %
Information about Hadoop.NET

Published on July 12, 2016

Author: HakeemMohammed4

Source: slideshare.net

1. Hakeem S Mohammad @hakeemsm @hakeemsm

2. What is Hadoop? • A data processing framework 10 years in the making • Can mean different things depending on the context • Written entirely in Java • Started as a framework for writing applications using the MapReduce pattern @hakeemsm

3. Design • Code is pushed to data • Core pieces are HDFS, a distributed file system & YARN the cluster resource management system • Data needs to be loaded into HDFS first before performing any processing • Results can be written back to HDFS or a different destination • Client applications can be written any language with the Streaming feature @hakeemsm

4. HDFS • Filesystem designed for storing very large files • Write-once, read-many (WORM) times pattern • Works by replicating data to different nodes in a cluster • Data is striped & mirrored as blocks • Comprised of a namenode (master) & lots of datanodes (workers) • Provides CLI, RESTful & Java SDK for IO • Not a good fit for low latency data access, too many small files or multiple writers @hakeemsm

5. HDFS Write HDFS Client Distributed file system FSDataOutputStream NameNode DataNode DataNode DataNode 1: Create 2: Create 6: Close 3: Write Client 7: Complete 4 5 4 5 4: Write packet 5: Ack packet @hakeemsm

6. HDFS Read HDFS Client Distributed file system FSDataInputStream NameNode DataNode DataNode DataNode 1: Open 2: Get block locations 6: Close 4: Read 5: Read 3: Read Client @hakeemsm

7. YARN • Resource manager to manage the use of resources across a cluster • Node managers running on all nodes in a cluster • Containers running in NMs are where the code gets shipped to • Resource Manager & Namenode can be co-located • Node managers run on DataNodes @hakeemsm

8. YARN Application Client Client Node Resource Manager RM Node NodeManager Application Process Node manager node Container 1: Submit YARN application 2a: Start container 2b: Launch 3: Allocate resources NodeManager Application Process Node manager node 4a: Start Container 4b: Launch @hakeemsm

9. Hadoop Demo • HDFS & YARN @hakeemsm

10. Hadoop.NET • C# is cool & powerful • With .NET Core, xplat is a real possibility • Enterprises that are primarily .NET shops can adopt it immediately @hakeemsm

11. Roadmap • Core – IO, HDFS, YARN • Custom datatypes • Implement an orchestrator (ZooKeeper) • Extensions for Azure, AWS, OpenStack et al @hakeemsm

12. NEED YOUR HELP! @hakeemsm

13. Metadata { “name” : “Hakeem S Mohammad”, “twitter”: “@hakeemsm”, “location”: “Atlanta”, “interests”: “Cloud, []data”, “blog”: “http://code-cafe.blogspot.com/”, “github”: “https://github.com/hakeemsm”, “email”: “hakeemosrc@gmail.com” } @hakeemsm

Add a comment

Related pages

hadoop.net - Welcome to The Apache Software Foundation!

Open. The Apache Software Foundation. provides support for the Apache Community of open-source software projects, which provide software products for the ...
Read more

Contact – Hadoop Net

Recent Posts. How to get started with Hadoop; What’s the easiest way to learn Hadoop; Hadoop Net; 4 Hot Open Source Big Data Projects; MySQL and Hadoop ...
Read more

Free Training – Hadoop Net

Helping developers, DBA’s, and System Administrators in finding resources for free Big Data/Hadoop training. Last update on Sept 10, 2013. 10gen online ...
Read more

org.apache.hadoop.net (Apache Hadoop Main 2.7.1 API)

Thrown by NetUtils.connect(java.net.Socket, java.net.SocketAddress, int) if it times out while connecting to the remote host.
Read more

Hadoop .Net HDFS File Access | Carl's Blog

... http://code.msdn.microsoft.com/Hadoop-Net-HDFS-File-Access ... Hadoop on Azure C# Hadoop.Net PowerShell FSharpChart Binary ...
Read more

CachedDNSToSwitchMapping (Apache Hadoop Main 2.7.2 API)

org.apache.hadoop.net.CachedDNSToSwitchMapping; All Implemented Interfaces: Configurable, DNSToSwitchMapping Direct Known Subclasses:
Read more

Hadoop.Net | Carl's Blog

As always here is a link to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code. In all the samples I have shown so far I have ...
Read more

Unable to upload file to HDFS from CLI - Hortonworks

Unable to upload file to HDFS from CLI. ... (SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) ...
Read more

Google Code Archive - Long-term storage for Google Code ...

Search. Projects; Search; About; Project; Source; Issues; Wikis; Downloads
Read more

Browse code - Windows Hadoop .Net HDFS File Access sample ...

Provided with the Microsoft Distribution of Hadoop, is a C library for HDFS file access. This code extended upon this through a Managed C++ ...
Read more