Plugging the Holes: Security and Compatability in Hadoop

57 %
43 %
Information about Plugging the Holes: Security and Compatability in Hadoop

Published on October 9, 2009

Author: oom65



My presentation from Hadoop World in NYC on 2 October 2009. I covered the plans for adding security into Hadoop in 0.22.

Plugging the Holes: Security and Compatibility Owen O’Malley Yahoo! Hadoop Team

Who Am I? •  Software Architect working on Hadoop since Jan 2006 –  Before Hadoop worked on Yahoo Search’s WebMap –  My first patch on Hadoop was Nutch-197 –  First Yahoo Hadoop committer –  Most prolific contributor to Hadoop (by patch count) –  Won the 2008 1TB and 2009 Minute and 100TB Sort Benchmarks •  Apache VP of Hadoop –  Chair of the Hadoop Project Management Committee –  Quarterly reports on the state of Hadoop for Apache Board Hadoop World NYC - 2009

What are the Problems? •  Our shared clusters increase: –  Developer and operations productivity –  Hardware utilization –  Access to data •  Yahoo! wants to put customer and financial data on our Hadoop clusters. –  Great for providing access to all of the parts of Yahoo! –  Need to make sure that only the authorized people have access. •  Rolling out new versions of Hadoop is painful –  Clients need to change and recompile their code Hadoop World NYC - 2009

Hadoop Security •  Currently, the Hadoop servers trust the users to declare who they are. –  It is very easy to spoof, especially with open source. –  For private clusters, we will leave non-security as option •  We need to ensure that users are who they claim to be. •  All access to HDFS (and therefore MapReduce) must be authenticated. •  The standard distributed authentication service is Kerberos (including ActiveDirectory). •  User code isn’t affected, since the security happens in the RPC layer. Hadoop World NYC - 2009

HDFS Security •  Hadoop security is grounded in HDFS security. –  Other services such as MapReduce store their state in HDFS. •  Use of Kerberos allows a single sign on where the Hadoop commands pick up and use the user’s tickets. •  The framework authenticates the user to the Name Node using Kerberos before any operations. •  The Name Node is also authenticated to the user. •  Client can request an HDFS Access Token to get access later without going through Kerberos again. –  Prevents authorization storms as MapReduce jobs launch! Hadoop World NYC - 2009

Accessing a File •  User uses Kerberos (or a HDFS Access Token) to authenticate to the Name Node. •  They request to open a file X. •  If they have permission to file X, the Name Node returns a token for reading the blocks of X. •  The user uses these tokens when communicating with the Data Nodes to show they have access. •  There are also tokens for writing blocks when the file is being created. Hadoop World NYC - 2009

MapReduce Security •  Framework authenticates user to Job Tracker before they can submit, modify, or kill jobs. •  The Job Tracker authenticates itself to the user. •  Job’s logs (including stdout) are only visible to the user. •  Map and Reduce tasks actually run as the user. •  Tasks’ working directories are protected from others. •  The Job Tracker’s system directory is no longer readable and writable by everyone. •  Only the reduce tasks can get the map outputs. Hadoop World NYC - 2009

Interactions with HDFS •  MapReduce jobs need to read and write HDFS files as the user. •  Currently, we store the user name in the job. •  With security enabled, we will store HDFS Access Tokens in the job. •  The job needs a token for each HDFS cluster. •  The tokens will be renewed by the Job Tracker so they don’t expire for long running jobs. •  When the job completes, the tokens will be cancelled. Hadoop World NYC - 2009

Interactions with Higher Layers •  Yahoo uses a workflow manager named Oozie to submits MapReduce jobs on behalf of the user. •  We could store the user’s credentials with a modifier (oom/oozie) in Oozie to access Hadoop as the user. •  Or we could create Token granting Tokens for HDFS and MapReduce and store those in Oozie. •  In either case, such proxies are a potential source of security problems, since they are storing large number of user’s access credentials. Hadoop World NYC - 2009

Web UIs •  Hadoop and especially MapReduce make heavy use of the Web Uis. •  These need to be authenticated also… •  Fortunately, there is a standard solution for Kerberos and HTTP, named SPNEGO. •  SPNEGO is supported by all of the major browsers. •  All of the servlets will use SPNEGO to authenticate the user and enforce permissions appropriately. Hadoop World NYC - 2009

Remaining Security Issues •  We are not encrypting on the wire. –  It will be possible within the framework, but not in 0.22. •  We are not encrypting on disk. –  For either HDFS or MapReduce. •  Encryption is expensive in terms of CPU and IO speed. •  Our current threat model is that the attacker has access to a user account, but not root. –  They can’t sniff the packets on the network. Hadoop World NYC - 2009

Backwards Compatibility •  API •  Protocols •  File Formats •  Configuration Hadoop World NYC - 2009

API Compatibility •  Need to mark APIs with –  Audience: Public, Limited Private, Private –  Stability: Stable, Evolving, Unstable @InterfaceAudience.Public @InterfaceStability.Stable public class Xxxx {…} –  Developers need to ensure that 0.22 is backwards compatible with 0.21 •  Defined new APIs designed to be future-proof: –  MapReduce – Context objects in org.apache.hadoop.mapreduce –  HDFS – FileContext in org.apache.hadoop.fs Hadoop World NYC - 2009

Protocol Compatibility •  Currently all clients of a server must be the same version (0.18, 0.19, 0.20, 0.21). •  Want to enable forward and backward compatibility •  Started work on Avro –  Includes the schema of the information as well as the data –  Can support different schemas on the client and server –  Still need to make the code tolerant of version differences –  Avro provides the mechanisms •  Avro will be used for file version compatibility too Hadoop World NYC - 2009

Configuration •  Configuration in Hadoop is a string to string map. •  Maintaining backwards compatibility of configuration knobs was done case by case. •  Now we have standard infrastructure for declaring old knobs deprecated. •  Also have cleaned up a lot of the names in 0.21. Hadoop World NYC - 2009

Questions? •  Thanks for coming! •  Mailing lists: – – – •  Slides posted on the Hadoop wiki page – Hadoop World NYC - 2009

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Plugging the Holes: Security and Compatibility

Plugging the Holes: Security and Compatibility Owen O’Malley Yahoo! Hadoop Team
Read more

【Yahoo!Hadoop Team】Hadoop Team - - Powered by Discuz ...

首页 » 技术专区 » Hadoop ... Plugging the Holes:Security and Compatibility Owen O’Malley Yahoo!Hadoop Team 附件: ...
Read more

Hortonworks Data Platform (HDP) - Hortonworks: Open and ...

Cornerstone of Hortonworks Data Platform. YARN and Hadoop Distributed ... Simplified Security OperationsService configurations for Ranger provides a ...
Read more

Welcome to Apache™ Hadoop®!

Apache Hadoop 2.7.0 contains a number of significant enhancements. A few of them are noted below. IMPORTANT notes. This release drops ...
Read more

Hadoop Eclipse plugin - Hortonworks Answers Home Page ...

Hadoop Eclipse plugin. ... compatibility issues with hadoop) using hadoop­eclipse pluggin 2.6 Note that i am able to connect to HDFS via eclipse plugging ...
Read more

Jaspersoft and MapR team up for business intelligence on ...

Jaspersoft and MapR team up for business intelligence on Hadoop. ... and security have made way ... focuses on plugging the holes in Hadoop and ...
Read more

Apache Hadoop Native SQL.

Apache Hadoop Native SQL. Advanced, MPP, elastic query engine ; and analytic database for enterprises. Now incubating with Apache.
Read more

Plugging | LinkedIn

View 14211 Plugging posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn. LinkedIn Home What is LinkedIn? Join Today
Read more