How Hue integrates Hadoop with Django

50 %
50 %
Information about How Hue integrates Hadoop with Django
Technology

Published on March 6, 2014

Author: gethue

Source: slideshare.net

Description

Given the different structure of big data systems, they can be difficult to query, and even more difficult to explore. Hue, a Django-drive web application, integrates with these components and provides a clean, easy-to-use interface. In this discussion, we'll cover how the Hue project addressed communicating with Hbase, Hdfs, and various query engines. We'll also cover the reasons behind these design decisions.

Django+NoSQL HOW Hue Integrates with Hadoop Abraham Elmahrek Cloudera - March 5th, 2014 Monday, March 3, 14

What is Hue? HUE 1 Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010). Monday, March 3, 14

HISTORY HUE 2 The first flat structure port, with Twitter Bootstrap all over the place. Monday, March 3, 14

HISTORY HUE 2.5 New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop. Monday, March 3, 14

HISTORY HUE 3 ALPHA Proposed design, didn’t make it. Monday, March 3, 14

HISTORY HUE 3 Transition to the new UI, major improvements and new apps. Monday, March 3, 14

HISTORY HUE 3.5+ Monday, March 3, 14

Monday, March 3, 14 RE O ET AS T M B BR R H ... M E O H K SP AR ER Y U Q IN M AD DB R SE U ER EP R SE O W BR O P O O KE ZO SQ SE BA H AR C SE BR A O W SE R PA L IM O DE W SI SE G O R N O ER ZI H E IV E B JO G PI SE O W BR JO LE FI APPS

APPS Hue Plugins YARN Monday, March 3, 14 JobTracker Pig Oozie Cloudera Impala HiveServer2 HDFS Hive Metastore HBase Solr Zookeeper Sqoop2 LDAP SAML

FAST PACE LAST MONTH 91 issues created and 90 resolved. Core team + Community Monday, March 3, 14

STACK BACKEND Python + Django (2.6+/ 1.4.5) Monday, March 3, 14 FRONTEND jQuery Bootstrap Knockout.js Love

HADOOP INTERFACES REST & THRIFT Many Hadoop interfaces used CUSTOM CLIENTS Provide custom clients for more explicit API definitions Monday, March 3, 14 WebHDFS YARN API (RM, NM, MR...) HiveServer2 Impala HBase Oozie Sqoop2 ZooKeeper ...

PROTOCOLS REST Use python-requests and a custom client to streamline RESTful interface calls. Thrift Custom connection pooling and socket multiplexing to streamline thrift calls. Monday, March 3, 14 http_client.HttpClient(url, exc_class=WebHdfsException, logger=LOG) if security_enabled: client.set_kerberos_auth() return client thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], kerberos_principal=kerberos_principal_short_name, use_sasl=use_sasl, mechanism=mechanism, username=user.username, timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(), use_ssl=conf.SSL.ENABLED.get(), ca_certs=conf.SSL.CACERTS.get(), keyfile=conf.SSL.KEY.get(), certfile=conf.SSL.CERT.get(), validate=conf.SSL.VALIDATE.get())

ACCESSIBILITY Middleware Make Hadoop interfaces accessible in request objects class ClusterMiddleware(object): def process_view(self, request, ...): request.fs = cluster.get_hdfs(request.fs_ref) if request.user.is_authenticated(): if request.fs is not None: request.fs.setuser(request.user.username) def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file.")) Monday, March 3, 14

HDFS Goal Easily browse, create, read, update, and delete files in HDFS Monday, March 3, 14

HDFS - Communication REST The NameNode provides a RESTful server called WebHDFS Explicit Client Provide an API that is explicit Request Accessible Provide a middleware for populating a request member Monday, March 3, 14 http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN ... class WebHdfs(Hdfs): def create(self, path, ...): ... def read(self, path, ...): ... def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file."))

HDFS - Cool Things MIME Type Detection Detect the various kinds of files being read: Avro, GZIP, etc. Pagination Nice pagination by block size when viewing a file (soon to be more like a PDF reader with content automatically being added) Monday, March 3, 14

HBase Goal Make it easy to view and search HBase Monday, March 3, 14

HBase - Technical Risk 2 Dimensions Infinitely many columns and rows Sparseness Column names will often differ per row Monday, March 3, 14

HBase - Communication Thrift Communicate with HBase using Thrift for better filtering Explicit Client Provide an API that is explicit Monday, March 3, 14 class HBaseApi(Hdfs): def createTable(self, cluster, tableName, ...): ... def getRows(self, cluster, tableName, columns, ...): ...

HBase - Results Improved View Intelligent view that collapses null cells Better Search Improved searchability of HBase via flexible search MIME Type Detection Able to view documents in HBase: PDF, images, etc Monday, March 3, 14

Hive Goal Make it easy to run queries in Hive Monday, March 3, 14

Hive - Communication Thrift Communicate with HiveServer2 using Thrift Explicit Client Provide a higher level API that is explicit and easy to configure DBMS Further the capacities of the DBMS in Hue Monday, March 3, 14 thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], ...) class HiveServerClient: HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN', 'NOSASL': 'NOSASL'} def __init__(self, query_server, user, ...): thrift_util.get_client(TCLIService.Client, ... class HiveServer2Dbms(object): def get_databases(self): return self.client.get_databases() ... def select_star_from(self, database, table): hql = "SELECT * FROM `%s.%s` %s" % (database, table.name, self._get_browse_limit_clause(table)) return self.execute_statement(hql) ...

Hive - Results One Page App Intelligent view that lets users worry about their queries Secure Achieved some level of security through SASL, Kerberos, and SSL Navigation Able to navigate databases and tables easily Monday, March 3, 14

DEMO TIME Monday, March 3, 14

Missed something? GET STARTED Take a closer look at REST and Thrift communication in Hue The inner workings of the Filebrowser The fundamentals of the HBase browser The concepts behind the Beeswax app Monday, March 3, 14

What else does Hue do with Django? Extensible settings Security Doc Model Configuration of settings.py provided through the hue.ini Configurable session timeouts, SAML authentication, etc. Polymorphic documents via a base document model Authentication Permissions Testing LDAP, PAM, OAuth, etc. provided through authentication backends Per-app permissions configurable in the UserAdmin Mocked and functional tests via nose + django-nose Monday, March 3, 14

GET HUE CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager. Try in advance the latest and greatest but you’ll have to configure everything on your own. HORTONWORKS* MAPR* In HDP there’s an old forked version of Hue 2.3. Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search. Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to HP CLOUD* use. The newest addition, ships Hue 3.0 through the GreenButton products. BIGTOP EMBEDDED/DEMO IN IND. COMPANIES * YOUR MILEAGE MAY VARY. Monday, March 3, 14

LINKS WEBSITE http://gethue.com GITHUB https://github.com/cloudera/hue/ BLOG http://blog.gethue.com TWITTER @gethue USER GROUP hue-user@ Monday, March 3, 14

THANKS. QUESTIONS? gethue.com Monday, March 3, 14

Add a comment

Related presentations

Related pages

How Hue integrates Hadoop with Django | Hue - Hadoop User ...

Django meetup from gethue SF Django Meetup: How to integrate Hadoop with Hue Main talk More pizzas Lot of mingling A great event!
Read more

Integrating Django & NoSQL - The San Francisco Django ...

The San Francisco Django Meetup ... focuses on the Hue project, the Hadoop User ... com/post/78732688919/how-hue-integrates-hadoop-with-django. ...
Read more

Hue - Hadoop User Experience - The Apache Hadoop UI | Hue ...

Hue’s target is the Hadoop user experience and lets users focus on getting results faster and sooner. Unify all Hadoop. ... Integrate MySQL, PostgreSQL, ...
Read more

Hadoop and Django, is it possible? - Stack Overflow

Hadoop and Django, ... (ASP.NET, PHP, Java(JSP,JSF, etc) ) integrate themselves with Hadoop? ... Hue, The Web UI for Hadoop is based on Django!
Read more

GitHub - cloudera/hue: Let’s Big Data. Hue is an open ...

Hue is an open source Web interface for analyzing data with Hadoop and Spark. hue ... //github.com/cloudera/hue.git ... src/ for Python/Django code ...
Read more

What's New in CDH3b2: HUE - Cloudera Engineering Blog

The HUE (aka. Hadoop User Experience) ... Cloudera Engineering Blog. Best practices, ... The backend of HUE uses Django, ...
Read more

[cloudera/hue] bdbd34: HUE-724 Integrate Django Flash for ...

Grokbase › Groups › Hadoop › hue-commits ... HUE-724 Integrate Django Flash for confirmation messages Introduced jHueNotify plugin Commit: ...
Read more