Making Big Data Portable - Strata 2014 Presentation

60 %
40 %
Information about Making Big Data Portable - Strata 2014 Presentation

Published on February 22, 2014

Author: altiscale



The growing popularity of Hadoop has led to the availability of an increasing number of clusters worldwide, often multiply within the same organization. However, in order to leverage this computing capability, the clusters must first be primed with data. Frequently, this entails uploading existing client repositories into a remote cluster. Such a move can be challenging for the following reasons:

* size: the size of the data to be transferred can be very large. Typically, enterprises do not consider adopting Big Data technologies unless they are actively experiencing pain owing to their current system being unable to handle the existing volume. At that point, their data has usually grown to significant levels and, consequently, is much more difficult to manage.

* networks: if the target cluster is remote, one option is to move data via wide area networks. This presents hurdles in terms of limited available throughput, bandwidth and

* security. Transferring large data sizes via this approach can potentially be very time consuming. A special case is if the source and destination clusters are within the same data center but belong to different organizations. This scenario requires a different set of specialized skills in order to set up a network architecture that allows data to flow.
lack of domain knowledge & tools: there exists little understanding of the various approaches for bulk data uploads to a Hadoop cluster. In addition, widely used data transfer tools such as scp, ftp and rsync do not directly interface with HDFS and alternatives are not available. While there are tools to facilitate cluster to cluster copies, doing so across organizations and multiple hadoop versions is challenging.

* security: data is particularly vulnerable during transit. Being able to safely transport high volume data across organizational boundaries and networks demands thorough understanding of security protocols and practices.

In this talk, we present a number of techniques and best practices for uploading large quantities of data to a remote Hadoop cluster. Our presentation is based on real world experience in transferring large amounts of data on behalf of various clients. Topics covered will include DistCp, S3, disks, Flume and Kafka.

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Making Big Data Portable: Strata 2014 - O'Reilly ...

Making Big Data Portable. ... Our presentation is based on real world experience in transferring large amounts of data on behalf of ... Strata 2014 Alpine ...
Read more

Speaker Slides and Video: Strata 2014 - O'Reilly ...

Speaker Slides and Video for O'Reilly Strata ... Speaker Slides and Video. Presentation ... be about making new discoveries. With big data, ...
Read more

Making Big Data Portable at O'Reilly Strata Conference ...

Making Big Data Portable A session at O'Reilly Strata Conference 2014. Charles Wimmer; ... Date Wed 12th February 2014. Where.
Read more

IBM big data platform - Bringing big data to the Enterprise

The information management big data and analytics capabilities include : Data Management & Warehouse: ...
Read more

PowerPoint Presentation - Computer Science | Kent State ...

Introduction to Big Data & Basic Data Analysis. Big Data EveryWhere! Lots of data is being collected and warehoused . ... PowerPoint Presentation
Read more

Data Analysis and Statistical Software | Stata

Data Analysis and Statistical Software for Professionals. Stata is a complete, ... Call for presentations; Stata workshops at ICPSR Summer Program;
Read more

UN Global Pulse: Big Data for a Better World (Strata Conf ...

Presentation by UN Global Pulse at the Strata Big ... Presentation by UN Global Pulse at the Strata Big Data ... Making Big Data Portable - Strata 2014 ...
Read more

Strata Talk Recommender - Mike Cunha - Blog

Strata Talk Recommender ... Making Big Data Portable ... Help us kick off Strata 2014 with a festive gathering featuring a poker tournament.
Read more

Big Data | Big Data in Biomedicine Conference | Stanford ...

2014 Presentations; 2014 Interviews; ... Explore speaker presentations, interviews and more from the 2014 Big Data in Biomedicine Conference. Learn more ;
Read more