An introduction to Apache Crunch

50 %
50 %
Information about An introduction to Apache Crunch

Published on December 17, 2013

Author: mikejf12



A short introduction to Apache Crunch. What is it and how does it simplify and aid the
creation of Hadoop pipelines ?

Apache Crunch ● What is it ? ● How does it work ? ● Why use it ? ● Hadoop MapReduce pipelines ● Scrunch ● Joins

Apache Crunch – Pipe line ● Crunch is based on Google's FlumeJava ● Provides a Java based API for M/R pipelines ● It uses an MST ( multiple serializable type ) data model ● Good for processing complex data types ● Better for “non tuple” data types i.e. – Images – Audio – Seismic data

Apache Crunch – Pipe line ● What is a Map Reduce Pipe line ? – Map – Shuffle – Reduce – Combine ● Arranged in sequence and / or in parallel ● Potentially very long chains

Apache Crunch – Scala ● Scrunch is a Scala wrapper for Apache Crunch ● Reduced code ● Functional and OO styles ● Uses type inferencing for Map / Reduce ● Incorporates Java Materialize functionality ● Includes REPL ( read eval print loop )

Apache Crunch – Joins ● Details of Joins available in Crunch – Inner / Outer like SQL joins – Same with Left / Right / Full joins – MapSide join is an in memory join

Apache Crunch – Performance ● A light weight API that runs efficiently ● Crunch is a thin veneer on top of Map Reduce ● Two implementations available – – ● Hadoop Writeables Avro Avro implementation much faster

Apache Crunch – API ● Data Model ● Operators – Pipeline – DoFn – MRPipeline – CombineFn – MemPipeline – FilterFn – Pcollection – Joins – Ptable – Cartesian – PgroupTable – Sort – Source – Secondary Sort – Target – Pobject – Emitter – BloomFilters – PType

Contact Us ● Feel free to contact us at – – ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

An introduction to Apache Crunch - YouTube

A short introduction to Apache Crunch. What is it and how does it simplify and aid the creation of Hadoop pipelines ?
Read more

Apache Crunch Introduction - YouTube

Apache Crunch Pipelines ... This feature is not available right now. Please try again later.
Read more

PPT – An introduction to Apache Crunch PowerPoint ...

A short introduction to Apache Crunch. What is it and how does it simplify and aid the creation of Hadoop pipelines ? – PowerPoint PPT presentation
Read more

Crunch for Dummies - Cloudera Engineering Blog

This guide is intended to be an introduction to Crunch. Introduction. Crunch is used for processing data. Crunch builds on top of Apache Hadoop to provide ...
Read more

Apache Crunch - Apache Crunch User Guide

Apache Crunch User Guide Introduction to Crunch. Motivation; Data Model and Operators; Data Processing with DoFns. DoFns vs. Mapper and Reducer ...
Read more

Apache Crunch - Scrunch - Apache Software Foundation

Scrunch A Scala Wrapper for the Apache Crunch Java API Introduction¶ Scrunch is an experimental Scala wrapper for the Apache Crunch ...
Read more

Arvados | Documentation | Introduction to Crunch

Introduction to Crunch. ... Code samples in this documentation are licensed under the Apache License, Version 2.0. ...
Read more